Make WordPress Core

Opened 5 years ago

Last modified 5 years ago

#47514 new enhancement

Change priority of make_clickable callback to boost performance

Reported by: olliverh87's profile olliverh87 Owned by:
Milestone: Awaiting Review Priority: normal
Severity: minor Version: 5.2.1
Component: Comments Keywords:
Focuses: performance Cc:

Description

Hi!

Preemptive: please excuse me if this ticket is inappropriate. Since WordPress' HackerOne states explicit disinterest in DoS related issues, I hesitated at first to create this ticket here. However, the effectiveness of the DoS attack and the signifcant performance boost I gained after I understood the logic behind the attack and changed a line of code made me think this might be worth sharing here.

Last week one of our clients who runs a very high profile site with WordPress fell victim to a sophisticated DoS attack where attackers spammed the site with comments containing thousands of strings such as ftp.z. The processing of requests that displayed these comments took a huge portion of the server's available RAM.

Long story short, we identified the following callbacks as the troublemakers:

  1. make_clickable
  2. wp_kses_post
  3. wptexturize
  4. convert_chars
  5. convert_smilies
  6. force_balance_tags
  7. wpautop

They all have in common that they operate on the entire comment string and attempt to parse out all HTML tags contained within them via REGEX functions, which takes up a lot of RAM. As a solution for our client, we established some more spam filtering for created comments.
However, out of personal interest in how this attack worked I tried to understand the logic behind the attack.

The reason the attackers had created comments containing thousands of strings such as ftp.x is that the make_clickable callback converts these strings into full <a> tags via PHP's built-in preg_replace_callback.

Per default, comments can only contain 65500 characters. The string ftp.x (including the whitespace) contains only 6 characters. Thus it is possible to have the string ftp.x contained 10916 times within a single comment. make_clickable then turns those 10916 strings into 10916 tags, leading to a comment containing over 500.000 characters. Since make_clickable runs first, all of the other 6 callbacks that operate on the comment string use REGEX to parse 10916 * 2 (one opening and one closing) tags, leading to a huge memory overhead.

I figured that changing the priority of the make_clickable filter so that it runs after force_balance_tags significantly increased the efficiency of the RAM usage of our server when confronted with comments containing links. Since the <a> tags are properly generated by WP itself, there is no reason to have it run through functions such as force_balance_tags.

To reproduce the problem, just open a test setup of WordPress and navigate to a post that has comments enabled. Create a new one and open the developer console of the browser and type

comment.value = "ftp.x ".repeat(10916)

Submit the comment and notice how long it takes to load the site. Imagine hundreds of such comments being loaded by hundreds of bots at the same time, then you get what happened to our client's site.

The following shows the patch I applied (also attached as a diff file):

--- a.php	2019-06-09 22:47:58.744746903 +0200
+++ b.php	2019-06-09 22:49:00.080742874 +0200
@@ -190,7 +190,7 @@
 
 add_filter( 'comment_text', 'wptexturize' );
 add_filter( 'comment_text', 'convert_chars' );
-add_filter( 'comment_text', 'make_clickable', 9 );
+add_filter( 'comment_text', 'make_clickable', 26 );
 add_filter( 'comment_text', 'force_balance_tags', 25 );
 add_filter( 'comment_text', 'convert_smilies', 20 );
 add_filter( 'comment_text', 'wpautop', 30 );

Best regards,
Olliver

Attachments (1)

47514.diff (470 bytes) - added by olliverh87 5 years ago.

Download all attachments as: .zip

Change History (4)

@olliverh87
5 years ago

#1 @SergeyBiryukov
5 years ago

  • Focuses performance added

#2 @SergeyBiryukov
5 years ago

Hi @olliverh87, welcome to WordPress Trac! Thank you for the detailed problem description.

Since the <a> tags are properly generated by WP itself, there is no reason to have it run through functions such as force_balance_tags.

What if there's an unclosed link in the comment, e.g. if the commenter forgot to close the <a> tag?

<a href="#" rel="nofollow">test

 ftp.x

Without the patch, the first link is properly closed before the second one is opened (even though there are extra </p><p> tags in between:

<p><a href="#" rel="nofollow">test</p>
<p> </a><a href="http://ftp.x" rel="nofollow">http://ftp.x</a></p>

With the patch, the links are nested, causing invalid markup:

<p><a href="#" rel="nofollow">test</p>
<p> <a href="http://ftp.x" rel="nofollow">http://ftp.x</a></a></p>
Last edited 5 years ago by SergeyBiryukov (previous) (diff)

#3 @olliverh87
5 years ago

Hi @SergeyBiryukov ,

thank you so much for your feedback! I did not think of this scenario, please let me apologize.

It seems that force_balance_tags does a great job at producing mostly valid markup, so to address the issue with the test case you showed, it seems to be the easiest solution to make a trade off and have make_clickable run after the other filters (such as convert_smilies but run before force_balance_tags.

I added the following snippet to the functions.php file of my active theme on my test-setup to do some performance measurements:

<?php
add_action('init', function() {

    $comment_string = "x " . str_repeat("ftp.x ", 10916);

    // 1st test, run with default configuration
    $default_time = microtime(true);
    apply_filters('comment_text', $comment_string);
    $default_time = microtime(true) - $default_time;


    //2nd test, run with changed priorities (make_clickable before force_balance_tags)
    remove_filter('comment_text', 'make_clickable', 9);
    add_filter('comment_text', 'make_clickable', 24);
    $changed_time = microtime(true);
    apply_filters('comment_text', $comment_string);
    $changed_time = microtime(true) - $changed_time;

    // echo results
    echo "Default configuration runtime: " . $default_time . " secs" . "<br />";
    echo "Changed configuration runtime: " . $changed_time . " secs" . "<br /> <br />";


    exit;

});

Running it for a 100 times showed this modification increases performance by ~23% on average, while still producing valid markup as addressed with your test case. What do you think?

The diff would then become:

--- a.php	2019-06-09 22:47:58.744746903 +0200
+++ b.php	2019-06-09 22:49:00.080742874 +0200
@@ -190,7 +190,7 @@
 
 add_filter( 'comment_text', 'wptexturize' );
 add_filter( 'comment_text', 'convert_chars' );
-add_filter( 'comment_text', 'make_clickable', 9 );
+add_filter( 'comment_text', 'make_clickable', 24 );
 add_filter( 'comment_text', 'force_balance_tags', 25 );
 add_filter( 'comment_text', 'convert_smilies', 20 );
 add_filter( 'comment_text', 'wpautop', 30 );
Note: See TracTickets for help on using tickets.