Make WordPress Core

Opened 10 months ago

Closed 10 months ago

Last modified 10 months ago

#59607 closed defect (bug) (fixed)

HTML API: Fix - avoid calling subclass method while internally scanning in Tag Processor

Reported by: dmsnell's profile dmsnell Owned by: sergeybiryukov's profile SergeyBiryukov
Milestone: 6.4 Priority: normal
Severity: normal Version: 6.4
Component: HTML API Keywords: has-patch has-unit-tests has-testing-info
Focuses: Cc:

Description

After modifying tags in the HTML API, the Tag Processor backs up to before the tag being modified and then re-parses its attributes. This saves on the code-complexity involved in applying updates, which have already been transformed to "lexical updates" by the time they are applied.

In order to do that, get_updated_html() calls next_tag() to reuse its logic. Unfortunately, as a public method, subclasses may change the behavior of that method, and the HTML Processor does just this. It maintains an HTML stack of open elements and when the Tag Processor calls this method to re-scan a tag and its attributes, it leads to a broken stack.

To fix this, this patch replaces the call to next_tag() with a more appropriate reapplication of its internal parsing logic to rescan the tag name and its attributes. Given the limited nature of what's occurring in get_updated_html() this should bring with it certain guarantees that no HTML structure is being changed (that structure will only be changed by subclasses like the HTML Processor).

This patch resolves an issue discovered by @adamziel during testing of the HTML Processor.

---

No code in Gutenberg relies on the HTML Processor and no code has appeared in the WP Directory relying on it, so the probably extent of the impact of this defect is minimal at worst. It's possible that this bug has never been triggered on a real site.

One could argue that this patch belongs in 6.4 because it fixes a bug in get_breadcrumbs() and matches_breadcrumbs(), but given the severity and expose I don't see a compelling argument to rush it into the release, if people don't want to add yet-another change.

Change History (11)

This ticket was mentioned in PR #5475 on WordPress/wordpress-develop by @dmsnell.


10 months ago
#1

  • Keywords has-patch has-unit-tests added

Trac ticket: Core-59607

After modifying tags in the HTML API, the Tag Processor backs up to before the tag being modified and then re-parses its attributes. This saves on the code-complexity involved in applying updates, which have already been transformed to "lexical updates" by the time they are applied.

In order to do that, get_updated_html() calls next_tag() to reuse its logic. Unfortunately, as a public method, subclasses may change the behavior of that method, and the HTML Processor does just this. It maintains an HTML stack of open elements and when the Tag Processor calls this method to re-scan a tag and its attributes, it leads to a broken stack.

To fix this, this patch replaces the call to next_tag() with a more appropriate reapplication of its internal parsing logic to rescan the tag name and its attributes. Given the limited nature of what's occurring in get_updated_html() this should bring with it certain guarantees that no HTML structure is being changed (that structure will only be changed by subclasses like the HTML Processor).

This patch resolves an issue discovered by @adamziel during testing of the HTML Processor.

@dmsnell commented on PR #5475:


10 months ago
#2

cc: @adamziel @westonruter @ockham

#3 @SergeyBiryukov
10 months ago

  • Milestone changed from Awaiting Review to 6.4

Thanks for the ticket!

I think this is straightforward enough to include in 6.4, moving to the milestone for visibility.

Last edited 10 months ago by SergeyBiryukov (previous) (diff)

#4 @nicolefurlan
10 months ago

  • Keywords needs-testing-info added

Thank you @dmsnell!

Would you be able to add testing info so that we can get some testers to submit a test report or two for this change?

Would you like to take ownership of this ticket?

#5 @dmsnell
10 months ago

Thanks @nicolefurlan - for testing, it would be helpful for someone to follow the pattern in the added unit test. There should be no existing code affected by this change, so I think any testing is going to follow one of three approaches:

  • replicate the unit test
  • smoke test WordPress with this patch and ensure that things aren't entirely obviously broken (e.g. do heading block H2's have the wp-block-heading class). most of these smoke-tests are already covered by existing tests in the test suite.
  • think up some interesting way to try and break the HTML processing, which is what led to this patch in the first place.

I'm sorry I don't have better testing instructions, but on the other hand, at least the test suite is fairly comprehensive, so I'm confident this isn't causing blatant and unwanted side-effects

#6 @dmsnell
10 months ago

to make the unit test clearer, without this patch the breadcrumbs misreport after updating the tag's attribute. after calling get_updated_html() the method get_breadcrumbs() incorrectly reports array( 'HTML', 'BODY', 'DIV', 'BUTTON', 'B', 'B' ), because when next_tag() was called the HTML Processor thought it was entering the B element for the first time and appended it to the stack of open elements.

#7 @nicolefurlan
10 months ago

  • Keywords has-testing-info added; needs-testing-info removed

Sounds good @dmsnell! Thank you. Hopefully we can get this into RC1 next week. If you're able to rally folks to test this, that would certainly help! :)

#8 @SergeyBiryukov
10 months ago

  • Owner set to SergeyBiryukov
  • Resolution set to fixed
  • Status changed from new to closed

In 56941:

HTML API: Avoid calling subclass method while internally scanning in Tag Processor.

After modifying tags in the HTML API, the Tag Processor backs up to before the tag being modified and then re-parses its attributes. This saves on the code complexity involved in applying updates, which have already been transformed to “lexical updates” by the time they are applied.

In order to do that, ::get_updated_html() called ::next_tag() to reuse its logic. However, as a public method, subclasses may change the behavior of that method, and the HTML Processor does just this. It maintains an HTML stack of open elements and when the Tag Processor calls this method to re-scan a tag and its attributes, it leads to a broken stack.

This commit replaces the call to ::next_tag() with a more appropriate reapplication of its internal parsing logic to rescan the tag name and its attributes. Given the limited nature of what's occurring in ::get_updated_html(), this should bring with it certain guarantees that no HTML structure is being changed (that structure will only be changed by subclasses like the HTML Processor).

Follow-up to [56274], [56702].

Props dmsnell, zieladam, nicolefurlan.
Fixes #59607.

@SergeyBiryukov commented on PR #5475:


10 months ago
#9

Thanks for the PR! Merged in r56941.

@dmsnell commented on PR #5475:


10 months ago
#10

thanks @SergeyBiryukov!

@andrewserong commented on PR #5475:


10 months ago
#11

I'm not sure if it's related, but Gutenberg trunk PHP tests have started failing, in particular WP_Directive_Processor tests:

There were 2 failures:

1) WP_Directive_Processor_Test::test_set_inner_html_subsequent_updates_on_the_same_tag_work
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-'<div>outside</div><section>This is the even newer section content.</section>'
+'<This is the even newer section content.</section>'

/var/www/html/wp-content/plugins/gutenberg/phpunit/experimental/interactivity-api/class-wp-directive-processor-test.php:90
phpvfscomposer:///var/www/html/wp-content/plugins/gutenberg/vendor/phpunit/phpunit/phpunit:60

2) WP_Directive_Processor_Test::test_set_inner_html_preceded_by_set_attribute_works
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-'<div>outside</div><section id="thesection">This is the new section content.</section>'
+'<This is the new section content.</section>'

/var/www/html/wp-content/plugins/gutenberg/phpunit/experimental/interactivity-api/class-wp-directive-processor-test.php:108
phpvfscomposer:///var/www/html/wp-content/plugins/gutenberg/vendor/phpunit/phpunit/phpunit:60

Could that be related to this change, since WP_Directive_Processor extends WP_HTML_Tag_Processor?

There's also a bit of discussion over in the #core-editor Slack channel: https://wordpress.slack.com/archives/C02QB2JS7/p1697514694299429?thread_ts=1697471144.510409&cid=C02QB2JS7

Note: See TracTickets for help on using tickets.