WordPress/wp-includes/html-api
dmsnell b342d5c7b8 HTML API: Join text nodes on invalid-tag-name boundaries.
A fix was introduced to the Tag Processor to ensure that contiguous text
in an HTML document emerges as a single text node spanning the full
sequence. Unfortunately, that patch was marginally over-zealous in
checking if a "<" started a syntax token or not. It used the following:

{{{
<?php
if ( 'A' <= $c && 'z' >= $c ) { ... }
}}}

This was based on the assumption that the A-Z and a-z letters are
contiguous in the ASCII range; they aren't, and there's a gap of
several characters in between. The result of this is that in some
cases the parser created a text boundary when it didn't need to.
Text boundaries can be surprising and can be created when reaching
invalid syntax, HTML comments, and more hidden elements, so
semantically this wasn't a major bug, but it was an aesthetic
challenge.

In this patch the check is properly compared for both upper- and
lower-case variants that could potentially form tag names.

{{{
<?php
if ( ( 'A' <= $c && 'Z' >= $c ) || ( 'a' <= $c && 'z' >= $c ) ) { ... }
}}}

This solves the problem and ensures that contiguous text appears
as a single text node when scanning tokens.

Developed in https://github.com/WordPress/wordpress-develop/pull/6041
Discussed in https://core.trac.wordpress.org/ticket/60385

Follow-up to [57489]
Props dmsnell, jonsurrell
Fixes #60385


Built from https://develop.svn.wordpress.org/trunk@57542


git-svn-id: http://core.svn.wordpress.org/trunk@57043 1a063a9b-81f0-0310-95a4-ce76da25c4cd
2024-02-06 19:23:13 +00:00
..
class-wp-html-active-formatting-elements.php HTML API: Apply linting changes to `@TODO` comments. 2023-12-20 12:36:31 +00:00
class-wp-html-attribute-token.php HTML API: Track spans of text with (offset, length) instead of (start, end). 2023-12-10 13:19:28 +00:00
class-wp-html-open-elements.php HTML API: Add support for list elements. 2024-01-10 14:05:17 +00:00
class-wp-html-processor-state.php HTML API: Store current token reference in HTML Processor state. 2023-09-12 15:12:17 +00:00
class-wp-html-processor.php HTML API: Fix typo setting the wrong self-closing flag. 2024-02-02 23:27:14 +00:00
class-wp-html-span.php HTML API: Track spans of text with (offset, length) instead of (start, end). 2023-12-10 13:19:28 +00:00
class-wp-html-tag-processor.php HTML API: Join text nodes on invalid-tag-name boundaries. 2024-02-06 19:23:13 +00:00
class-wp-html-text-replacement.php HTML API: Track spans of text with (offset, length) instead of (start, end). 2023-12-10 13:19:28 +00:00
class-wp-html-token.php HTML-API: Prevent unintended behavior when WP_HTML_Token is unserialized. 2023-12-06 16:05:19 +00:00
class-wp-html-unsupported-exception.php HTML-API: Introduce minimal HTML Processor. 2023-07-20 13:43:25 +00:00