This patch introduces two related changes:
- It adds missing subclass methods on the HTML Processor which needed
to be implemented since it started visiting virtual nodes. These
methods need to account for the fact that not all tokens truly exist.
- It adds a new concept and internal method, `is_virtual()`, indicating
if the currently-matched token comes from the raw text in the input
HTML document or if it was the byproduct of semantic parsing rules.
This internal method and new vocabulary around token provenance
considerably simplifies the logic spread throughout the rest of the
class and its subclass methods.
Developed in https://github.com/WordPress/wordpress-develop/pull/6860
Discussed in https://core.trac.wordpress.org/ticket/61348
Follow-up to [58304].
Props dmsnell, jonsurrell, gziolo.
See #61348.
Built from https://develop.svn.wordpress.org/trunk@58558
git-svn-id: http://core.svn.wordpress.org/trunk@58006 1a063a9b-81f0-0310-95a4-ce76da25c4cd
HTML is a kind of short-hand for a DOM structure. This means that there are
many cases in HTML where an element's opening tag or closing tag is missing (or
both). This is because many of the parsing rules imply creating elements in the
DOM which may not exist in the text of the HTML.
The HTML Processor, being the higher-level counterpart to the Tag Processor, is
already aware of these nodes, but since it's inception has not paused on them
when scanning through a document. Instead, these are visible when pausing on a
child of such an element, but otherwise not seen.
In this patch the HTML Processor starts exposing those implicitly-created nodes,
including opening tags, and closing tags, that aren't foudn in the text content
of the HTML input document.
Previously, the sequence of matched tokens when scanning with
`WP_HTML_Processor::next_token()` would depend on how the HTML document was written,
but with this patch, all semantically equal HTML documents will parse and scan in
the same exact manner, presenting an idealized or "perfect" view of the document
the same way as would occur when traversing a DOM in a browser.
Developed in https://github.com/WordPress/wordpress-develop/pull/6348
Discussed in https://core.trac.wordpress.org/ticket/61348
Props audrasjb, dmsnell, gziolo, jonsurrell.
Fixes#61348.
Built from https://develop.svn.wordpress.org/trunk@58304
git-svn-id: http://core.svn.wordpress.org/trunk@57761 1a063a9b-81f0-0310-95a4-ce76da25c4cd