Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try/html api with packages update and layout.php back ports #3955

Closed

Conversation

ntsekouras
Copy link

@ntsekouras ntsekouras commented Feb 1, 2023

PR which includes:

  1. is branched from HTML Tag Processor
  2. the the WordPress packages update with Gutenberg 15.0.1 changes
  3. Includes layout support back ports from Gutenberg PRs(Layout: ensure block content is always returned as a string after processing gutenberg#45330, Try adding layout classnames to inner block wrapper gutenberg#44600, Add Layout controls to children of Flex layout blocks gutenberg#45364, Layout child fixed size should not be fixed by default and should always have a value set gutenberg#46139)

Notes

  1. I'll check it thoroughly tomorrow regarding the layout additions, although I think they are fine.
  2. Currently some block might cause a fatal error(Calendar and Gallery), but these are going to be fixed with this PR

Trac ticket:


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

dmsnell and others added 23 commits January 26, 2023 15:48
This commit pulls in the HTML Tag Processor from the Gutenbeg repository.
The Tag Processor attempts to be an HTML5-spec-compliant parser that
provides the ability in PHP to find specific HTML tags and then add,
remove, or update attributes on that tag. It provides a safe and reliable
way to modify the attribute on HTML tags.

```php
// Add missing `rel` attribute to links.
$p = new WP_HTML_Tag_Processor( $block_content );
if ( $p->next_tag( 'A' ) && empty( $p->get_attribute( 'rel' ) ) ) {
    $p->set_attribute( 'noopener nofollow' );
}
return $p->get_updated_html();
```

Introduced originally in WordPress/gutenberg#42485 and developed within
the Gutenberg repository, this HTML parsing system was built in order
to address a persistent need (properly modifying HTML tag attributes)
and was motivated after a sequence of block editor defects which stemmed
from mismatches between actual HTML code and expectectations for HTML
input running through existing naive string-search-based solutions.

The Tag Processor is intended to operate fast enough to avoid being an
obstacle on page render while using as little memory overhead as possible.
It is practically a zero-memory-overhead system, and only allocates memory
as changes to the input HTML document are enqueued, releasing that memory
when flushing those changes to the document, moving on to find the next
tag, or flushing its entire output via `get_updated_html()`.

Rigor has been taken to ensure that the Tag Processor will not be consfused
by unexpected or non-normative HTML input, including issues arising from
quoting, from different syntax rules within `<title>`, `<textarea>`, and
`<script>` tags, from the appearance of rare but legitimate comment and
XML-like regions, and from a variety of syntax abnormalities such as
unbalanced tags, incomplete syntax, and overlapping tags.

The Tag Processor is constrained to parsing an HTML document as a stream
of tokens. It will not build an HTML tree or generate a DOM representation
of a document. It is designed to start at the beginning of an HTML
document and linearly scan through it, potentially modifying that document
as it scans. It has no access to the markup inside or around tags and it
has no ability to determine which tag openers and tag closers belong to each
other, or determine the nesting depth of a given tag.

It includes a primitive bookmarking system to remember tags it has previously
visited. These bookmarks refer to specific tags, not to string offsets, and
continue to point to the same place in the document as edits are applied. By
asking the Tag Processor to seek to a given bookmark it's possible to back
up and continue processsing again content that has already been traversed.

Attribute values are sanitized with `esc_attr()` and rendered as double-quoted
attributes. On read they are unescaped and unquoted. Authors wishing to rely on
the Tag Processor therefore are free to pass around data as normal strings.

Convenience methods for adding and removing CSS class names exist in order to
remove the need to process the `class` attribute.

```php
// Update heading block class names
$p = new WP_HTML_Tag_Processor( $html );
while ( $p->next_tag() ) {
    switch ( $p->get_tag() ) {
	case 'H1':
	case 'H2':
	case 'H3':
	case 'H4':
	case 'H5':
	case 'H6':
	    $p->remove_class( 'wp-heading' );
	    $p->add_class( 'wp-block-heading' );
	    break;
}
return $p->get_updated_html();
```

The Tag Processor is intended to be a reliable low-level library for traversing
HTML documents and higher-level APIs are to be built upon it. Immediately, and
in Core Gutenberg blocks it is meant to replace HTML modification that currently
relies on RegExp patterns and simpler string replacements.

See the following for examples of such replacement:
    WordPress/gutenberg@1315784
    https://github.com/WordPress/gutenberg/pull/45469/files#diff-dcd9e1f9b87ca63efe9f1e834b4d3048778d3eca41aa39c636f8b16a5bb452d2L46
    WordPress/gutenberg#46625

Co-Authored-By: Adam Zielinski <adam@adamziel.com>
Co-Authored-By: Bernie Reiter <ockham@raz.or.at>
Co-Authored-By: Grzegorz Ziolkowski <grzegorz@gziolo.pl>
* Rename data providers to match test per coding standard.
* Restructure data provider datasets into a single array form for consistency.
* Add `WP_HTML_Tag_Processor::` to @Covers methods per coding standard.
* Add empty line between set up and assertion groupings.
* Moved well-formed HTML into separate test of updating attributes.
* Replaced assertEquals() with assertSame().
Tests_{APIorGroup}_className.
@ntsekouras ntsekouras changed the title Try/html api with packages update Try/html api with packages update and layout.php back ports Feb 1, 2023
@felixarntz
Copy link
Member

@ntsekouras I tried using this, but it resulted in a couple errors for me. I created #3971 as a way to test #3920, with that PR only including the actual PHP usage of the new WP_HTML_Tag_Processor class / API.

@ntsekouras ntsekouras closed this Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants