Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocks: Add utilities to help find next block comment delimiter in a document. #6760

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Jun 8, 2024

Trac ticket: Core-61401.
Theme: Everything could be streaming, single-pass, reentrant, and low-overhead.

Todo

  • While this is developed here, any changes to the Block Parser need to coordinate with the package in the Gutenberg repository.

Breaking changes

  • When encountering a block delimiter with a closing flag and also a void flag, the existing parser prefers returning as a void block, but this returns the block closer. This is an edge case when things are already erroneous, but it makes more sense to me when writing this that we should prefer closing to introducing a void, as the void flag is more likely to be a mistake, and because if we treat a closer as a void we could lead to deep chains of unclosed blocks. This is something I'd like to re-examine as a whole with the block parsing, taking lessons from HTML's stack machine, but not in this change (for example, treat it as a closer if there's an open block of the given name).

Summary

In this patch two new functions are introduced for the purpose of returning a PCRE pattern that can be used to quickly and efficiently find blocks within an HTML document without having to parse the entire document and without building a full block tree.

These new functions enable more efficient processing for work that only needs to examine document structure or know a few things about a document without knowing everything, including but not limited to:

  • Finding the URL of the first image block in a document.
  • Inserting hooked blocks.
  • Analyzing block counts.

Further, a new class is introduced to further manage the process of finding block comment delimiters, one based on a hand-crafted parser designed for high performance: WP_Parsed_Block_Delimiter_Info.

This class provides a number of conveniences:

  • It performs zero allocations beyond a static set of numeric indices.
  • It holds onto the reference of the text it scanned, but can be detached to release that text. When detaching, it creates a substring of the text containing the full delimiter match.
  • It can indicate if the delimiter is for a given block type without performing any allocations.
  • It returns a lazy JSON parser by default for the attributes (not implemented yet) for more efficient interaction with the block attributes.
  • Inasmuch as is possible, all costs are explicit and only paid when requested by the calling code.
Screenshot 2024-06-09 at 5 51 40 PM

Example

// Get the first image in a post with the PCRE pattern.
while ( 1 === preg_match( get_named_block_delimiter_regex( 'image' ), $post_content, $matches, null, $at ) ) {
	if ( '/' === $matches['closer'] ) {
		$at += strlen( $matches[0] );
		continue;
	}
	
	$attrs = json_parse( $matches['attrs'] );
	if ( isset( $attrs['url'] ) ) {
		return $attrs['url'];
	}
}

return null;
// Get the first image in a post with the utility class.
$image = null;
$at    = 0;
while ( ! isset( $image ) ) {
	$image = WP_Parsed_Block_Delimiter_Info::next_delimiter( $post_content, $at, $next_delimiter_at, $next_delimiter_length );
	if (
		'opener' === $image->get_delimiter_type() &&
		$image->is_block_type( 'core/image' )
	) {
		break;
	}

	$image = null;
	$at    = $next_delimiter_at + $next_delimiter_length;
}

Copy link

github-actions bot commented Jun 8, 2024

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@dmsnell dmsnell force-pushed the blocks/add-block-delimiter-regex branch from c05e05a to f35de59 Compare June 8, 2024 16:06
@WordPress WordPress deleted a comment from github-actions bot Jun 8, 2024
In this patch two new functions are introduced for the purpose of
returning a PCRE pattern that can be used to quickly and efficiently
find blocks within an HTML document without having to parse the entire
document and without building a full block tree.

These new functions enable more efficient processing for work that only
needs to examine document structure or know a few things about a
document without knowing everything, including but not limited to:

 - Finding the URL of the first image block in a document.
 - Inserting hooked blocks.
 - Analyzing block counts.
@dmsnell dmsnell force-pushed the blocks/add-block-delimiter-regex branch from f35de59 to 729e5b3 Compare June 8, 2024 16:10
@dmsnell dmsnell changed the title Blocks: Add functions to return PCRE pattern (regex) for finding blocks. Blocks: Add utilities to help find next block comment delimiter in a document. Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant