Block Parser: Explore a streaming lazy interface #5705

dmsnell · 2023-11-26T07:46:39Z

Augmented but not replaced entirely by the Unified Block Parser in #6381
Alternatively provided by the block-delimiter-finder in #6760

For a 3 MB document which took 5 seconds and 14 GB to parse, this version of the parser parsed it in 27 ms and 20 MB.

Initial testing

This version is slower for the home page render of `twentytwentyfour`. While unfortunate, this is not entirely surprising as this was designed to fix the catastrophically bad cases.

However, in catastrophic cases it's wildly better than trunk. The following was tested for a 15 KB / 400 line chunk of the 3 MB post mentioned above.

The algorithm has wild complexity too. For the same post, including the first 599 lines (only 23 KB of HTML), trunk consumes 520 MB or memory while this branch only consumes 10 MB. With no more than 15 samples the data is extremely significant.

Testing Results

This may be slightly slower for a number of normal posts. For the home page render of twentytwentyfour it rendered 3.7 ms slower than trunk. However, for my catastrophically-broken test post, the impact of the lazy parsing is dramatic and significant after only a single request.

The lazy parser is still slow for really pathological cases, but unlike trunk it runs within a mostly bounded memory footprint. The more pathological the post, the more dramatic the improvement in both runtime and memory use becomes. Below is a chart comparing slices of my test file against both parsers. Between each test run the database is reset. The number of lines reported is the count of how many of the original 3 MB document lines were extracted as the test post.

Of particular note is that this lazy parser allows for more control over the performance threshold still. Further expansion would allow setting a time limit, an upper bound on m emory usage, and a content length threshold after which the parser could pause and/or collapse the remainder of the post into a single unparsed block, essentially turning everything after the limit into a chunk of raw HTML (the static fallback render).

Lines	max depth	KB	`trunk` ms	branch ms	Δ	speedup	`trunk` MB	branch MB
(tt4)	25		92.6	96.1	+3.78%	x0.96
400	124	15	1,170	624	-47%	x1.88
600	187	23	4.82 s	968	-80%	x5.00	520	10
792	248	30	16.7 s	3.53 s	-79%	x4.73	1.8 GB	14
1000	316	38	118 s	10.1 s	-91%	x11.7	6.5 GB	23
1200	380	46	8.01 min	22 s	-95%	x22	13.9 GB	32
79k (all)	25683	3 MB						30

With memory_limit=55G on a 60 GB system I was unable to create the post via wp_insert_post() and it failed after some number of tens of minutes.

on this branch the post inserted after a few seconds and used a peak memory of 64 MB

For a 3 MB document which took 5 seconds and 14 GB to parse, this version of the parser parsed it in 27 ms and 20 MB.

dmsnell force-pushed the dmsnell-streaming-lazy-block-parser branch from 2585f14 to 20e8c0b Compare November 26, 2023 07:48

dmsnell force-pushed the dmsnell-streaming-lazy-block-parser branch from d1671bd to 13c1fb1 Compare December 7, 2023 18:38

joemcgill self-requested a review December 8, 2023 15:23

dmsnell force-pushed the dmsnell-streaming-lazy-block-parser branch from ab3978b to 7d95983 Compare December 12, 2023 17:05

dmsnell mentioned this pull request Dec 16, 2023

Show all template parts using the edited navigation WordPress/gutenberg#55782

Closed

dmsnell force-pushed the dmsnell-streaming-lazy-block-parser branch 2 times, most recently from 3b38369 to 3e55ebe Compare January 13, 2024 13:09

dmsnell force-pushed the dmsnell-streaming-lazy-block-parser branch 3 times, most recently from fed1bab to 2aa6d3e Compare January 30, 2024 21:49

dmsnell mentioned this pull request Apr 11, 2024

Block Parser: Start building a unified block/HTML parser. #6381

Draft

dmsnell mentioned this pull request May 1, 2024

Try: Block insertion via HTML processor woocommerce/woocommerce#47089

Open

11 tasks

dmsnell mentioned this pull request May 10, 2024

Performance best practices WordPress/developer-blog-content#258

Open

dmsnell added 2 commits June 8, 2024 17:22

Block Parser: Explore a streaming lazy interface

fb3c62d

For a 3 MB document which took 5 seconds and 14 GB to parse, this version of the parser parsed it in 27 ms and 20 MB.

Update Core code to avoid the nesting problem

9f4b52d

dmsnell force-pushed the dmsnell-streaming-lazy-block-parser branch from 2aa6d3e to 9f4b52d Compare June 8, 2024 15:23

dmsnell mentioned this pull request Jun 9, 2024

Dennis' list of broad and interesting things. WordPress/gutenberg#62437

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block Parser: Explore a streaming lazy interface #5705

Block Parser: Explore a streaming lazy interface #5705

dmsnell commented Nov 26, 2023 •

edited

Loading

Block Parser: Explore a streaming lazy interface #5705

Are you sure you want to change the base?

Block Parser: Explore a streaming lazy interface #5705

Conversation

dmsnell commented Nov 26, 2023 • edited Loading

Testing Results

dmsnell commented Nov 26, 2023 •

edited

Loading