You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The token queue adds a level of indirection that makes it harder to fix some issues. Eg. #292 is easy to fix once the token queue is gone. Also, debugging is currently complicated, as stack traces end at the token queue.
With the queue gone, stack traces will point at the corresponding line in the tokenizer. V8 will be able to optimise more aggressively; in my branch combining all of the changes, I see a ~15% performance increase using htmlparser-benchmark.
Game plan
Update the tokenizer to produce events. There will be a QueuedTokenizer class that wraps around the tokenizer, which provides an interface for the parser. Opened as refactor(tokenizer): Introduce events #404
Invert event processing in the parser. The parser currently first checks the insertion mode, and then the token type. By inverting this (checking first the token type, then the insertion mode), we prepare the parser to accept the events from (1). Opened as refactor(parser): Invert event processing #405
Why
The token queue adds a level of indirection that makes it harder to fix some issues. Eg. #292 is easy to fix once the token queue is gone. Also, debugging is currently complicated, as stack traces end at the token queue.
With the queue gone, stack traces will point at the corresponding line in the tokenizer. V8 will be able to optimise more aggressively; in my branch combining all of the changes, I see a ~15% performance increase using
htmlparser-benchmark
.Game plan
QueuedTokenizer
class that wraps around the tokenizer, which provides an interface for the parser. Opened as refactor(tokenizer): Introduce events #404(1) and (2) do not depend on one-another and can be merged independently.
cc @wooorm @43081j
The text was updated successfully, but these errors were encountered: