From fa4ab376da948c3f78d8d8b6798b4dcbcf37338c Mon Sep 17 00:00:00 2001 From: Tomas Hubelbauer Date: Sun, 13 Oct 2024 14:12:04 +0200 Subject: [PATCH] Update tasks related to HTML parsing with `HTMLRewriter` and CSS options This should allow me to trim down the project to focus only on the layout and rendering aspect and externalize the HTML and CSS handling somewhat. I do not plan on adding JavaScript support but if I were do it I could either use the `vm` API or just Bun's `eval` or `Function` constructor. Maybe I could also bring in QuickJS with Bun's C support via TinyCC (TCC). --- readme.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/readme.md b/readme.md index 51069cb..0f6feb0 100644 --- a/readme.md +++ b/readme.md @@ -18,16 +18,27 @@ state; perhaps during the next Browser Jam, more improvements will be delivered. ## Tasks -### Look into using `HTMLRewriter` as a solution for HTML parsing +### Use my `HTMLRewriter`-based `DOMParser` and drop the custom HTML parser here -As in https://github.com/TomasHubelbauer/bun-domparser. +https://github.com/TomasHubelbauer/bun-domparser ### Look into using Bun's experimental CSS parser https://bun.sh/blog/bun-v1.1.30#experimental-css-parsing-bundling +`HTMLRewriter` will also probably be usable to implement a basic query selector +engine by parsing the whole HTML and then parsing it again with the CSS selector +and comparing the two trees to see which nodes of the original tree match the +nodes of the selector-driven tree. + +This should be combined with the usage of the `HTMLRewriter`-based `DOMParser` I +mention above. + ### Improve the HTML parser to not set a node as `cursor` until fully closed +Note that this will become obsolete once I switch to my `DOMParser` based on the +`HTMLRewriter` API bundled with Bun. + Right now, I'm materializing nodes the moment their opening tag finishes parsing which allows me to simplify the attribute parsing states, but results in an unfaithful representation of the incomplete DOM tree (stuff that's not fully