Rewrite with compressed virtual DOM tree #1

bakape · 2019-04-23T13:25:17Z

Use as little string allocations as possible and speed up diffs through tokenization:

Tokenize all property and tag strings to uint16 IDs and keep in global map.
Separate storage field for classes - [ ] most used property. Store as an integer ID pointing to a sorted array of class IDs.
Predefine most common HTML tags and properties as constants
- Have node constructor accept these constants or a string through an Into generic parameter.
Have HTML macro use these predefined constants, where possible.
Property/tag string storage without extra heap allocations: [1B length][15B string]
- Fit to 16B for double int alignment
- Since the type is fixed size, don't derive the Hash trait for it and just return the contents as u64.
- If string larger than 15B, use fallback String map storage.

Nodes:

Implement text nodes as a special kind of <span>, so they stay addressable.
Store various node flags in a frontal byte.
Function for node data lookup by ID in the patch tree.
Method for flagging a node and it's subtree immutable. Such node will never be diffed.
Store non-class property values as a hash_map<property_id, String>

Patching:

DOM events:

Node handles are used to declare event handlers.
You can build higher level frameworks on top of this.
On DOM event fire, check if event type has any registered handlers. If yes, bubble up to the body, collecting brunhild IDs along the way. Descend the DOM representation tree, looking for event handlers as we go. If found, do map lookup against ID and trigger any registered event handles on node handles.

Node creation macros:

Macros for element and text node creation.
Macros take 1 parameter - dict-like literal, that will be allocated into static memory.
Element macro can take 1-2 parameters - 2. is an array of child Nodes.
Macro also parses a dict-like attribute list into a slice of tuples, if the parameter is a dict and not a hashmap iterator or similar.

Details:

There library takes over the HTML body. Everything to be left untouched must be stored in the <head>.
Such tight string and vector storage regions should be good for cache locality

The text was updated successfully, but these errors were encountered:

Chiiruno · 2019-04-24T08:12:36Z

Implement text nodes as a special kind of <span>, so they stay addressable.

Elaborate please. Do you mean a tag or other identifier in the field?

bakape · 2019-04-24T12:16:25Z

You can stick an id attribute onto a span, but not a text node. Most virtual DOMs keep a reference to the node in memory, but I plan to do no such thing across the FFI and lookup nodes each time by ID. Though reversing that is up for consideration.

…

On Wed, 24 Apr 2019, 11:12 チルノ, ***@***.***> wrote: Implement text nodes as a special kind of , so they stay addressable. Elaborate please. Do you mean a tag or other identifier in the field? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB347MFUX723KNCXALO5MJLPSAI7JANCNFSM4HHX5SEQ> .

Chiiruno · 2019-04-24T16:40:34Z

Hmm, memory lookup by reference sounds a lot faster, however I'm not considering the cross-FFI penalty.
I don't suppose you could give me a short list of pros and cons for looking up nodes each time by ID, vs a memory reference map/list, could you?

bakape · 2019-04-24T16:51:19Z

References: + Faster node lookup + Let's user define element id attribute - Uses more memory - Have to take references of all created nodes on insertion, which is a ton of FFI calls ID lookup: + Faster inserts and subtree overwrites (just generate HTML and insert that in one FFI call) + Uses less memory - can't let user define own IDs

…

On Wed, 24 Apr 2019, 19:40 チルノ, ***@***.***> wrote: Hmm, memory lookup by reference sounds a lot faster, however I'm not considering the cross-FFI penalty. I don't suppose you could give me a short list of pros and cons for looking up nodes each time by ID, vs a memory reference map/list, could you? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB347MFKMBFFVJZ7CRUDJJDPSCEQHANCNFSM4HHX5SEQ> .

Chiiruno · 2019-04-24T17:10:43Z

The crux here seems to be the user being able to define IDs, and the FFI calls.
Could something like this be of help for massive FFI calls in JS<->WASM, which there shouldn't be a lot of memory sharing to begin with.

bakape · 2019-04-24T19:18:44Z

Interesting read, but that is completely not applicable here. I consider the core difference to be storage (or not) of JS object references in the WASM memory for faster lookup times or much faster whole subtree patches (this includes the initial page render). IDs is secondary.

…

On Wed, 24 Apr 2019 at 20:10, チルノ ***@***.***> wrote: The crux here seems to be the user being able to define IDs, and the FFI calls. Could something like this <https://nullprogram.com/blog/2018/05/27/> be of help for massive FFI calls in JS<->WASM, which there shouldn't be a lot of memory sharing to begin with. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB347MA6VR7WR65SMF77DU3PSCIBHANCNFSM4HHX5SEQ> .

bakape · 2019-04-29T01:41:23Z

@Chiiruno

Store non-class property values and text node strings inside a global reference-counting map.

The map needs to be accessible both by string ID and by actual string.

Maybe somehow not allocate a String twice and use references.

Any idea how to implement this map, that both maps strings to IDs and IDs to strings without actually having to allocate the string twice?

Chiiruno · 2019-04-29T01:57:26Z

With a map, you can't have the value find the key, right?
Whereas vice versa you can, right?
If the string is the same, you could have a pointer perhaps?

Chiiruno · 2019-04-29T02:02:25Z

Oh
Perhaps have the first X characters of the actual string be the string id, have the key be a pointer to the string, and use the first X characters as a key for the string to be found?
Although, you could ditch the map with that.

bakape · 2019-04-29T02:02:26Z

Yes, but how to do that in Rust with references or something? This will be used in at least 3 separate locations, so a generic type for this would be nice.

…

On Mon, 29 Apr 2019 at 04:57, チルノ ***@***.***> wrote: With a map, you can't have the value find the key, right? Whereas vice versa you can, right? If the string is the same, you could have a pointer perhaps? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB347MFISXQXFKSSA7DAQTLPSZIYNANCNFSM4HHX5SEQ> .

Chiiruno · 2019-04-29T02:03:50Z

I've had a bad migraine the past few days, so I've just been catching up on manga and listening to quiet music, um... Maybe arc/box for references/pointers?
I'm pretty sure you can make a map with those.
You can also use unsafe to work with raw pointers, while safely storing them in the map.

Chiiruno · 2019-04-29T02:06:13Z

With Rust, you could make a type, and set the impl for each location, while having shared ("generic" non-specific) functions higher up in the type I think, I'll have to reread that.

Chiiruno · 2019-04-29T02:07:37Z

Trait, I think that might have been it. The type implements trait along with your specific implementation, there may have been a more virtual approach I may be thinking about.

Chiiruno · 2019-04-29T02:08:19Z

As for copy operations, for the type you'll want to set it to move instead, I forgot how to do that, but it's easily searchable.

Chiiruno · 2019-04-29T02:09:25Z

Never mind, it's move default, and you can add a copy trait if you desire it.
https://doc.rust-lang.org/rust-by-example/trait/derive.html

bakape · 2019-04-29T02:14:16Z

Maybe it might be better to use a sparse vector with a freelist and 2 maps to point to indices in the vector.

…

On Mon, 29 Apr 2019 at 05:09, チルノ ***@***.***> wrote: Never mind, it's move default, and you can add a copy trait if you desire it. https://doc.rust-lang.org/rust-by-example/trait/derive.html — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB347MCT3ABFINS5QFNEE3DPSZKFLANCNFSM4HHX5SEQ> .

Chiiruno · 2019-04-29T02:24:07Z

Is there any way to guarantee that we only push to the end of the vector?
I think middle reads are a given here, but we may be able to speed things up by ensuring that we always only push at the end of the vector, to prevent reallocations.

bakape · 2019-04-29T02:30:43Z

Actually, it might be not worth it to dedup strings that aren't tags or property names. In which case we only have to deal with a 16 byte array and no extra heap allocations for 2 maps. Yeah, let's do that.

…

On Mon, 29 Apr 2019 at 05:24, チルノ ***@***.***> wrote: Is there any way to guarantee there is only push and not insert in this vector? (insert only at end, preferrably) I think middle reads are a given here, but we may be able to speed things up by ensuring that we always only push at the end of the vector, to prevent reallocations. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB347MHNSKS7WOUMH6ZVJHTPSZL4RANCNFSM4HHX5SEQ> .

Chiiruno · 2019-04-29T02:39:20Z

As long as the memory usage isn't morbidly high, it should be okay to sacrifice some of it for speed.

Chiiruno · 2019-04-29T02:40:05Z

It is important that it fits into the CPU cache though, hm...

bakape · 2019-04-29T02:40:39Z

The question is, if the strings would even be repeating enough to justify the complication in typical usage scenarios.

…

On Mon, 29 Apr 2019 at 05:39, チルノ ***@***.***> wrote: As long as the memory usage isn't morbidly high, it should be okay to sacrifice some of it for speed. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB347MHL3DQZVHCSSGRKDX3PSZNVRANCNFSM4HHX5SEQ> .

Chiiruno · 2019-04-29T02:42:09Z

Assuming an English alphabet, yes.
Common HTML terms and some other terms may be worth the complication.

Chiiruno · 2019-04-29T02:43:26Z

You may also be able to compress common non-programming terms in strings down quite a bit with a lookup map, thus saving memory.
However, this assumes English.

bakape · 2019-04-29T02:43:34Z

Whitelist them?

…

On Mon, 29 Apr 2019 at 05:42, チルノ ***@***.***> wrote: Assuming an English alphabet, yes. Common HTML terms and some other terms may be worth the complication. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB347MF2PJGXWZLDVWI762DPSZOAFANCNFSM4HHX5SEQ> .

Chiiruno · 2019-04-29T02:44:15Z

Yeah, whitelist certain common terms, and maybe words in strings to be compressed.

bakape · 2019-04-29T02:46:44Z

Don't know about sub-string compression, not simple exact match tokenization. The compression/decompression overhead might play there too much. This is all in memory and is not a database.

…

On Mon, 29 Apr 2019 at 05:44, チルノ ***@***.***> wrote: Yeah, whitelist certain common terms, and maybe words in strings to be compressed. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB347MF4NQGGDV6KYAOG3CTPSZOH7ANCNFSM4HHX5SEQ> .

Chiiruno · 2019-04-29T02:48:48Z

I guess we can't accept a CPU/memory spike on page load to do all of that, since very rarely is someone on the same page for long. The memory usage would be a lot lower assuming they stay on the same page however.
Would there be a way for this to smartly detect pages that are usually active for a long time, or perhaps have a programmer set it?

bakape · 2019-04-29T02:50:43Z

Adaptive algorithms can come later.

…

On Mon, 29 Apr 2019 at 05:48, チルノ ***@***.***> wrote: I guess we can't accept a CPU/memory spike on page load to do all of that, since very rarely is someone on the same page for long. The memory usage would be a lot lower assuming they stay on the same page however. Would there be a way for this to smartly detect pages that are usually active for a long time, or perhaps have a programmer set it? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB347MB4AJLAS7GQRDY5XDDPSZOZBANCNFSM4HHX5SEQ> .

Chiiruno · 2019-04-29T02:51:27Z

That's fine, but I think it's best to consider the concept now, so less rewriting later.
Of course, avoiding premature optimization.

bakape · 2019-04-29T03:20:53Z

Need an append-only data structure that is writable like a string.

Could you look work on this one?

bakape · 2019-04-29T03:22:37Z

Actually, before that we need to wipe clean and migrate this repo to the newer Rust WASM toolkit.

Chiiruno · 2019-04-29T03:25:52Z

Could you look work on this one?

Sure, seems pretty well-defined.

bakape · 2019-05-06T12:36:31Z

@Chiiruno I set up a basic project with wasm_bindgen as the linking layer.

bakape · 2019-05-06T17:46:29Z

String tokenizer done.

bakape · 2019-06-01T14:49:10Z

unsafe unsafe unsafe like grandpa C was intended.

bakape added the enhancement label Apr 23, 2019

bakape changed the title ~~Idea: Compress virtual DOM tree~~ Ideas: Compressed virtual DOM tree Apr 23, 2019

bakape changed the title ~~Ideas: Compressed virtual DOM tree~~ Ideas: Rewrite with compressed virtual DOM tree Apr 23, 2019

bakape changed the title ~~Ideas: Rewrite with compressed virtual DOM tree~~ Rewrite with compressed virtual DOM tree May 7, 2019

Rewrite with compressed virtual DOM tree #1

Rewrite with compressed virtual DOM tree #1

Comments

bakape commented Apr 23, 2019 • edited Loading

Chiiruno commented Apr 24, 2019 • edited Loading

bakape commented Apr 24, 2019 via email

Chiiruno commented Apr 24, 2019

bakape commented Apr 24, 2019 via email

Chiiruno commented Apr 24, 2019

bakape commented Apr 24, 2019 via email

bakape commented Apr 29, 2019

Chiiruno commented Apr 29, 2019

Chiiruno commented Apr 29, 2019 • edited Loading

bakape commented Apr 29, 2019 via email

Chiiruno commented Apr 29, 2019 • edited Loading

Chiiruno commented Apr 29, 2019 • edited Loading

Chiiruno commented Apr 29, 2019

Chiiruno commented Apr 29, 2019

Chiiruno commented Apr 29, 2019

bakape commented Apr 29, 2019 via email

Chiiruno commented Apr 29, 2019 • edited Loading

bakape commented Apr 29, 2019 via email

Chiiruno commented Apr 29, 2019

Chiiruno commented Apr 29, 2019

bakape commented Apr 29, 2019 via email

Chiiruno commented Apr 29, 2019

Chiiruno commented Apr 29, 2019

bakape commented Apr 29, 2019 via email

Chiiruno commented Apr 29, 2019

bakape commented Apr 29, 2019 via email

Chiiruno commented Apr 29, 2019

bakape commented Apr 29, 2019 via email

Chiiruno commented Apr 29, 2019

bakape commented Apr 29, 2019 • edited Loading

bakape commented Apr 29, 2019 • edited Loading

Chiiruno commented Apr 29, 2019

bakape commented May 6, 2019

bakape commented May 6, 2019

bakape commented Jun 1, 2019

bakape commented Apr 23, 2019 •

edited

Loading

Chiiruno commented Apr 24, 2019 •

edited

Loading

Chiiruno commented Apr 29, 2019 •

edited

Loading

Chiiruno commented Apr 29, 2019 •

edited

Loading

Chiiruno commented Apr 29, 2019 •

edited

Loading

Chiiruno commented Apr 29, 2019 •

edited

Loading

bakape commented Apr 29, 2019 •

edited

Loading

bakape commented Apr 29, 2019 •

edited

Loading