Populate cache from last context rather than hxb #11519

back2dos · 2024-01-30T09:46:31Z

The other day on slack we discussed whether or not it might make sense to "load" types from the last set of types rather than hxb, as it's probably easier on the CPU.

back2dos
[...] naively, I would somehow think that keeping a reference to the old state and "loading" from that, i.e. lookup in there and recursively clone what you find (while again looking up as you recurse), would accomplish the same with less CPU cycles [...]
simn
Yes a typed AST "reloading mapping" could perhaps work as well, now that we cleaned up quite a few related parts

I have no idea if it's truly worth the trouble or not. The hxb PR mentions potential slowdown, so I thought I'd put forward the idea in case the ongoing optimization efforts fall short of delivering the desired performance (and assuming that this would indeed provide a speed up).

Simn · 2024-01-30T21:04:55Z

While I'm saying that it could work, I have some doubts about the cost/benefit here. I imagine that a hybrid cache would work like this:

Keep any module that hasn't changed for the last X compilations in memory (TODO: how to detect the "has changed").
Lower everything else to hxb and reload it.

A middle-ground where we lower to a non-binary representation could, in theory, give better performance, but I'd first like to check if the the b part of hxb really has an unacceptable overhead in reality.

back2dos · 2024-01-31T07:44:43Z

I'd first like to check if the the b part of hxb really has an unacceptable overhead in reality.

Yes, yes. This is primarily of a backup plan ;)

I'm not sure how expensive the hxb roundtrip will ultimately be, nor how much faster this would turn out. Some reasons why I think it might:

You skip the whole encoding part entirely, string pool population (which has gotta be quite a bit of hashing), much like buffer allocation (I'm assuming that the individual hxb blobs will be kept in memory rather than written out, because I can definitely see how adding IO to the would not be insignifficant)
Cloning will be a whole lot faster and less memory intense than decoding. "Copying" an int is no doubt faster than roundtripping through a variable length encoded int. Also copying a reference to an immutable string has gotta be faster than copying to and back out of some buffer. Some structs can simply be reused, like positions, perhaps even the few plain expressions left (not the typed ones) or the whole metadata that contains them.

Of course on any macro heavy project this difference will probably barely be noticeable, but not everyone has that kink.

Keep any module that hasn't changed for the last X compilations in memory (TODO: how to detect the "has changed").

Yeah, that could well bring more of a difference than the above idea.

Simn · 2024-01-31T08:05:07Z

I agree. All this can be summarized as the difference between "doing something" and "doing nothing" that I mentioned in the hxb PR. Our profiling suggested that we spent by far the most amount of time in the texpr decoder, mostly due to type instances and positions. The good news is that we can often skip that for display requests, which is where overhead is the most important.

Rudy also had some cases where the string pool overhead was surprisingly high. I've opened #11511 to support persisting the string pool itself in the server, which will take a lot of burden off of the reader. The writer still has to deal with the hashing, as you point out, but fortunately OCaml's Hashtbl is known to be quite efficient even with large data sets.

Simn · 2024-01-31T08:35:48Z

Another aspect that is often overlooked is the amount of persisted memory. While we tend to treat memory as this big cheap thing that everyone has plenty of, a large memory footprint can cause performance problems in a GC environment as well. From our C# unit tests after hacking the compiler to store both hxb and the in-memory cache:

I expect this to scale linearly in larger projects, and at some point that is going to cause issues. With OCaml's generational GC, we also have a lot of old-data-points-to-new-data situations, and a lot of write barriers. This is difficult to track and profile, but I'll definitely make the claim that a fully hxb-restored module is going to have better memory behavior. It's unlikely to offset the encoding overhead, but it's not nothing.

@kLabz Could you check these memory values on a Shiro codebase?

Simn · 2024-02-07T18:09:29Z

The module to binary ratio is about 5:1 in the codebases we tested. Will be interesting to compare this once #11511 has been implemented.

Simn added the feature-hxb label Jan 30, 2024

skial mentioned this issue Feb 1, 2024

Haxe Roundup 702 skial/haxe.io#1137

Closed

1 task

kLabz mentioned this issue May 1, 2024

Add heaps_disable_res_completion compilation flag HeapsIO/heaps#1209

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Populate cache from last context rather than hxb #11519

Populate cache from last context rather than hxb #11519

back2dos commented Jan 30, 2024

Simn commented Jan 30, 2024

back2dos commented Jan 31, 2024

Simn commented Jan 31, 2024

Simn commented Jan 31, 2024 •

edited

Loading

Simn commented Feb 7, 2024

Populate cache from last context rather than hxb #11519

Populate cache from last context rather than hxb #11519

Comments

back2dos commented Jan 30, 2024

Simn commented Jan 30, 2024

back2dos commented Jan 31, 2024

Simn commented Jan 31, 2024

Simn commented Jan 31, 2024 • edited Loading

Simn commented Feb 7, 2024

Simn commented Jan 31, 2024 •

edited

Loading