Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate cache from last context rather than hxb #11519

Open
back2dos opened this issue Jan 30, 2024 · 5 comments
Open

Populate cache from last context rather than hxb #11519

back2dos opened this issue Jan 30, 2024 · 5 comments

Comments

@back2dos
Copy link
Member

The other day on slack we discussed whether or not it might make sense to "load" types from the last set of types rather than hxb, as it's probably easier on the CPU.

back2dos
[...] naively, I would somehow think that keeping a reference to the old state and "loading" from that, i.e. lookup in there and recursively clone what you find (while again looking up as you recurse), would accomplish the same with less CPU cycles [...]
simn
Yes a typed AST "reloading mapping" could perhaps work as well, now that we cleaned up quite a few related parts

I have no idea if it's truly worth the trouble or not. The hxb PR mentions potential slowdown, so I thought I'd put forward the idea in case the ongoing optimization efforts fall short of delivering the desired performance (and assuming that this would indeed provide a speed up).

@Simn
Copy link
Member

Simn commented Jan 30, 2024

While I'm saying that it could work, I have some doubts about the cost/benefit here. I imagine that a hybrid cache would work like this:

  1. Keep any module that hasn't changed for the last X compilations in memory (TODO: how to detect the "has changed").
  2. Lower everything else to hxb and reload it.

A middle-ground where we lower to a non-binary representation could, in theory, give better performance, but I'd first like to check if the the b part of hxb really has an unacceptable overhead in reality.

@back2dos
Copy link
Member Author

I'd first like to check if the the b part of hxb really has an unacceptable overhead in reality.

Yes, yes. This is primarily of a backup plan ;)

I'm not sure how expensive the hxb roundtrip will ultimately be, nor how much faster this would turn out. Some reasons why I think it might:

  1. You skip the whole encoding part entirely, string pool population (which has gotta be quite a bit of hashing), much like buffer allocation (I'm assuming that the individual hxb blobs will be kept in memory rather than written out, because I can definitely see how adding IO to the would not be insignifficant)
  2. Cloning will be a whole lot faster and less memory intense than decoding. "Copying" an int is no doubt faster than roundtripping through a variable length encoded int. Also copying a reference to an immutable string has gotta be faster than copying to and back out of some buffer. Some structs can simply be reused, like positions, perhaps even the few plain expressions left (not the typed ones) or the whole metadata that contains them.

Of course on any macro heavy project this difference will probably barely be noticeable, but not everyone has that kink.


Keep any module that hasn't changed for the last X compilations in memory (TODO: how to detect the "has changed").

Yeah, that could well bring more of a difference than the above idea.

@Simn
Copy link
Member

Simn commented Jan 31, 2024

I agree. All this can be summarized as the difference between "doing something" and "doing nothing" that I mentioned in the hxb PR. Our profiling suggested that we spent by far the most amount of time in the texpr decoder, mostly due to type instances and positions. The good news is that we can often skip that for display requests, which is where overhead is the most important.

Rudy also had some cases where the string pool overhead was surprisingly high. I've opened #11511 to support persisting the string pool itself in the server, which will take a lot of burden off of the reader. The writer still has to deal with the hashing, as you point out, but fortunately OCaml's Hashtbl is known to be quite efficient even with large data sets.

@Simn
Copy link
Member

Simn commented Jan 31, 2024

Another aspect that is often overlooked is the amount of persisted memory. While we tend to treat memory as this big cheap thing that everyone has plenty of, a large memory footprint can cause performance problems in a GC environment as well. From our C# unit tests after hacking the compiler to store both hxb and the in-memory cache:

Code_fGuc8fqqVp

I expect this to scale linearly in larger projects, and at some point that is going to cause issues. With OCaml's generational GC, we also have a lot of old-data-points-to-new-data situations, and a lot of write barriers. This is difficult to track and profile, but I'll definitely make the claim that a fully hxb-restored module is going to have better memory behavior. It's unlikely to offset the encoding overhead, but it's not nothing.

@kLabz Could you check these memory values on a Shiro codebase?

@Simn
Copy link
Member

Simn commented Feb 7, 2024

The module to binary ratio is about 5:1 in the codebases we tested. Will be interesting to compare this once #11511 has been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants