Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust - LMDB Cache back-end #9827

Closed
wants to merge 40 commits into from
Closed

Rust - LMDB Cache back-end #9827

wants to merge 40 commits into from

Conversation

yamadapc
Copy link
Contributor

@yamadapc yamadapc commented Jul 1, 2024

  • Adds a Rust LMDB wrapper and benchmark
  • Set-up bincode for our types
  • RKYV had been added but was removed on 71e7fc269881a44bab0d44b054e53ce8beabc477
  • Adds docs/rfcs/parcel-v3/0001_cache_implementation.md which discusses some cache related ideas and measurements taken from this branch

crates/parcel_core/src/types/asset.rs Show resolved Hide resolved
- Serializing a single request - **165ns** with `bincode` and 288ns with `rkyv`
- Read a single request and deserialize - **821ns**
- Write 1 request with async writes - **15204ns** / **15us**
- Write 1 request with sync writes - **3000000ns** / **3000us** / **3ms**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's expensive! That means if you have 100,000 nodes and 500,000 edges in the RequestGraph (not unreasonable, my current benchmark has 87k nodes and 400k edges, and I'm sure Jira has 10x more than that), just writing them alone will take 30 minutes.

Even with async writes, that would be 9 seconds added to the build time just to write request nodes to the cache. That's 11x slower than performing a complete build from scratch with no cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we'd write only request nodes so those estimates are 6x higher than reality.

It's quite expensive though, I think there is likely still some tuning that could speed it up a lot ; or something my benchmark does incorrectly.

Copy link
Contributor

@MonicaOlejniczak MonicaOlejniczak Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the cache is still unused, let's iterate on the implementation and keep the discussion open over the next few weeks as we get closer to having an E2E

@devongovett
Copy link
Member

devongovett commented Jul 1, 2024

I don't think we need a database for this. We don't need any of the features that a database offers, e.g. durability, fault tolerance, transactions, indexes, etc. We don't care about data loss - a cache can simply be rebuilt from scratch if something goes wrong. All of those features have significant overhead and added complexity.

The history here is that we started with a file system based cache in Parcel. We used this cache not only as an actual cache, but as temporary storage within a build to pass data from one phase to another (e.g. transformation to packaging), which may be on different threads. In the JS implementation of Parcel, this was necessary due to the lack of shared memory between threads, as well as v8 heap size limits that we ran into while building large apps. We eventually moved to LMDB because it was faster than the FS cache due to batching, but avoiding disk IO during builds entirely would be much faster still.

In Rust, neither of these are problems anymore. We can keep as much as we want in memory (even more than the total system physical memory if necessary - virtual memory is already paged by the OS), and we don't need to write to disk in order to share memory between threads. Avoiding writing to disk until Parcel shuts down (or maybe during idle time) will be much faster - I think your benchmarks show the cost well. This is worth doing because the common case is a dev server that does many builds over its lifetime, none of which need to be written to disk, so we should optimize that dev loop. Not only is this faster, it's also simpler to implement, and avoids problems of inconsistent state that could occur when data structures are sharded.

@yamadapc
Copy link
Contributor Author

yamadapc commented Jul 2, 2024

We might care about transactions / consistency on some types of entries, but generally I agree that most of the time we just want to write the transformation outputs (or other intermediary assets) into files. That's anyhow the way to support remote caching.

To do that, we'd need individual cache entries for each of these intermediary assets.

),
|b| {
let mut scratch = vec![];
scratch.resize(10000, 0); // pre-alloc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this equivalent to vec![0; 10000] on a single line?

@devongovett
Copy link
Member

I'm not sure remote caching will be beneficial anymore tbh. If done per file, the network latency alone will be an order of magnitude slower than the CPU time needed to simply recompile that file from scratch (a couple milliseconds). Even if you downloaded the entire cache at once up front it's questionable whether recompiling without a cache would be faster depending on your internet speed.

@yamadapc
Copy link
Contributor Author

yamadapc commented Jul 8, 2024

That's not how it'll work, I don't think. But that is a good point.

I reckon remote caching would work through downloading large incremental intermediary artifacts that are combined into the output files. That would surely be faster than rebuilding locally.

Currently there's a large amount of complexity related to cache writes, which is working around the fact there's a single cache entry. We'd like to avoid this complexity while also addressing a number of pitfalls it causes.


Regarding what you wrote regarding cache write times, I think we'll be able to address this better once we have an E2E build. My estimate was quite a bit lower, but I also thought since we don't need to wait we can offload any writes we do want to make.

I do agree regarding storing transformation outputs as files.

@devongovett
Copy link
Member

That would surely be faster than rebuilding locally.

I'm not so sure. You might have to download a lot of data. Say you start with no cache. Just downloading a large cache folder over a typical internet connection might be slower than building from scratch.

once we have an E2E build

Just pointing out that we have had this for a while on the core-rs2 branch. You're welcome to test stuff there. Not sure why we need to rewrite everything again... 😕

@yamadapc
Copy link
Contributor Author

We've decided to split the code from the markdown RFC on this branch.

Therefore I'll remove all code from this PR, and we can keep it open as a draft while we gather information and discuss how we'd like to implement persistence of "requests" or transformation outputs.

Pending information would be:

  • Performance of builds past rewrites
  • Combined size of all serialized requests for large builds
  • Combined runtime impact of reading and writing requests individually as opposed to all in one go

It'll be possible to gather this information once E2E builds are working on v2 with the v3 feature-flag turned-on.

@yamadapc
Copy link
Contributor Author

It will also be useful to clarify the meaning and process related to RFC documents added into this project. Which is something we can do soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants