-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify and lower state caches #5313
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this! It looks awesome on the whole!
I'll do some benchmarks for the committee cache removal on mainnet or Holesky. I think the hot committees should be fine as we will use the head or some state from the state_cache
, but as you said via DM the cold committees are going to be slower due to the non-zero cost of going from HDiffBuffer -> BeaconState -> CommitteeCache
.
I guess a hybrid approach could be to retain the shuffling_cache
as a simpler LRU, which relies on the de-dupe at the store level for parallel requests?
diff.apply(&mut buffer)?; | ||
// Load diff and apply it to buffer. | ||
let diff = self.load_hdiff_for_slot(slot)?; | ||
diff.apply(Arc::make_mut(&mut buffer))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the addition of the Arc
s here. In future we might be able to go further and just clone the buffer.balances
, because that's the only one we actually mutate.
The main state diff only needs to be mutable because we re-assign it here:
lighthouse/beacon_node/store/src/hdiff.rs
Lines 156 to 159 in 928915c
pub fn apply_xdelta(&self, source: &[u8], target: &mut Vec<u8>) -> Result<(), Error> { | |
*target = xdelta3::decode(&self.bytes, source).ok_or(Error::UnableToApplyDiff)?; | |
Ok(()) | |
} |
i.e. it's a bit wasteful as-is, because we clone the buffer.state
and then don't even use that memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaah, I missed that! I committed a quick suggestion, but am not sure if it's the best approach.
Hmm. I think I would prefer re-enabling the historic state cache with computed committee caches and default size of 1 over that, as the But it is kind of hard to judge that, as I don't know the API usage profile of the average user interested in cold states. |
This is tagged as "waiting-on-author". How do we want to proceed with it? |
looks ready for a re-review, I think the tag was just outdated |
Hey @dknopik sorry for the slow review on this one I was testing out your changes while also trying to get tree-states to help rescue the Goerli network. I found we were still getting slogged by lots of cache misses and parallel state loads, so I tried re-working the cache to de-duplicate even more requests on this branch: https://github.com/michaelsproul/lighthouse/commits/tree-states-goerli-special/ Even with those changes, Goerli was still basically untenable. This has lead me to reconsider whether we want to continue using state diffs in the hot database. The problem is that, during periods of long non-finality, when you get a cache miss you need to load potentially hundreds of diffs (one every few epochs) back to the finalized state. This takes a lot of time. In tandem, we're also working on merging tree-states down to unstable gradually. The plan there is to split it into 3 parts:
For (2) we don't need all the complexity of the state diff handling in the cache, so I will try to incorporate some form of your changes. To keep the change gradual I think I will also keep the attester shuffling caches, until we can show they're not necessary. For the disk-based changes, I think if we abandon the state diffing and store full |
Hi @michaelsproul, thanks for the detailed update! Unfortunately I'll be pretty busy this week, but if I find some time I'll try to properly catch up. It probably makes no sense for me to get involved through coding right now, but if I have any ideas regarding caching (or the current state of affairs in general) I'll let you know. |
No worries @dknopik! I think I can handle merging stuff down in a satisfactory way. The in-memory PR is here for reference: |
Closing, obsolete |
Issue Addressed
Proposed Changes
Rewrite the
PromiseCache
: Instead of holding finished values for an indefinite amount of time, we only supply them to threads that wait for them, and then discard them. Caller may or may not store the finished value in another cache.PromiseCache
can be easily combined with a "long-term" caching solution such as anLruCache
or another situationally appropriate cache.PromiseCache
and some kind of "long-term" cache automatically.Introduce the new
PromiseCache
inHotColdDB::get_hot_state
.Introduce the new
PromiseCache
inHotColdDB::load_hdiff_buffer_for_slot
to accelerate parallel cold state loads.load_cold_state_by_slot
allows us to benefit from the cache not only if we request the state of a specific slot in parallel, but also if we request states requiring the same diff that is currently computed.load_cold_state_by_slot
, we have more writing state accesses, which are superfluous if there is no parallel access.Remove the
ShufflingCache
.HotColdDB
are sufficient, so we save memory by removing this cache.Arc
theHDiffBuffer
s in thediff_buffer_cache
.diff_buffer_cache
. This change avoids those copies.Arc
only adds unnecessary indirection and does not save a clone.Additional Info
historic_state_cache
, but decided against it, as I believe the performance of the new caching is still sufficient. Reintroducing it is however still a viable option to aid with rapid requests for the same slot.PromiseCache
are blocking, i.e. they are not available for the tokio executor. This is currently not possible as the store is notasync
. Therefore, while they save system resources by not running parallel computations, they still are not available for tokio to run some other task on that thread. Future work might make the state retrieval interfaceasync
to allow use of anasync
PromiseCache
variant to make these threads usable for our tokio executor.