Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighthouse OOM mitigations #7053

Open
michaelsproul opened this issue Feb 27, 2025 · 7 comments
Open

Lighthouse OOM mitigations #7053

michaelsproul opened this issue Feb 27, 2025 · 7 comments
Labels
optimization Something to make Lighthouse run more efficiently. v7.0.0-beta.clean Clean release post Holesky rescue v7.0.0 New release c. Q1 2025

Comments

@michaelsproul
Copy link
Member

michaelsproul commented Feb 27, 2025

Short term plan:

  1. Move banned block checks higher in block verification to prevent repeat state lookups (before every instance of load_parent in block_verification.rs)
  2. Encourage use of --state-cache-size 4 to avoid bad state cache pruning logic that is keeping 128x 180MB epoch boundary states around (~24GB of states).
  3. (DONE) Remove block root lookups from status processing. We are getting killed looking up old states to compute the block root. We need a more aggressive version of this PR: Optimise status processing #5481.

Point (1) is intended to fix an OOM that happens to nodes that are in sync and forced to process junk.

Point (2) fixes OOMs during head sync due to lots of epoch boundary states being retain.

To investigate later:

  1. Why are epoch boundary state diffs so large (180MB+), given that we should be basing them off each other while syncing sequential blocks? Answer: balances and inactivity_scores.
  2. Is an earlier invalid block check sufficient to prevent OOM while synced? Are there are other states or valid side chains which are forcing us to load states and use too much memory?
  3. Why is sync sending us so many copies of the invalid block? Is there parallelism that is causing the OOM near the head?

Future plans (long-term fixes):

  1. Implement the PromiseCache concept used for attestation committees for beacon states. This is quite subtle to get right, a version was previously attempted but abandoned (Unify and lower state caches #5313). Tracking issue: Improve & unify parallel de-duplication caches #5112
  2. Implement size-based pruning for the state cache. This is possible with my WIP changes from: State cache memory size WIP #6532. However, that code is quite immature and the pruning itself is expensive (1.5s-4s or more), so we cannot ship this quickly. There is also some subtlety around deciding which states to prune based on size (we could use a similar heuristic to the existing cull method on the 20% largest states).
  3. Re-think pruning logic in cull so that it doesn't hang on to so many useless epoch boundary states.
@michaelsproul michaelsproul added the optimization Something to make Lighthouse run more efficiently. label Feb 27, 2025
@michaelsproul michaelsproul added v7.0.0 New release c. Q1 2025 v7.0.0-beta.clean Clean release post Holesky rescue labels Feb 27, 2025
@michaelsproul
Copy link
Member Author

Merged status processing fix to holesky-rescue branch:

@michaelsproul
Copy link
Member Author

Just thought of another source of unbounded state lookups: BlocksByRoot and BlocksByRange.

It might be time to build a dedicated in-memory DAG of block roots which we can use instead of the state-based block iterators.

cc @dapplion

@eserilev
Copy link
Collaborator

eserilev commented Feb 27, 2025

Why are epoch boundary state diffs so large (180MB+), given that we should be basing them off each other while syncing sequential blocks?

Inactivity leak penalties are applied at each epoch. The longer were in non-finality the heavier the penalties are, so not only validator balances are changing at epoch boundaries, but also potentially effective balances. This could be one reason epoch boundary state diffs are so large, esp considering how large the validator set is on holesky.

@michaelsproul
Copy link
Member Author

michaelsproul commented Mar 3, 2025

Jimmy and I had a look a the diffs using lcli (based on the states from our experiment the other day, logs here). They are legitimately big.

Mostly balances and inactivity scores.

[2025-03-03T01:16:43Z INFO  lcli::skip_slots] Using mainnet spec
[2025-03-03T01:16:43Z INFO  lcli::skip_slots] Advancing 32 slots
[2025-03-03T01:16:43Z INFO  lcli::skip_slots] Doing 1 runs
[2025-03-03T01:16:43Z INFO  lcli::skip_slots] State path: "/home/michael/eth2/milhouse-diff-test/state_3714912.ssz"
[2025-03-03T01:16:44Z DEBUG lcli::transition_blocks] SSZ decoding /home/michael/eth2/milhouse-diff-test/state_3714912.ssz: 542.830343ms
[2025-03-03T01:16:49Z INFO  lcli::skip_slots] Post-state balances size (total/diff): 84594032-84594032 B/79783384 B
[2025-03-03T01:16:49Z INFO  lcli::skip_slots] Post-state validators size: 522924032-522924032 B/9432 B
[2025-03-03T01:16:49Z INFO  lcli::skip_slots] Post-state inactivity_scores size: 84594032-84594032 B/79783384 B

We've started working on a milhouse PR to intra-rebase a list on itself, i.e. to exploit internal structural sharing. If there are e.g 1k identical inactivity scores in a row, then we can reuse memory for them.

@michaelsproul
Copy link
Member Author

Latest investigation reveals sources of state cache miss:

  • BlocksByRange requests that span across the finalized epoch. We can probably tweak our logic to avoid this requiring a state lookup.
  • Gossip blocks and advance head. We seem to be flushing good states from the cache, to the point of having to reload them.

Ideas to fix:

  • Splice block roots from freezer with fork choice for BlocksByRange
  • Maybe protect the head state in the state cache
  • Maybe avoid adding "ancestor states" to the state cache

@realbigsean
Copy link
Member

BlocksByRange requests that span across the finalized epoch. We can probably tweak our logic to avoid this requiring a state lookup.
Splice block roots from freezer with fork choice for BlocksByRange

Made a PR for this here #7066

@realbigsean
Copy link
Member

I've made a PR here that tries to make the state cache more intelligent, it's a bigger/more complicated change

https://github.com/sigp/lighthouse/pull/7069/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Something to make Lighthouse run more efficiently. v7.0.0-beta.clean Clean release post Holesky rescue v7.0.0 New release c. Q1 2025
Projects
None yet
Development

No branches or pull requests

3 participants