Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔷 [ProjectTracking] Implement column-dependant garbage collection limits #10497

Open
4 tasks
posvyatokum opened this issue Jan 25, 2024 · 0 comments
Open
4 tasks

Comments

@posvyatokum
Copy link
Member

Goals

Background

GC = garbage collection
Right now GC first selects the number of blocks to delete, and then deletes everything that we can delete for that blocks. That means that GC limits (number of epochs to keep) are set by the most demanding columns. But not every column needs to be available the same number of blocks/epochs.

Why should NEAR One work on this

By introducing a separate limit for State and TrieChanges columns we can save disk space during high load. Notably, during first Cosmose onboarding we saw increase in RocksDB size of about 20GB per saved epoch. So, 100GB by default, and 200GB for internal nodes that were keeping 10 epochs. This was entirely due to garbage collectable State/TrieChanges data that we were keeping because of GC limits.

What needs to be accomplished

  • Implement separate configurable GC limit for State/TrieChanges.
  • [Optional] Implement separate configurable GC limit for every column.

Main use case

With this config option validators can decrease RocksDB size (and through this action also improve block production time).

Links to external documentations and discussions

Zulip thread about this idea.
Zulip thread about the relevant issue on mainnet.

Estimated effort

For the State/TrieChanges part of the project I expect 2-3 weeks effort from one engineer.

Assumptions

There are no specific assumptions that this project is making.

Pre-requisites

N/A

Out of scope

N/A

Task list:

  • Transpose GC from
heights_to_gc = calculate_heights_to_gc()
for height in heights_to_gc:
    for column in columns:
        do_gc(column, height)

to

for column in columns:
    heights_to_gc = calculate_heights_to_gc(column)
    for height in heights_to_gc:
        do_gc(column, height)

to allow different GC restrictions for different columns. Both code snippets are gross oversimplifications of the actual code.

  • Establish minimum limits for State GC (how many blocks do we need to keep for block production, Flat Storage, State Sync, and other projects to work)
  • Add config for State/TrieChanges GC limit with value checks
  • Refactor RPC methods that read State
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ideas for projects
Development

No branches or pull requests

1 participant