Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Fix a storage leak in parachains db #5594

Merged
merged 15 commits into from
Jun 13, 2022
Merged

Conversation

eskimor
Copy link
Member

@eskimor eskimor commented May 25, 2022

This PR makes sure old votes are pruned from the database in dispute-coordinator.

Should be tested on Versi and with paritydb.

  • Tested on Rococo - disk usage seems to go down.
  • Tested on paritydb: Nodes are crashing, but things are working after a db reset:
 WARN tokio-runtime-worker sc_service::builder: The NetworkStart returned as part of `build_network` has been silently dropped    
Error: 
   0: InvalidConfiguration("Column config mismatch for column 4. Expected \"preimage: false, uniform: false, refc: false, compression: 0, ordered: true\", got \"preimage: false, uniform: false, refc: false, compression: 0, ordered: false\"")
  • burnin on Kusama and check for saved storage space.

Release Notes

Operators of nodes running paritydb will need to delete the parachains database before upgrading.

@github-actions github-actions bot added the A3-in_progress Pull request is in progress. No review needed at this stage. label May 25, 2022
@eskimor eskimor added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels May 25, 2022
@eskimor eskimor requested a review from sandreim May 25, 2022 13:57
@eskimor eskimor marked this pull request as ready for review May 25, 2022 14:07
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

node/core/dispute-coordinator/src/db/v1.rs Outdated Show resolved Hide resolved
#[cfg(test)]
const MAX_CLEAN_BATCH_SIZE: u32 = 10;
#[cfg(not(test))]
const MAX_CLEAN_BATCH_SIZE: u32 = 300;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the impact of doing this many items on each earliest session update? (for nodes who have a lot of dangling storage items to clean)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I have seen so far, it seems to be pretty fast - although that might have been mostly empty sessions. Worst thing that could happen is that a valdiator is heavily loaded at a session boundary and fails to do some work. I also don't have any good data yet about how much wasted storage we are actually talking about, if it is tiny we can go with smaller batch sizes as then it does not matter if it takes forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

burnin on Kusama will tell.

@eskimor eskimor changed the title Fix cleanup of old votes. Fix a storage leak in parachains db Jun 3, 2022
Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did burn-in on Kusama go? What are the expected savings?

node/core/dispute-coordinator/src/db/v1.rs Outdated Show resolved Hide resolved
node/core/dispute-coordinator/src/db/v1.rs Outdated Show resolved Hide resolved
eskimor and others added 2 commits June 4, 2022 10:00
Co-authored-by: Andronik <write@reusable.software>
Co-authored-by: Andronik <write@reusable.software>
@eskimor
Copy link
Member Author

eskimor commented Jun 4, 2022

burnin on Kusama was not done yet. On Rococo I don't have definite data, but it looks like around 10%. I will add a metric how long cleanup rounds take and will then burnin on Kusama.

@eskimor eskimor removed the B0-silent Changes should not be mentioned in any release notes label Jun 8, 2022
@eskimor eskimor merged commit 3e4fe06 into master Jun 13, 2022
@eskimor eskimor deleted the rk-fix-dispute-storage-leak branch June 13, 2022 13:38
@eskimor
Copy link
Member Author

eskimor commented Jun 13, 2022

burnin looks fine so far. Given that cleanup takes a while, let's include this sooner than later.

al3mart pushed a commit that referenced this pull request Jul 14, 2022
* Fix cleanup of old votes.

* Cleanup.

* Get rid of redundant import

* Tests + logging

* Fix db key name.

* Add some reasoning to batch size.

* Add dispute data to indexed columns

* Fix fmt

* Add helper function.

* Fix typos.

* Update node/core/dispute-coordinator/src/db/v1.rs

Co-authored-by: Andronik <write@reusable.software>

* Update node/core/dispute-coordinator/src/db/v1.rs

Co-authored-by: Andronik <write@reusable.software>

* Add metric for how long cleanup takes.

Co-authored-by: Andronik <write@reusable.software>
@Generic-Chain
Copy link

Generic-Chain commented Jul 18, 2022

this has not been resolved, since the release of v0.9.25 (and after a full cleanup of parachains/db folder) it has gradually increased with .sst files left behind since Jul 6 every day. At the moment on a validator that is active it has gotten to 376 .sst files total and 23GB in size of the parachains/db folder.

du -hs .local/share/polkadot/chains/ksmcc3/db/full/parachains/db/
24G .local/share/polkadot/chains/ksmcc3/db/full/parachains/db/

ls -ltr .local/share/polkadot/chains/ksmcc3/db/full/parachains/db/*.sst | wc -l
376

@eskimor
Copy link
Member Author

eskimor commented Jul 22, 2022

Thanks @Generic-Chain ! Indeed there exists another leak, @vstakhov already identified that one as well and fixed it.

@vstakhov
Copy link
Contributor

It is not very likely that the leak I have fixed could lead to 23Gb of storage waste. One thing we could try is to examine the current database to figure out what's happening.

@Generic-Chain
Copy link

I'm just reporting what I am seeing, I'm not familiar with what exactly was fixed and if there's some other storage leak unrelated to this one - the files are there since the cleanup and upgrade and the size and count keeps increasing

8 days after my post is up to 40GB and 576 .sst files (so +16GB and +200 .sst files)

du -hs .local/share/polkadot/chains/ksmcc3/db/full/parachains/db/
40G .local/share/polkadot/chains/ksmcc3/db/full/parachains/db/

ls -ltr .local/share/polkadot/chains/ksmcc3/db/full/parachains/db/*.sst | wc -l
576

anyone with an active validator can confirm this - I have 2 and both are having the same issue, 2nd one has 40GB and 545 files in the parachains/db folder

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants