Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf/improve snap request concurrency #5342

Merged
merged 14 commits into from
Mar 7, 2023

Conversation

asdacap
Copy link
Contributor

@asdacap asdacap commented Feb 26, 2023

  • Improve snap request concurrency by splitting top level account range into n configurable, fixed sized partition.
  • Improve my snap sync time by about 20% to 25%, but more importantly, reliably saturate my ssd, so snap sync now reliably take less than one hour to complete.
  • This works as currently only some peers are used in snap sync as it is limited by the top level account range progress to know what else to request. If the network is fast, and nethermind got lucky to get a fast peer (and it dispatch account range request to that peer), the sync will be quick. Otherwise, it does not matter how fast your internet and ssd are.
  • This change split the top level account range into n (8 by default) partition and dispatch them concurrently to multiple peer, significantly increase the request rate.
  • Snap protocol already have limit parameters, so this works well.

Screenshot from 2023-02-26 11-23-56

Changes

  • Add config to specify num of partition.
  • Dispatch multiple concurrent account range request where possible.

Types of changes

What types of changes does your code introduce?

  • Optimization

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

  • Multiple successful mainnet sync.
  • Had to stop mainnet VerifyTrie, its taking too much time. Last progress: 2023-02-28 00:03:43.0144|WARN|13|Collected info from 1100001490 nodes. Missing CODE 0 STATE 0 STORAGE 0 .
  • Goerli can sync
  • Sepolia still looking for peer (known issue) and my new ISP have this nasty habit of not working randomly when I start nethermind. It's synced now. Don't know when, just left it on for the whole day.

Other notes

  • It seems to be possible to remove the commit lock by creating a new triestore every time. This further reduce sync time by another 20%. I did not include that here as that seems like something that will silently break things, maybe on a different PR.

@asdacap asdacap marked this pull request as ready for review March 1, 2023 04:58
@LukaszRozmej LukaszRozmej requested a review from dceleda March 1, 2023 09:35
Copy link
Member

@dceleda dceleda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now there will be always healing required, even for small states because stitching between partitions won't work but I don't think it's a problem.
In general looks good. A few minor comments added.
@kamilchodola we need to run multiple test syncs on mainnet and then Verify the whole tree although it's going to be slow.
We should also test scenarios when we close the node after Phase 1 and start again etc.

@dceleda
Copy link
Member

dceleda commented Mar 5, 2023

@damian-orzechowski please also review.

@asdacap
Copy link
Contributor Author

asdacap commented Mar 5, 2023

Aside from the fact that the partitions is going to cause interleaving range from different root hash, why does the stitching does not work? Or do you mean it does not work as well?

@dceleda
Copy link
Member

dceleda commented Mar 6, 2023

Aside from the fact that the partitions is going to cause interleaving range from different root hash, why does the stitching does not work? Or do you mean it does not work as well?

I'll explain on a call.

@dceleda dceleda merged commit 1cc8729 into master Mar 7, 2023
@dceleda dceleda deleted the perf/improve-snap-request-concurrency branch March 7, 2023 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants