Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snap sync does not work on polygon fork #25965

Closed
manav2401 opened this issue Oct 11, 2022 · 4 comments
Closed

snap sync does not work on polygon fork #25965

manav2401 opened this issue Oct 11, 2022 · 4 comments

Comments

@manav2401
Copy link
Contributor

manav2401 commented Oct 11, 2022

Hey team, we’ve been unable to run snap sync on our mainnet (https://github.com/maticnetwork/bor). To be more specific, it completes the state sync phase, but keeps running in a never ending healing phase. We are aware that the sync has to run faster than the block production and state in order to finish. We have performed some experiments and checks at our end (The block time and block gas limit for our mainnet is 2s and 30M respectively). Also, we know that the process is I/O and network heavy and we've allocated more than enough in these machines.

  1. We tried running a node in snap sync on our mumbai testnet to make sure the issue has nothing to do the other components of our PoS chain specifically the consensus. It works well on the testnet.
  2. We tried scaling up some of the parameters involved in the sync mechanism like the pivot marker, the size (bytes) of data to be received in the snap sync trie node (and storage) requests, the dynamic timeout to see the behaviours. We did not see any significant changes in the mechanism. Also, we’re not sure which metrics would be appropriate to checkout while modifying these parameters. Well, to be specific, modifying the pivot parameters didn’t really work as it stopped the healing phase and node stopped syncing totally.
  3. We conducted an experiment where we took a full synced mainnet node, disconnected it from outer world and only let 1 fresh node sync from it using snap sync. We saw a lot of peer connectivity issues and after some point, the snap sync node wasn’t able to connect to the full synced peer (maybe it figured out that the opposite peer is stale?). This was an attempt to see if the issue is with the state moving fast or not.

We’re currently exploring some ways to understand the mechanism and internals through tests as we thought just tweaking the parameters might not help and would make the process much longer. But, it would be great if you can suggest us some important points/places to look at to dig further, or some experiments to conduct and ways to do so (like finding more internal details about the trie nodes and the rate at which they’re being produced vs downloaded, etc).

Let us know, if there's anything more which we can share from our end. Thanks!

EDIT: the tag was auto chosen to "docs". I'd put it under "help wanted".

@holiman
Copy link
Contributor

holiman commented Oct 11, 2022

We tried scaling up some of the parameters involved in the sync mechanism like the pivot marker, the size (bytes) of data to be received in the snap sync trie node (and storage) requests, the dynamic timeout to see the behaviours.

  1. Pivot. Ideally, you never want to pivot. Pivoting is not something we want to do, it's a necessity due to the fact that the peer(s) do in-memory pruning. The (geth) in-memory pruning does not touch the most recent 128 states, but once it becomes older than 128, we gc as best we can. So, if you want your peers to deliver, you need to ask them about roots that are within the last 128 blocks, otherwise you'll get no responses.

So adjusting the pivot block is not really doable, unless the whole peer ecosystem changes the pruning thresholds.

  1. Size of data. Retrieving larger trie node responses during healing may have some effect, but it's not an easy problem. The trienode healing is essentially a classic fast-sync, and we can only start storing to disk once we reach the leaf-level. If we ask for too much data, we will expand the trie-iteration too much on the breadth -- what we ideally want to do is go depth-first. We recently made a fix in this area (eth/protocols/snap: throttle trie heal requests when peers DoS us #25666) , for specifically this problem.

Anyway, tldr; if you change the trienode heal request/response size, you may shoot yourself in the foot. Changing the size on other types (account/storage) is probably less dangeous, but you should keep an eye out for timeouts -- if a request times out the data is thrown away, so that would be a net loss.

We conducted an experiment where we took a full synced mainnet node, disconnected it from outer world and only let 1 fresh node sync from it using snap sync.

This should work fine, but a couple of caveats... Let's call the full-synced node A.

If A is shut down, then restarted with e.g netrestrict so that it only sees local nodes, then you have a problem. Because the shutdown will cause it to store only trie nodes for three states: head, head-1 and head-127 (iirc). It will happily try to serve snap-data, and it has a lot of snap data for all the most recent 128 layers, but it will only be able to provide proofs (from the trie) for those three specific states. Requests for any other root will not yield any response.

So what you need to do is basically to start it up, and then import 128 blocks, and then take it offline. Or, set it to gcmode=archive, let it import 128 blocks in archive-mode, shut it down. After that, you can boot it up, and it will have all the last 128 states available.

Also, here are some recent fixes we've made to snap-sync:
https://github.com/ethereum/go-ethereum/pulls?q=is%3Apr+is%3Aclosed+label%3Abackport

We are going to backport them and make a new release with them. I recommend that you also make use of these fixes.

Lastly, re places to dig further. It would benefit you to understand your own trie-churn, given your time (2s) and gas 30M. If you put the node in archive-mode, each new node will be stored to disk, after each block. During commit, you should thus be able to get some raw figures on exactly "how many trie updates are performed during a block".

Now, let's say it's for example 5k modifications, spread out across the trie. Then you could model how many trie heal requests would be needed to heal that.

I'll leave this ticket open for a while longer, in case @karalabe has anything to add.

@manav2401
Copy link
Contributor Author

Thanks a lot for the response.

So adjusting the pivot block is not really doable, unless the whole peer ecosystem changes the pruning thresholds.

I see. Also, just to confirm, peers would only respond if they have those blocks in the difflayer which is present due to the --snapshot flag, right?

Anyway, tldr; if you change the trienode heal request/response size, you may shoot yourself in the foot. Changing the size on other types (account/storage) is probably less dangeous, but you should keep an eye out for timeouts -- if a request times out the data is thrown away, so that would be a net loss.

Alright. Yes, we did try using a static timeout (~30s I believe) instead of a dynamic logic in there.

Re that experiment, didn't think of this. We'll try to pull in some changes/fixes which you've done and try re-running that experiment again by making sure that last 128 block states is available with the full synced node.

Lastly, re places to dig further. It would benefit you to understand your own trie-churn, given your time (2s) and gas 30M. If you put the node in archive-mode, each new node will be stored to disk, after each block. During commit, you should thus be able to get some raw figures on exactly "how many trie updates are performed during a block".

Any scripts that can help us speed up the process?

We'd quickly get moving with these action items and share the results here itself. CC: @JekaMas

@holiman
Copy link
Contributor

holiman commented Oct 11, 2022

I see. Also, just to confirm, peers would only respond if they have those blocks in the difflayer which is present due to the --snapshot flag, right?

In order for a peer to deliver responses to account/storage requests, they need to have the snapshot layer for that root, and the trie nodes for that root.

In order for a peer to deliver responses to trie healing requests, they theoretically only need to have the trie nodes for that root. In practice, however, they required having the snapshot layer too, until this PR (#25644).

Any scripts that can help us speed up the process?

Nothing off the top of my head, no. This might also be interesting to look into: #25022

@holiman
Copy link
Contributor

holiman commented Mar 8, 2023

Seems answered, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants