Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix disconnecting peers during snap sync #4528

Merged
merged 3 commits into from
Sep 9, 2022

Conversation

asdacap
Copy link
Contributor

@asdacap asdacap commented Sep 6, 2022

Fix disconnecting peers during snap sync by incrementally increase snap sync's response byte limit until its latency goes too high. Latency threshold was set low since 3s+ still causes some snap sync failure. Previously, snap request was too large and was also causing FCU peer refresh to fail which trigger peer disconnect.

Screenshot from 2022-09-06 11-08-52

Alternatively just reducing the respone bytes to fixed 200KB works also.

Changes:

  • Auto adjust response bytes.

Types of changes

What types of changes does your code introduce?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Other (please describe):

Testing

Requires testing

  • Yes
  • No

In case you checked yes, did you write tests??

  • Yes

  • No

  • Goerli can sync to head.

  • Performance implication is unclear.

Further comments (optional)

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

@asdacap asdacap requested review from MarekM25 and dceleda September 6, 2022 03:26
@dceleda
Copy link
Member

dceleda commented Sep 6, 2022

@asdacap do you know why it (2_000_000) wasn't a problem before and suddenly we started to have the problem with peers?

@asdacap
Copy link
Contributor Author

asdacap commented Sep 6, 2022

I've noticed it before, but I've increase FCU PeerRefresher's timeout to compensate. It's unclear if there are any trend in the network that causes it to be a bigger problem.

Copy link
Contributor

@MarekM25 MarekM25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it looks good. It would be great to get approval from @dceleda too

@asdacap asdacap merged commit 092fb5d into master Sep 9, 2022
@asdacap asdacap deleted the fix-disconnecting-peers-during-snap-sync branch September 9, 2022 15:08
@dceleda
Copy link
Member

dceleda commented Sep 9, 2022

@asdacap have you compared sync times before and after the change?
Any improvement?

@dceleda
Copy link
Member

dceleda commented Sep 9, 2022

I've run it on my Fast VM and the time is not worse than it used to be - just below 2h.
Looks good then :)

@asdacap
Copy link
Contributor Author

asdacap commented Sep 10, 2022

About 20% improvement on goerli. No significant change on mainnet for some reason.... Positive anecdotes from Kamil.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants