Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix forward sync may stuck on slow peer #4725

Merged
merged 2 commits into from
Oct 13, 2022

Conversation

asdacap
Copy link
Contributor

@asdacap asdacap commented Oct 6, 2022

  • Fix forward sync stuck on slow peer as it will keep picking peers with speed, which will always be one as only one was tried.

Changes:

  • Modified BySpeedStrategy to pick peer with no speed as long as number of peer with known speed is less than 5.
  • Modified BlockDownloader to not cancel current download immediately. It will continue processing downloaded blocks before checking if a better peer is available.

Types of changes

What types of changes does your code introduce?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Other (please describe):

Testing

Requires testing

  • Yes
  • No

In case you checked yes, did you write tests??

  • Yes
  • No

Comments about testing , should you have some (optional)

  • About 24 to 32 mainnet smoke tests. None stucked on slow peers.

Further comments (optional)

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

@asdacap
Copy link
Contributor Author

asdacap commented Oct 10, 2022

Piggibacked this on several smoke tests for consistency. 32 mainnet runs probably. No run stucked on slow peer.

@asdacap asdacap force-pushed the fix/forward-sync-stuck-on-slow-peer branch from 5e2c230 to 391db6e Compare October 10, 2022 02:44
@asdacap asdacap marked this pull request as ready for review October 10, 2022 02:49
[TestCase(10, 0, 1, 0)]
[TestCase(10, 10, 1, 0.5)]
[TestCase(10, 10, 0.5, 0.25)]
[Retry(3)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is random, already have margin of error, but sometimes it still fails.

Copy link
Contributor

@MarekM25 MarekM25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok. It would be great to sync pre-merge node/nodes too. For example, gnosis. We should check engine API hive tests too. I would consider running it with higher pruning cache to see how node will recover after timeout.

@asdacap asdacap force-pushed the fix/forward-sync-stuck-on-slow-peer branch from 391db6e to e70ec56 Compare October 12, 2022 04:22
@asdacap
Copy link
Contributor Author

asdacap commented Oct 12, 2022

All hive test passed.

@asdacap asdacap requested a review from MarekM25 October 12, 2022 12:36
@asdacap asdacap merged commit dfe3ab4 into master Oct 13, 2022
@asdacap asdacap deleted the fix/forward-sync-stuck-on-slow-peer branch October 13, 2022 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants