Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multithreading on P2P deserialization workload #4637

Merged
merged 2 commits into from
Oct 12, 2022

Conversation

asdacap
Copy link
Contributor

@asdacap asdacap commented Sep 26, 2022

Fixes issue with old bodies download capped by unknown force.

Changes:

  • Add P2PHandlerThreadCount config.
  • Set default thread count to ProcessorCount - 2.

Types of changes

What types of changes does your code introduce?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Other (please describe):

Testing

Requires testing

  • Yes
  • No

In case you checked yes, did you write tests??

  • Yes
  • No

Running smoke...

Further comments (optional)

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

_group = new SingleThreadEventLoop();

_group = new MultithreadEventLoopGroup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that we should consider. If we do more writes during the old bodies/receipts sync, we will have more problems with attestations during that time. On the other hand, we will end up the sync faster, so not sure what is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can specify number of thread. Default is core count * 2.

Copy link
Member

@LukaszRozmej LukaszRozmej Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math.Max(Environment.ProcessorCount- 2, 1) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also just ask user to set less peer count if sync is too resource intensive. They would have the same problem if they set very high sync peer count anyway.

@asdacap
Copy link
Contributor Author

asdacap commented Sep 26, 2022

3 out of 4 smoke test failed with invalid block. Isolating pr....

@asdacap asdacap force-pushed the fix/deserialization-capped-to-one-thread branch 8 times, most recently from a5d3433 to 85c035d Compare October 4, 2022 14:06
@asdacap asdacap force-pushed the fix/deserialization-capped-to-one-thread branch from 85c035d to 2ad3ba2 Compare October 5, 2022 10:44
@asdacap asdacap force-pushed the fix/deserialization-capped-to-one-thread branch from 2ad3ba2 to 4b4e496 Compare October 5, 2022 12:54
@asdacap
Copy link
Contributor Author

asdacap commented Oct 6, 2022

Invalid block is probably due to the package issue. (this was in develop).
Ran again with configurable thread count. Following is old bodies duration.

  • Lighthouse: Stuck on slow peer vs 2 hours.
  • Lodestar: 1 hours 20 minute vs 1 hours 20 minute.
  • Prysm: 1 hours 20 minute vs 1 hours 20 minute.
  • Teku: 1 hours 20 minute vs 1 hours 30 minute.

So that is disappointing.

@asdacap
Copy link
Contributor Author

asdacap commented Oct 7, 2022

Mixed with #4725 and #4692 to reduce likelyhood of stuck on slow peer and maximize efficiency.

1 thread vs 8 thread.
Lighthouse: 7 hours 40 minute vs 7 hours.
Lodestar: 5 hours 30 minute vs 5 hours.
Prysm: 5 hours 15 minute vs 4 hours 45 minute.
Teku: 5 hours 45 minute vs 5 houres 15 minute.

Second run have regression across the board, brobably unrelated to this PR.

@asdacap
Copy link
Contributor Author

asdacap commented Oct 10, 2022

Locally is caused segmentation fault even if I set the thread count to 1. Accidentally patched in a known problematic commit.

@asdacap asdacap marked this pull request as ready for review October 10, 2022 06:07
@asdacap
Copy link
Contributor Author

asdacap commented Oct 11, 2022

Another runs 1thread vs 8thread:

  • lighthouse: 7 hours 40 minute vs 7 hours 30 minute.
  • lodestar: 5 hours 45 minute vs 5 hours 45 minute.
  • prysm: 5 hours 30 minute vs 5 hours 15 minute.
  • teku: 5 hours 30 minute vs more than 11 hours. Not complete yet.

@asdacap
Copy link
Contributor Author

asdacap commented Oct 11, 2022

Teku was scheduled on an epyc 7601 while the rest is on epyc 7713. Which explain everything....

@asdacap
Copy link
Contributor Author

asdacap commented Oct 12, 2022

More runs when requested 16 core instance:
1 thread vs 8 thread.

  • Lighthouse: 14 hours (schedulet on slow vm) vs 4 hours 35 minute.
  • Lodestar; 4 hours 5 minute vs 4 hours.
  • Prysm: 4 hours vs 3 hours 50 minute.
  • Teku: 3 hours 50 minute vs 4 houres 5 minute.

@asdacap
Copy link
Contributor Author

asdacap commented Oct 12, 2022

Default to CPU count - 2. No regression found after a lot of smoke tests, but usually small improvement exists when measusing whole sync time.

@LukaszRozmej
Copy link
Member

Approved, but looks like current improvements are minimal?

@asdacap
Copy link
Contributor Author

asdacap commented Oct 12, 2022

Unfortunately yes.

@asdacap asdacap merged commit 915c016 into master Oct 12, 2022
@asdacap asdacap deleted the fix/deserialization-capped-to-one-thread branch October 12, 2022 12:33
asdacap added a commit that referenced this pull request Oct 13, 2022
MarekM25 pushed a commit that referenced this pull request Oct 13, 2022
rubo pushed a commit that referenced this pull request Oct 13, 2022
* Make P2P handler thread count configurable

* Auto set based on CPU count.
rubo pushed a commit that referenced this pull request Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants