Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate long epoch transition during Pyrmont sync #2046

Closed
twoeths opened this issue Feb 15, 2021 · 8 comments
Closed

Investigate long epoch transition during Pyrmont sync #2046

twoeths opened this issue Feb 15, 2021 · 8 comments
Assignees
Labels
scope-performance Performance issue and ideas to improve performance.

Comments

@twoeths
Copy link
Contributor

twoeths commented Feb 15, 2021

I was able to catch a 4.84s epoch transition in my local environment using Chrome profiler
Screen Shot 2021-02-15 at 18 42 17

An overview:

  • prepareEpochProcessState takes 530ms ++, and there are some small executions happened after that
  • processRewardsAndPenalties takes 2.4s, getAttestationDeltas takes 1.17s, epochBalances setter takes 764ms
  • rotateEpoch takes more than 1s, computeEpochShuffling takes 687ms
  • hashTreeRoot takes 425ms

event loop lag somehow was at 1.5s only, I'm attaching the snapshot as well.
CPU-20210215T150254-4s-epoch-transition.cpuprofile.zip

@wemeetagain
Copy link
Member

IMO we have likely milked as much performance out of things as reasonable and should just attempt to mitigate the slowness.
We should just break up epoch processing with await sleep(0); to just prevent any single piece of the epoch transition from taking

For instance, there's not much that can be done regarding re-hashing the balances. It's just computationally intensive work that has to be done.
Similar issue with just setting the state balances. The balances list needs to be converted to a tree, and that just takes time.

If we yield back to the event loop in between our longer-running subtasks, it likely won't even matter that the epoch transition takes time.

@dapplion
Copy link
Contributor

@wemeetagain I had a chat with Paul from Lighthouse and their epoch transitions take 20-30 ms, 100 times faster than ours. That's an insane difference that should be at least investigated.

@wemeetagain
Copy link
Member

It looks like we might not be iterating thru our cached validators/balances where we can?

@twoeths
Copy link
Contributor Author

twoeths commented Feb 19, 2021

getAttesationDeltas()/BigInt performance is really unpredictable (ranging from 98ms to 1375ms) as shown in PR #2064 , one possibility is to implement the function in assemblyscript. From a quick look, we can just pass balances, config params (5 of them), EpochProcess (may pass 20 different params), finalized epoch and receive back new balances (if assemblyscript does not support passing objects as argument).

not sure if we want to do a POC for it at this time.

@dapplion
Copy link
Contributor

not sure if we want to do a POC for it at this time.

I think it's worth it! Using assembly script won't introduce new complexities such as multi-treading

@twoeths
Copy link
Contributor Author

twoeths commented Feb 23, 2021

so I tried assemblyscript, one issue I had was OOM issue although I configured different gc strategies there and manually called __collect() (tried with v0.9.2), so for now I think this is more suitable to functions where we can allocate a static memory allocation similar to how as-sha256 was implemented.
Also in order to pass bigint to wasm we have to use --experimental-wasm-bigint flag which is a different barrier, not sure when that become official in nodejs.

@twoeths
Copy link
Contributor Author

twoeths commented Mar 9, 2021

current epoch transition for Pyrmont at around slot 363972 is 890ms.

@ChainSafe/eth2-0 I think the last thing to do for this issue is to add await sleep(0) in state transition and make sure we don't block for 200ms?

@dapplion dapplion added the scope-performance Performance issue and ideas to improve performance. label Jun 11, 2021
@dapplion
Copy link
Contributor

dapplion commented Sep 3, 2021

IMO we have likely milked as much performance out of things as reasonable and should just attempt to mitigate the slowness.
We should just break up epoch processing with await sleep(0); to just prevent any single piece of the epoch transition from taking

For instance, there's not much that can be done regarding re-hashing the balances. It's just computationally intensive work that has to be done.
Similar issue with just setting the state balances. The balances list needs to be converted to a tree, and that just takes time.

If we yield back to the event loop in between our longer-running subtasks, it likely won't even matter that the epoch transition takes time.

Oh man, we have milked so much more performance out of this 🎉

Closing, add new issues for specific improvement strategies

@dapplion dapplion closed this as completed Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope-performance Performance issue and ideas to improve performance.
Projects
None yet
Development

No branches or pull requests

3 participants