-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up altair block processing 2x #3115
Conversation
Like #3089, this PR drastially speeds up historical REST queries and other long state replays. * cache sync committee validator indices * use ~80mb less memory for validator pubkey mappings * batch-verify sync aggregate signature (fixes #2985) * document sync committee hack with head block vs sync message block * add batch signature verification failure tests Before: ``` ../env.sh nim c -d:release -r ncli_db --db:mainnet_0/db bench --start-slot:-1000 All time are ms Average, StdDev, Min, Max, Samples, Test Validation is turned off meaning that no BLS operations are performed 5830.675, 0.000, 5830.675, 5830.675, 1, Initialize DB 0.481, 1.878, 0.215, 59.167, 981, Load block from database 8422.566, 0.000, 8422.566, 8422.566, 1, Load state from database 6.996, 1.678, 0.042, 14.385, 969, Advance slot, non-epoch 93.217, 8.318, 84.192, 122.209, 32, Advance slot, epoch 20.513, 23.665, 11.510, 201.561, 981, Apply block, no slot processing 0.000, 0.000, 0.000, 0.000, 0, Database load 0.000, 0.000, 0.000, 0.000, 0, Database store ``` After: ``` 7081.422, 0.000, 7081.422, 7081.422, 1, Initialize DB 0.553, 2.122, 0.175, 66.692, 981, Load block from database 5439.446, 0.000, 5439.446, 5439.446, 1, Load state from database 6.829, 1.575, 0.043, 12.156, 969, Advance slot, non-epoch 94.716, 2.749, 88.395, 100.026, 32, Advance slot, epoch 11.636, 23.766, 4.889, 205.250, 981, Apply block, no slot processing 0.000, 0.000, 0.000, 0.000, 0, Database load 0.000, 0.000, 0.000, 0.000, 0, Database store ```
@@ -744,9 +747,9 @@ proc get_next_sync_committee*(state: altair.BeaconState | merge.BeaconState): | |||
# see signatures_batch, TODO shouldn't be here | |||
# Deposit processing ensures all keys are valid | |||
var attestersAgg: AggregatePublicKey | |||
attestersAgg.init(res.pubkeys.data[0].loadWithCache().get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with using the cache here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache would only be used once per day for the 512 lookups and uses quite a lot of memory (absolute minimum 144 bytes per validator + hashset and threadvar overhead which is significant) - thanks to the index cache (about 4kb fixed size), the cost no longer matches the benefit. There are no other uses of this cache outside of test cases (I'd generally remove it).
There's an additional optimization that can be done to pre-seed the statecache with the index cache as long as the states line up, but that's for a separate PR - the main benefit is during replays where many blocks need to be applied - the creation of the index mapping is the elephant in the room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the keys would end up in the cache anyway because the cache was also used in gossip validation. I guess it's now possible to introduce a more precise eviction of the cache around the update points of the SyncCommitteeCache
(or the cooked keys could be stored in the SyncCommitteeCache
, but this seems potentially sub-optimal if we end up needing the same validators keys in other caches).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have two caches of the same data, where one would be enough: loadWithCache
and validatorKey
- the latter is tied to our database storage and should be preferred, also because constructing it is faster: loadWithCache
works from compressed keys. If anything, we could keep cooked keys in the in-memory representation that backs validatorKey
@@ -414,4 +414,46 @@ proc collectSignatureSets*( | |||
volex.message.epoch, | |||
DOMAIN_VOLUNTARY_EXIT) | |||
|
|||
block: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For documentation it would be nice to have a comment section
7. SyncAggregate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8e445dc - there's not much to write though, like proposer signature, it's just a single field with a signature in it
var resCur: Table[ValidatorPubKey, int] | ||
var resNxt: Table[ValidatorPubKey, int] | ||
var resCur: Table[uint64, int] | ||
var resNxt: Table[uint64, int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some alias type that tells us what this magic uint64 corresponds to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they're uint64
values from the message that are repeated in the response, and in the message, they don't have any alias type - they're kind of validator indices, except we don't want to use ValidatorIndex
because that's messy from a type size perspective.
syncCommitteeSlot, | ||
committeeIdx, | ||
msg.message.contribution.aggregation_bits): | ||
let validatorPubKey = validatorPubKey.loadWithCache.get | ||
let validatorPubKey = dag.validatorKey(validatorIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous use of the cache here was quite important for performance. @mratsim, is dag.validatorKey(index)
considered fast enough? (it's based internally on loadValid
which in turn is based on fromBytesKnownOnCurve
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block_sim
is a good benchmark for this, because it simulates the full network traffic around sync committees and their aggregation. The ncli_db bench
just replays blocks, so it covers only the sync aggregate verification code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutableValidatorData2
stores uncompressed pubkeys that are quite fast to load - this was introduced a while ago, giving a decent performance boost: d859bc1
We're using this same approach for attestations and everything else, pretty much - if we're going to update it, we should update it across the board by changing the backend storage for validatorKey
- sync message validation was the outlier here, and loadWithCache
is in general a problematic construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting before/after block_sim
results will be a convincing evidence that there is no performance regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a benchmark for pubkeys and signature deserialization.
On a 11th gen Intel it's 14 µs per key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that said, it's likely a valid optimization point regardless, to load the data into cooked pubkeys on startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure 13440 is relevant here - different data set, multiple flaws in methodology
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if loading uncompressed keys is so fast, loading them on start-up shouldn't be a problem either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate PR though, since it affects attestations significantly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mratsim what are you measuring there btw? compressed or uncompressed deserialization? |
Like #3089, this PR drastially speeds up historical REST queries and
other long state replays.
Before:
After: