-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Penalties+activs+exits only at 64-epoch boundary #2192
Conversation
Moves all validator activations and exits to a multiple of 64 epochs, and amortizes offline penalties so that they are also processed at that boundary. This makes chains with "empty epochs" easier to process, because all but 1/64 of epochs would not require validator-set or balance-related processing. Also adds per-validator quadratic leaks as in #2125.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the first pass.
clean!
Co-authored-by: Hsiao-Wei Wang <hsiaowei.eth@gmail.com>
Co-authored-by: Hsiao-Wei Wang <hsiaowei.eth@gmail.com>
64-epoch-boundary cleanups and tests
…into vbuterin-patch-10
ad331c6
to
4d13522
Compare
Note, I reverted this PR to pre-justin cleanups and moved @JustinDrake's cleanups to a PR to this PR for easier review |
cded289
to
f4dbb40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favour of the per-validator quadratic leak, but skeptical of the amortised epoch transition for these reasons:
- Having a regular penalty that wipes out previous gains is a bit strange from the perspective of a validator and I suspect we will have to spend considerable energy explaining why this is OK (or necessary). We already have validators on Discord freaking out about the tiniest penalties for a single epoch, let alone a punishing ~0.0005 ETH penalty every 64 epochs which can't be avoided. I can't see a way to remove the sawtooth penalty mechanism without unmaking the amortisation.
- The immediate benefit of the amortisation would be a compute time benefit for forks < 64 epochs in duration, yet I think resilience against DoS by forking is determined more by how clients store states in memory and on disk. In Lighthouse for example, we are still copying around full
BeaconState
objects in memory (no copy-on-write binary merkle trees yet), and storing full states on disk with repetition (no fancy deduplication or storing of diffs). The way we've made the client good enough for current network conditions is by ruthlessly reducing the frequency with which clones and disk writes of these states occur (particularly for finalized states, which are de-duplicated a little). In this sense I'm concerned the PR might be a premature optimisation. On the other hand, a mostly-immutable validator set could be exploited for memory + storage optimisations in future. - Clients try to avoid spikes in resource consumption, yet the amortisation of epoch processing works contrary to this aim. Pushing extra tree hashing (caused by mutation of the validator set) and state storage to the period boundary could spike load, potentially impacting other duties (block proposal, etc). This also creates a larger DoS target for an attacker who is willing to wait for a period boundary to throw a bunch of forks at the network.
I'm open to changing my mind, but these are my thoughts after ~1 day chewing over this proposal
@@ -341,49 +360,72 @@ def get_flag_deltas(state: BeaconState, | |||
for index in get_eligible_validator_indices(state): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop could be written more succinctly as:
for index in unslashed_participating_indices:
base_reward = get_base_reward(state, index)
penalty_canceling_reward = base_reward * numerator // FLAG_DENOMINATOR
# ... and so on
(as there's no else
branch for the penalty anymore)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any such cleanups please see #2212 :) I believe this is addressed there.
|
||
for index in get_eligible_validator_indices(state): | ||
# Inactivity-leak-specific penalty | ||
if state.leak_score[index] >= EPOCHS_PER_ACTIVATION_EXIT_PERIOD * LEAK_SCORE_BIAS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking that the logic behind this threshold is that quadratic leaks on timespans shorter than EPOCHS_PER_ACTIVATION_EXIT_PERIOD
are inconsequential anyway (would evaluate to a 0 penalty)?
Thanks for the feedback, this is helpful!
Hmm, I see the point and how it could make the validator experience less intuitive. But does there have to be a sawtooth at UI level though? You could just show the balances at the beginning of the period (and we could enshrine this further by having an incrementing counter instead of updating balances), or we could add a non-consensus function
By far the bulk of the state is the validator set, so if this PR ensures that the validator set is completely immutable between 64 epoch boundaries, that would be a large win for dealing with that kind of DoS attack, no? I imagine it would also make fork choice optimizations easier.
Would there actually be that much more extra tree hashing at the period boundary? Validator balance updates would only happen when hysteresis demands it, which would continue to be on average something like once every 2-6 months per validator. |
|
Thank you both for the thoughtful defence. Regarding (1) I'm happy with the Regarding (2) and (3)
Having thought about it more I think you're probably right. For sake of argument let's assume that the period boundary is approximately the same cost as an epoch boundary today. Even under this assumption, I think the optimisation is of dubious usefulness for security. In the case where an attacker is trying to DoS the network by producing forks, even with this change they can DoS just as effectively by simply waiting for a period boundary before they produce their fork blocks. By making blocks with parents equal to That's looking at things from a security PoV. In terms of optimising the chain during normal operation it might be more viable. I suspect that writing the validator set to disk involves a large constant factor, something like |
By far, like in lighthouse, the dominant bottleneck in Nimbus is disk I/O - to give an idea, processing an epoch transition takes ~15ms, interacting with a state on disk can take up to a second when combined with other epoch-related database processing, so like lighthouse, we do everything we can to avoid that - in particular, we will only write states that result from blocks that are viable for fork choice. The second, more dubious perhaps, thing we do is limit the horizon of empty slot processing for attestation validation - for example, if someone votes for a target epoch rooted in a very old block, we will not process the attestation as this would require hitting the disk and do many epochs of empty slot processing - in a healthy network, this condition is never triggered because finalization ensures that the target root must be recent, but in the case of non-finality, we deem such votes unlikely to bring significant value and simply discard them as loading the state in this scenario is too costly at current disk I/O rates. Finally, if someone produces a block that requires a lot of empty slot processing, the bottleneck will still be disk I/O, but it seems that the rate at which one can produce such forks is pretty low and a sustained attack is difficult. To mitigate fork scenarios (and to prep for 200k validators) we're in the process of splitting out the truly immutable part of the validator set (keys, withdrawals) from the mutable parts and storing these separately - this allows us, for each snapshot we consider useful, to not store 12mb out of the 16mb that a state takes up (mainnet, 100k validators) - it turns out that the mutable part also compresses well with snappy, while the immutable part doesn't, further adding to the benefits - there are some benchmark numbers in the linked PR that might be interesting to look at. (3)
Excluding BLS, we currently spend ~25% of CPU time updating the merkle tree cache and ~20% shuffling - if accumulating 64 epochs of work ends up dirtying a larger part of the tree, this could make the tree cache update a tiny bit more expensive, but probably not significantly so. In the same vein, we don't have an optimal tree cache right now - we don't cache hashes of individual fields but rather treat the entire validator as one cache entry - this is close enough that we didn't want to complicate the code further at this time - maybe we could offset any 64-epoch buildup by addressing these kinds of issues, but it doesn't seem likely that we would, any time soon.
By far the bulk of the validator set is the key and withdrawal - the rest compresses away. All in all, this change looks both low-risk and low-benefit, and we'd be happy to leave it for another fork - the risks it mitigates seem remote, while at the same time there are much more expensive things going on in clients, specially wrt managing the truly immutable part of the validator set - "completely immutable for 64 epochs" isn't as attractive an optimization target as "completely immutable, period" because of the complexity involved in managing those mutations coupled with the limited benefits (because compression etc). We'll be in a pretty good spot already when this is the lowest fruit on the tree :) |
This optimization will not go into Altair. More R&D is desired before making such spec optimizations. See call notes here ethereum/eth2.0-pm#208 |
Moves all validator activations and exits to a multiple of 64 epochs, and amortizes offline penalties so that they are also processed at that boundary. This makes chains with "empty epochs" easier to process, because all but 1/64 of epochs would not require validator-set or balance-related processing.
Also adds per-validator quadratic leaks as in #2125.