Penalties+activs+exits only at 64-epoch boundary #2192

vbuterin · 2021-01-27T09:50:27Z

Moves all validator activations and exits to a multiple of 64 epochs, and amortizes offline penalties so that they are also processed at that boundary. This makes chains with "empty epochs" easier to process, because all but 1/64 of epochs would not require validator-set or balance-related processing.

Also adds per-validator quadratic leaks as in #2125.

Moves all validator activations and exits to a multiple of 64 epochs, and amortizes offline penalties so that they are also processed at that boundary. This makes chains with "empty epochs" easier to process, because all but 1/64 of epochs would not require validator-set or balance-related processing. Also adds per-validator quadratic leaks as in #2125.

djrtwo

I like the first pass.
clean!

specs/lightclient/beacon-chain.md

@DannyRyan

Thanks @DannyRyan!

specs/lightclient/beacon-chain.md

Co-authored-by: Hsiao-Wei Wang <hsiaowei.eth@gmail.com>

…ak-tests

64-epoch-boundary cleanups and tests

…into vbuterin-patch-10

djrtwo · 2021-02-23T23:51:49Z

Note, I reverted this PR to pre-justin cleanups and moved @JustinDrake's cleanups to a PR to this PR for easier review

#2212

michaelsproul

I'm in favour of the per-validator quadratic leak, but skeptical of the amortised epoch transition for these reasons:

Having a regular penalty that wipes out previous gains is a bit strange from the perspective of a validator and I suspect we will have to spend considerable energy explaining why this is OK (or necessary). We already have validators on Discord freaking out about the tiniest penalties for a single epoch, let alone a punishing ~0.0005 ETH penalty every 64 epochs which can't be avoided. I can't see a way to remove the sawtooth penalty mechanism without unmaking the amortisation.
The immediate benefit of the amortisation would be a compute time benefit for forks < 64 epochs in duration, yet I think resilience against DoS by forking is determined more by how clients store states in memory and on disk. In Lighthouse for example, we are still copying around full BeaconState objects in memory (no copy-on-write binary merkle trees yet), and storing full states on disk with repetition (no fancy deduplication or storing of diffs). The way we've made the client good enough for current network conditions is by ruthlessly reducing the frequency with which clones and disk writes of these states occur (particularly for finalized states, which are de-duplicated a little). In this sense I'm concerned the PR might be a premature optimisation. On the other hand, a mostly-immutable validator set could be exploited for memory + storage optimisations in future.
Clients try to avoid spikes in resource consumption, yet the amortisation of epoch processing works contrary to this aim. Pushing extra tree hashing (caused by mutation of the validator set) and state storage to the period boundary could spike load, potentially impacting other duties (block proposal, etc). This also creates a larger DoS target for an attacker who is willing to wait for a period boundary to throw a bunch of forks at the network.

I'm open to changing my mind, but these are my thoughts after ~1 day chewing over this proposal

michaelsproul · 2021-03-03T00:45:32Z

specs/lightclient/beacon-chain.md

@@ -341,49 +360,72 @@ def get_flag_deltas(state: BeaconState,
    for index in get_eligible_validator_indices(state):


This loop could be written more succinctly as:

for index in unslashed_participating_indices: base_reward = get_base_reward(state, index) penalty_canceling_reward = base_reward * numerator // FLAG_DENOMINATOR # ... and so on

(as there's no else branch for the penalty anymore)

For any such cleanups please see #2212 :) I believe this is addressed there.

michaelsproul · 2021-03-03T04:46:28Z

specs/lightclient/beacon-chain.md

+
+    for index in get_eligible_validator_indices(state):
+        # Inactivity-leak-specific penalty
+        if state.leak_score[index] >= EPOCHS_PER_ACTIVATION_EXIT_PERIOD * LEAK_SCORE_BIAS:


Just checking that the logic behind this threshold is that quadratic leaks on timespans shorter than EPOCHS_PER_ACTIVATION_EXIT_PERIOD are inconsequential anyway (would evaluate to a 0 penalty)?

vbuterin · 2021-03-03T20:55:49Z

Thanks for the feedback, this is helpful!

I'm in favour of the per-validator quadratic leak, but skeptical of the amortised epoch transition for these reasons:

1. Having a regular penalty that wipes out previous gains is a bit strange from the perspective of a validator and I suspect we will have to spend considerable energy explaining why this is OK (or necessary). We already have validators on Discord freaking out about the tiniest penalties for a single epoch, let alone a punishing ~0.0005 ETH penalty every 64 epochs _which can't be avoided_. I can't see a way to remove the sawtooth penalty mechanism without unmaking the amortisation.

Hmm, I see the point and how it could make the validator experience less intuitive. But does there have to be a sawtooth at UI level though? You could just show the balances at the beginning of the period (and we could enshrine this further by having an incrementing counter instead of updating balances), or we could add a non-consensus function get_predicted_balance that pre-computes the inactivity penalty and have the APIs point to that.

2. The immediate benefit of the amortisation would be a compute time benefit for forks < 64 epochs in duration, yet I think resilience against DoS by forking is determined more by how clients store states in memory and on disk. In Lighthouse for example, we are still copying around full `BeaconState` objects in memory (no copy-on-write binary merkle trees yet), and storing full states on disk with repetition (no fancy deduplication or storing of diffs). The way we've made the client good enough for current network conditions is by ruthlessly reducing the frequency with which clones and disk writes of these states occur (particularly for finalized states, which are de-duplicated a little). In this sense I'm concerned the PR might be a premature optimisation. On the other hand, a mostly-immutable validator set could be exploited for memory + storage optimisations in future.

By far the bulk of the state is the validator set, so if this PR ensures that the validator set is completely immutable between 64 epoch boundaries, that would be a large win for dealing with that kind of DoS attack, no? I imagine it would also make fork choice optimizations easier.

3. Clients try to avoid spikes in resource consumption, yet the amortisation of epoch processing works contrary to this aim. Pushing extra tree hashing (caused by mutation of the validator set) and state storage to the period boundary could spike load, potentially impacting other duties (block proposal, etc). This also creates a _larger_ DoS target for an attacker who is willing to wait for a period boundary to throw a bunch of forks at the network.

Would there actually be that much more extra tree hashing at the period boundary? Validator balance updates would only happen when hysteresis demands it, which would continue to be on average something like once every 2-6 months per validator.

djrtwo · 2021-03-04T23:05:57Z

I agree that this could damage the psychology of staking for some, but this could easily be fixed in the UI/UX layer for at least block explorers. I think if we get ahead of it, this won't be too much of an issue.
Interesting. I don't want to get in a place where we optimize the spec without entirely understanding how this will affect practical engineering optimizations. I see how this at the base targets compute-time first without considering if compute-time is the real bottleneck. Before we commit to this, I'd want to better understand if (1) compute-time is a serious bottleneck and (2) if an immutable active-vset during a longer epoch period opens up valuable optimizations.
same question as vitalik. The idea is that the period boundary (every 64 epochs) would be similar cost to a normal epoch boundary today

michaelsproul · 2021-03-06T01:19:28Z

Thank you both for the thoughtful defence. Regarding (1) I'm happy with the get_predicted_balance solution!

Regarding (2) and (3)

The idea is that the period boundary (every 64 epochs) would be similar cost to a normal epoch boundary today

Having thought about it more I think you're probably right. For sake of argument let's assume that the period boundary is approximately the same cost as an epoch boundary today. Even under this assumption, I think the optimisation is of dubious usefulness for security. In the case where an attacker is trying to DoS the network by producing forks, even with this change they can DoS just as effectively by simply waiting for a period boundary before they produce their fork blocks. By making blocks with parents equal to boundary_slot - 1, boundary_slot - 2, etc, they force clients to compute multiple distinct period boundary transitions. Loading historic states like this also incurs substantial I/O costs, amortisation or not (e.g. Lighthouse loads a full state and replays up to 32 blocks on a cache miss). In the case where the period boundary is more expensive, it's worse than the situation today, which was my original point. But I'm now willing to weaken that to say that if the costs are equal, we're in the same situation as we are today.

That's looking at things from a security PoV. In terms of optimising the chain during normal operation it might be more viable. I suspect that writing the validator set to disk involves a large constant factor, something like cost = C + k * nodes_modified, where C could represent the latency of doing any disk write. If we obviate the need to write every epoch, we might save 63 * C. That's somewhat theoretical though, and the I'm not sure if a state storage scheme designed to exploit this would be optimal in other ways, i.e. more research required. I know there's already a lot more we could do to optimise disk access patterns (something like Geth's snapshots) without a spec update, and it would be my preference not to introduce spec updates until we push it further and hit a limit. I would be interested to know if other clients feel similarly (cc @arnetheduck @ajsutton @terencechain).

arnetheduck · 2021-03-09T09:10:32Z

By far, like in lighthouse, the dominant bottleneck in Nimbus is disk I/O - to give an idea, processing an epoch transition takes ~15ms, interacting with a state on disk can take up to a second when combined with other epoch-related database processing, so like lighthouse, we do everything we can to avoid that - in particular, we will only write states that result from blocks that are viable for fork choice.

The second, more dubious perhaps, thing we do is limit the horizon of empty slot processing for attestation validation - for example, if someone votes for a target epoch rooted in a very old block, we will not process the attestation as this would require hitting the disk and do many epochs of empty slot processing - in a healthy network, this condition is never triggered because finalization ensures that the target root must be recent, but in the case of non-finality, we deem such votes unlikely to bring significant value and simply discard them as loading the state in this scenario is too costly at current disk I/O rates.

Finally, if someone produces a block that requires a lot of empty slot processing, the bottleneck will still be disk I/O, but it seems that the rate at which one can produce such forks is pretty low and a sustained attack is difficult.

To mitigate fork scenarios (and to prep for 200k validators) we're in the process of splitting out the truly immutable part of the validator set (keys, withdrawals) from the mutable parts and storing these separately - this allows us, for each snapshot we consider useful, to not store 12mb out of the 16mb that a state takes up (mainnet, 100k validators) - it turns out that the mutable part also compresses well with snappy, while the immutable part doesn't, further adding to the benefits - there are some benchmark numbers in the linked PR that might be interesting to look at.

(3)

same question as vitalik. The idea is that the period boundary (every 64 epochs) would be similar cost to a normal epoch boundary today

Excluding BLS, we currently spend ~25% of CPU time updating the merkle tree cache and ~20% shuffling - if accumulating 64 epochs of work ends up dirtying a larger part of the tree, this could make the tree cache update a tiny bit more expensive, but probably not significantly so. In the same vein, we don't have an optimal tree cache right now - we don't cache hashes of individual fields but rather treat the entire validator as one cache entry - this is close enough that we didn't want to complicate the code further at this time - maybe we could offset any 64-epoch buildup by addressing these kinds of issues, but it doesn't seem likely that we would, any time soon.

By far the bulk of the state is the validator set

By far the bulk of the validator set is the key and withdrawal - the rest compresses away.

All in all, this change looks both low-risk and low-benefit, and we'd be happy to leave it for another fork - the risks it mitigates seem remote, while at the same time there are much more expensive things going on in clients, specially wrt managing the truly immutable part of the validator set - "completely immutable for 64 epochs" isn't as attractive an optimization target as "completely immutable, period" because of the complexity involved in managing those mutations coupled with the limited benefits (because compression etc). We'll be in a pretty good spot already when this is the lowest fruit on the tree :)

djrtwo · 2021-03-11T22:17:13Z

This optimization will not go into Altair. More R&D is desired before making such spec optimizations.

See call notes here ethereum/eth2.0-pm#208

djrtwo reviewed Jan 27, 2021

View reviewed changes

specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved

specs/lightclient/beacon-chain.md Show resolved Hide resolved

specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved

vbuterin commented Jan 28, 2021

View reviewed changes

specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved

vbuterin commented Jan 28, 2021

View reviewed changes

specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved

vbuterin commented Jan 28, 2021

View reviewed changes

specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved

vbuterin and others added 3 commits January 28, 2021 09:02

Apply suggestions from code review

3135020

Thanks @DannyRyan!

Some formatting fixes and more comments

6979d40

Merge branch 'accounting-reform' into vbuterin-patch-10

ac8ddcb

hwwhww added the scope:accounting label Feb 2, 2021

working through testing on leak PR

9109ac2

Base automatically changed from accounting-reform to dev February 4, 2021 17:16

djrtwo added 6 commits February 4, 2021 10:17

Merge branch 'dev' into vbuterin-patch-10

36f8a09

Merge branch 'vbuterin-patch-10' into leak-tests

2fe6f24

toc

82b4dc1

fix minor bug in get_leak_penalties function

e86b8f6

workign through leak tests

95d1459

green tests for 64-epoch pr

b345d4f

terencechain reviewed Feb 11, 2021

View reviewed changes

specs/lightclient/beacon-chain.md Show resolved Hide resolved

terencechain reviewed Feb 11, 2021

View reviewed changes

specs/lightclient/beacon-chain.md Outdated Show resolved Hide resolved

hwwhww mentioned this pull request Feb 13, 2021

64-epoch-boundary cleanups and tests #2194

Merged

djrtwo and others added 10 commits February 15, 2021 12:30

fix leak_epoch_counter bug and add sanity test

56d3a30

Apply suggestions from code review from @hwwhww

60ffac5

Co-authored-by: Hsiao-Wei Wang <hsiaowei.eth@gmail.com>

Apply suggestions from code review from @hwwhww

09faced

Co-authored-by: Hsiao-Wei Wang <hsiaowei.eth@gmail.com>

Merge branch 'leak-tests' of github.com:ethereum/eth2.0-specs into le…

409bf3f

…ak-tests

leaking transitions to activation/exit period boundary after HF1

dfce74e

Merge pull request #2194 from ethereum/leak-tests

7cf29db

64-epoch-boundary cleanups and tests

fix activation/exit period to 64 epochs for mainnet config

da61d57

Merge branch 'dev' into vbuterin-patch-10

71103d2

Merge branch 'dev' into vbuterin-patch-10

97128d2

Merge branch 'vbuterin-patch-10' of github.com:ethereum/eth2.0-specs …

4d13522

…into vbuterin-patch-10

djrtwo force-pushed the vbuterin-patch-10 branch from ad331c6 to 4d13522 Compare February 23, 2021 23:50

Merge branch 'dev' into vbuterin-patch-10

413e47a

rolfyone mentioned this pull request Feb 24, 2021

HF1 - consensus - accounting reform tier 2 Consensys/teku#3650

Closed

djrtwo mentioned this pull request Mar 2, 2021

port @justindrake's general cleanups from #2212 #2216

Merged

Merge branch 'dev' into vbuterin-patch-10

f4dbb40

djrtwo force-pushed the vbuterin-patch-10 branch from cded289 to f4dbb40 Compare March 2, 2021 14:26

michaelsproul reviewed Mar 3, 2021

View reviewed changes

JustinDrake added the Altair aka HF1 label Mar 3, 2021

hwwhww mentioned this pull request Mar 4, 2021

Rename the stub protocol version "HF1"/"LIGHTCLIENT_PATCH" name to Altair #2218

Closed

djrtwo mentioned this pull request Mar 9, 2021

independent leak score feature #2222

Merged

djrtwo added the DO NOT MERGE label Mar 10, 2021

mratsim mentioned this pull request Mar 11, 2021

Eth2 Call 59 ethereum/eth2.0-pm#208

Closed

djrtwo closed this Mar 11, 2021

jtraglia deleted the vbuterin-patch-10 branch January 22, 2025 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Penalties+activs+exits only at 64-epoch boundary #2192

Penalties+activs+exits only at 64-epoch boundary #2192

vbuterin commented Jan 27, 2021

djrtwo left a comment

djrtwo commented Feb 23, 2021

michaelsproul left a comment •

edited

Loading

michaelsproul Mar 3, 2021

JustinDrake Mar 3, 2021 •

edited

Loading

michaelsproul Mar 3, 2021

vbuterin commented Mar 3, 2021

djrtwo commented Mar 4, 2021

michaelsproul commented Mar 6, 2021 •

edited

Loading

arnetheduck commented Mar 9, 2021 •

edited

Loading

djrtwo commented Mar 11, 2021

		@@ -341,49 +360,72 @@ def get_flag_deltas(state: BeaconState,
		for index in get_eligible_validator_indices(state):

Penalties+activs+exits only at 64-epoch boundary #2192

Penalties+activs+exits only at 64-epoch boundary #2192

Conversation

vbuterin commented Jan 27, 2021

djrtwo left a comment

Choose a reason for hiding this comment

djrtwo commented Feb 23, 2021

michaelsproul left a comment • edited Loading

Choose a reason for hiding this comment

michaelsproul Mar 3, 2021

Choose a reason for hiding this comment

JustinDrake Mar 3, 2021 • edited Loading

Choose a reason for hiding this comment

michaelsproul Mar 3, 2021

Choose a reason for hiding this comment

vbuterin commented Mar 3, 2021

djrtwo commented Mar 4, 2021

michaelsproul commented Mar 6, 2021 • edited Loading

arnetheduck commented Mar 9, 2021 • edited Loading

djrtwo commented Mar 11, 2021

michaelsproul left a comment •

edited

Loading

JustinDrake Mar 3, 2021 •

edited

Loading

michaelsproul commented Mar 6, 2021 •

edited

Loading

arnetheduck commented Mar 9, 2021 •

edited

Loading