-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix timing attack #2101
Fix timing attack #2101
Conversation
Has there been any consideration to the impact this could have on attestations going "too early" and missing the current block? From local measurements, at current around 90% of blocks are received by my local validator before the 4 second mark. However, only around 60% of blocks are received before the 2 second mark. Obviously we'd expect a spread of validators across the 2-6 second range, but it does appear that this will reduce the % of validator clients that will use the block of the current slot as in their attestation. |
Tuning the numbers according to real-world data was the motivation for converting the constants to configuration parameters 😃 Given your observations about block timings, it makes sense to change the attestation production time to 4 ± 1 sec from the start of the slot. @mcdee Can you share the collected data so we can analyze this more? |
Here's a dump of the last ~7.5K blocks against one of my validator clients, standard prometheus histogram. |
Thanks for sharing! Looks like ~75% blocks are seen within 3 seconds and ~85% are seen within 4 seconds of the slot start, so 4 ± 1 sec is a reasonable configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty minor comments
Also, we use the magic number of 3
in aggregation broadcast. consider updating to the constant
specs/phase0/validator.md
Outdated
|
||
A validator should create and broadcast the `attestation` to the associated attestation subnet when the earlier one of these two events occurs: | ||
- the validator has received a valid block from the expected block proposer for the assigned `slot`, or | ||
- `SECONDS_PER_SLOT/3 + slot_timing_entropy` seconds have elapsed since the start of the `slot` (using the `slot_timing_entropy` generated for this slot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `SECONDS_PER_SLOT/3 + slot_timing_entropy` seconds have elapsed since the start of the `slot` (using the `slot_timing_entropy` generated for this slot) | |
- `SECONDS_PER_SLOT / 3 + slot_timing_entropy` seconds have elapsed since the start of the `slot` (using the `slot_timing_entropy` generated for this slot) |
Co-authored-by: Danny Ryan <dannyjryan@gmail.com>
@@ -391,7 +393,13 @@ def get_block_signature(state: BeaconState, block: BeaconBlock, privkey: int) -> | |||
|
|||
A validator is expected to create, sign, and broadcast an attestation during each epoch. The `committee`, assigned `index`, and assigned `slot` for which the validator performs this role during an epoch are defined by `get_committee_assignment(state, epoch, validator_index)`. | |||
|
|||
A validator should create and broadcast the `attestation` to the associated attestation subnet when either (a) the validator has received a valid block from the expected block proposer for the assigned `slot` or (b) one-third of the `slot` has transpired (`SECONDS_PER_SLOT / 3` seconds after the start of `slot`) -- whichever comes _first_. | |||
For each `slot`, a validator must generate a uniform random variable `slot_timing_entropy` between `(-SECONDS_PER_SLOT / ATTESTATION_ENTROPY_DIVISOR, SECONDS_PER_SLOT / ATTESTATION_ENTROPY_DIVISOR)` with millisecond resolution and using local entropy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the entropy be shared by multiple validators that is served under the same beacon node?
this will simplify the beacon node implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the entropy can be shared by multiple validators that are served under the same beacon node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the source of entropy can be shared or that randomly selected slot_timing_entropy
value can be shared by all validators served by the same beacon node?
There are significant performance implications of every individual validator has to select the latest head and create its attestation at a different time. Currently a Validator Client only needs to ask the beacon node to create at most one AttestationData
per committee per slot because all validators in that same committee can create an attestation from that AttestationData
. And all validators can share the same selected head block.
With this change, if the value of slot_timing_entropy
can't be shared, the number of validators a beacon node could support would be significantly reduced as it would need to update fork choice and create a new AttestationData
for each individual validator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the gist of this fix:
By making the attestation production time unpredictable to the attacker & unique for each validator, ...
The attestation production time doesn't have to be unique for each and every validator. However, it is absolutely crucial that the attestation production time is unpredictable for anyone who does not control the validator and/or beacon node (for clients where the beacon node is the driver of validator duties). So, validators served by the same beacon node can have the same attestation production time, i.e., they can share the source of the entropy and the actual slot_timing_entropy
value.
I'd agree that this could make it harder for attacks, but I don't think it's a substitute for deeper changes (eg. my "the proposer has 1/4 slot weight" proposal) that provide liveness in the standard model (attacker chooses the latency of every message within the bounds [0, delta]). The attack under this proposal (ie. this PR) would be: the attacker connects to every node (eg. by connecting to the network with a huge number of nodes and just waiting until they get included in the network and they make up 80%+ of all nodes in the network), and then splits the network 50/50 by broadcasting a set of attestations at exactly the time window when the |
The goal of the PR is to provide some satisfactory mitigation of the attack in v1.0 of the spec, while having relatively low code impact and low risk of the proposed changes. In addition, this fix is definitely backwards-compatible. Since the attack is feasible & has become well-known by now, it would be a bad move to go ahead with v1.0 without any fixes. |
For each `slot`, a validator must generate a uniform random variable `slot_timing_entropy` between `(-SECONDS_PER_SLOT / ATTESTATION_ENTROPY_DIVISOR, SECONDS_PER_SLOT / ATTESTATION_ENTROPY_DIVISOR)` with millisecond resolution and using local entropy. | ||
|
||
A validator must create and broadcast the `attestation` to the associated attestation subnet when the earlier one of the following two events occurs: | ||
- The validator has received a valid block from the expected block proposer for the assigned `slot`. In this case, the validator must set a timer for `abs(slot_timing_entropy)`. The end of this timer will be the trigger for attestation production. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the attack vector of sending the attestation on block receipt? that has some randomness built into it "naturally"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to mitigate the risk from an adversary who has faster connections to all validators than what the rest of the validators have between themselves. There are already some "Layer-0" projects in this space that provide this as a service (either currently, or will do in the near future), e.g., bloXroute and Marlin.
An attacker with this capability would be able to trigger attestation production at a predictable time of its choosing by always being the first one to inform validators about a new block. Hence, adding the timing entropy to make this attack vector unfeasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the way, the time block_arrived + abs(slot_timing_entropy)
should be capped at SECONDS_PER_SLOT / ATTESTATION_PRODUCTION_DIVISOR + slot_timing_entropy
, in the worst case we'd see the block being sent out at slot + 6s
, effectively, instead of slot + 5s
being the maximum, which starts being very close to the aggregation cutoff time increasing the risk of loss of reward.
Yes, this attack is possible even with the fix from this PR in place. However, the chance of success of the attack is substantially lower than before! Let's label the time when
|
368863c
to
7589af8
Compare
I'm curious on how valuable this fix is (e.g., how weaker the network model where the liveness is guaranteed becomes, how much the fault tolerance changes) compared to the additional complexity of the implementation, the effect on the efficiency of the attestation aggregation, and the risk of unknown side effect (for instance, this fix will affect the analysis of the incentive compatibility of the timing of attesting.).
The attacker's attestations are useful even if some portion of validators receive the attackers' attestations from the other subset and switch the chain to vote for. The difference of the two target chains' scores at the end of the current slot is
From the above observation, to consider the minimum network latency between the two subsets is not enough. We need to precisely analyze how many attacker's attestations are exchanged within the time window and how large the difference of the scores becomes as a result. |
closing this. likely to go another path |
The PR introduces a simple fix for the fork choice timing attack presented in this paper: https://arxiv.org/abs/2009.04987
By making the attestation production time unpredictable to the attacker & unique for each validator, we make it harder for an attacker to separately influence the fork choice of disjoint subsets of validators by sending well-timed messages to each set, such that these messages are not gossiped with the other subset of validators before attestations for the slot are produced.