PeerDAS fork-choice, validator custody and parameter changes #3779

fradamt · 2024-05-24T12:51:14Z

This PR does three things:

Introduce the parameter changes discussed at the interop. I set the subnet count to 128 instead of 64 after discussions with the Codex team and Dankrad, the idea being that we might as well try out something more ambitious (better ratio of custody/total data) and then go back to 64 if devnets/testnets point to that. Happy to revert to 64 if this turns out to be a contentious choice. For context, a subnet count of 64 would mean that nodes with a single validator attached custody 1/4 of the original data, which still gives us quite a bit of room to increase the blob count without increasing bandwidth consumption.
Introduce validator custody. Full nodes still custody a minimum 1/32 of the extended data, as in the current spec (the minimum custody is CUSTODY_REQUIREMENT = 4 out of 128 subnets), while nodes with validators attached are asked to custody at least VALIDATOR_CUSTODY_REQUIREMENT = 6 subnets, to provide a minimum level of security to their attestations, plus one extra subnet for every 16 ETH of balance (by balance and not by validator count, to account for the maxeb change). Any node with at least 61 min balance validators (~2000 ETH) would by default download the whole data and always be able to reconstruct whenever possible. Moreover, its consensus participation would be completely unaffected by sampling, making it much harder for sampling to introduce any consensus risk.
Edit: on Justin's suggestion, validator custody has been changed so that the rule is "1 subnet per 32 ETH, minimum 8, maximum 128".
Clarify the role of data availability in the fork-choice. I propose to mostly rely on the custody check, in particular using it to filter out unavailable blocks in get_head, rather than not importing them at all. Peer sampling is instead only used to gate justifications and finalizations (by not importing blocks whose state has an unavailable unrealized justification), accomplishing two goals. Firstly, it ensures that transaction confirmation by waiting for finality has an extra layer of safety. Moreover, it makes it harder for validators to end up voting to finalize an unavailable checkpoint in case of a supermajority attack. Restricting the use of peer sampling to these limited goals (where it actually has meaningful benefits over custody checks) means that it is also very hard for it to disrupt consensus.

Resources:

Todo:

Agree on the parameters, in particular subnet count and the validator custody parameters
Decide whether we instead want to have validator custody be assigned in protocol, to have (at least social) accountability in case of extreme failures like finalization of an unavailable block? The trade-off is that in the current design validator custody contributes to the network (if you have a peer with many validators, it will be reflected in their advertised custody and you can use that information for peer sampling) without deanonymization concerns.
Decide whether we are ok with the "normal" fork-choice proposed here, or if we want to introduce some variant of (block, slot) to deal with the attack where a (non supernode) proposer is tricked into extending an unavailable block. Alternatively, we could also have proposers do peer sampling when blocks have very little weight. See here and here for more context. Currently my thinking is that this attack is only restricted to a small percentage of proposers (the ones attached to a node with < 61 validators) and is not much easier than a proposer DoS, so perhaps we can treat it the same way, i.e., watch out for it and have credible countermeasures to implement if needed (while SSLE also works in this case as well, we can fix the problem completely with much simpler fork-choice changes, so it would be reasonably simple to deal with the problem if it were to actually come up). Moreover, the attack requires controlling two slots in a row (proposer boost reorging would activate otherwise), so it's even a bit harder.

jtraglia · 2024-05-28T14:28:30Z

specs/_features/eip7594/das-core.md

-| `TARGET_NUMBER_OF_PEERS` | `70` | Suggested minimum peer count |
+| `SAMPLES_PER_SLOT` | `16` | Number of `DataColumn` random samples a node queries per slot |
+| `CUSTODY_REQUIREMENT` | `4` | Minimum number of subnets an honest node custodies and serves samples from |
+| `VALIDATOR_CUSTODY_REQUIREMENT` | `6` | Minimum number of subnets an honest node with validators attached custodies and serves samples from |


I think VALIDATOR_CUSTODY_REQUIREMENT is a little misleading. In practice, this will never be 6.

Provided a validator with a balance of 32 ETH, get_validators_custody_requirement will return 8.

Provided a validator with a balance of 17 ETH, get_validators_custody_requirement will return 7.

Provided a validator with a balance of 16 ETH, get_validators_custody_requirement will return 6.

But it will never really get to this value, as the validator is queried for ejection at 16.75 ETH.

Why not just have a single CUSTODY_REQUIREMENT plus additional custodies per validator?

What I would like to preserve is:

a full node custodies only 4 subnets (I don't see much reason to go beyond that)

validators custody at least 8 subnets (I think it's a good minimum for security reasons)

the custody does not grow too fast with validator count (the distribution of number of validator per nodes is quite bimodal, with either just a few or hundreds, and I think it's good to keep the requirements low for the former). Growing it by 4 per validator (per 32 ETH) is too high imo

How do you feel about this, with VALIDATOR_CUSTODY_REQUIREMENT = 8?

def get_validators_custody_requirement(state: BeaconState, validator_indices: List[ValidatorIndex]) -> uint64: total_node_balance = sum(state.balances[index] for index in validator_indices) validator_custody_requirement = VALIDATOR_CUSTODY_REQUIREMENT if total_node_balance >= MIN_ACTIVATION_BALANCE: validator_custody_requirement += (total_node_balance - MIN_ACTIVATION_BALANCE) // BALANCE_PER_ADDITIONAL_CUSTODY_SUBNET return validator_custody_requirement

Hmm I understand the rationale. Your alternative is generally fine, but it does feel a little overly-complex.

How about something like the following?

def get_validators_custody_requirement(state: BeaconState, validator_indices: List[ValidatorIndex]) -> uint64: total_node_balance = sum(state.balances[index] for index in validator_indices) count = total_node_balance // BALANCE_PER_ADDITIONAL_CUSTODY_SUBNET return min(max(count, VALIDATOR_CUSTODY_REQUIREMENT), DATA_COLUMN_SIDECAR_SUBNET_COUNT)

This would provide the following custody requirements:

Validators Custody Requirement

1 8

2 8

3 8

4 8

5 10

6 12

... ...

63 126

64 128

65 128

This makes the computation relatively straight forward:

2 x the number of validators on the node, minimum 8, max 128.

+1 on using BALANCE_PER_ADDITIONAL_CUSTODY_SUBNET

Francesco's implementation uses that too. But yes, the constant is a good idea.

Ah you're right. It was a pseudocode mistake & the declaration/usage of multiplier can be removed. I believe I was thinking that BALANCE_PER_ADDITIONAL_CUSTODY_SUBNET should be defined as:

MAX_EFFECTIVE_BALANCE_ELECTRA // DATA_COLUMN_SIDECAR_SUBNET_COUNT

So that it properly scales if we (1) increase the max EB again or (2) increase the subnet count.

(Also, I fixed the backwards min and max)

I like this version, because it only starts increasing from 8 after a few validators, which is imo a fairly desirable property in itself. I would even consider setting BALANCE_PER_ADDITIONAL_CUSTODY_SUBNET to 32 ETH, so that it's "1x the number of validators on the node, minimum 8, max 128", and it only starts increasing from the minimum after 8 validators.

I am a bit worried on the direction of dynamically increasing the custody count depending on how many validators you do run with. For context, last year we finally moved to a new attestation subnet backbone structure where the responsibility for subscribing to long-lived attestation subnets was equally distributed amongst all nodes rather than those running many validators:
#2749
#3312

The way validator custody is currently specified, you would reintroduce the same downsides by requiring nodes running with many validators to custody all the subnets. Is it necessary to scale the custody count this way ? anyway we can simply have an upper bound rather than all the subnets being custodied if you run more than > 64 validators.

I am a bit worried on the direction of dynamically increasing the custody count depending on how many validators you do run with. For context, last year we finally moved to a new attestation subnet backbone structure where the responsibility for subscribing to long-lived attestation subnets was equally distributed amongst all nodes rather than those running many validators:
#2749
#3312
The way validator custody is currently specified, you would reintroduce the same downsides by requiring nodes running with many validators to custody all the subnets. Is it necessary to scale the custody count this way ?

The rationale behind making custody-count depend on something is as follows:

the system performs much better if we have nodes that can repair on-the-fly. This makes "just available" blocks "overwhelmingly available", which improves the amount of blocks that get canonical (because after repair, these blocks will get enough votes), and it also improves the sampling process (because we will have much less false negatives during sampling). We call this availability amplification.

in the 1D erasure coding case, only nodes that have at least half of the columns can repair. (Note that we do not need this in the 2D case, where any node can repair a row or a column).

The most intuitive way to force the system to have such "supernodes" is to make custody depend on validator count. There could be other ways, like

random allocation of "supernode role",

hoping that there will be supernodes,

having nodes doing incrementalDAS and eventual repair,
but custody based allocation seems to align best with expected resources needed to actually download the data and do the reconstruction. In other words, if someone has many validators, they can pay for the bandwidth and compute.

This goes agains the "hiding" property achieved by equally distributing, but improves system performance. Once we change to 2D encoding, we can go back to equally distributing custody.

anyway we can simply have an upper bound rather than all the subnets being custodied if you run more than > 64 validators.

I'm not sure I interpret this right, but it is important that we can't stop custody requirement at 64. Otherwise, if there would be exactly 64 columns released, we would need a supernode that is by miracle subscribed to the exact same 64 columns. There are too many combinations for that, we would need too many supernodes. If really needed, we could stop before 128, but we need way more than 64.

Echoing the above, and also @dapplion's comment that validator custody means that in practice, given the actual stake distribution, most of the stake will be run on supernodes, which imho is a good thing, because it hugely derisks the whole system. It basically means that the introduction of DAS is essentially irrelevant for 90% of the validator set, other than moving from gossiping few large objects to a lot of smaller objects. And why shouldn't someone that runs hundreds of validators, with millions or even tens of millions of stake, be downloading the whole data and contributing to the security and stability of the network?

This is quite different from the attestation subnets case imho, because there are huge tangible benefits to be had from linearly scaling the load based on stake. Also, validator custody does not change the fact that all nodes still share the responsibility for forming the backbone of long-lived subscriptions, though not equally.

Another point here is that downloading the whole data is by far the best way to ensure that you always correctly fulfil validator duties, including protecting you when proposing.

dapplion · 2024-06-05T12:44:38Z

The vast majority of Ethereum mainnet stake is run by entities controlling > 64 validators each. So with validator custody, a ~90% majority will be gossiping and importing everything. Any issues with partial custody or sampling will affect a small minority and may not even affect the overall network's health noticeably.

I am not judging this fact, but feels like an important consideration.

leobago · 2024-06-12T15:35:11Z

The vast majority of Ethereum mainnet stake is run by entities controlling > 64 validators each.

"Majority of stake" yes, but that does not necessarily translate into "majority of nodes"

So with validator custody, a ~90% majority will be gossiping and importing everything.

According to historical data from crawlers, it was estimated that only about ~10% of the nodes had over 64 validators.

Any issues with partial custody or sampling will affect a small minority and may not even affect the overall network's health noticeably.

I actually expect the majority of the nodes to run small custody sets.
But I agree this an important thing to keep in consideration.

fradamt · 2024-06-14T10:31:20Z

The vast majority of Ethereum mainnet stake is run by entities controlling > 64 validators each.

"Majority of stake" yes, but that does not necessarily translate into "majority of nodes"

So with validator custody, a ~90% majority will be gossiping and importing everything.

According to historical data from crawlers, it was estimated that only about ~10% of the nodes had over 64 validators.

Any issues with partial custody or sampling will affect a small minority and may not even affect the overall network's health noticeably.

I actually expect the majority of the nodes to run small custody sets. But I agree this an important thing to keep in consideration.

When it comes to the stability and security of consensus, the minority of the nodes which has 90% of the stake is mostly what matters. Even if most nodes in the network were regular nodes doing the minimum custody, we would still get huge benefits from 90% of the stake downloading everything, because consensus would basically be unaffected by availability issues, and everyone else (even non staking nodes and nodes with few validators) would end up following the same fully available chain.

Co-authored-by: Justin Traglia <95511699+jtraglia@users.noreply.github.com>

pysetup/spec_builders/eip7594.py

specs/_features/eip7594/fork-choice.md

specs/_features/eip7594/das-core.md

Co-authored-by: Justin Traglia <95511699+jtraglia@users.noreply.github.com>

.history/pysetup/spec_builders/eip7594_20240626155910.py

specs/_features/eip7594/fork-choice.md

Co-authored-by: Justin Traglia <95511699+jtraglia@users.noreply.github.com>

jtraglia

LGTM 👍

ppopth · 2024-07-16T14:46:54Z

I just did a research on peer count. https://notes.ethereum.org/@pop/peer-count-peerdas (it's still WIP)

I have a concern on CUSTODY_REQUIREMENT=4 out of 128 subnets. It increases the number of peers you need to cover all subnets from 32 peers to 172 peers which is a lot.

>>> peer_count(128, 4)
172.0125
>>> peer_count(32, 1)
32.0

cc: @cskiraly

fradamt · 2024-07-16T16:02:24Z

I just did a research on peer count. https://notes.ethereum.org/@pop/peer-count-peerdas (it's still WIP)

I have a concern on CUSTODY_REQUIREMENT=4 out of 128 subnets. It increases the number of peers you need to cover all subnets from 32 peers to 172 peers which is a lot.
>>> peer_count(128, 4)
172.0125
>>> peer_count(32, 1)
32.0
cc: @cskiraly

Good thing to point out :) While I do agree that this is a concern and something we should definitely take into account in deciding the parameters, I think we should also keep in mind that it is a worst case measure that assumes all nodes to be full nodes. If we were to assume all nodes are validators (also not correct ofc), the relevant number would be peer_count(128, 8), which is 85. And still, that leaves out nodes with multiple validators, which have a higher custody requirement.

Still, we could consider being conservative and setting for example custody group count to 128 and minimum custody requirement for full nodes to 8. For quite some time, this wouldn't be a problem, as we would still be able to go even up to a max of 48 blobs per slot without increasing full node bandwidth requirements compared to 4844. Eventually, we can hopefully increase peer counts and be less conservative about parameter choices.

jimmygchen · 2024-07-17T06:33:46Z

configs/mainnet.yaml

+CUSTODY_REQUIREMENT: 4
+VALIDATOR_CUSTODY_REQUIREMENT: 8
+BALANCE_PER_ADDITIONAL_CUSTODY_SUBNET: 32000000000 # 2**5 * 10**9 (= 32,000,000,000)
+TARGET_NUMBER_OF_PEERS: 100


Note that most clients don't use this config value, so I guess this is more like a reference / recommendation?

#3766 (comment)

I have created a PR to remove TARGET_NUMBER_OF_PEERS as a config variable

Remove TARGET_NUMBER_OF_PEERS config variable #3852

ppopth · 2024-07-28T05:27:57Z

specs/_features/eip7594/das-core.md

-| `SAMPLES_PER_SLOT` | `8` | Number of `DataColumnSidecar` random samples a node queries per slot |
-| `CUSTODY_REQUIREMENT` | `1` | Minimum number of subnets an honest node custodies and serves samples from |
-| `TARGET_NUMBER_OF_PEERS` | `70` | Suggested minimum peer count |
+| `SAMPLES_PER_SLOT` | `16` | Number of `DataColumn` random samples a node queries per slot |


There is no such thing as DataColumn

Nice catch!

configs/minimal.yaml

saltiniroberto · 2024-08-13T06:45:00Z

specs/_features/eip7594/fork-choice.md

+The application of `processing_justification_and_finalization` now happens in `on_block`.
+
+```python
+def compute_pulled_up_tip(store: Store, pulled_up_state: BeaconState, block_root: Root) -> None:


Has this function been modified only to avoid executing process_justification_and_finalization twice (as it is already executed at line 146 now) or is there any other reason?
If there is no other reason, I think it is better not to modify this function as the original function is more self-contained and therefore, I think, more readable (e.g. one does not need to track back the value assigned to pulled_up_state)

saltiniroberto · 2024-08-13T23:50:54Z

specs/_features/eip7594/fork-choice.md

+    # [New in EIP7594] Do not import the block if its unrealized justified checkpoint is not available
+    pulled_up_state = state.copy()
+    process_justification_and_finalization(pulled_up_state)
+    assert is_chain_available(store, pulled_up_state.current_justified_checkpoint.root)


I think that we shoud also check the current justified checkpoint for the non-pulled-up: is_chain_available(store, state.current_justified_checkpoint.root).
This is because if a block B is from the current epoch and it is chosed as the head, then the voting source corresponds to the current justied checkpoint for then non-pulled-up state, i.e., store.block_states[B].current_justified_checkpoint.

fradamt added 4 commits May 23, 2024 12:15

create fork-choice.md with availability checks

2b455c1

validator custody and parameter changes

7c72ec8

small fixes

9159bd0

change validator custody to 6, plus two extra per 32 ETH

56e9d38

fradamt added DO NOT MERGE EIP-7594 PeerDAS labels May 24, 2024

jtraglia reviewed May 28, 2024

View reviewed changes

jimmygchen mentioned this pull request May 29, 2024

DAS - Tracking Issue sigp/lighthouse#4983

Open

52 tasks

hwwhww mentioned this pull request Jun 10, 2024

PeerDAS Breakout Room #1 ethereum/pm#1059

Closed

wip

54157a2

hwwhww added the scope:fork-choice label Jun 26, 2024

fradamt added 9 commits June 27, 2024 09:24

simplify is_data_available + small fixes

f98241b

only run deneb on_block tests with deneb phase

e4e30f3

handle is_chain_available in tests

6723efd

typo fix

40c55e6

fix lint

fcca8fd

deal with the genesis edge case directly in the fork_choice spec

107dda6

move genesis edge case check in is_chain_available

e9398f6

change validator custody according to Justin's suggestion

22456c8

Co-authored-by: Justin Traglia <95511699+jtraglia@users.noreply.github.com>

fix typo

0f1e471

fradamt added scope:DAS Data Availability Sampling (DAS) and removed DO NOT MERGE scope:DAS Data Availability Sampling (DAS) labels Jul 3, 2024

cskiraly mentioned this pull request Jul 9, 2024

EIP-7594: Decouple network subnets from das-core #3832

Merged

remove unused helpers

ef64924

jtraglia reviewed Jul 9, 2024

View reviewed changes

Apply suggestions from code review

82103dd

Co-authored-by: Justin Traglia <95511699+jtraglia@users.noreply.github.com>

fradamt and others added 3 commits July 11, 2024 15:59

apply suggestions from code review

4f3c150

Co-authored-by: Justin Traglia <95511699+jtraglia@users.noreply.github.com>

doctoc

908f37d

Merge branch 'dev' into peerdas-fc

1acba8b

ralexstokes reviewed Jul 11, 2024

View reviewed changes

.history/pysetup/spec_builders/eip7594_20240626155910.py Outdated Show resolved Hide resolved

remove .history folder

a2db18c

jtraglia reviewed Jul 11, 2024

View reviewed changes

specs/_features/eip7594/fork-choice.md Outdated Show resolved Hide resolved

jtraglia reviewed Jul 11, 2024

View reviewed changes

specs/_features/eip7594/fork-choice.md Show resolved Hide resolved

jtraglia reviewed Jul 11, 2024

View reviewed changes

specs/_features/eip7594/fork-choice.md Outdated Show resolved Hide resolved

fradamt and others added 3 commits July 12, 2024 01:03

Apply suggestions from code review

19804a3

Co-authored-by: Justin Traglia <95511699+jtraglia@users.noreply.github.com>

Small fix

1613d2e

doctoc

62006c9

jtraglia approved these changes Jul 12, 2024

View reviewed changes

jimmygchen reviewed Jul 17, 2024

View reviewed changes

jimmygchen mentioned this pull request Jul 19, 2024

Update is_available check to support PeerDAS. sigp/lighthouse#6076

Merged

ppopth reviewed Jul 28, 2024

View reviewed changes

typo

3f5ba2e

fradamt mentioned this pull request Aug 5, 2024

Remove TARGET_NUMBER_OF_PEERS config variable #3852

Closed

fix wrong type

8dbd874

asn-d6 reviewed Aug 6, 2024

View reviewed changes

configs/minimal.yaml Show resolved Hide resolved

fradamt mentioned this pull request Aug 7, 2024

Basic validator custody #3871

Open

saltiniroberto reviewed Aug 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PeerDAS fork-choice, validator custody and parameter changes #3779

PeerDAS fork-choice, validator custody and parameter changes #3779

fradamt commented May 24, 2024 •

edited

Loading

jtraglia May 28, 2024

fradamt May 28, 2024 •

edited

Loading

jtraglia May 28, 2024 •

edited

Loading

dapplion May 28, 2024

jtraglia May 29, 2024

jtraglia May 29, 2024 •

edited

Loading

fradamt May 29, 2024

nisdas Jun 4, 2024

cskiraly Jun 10, 2024

fradamt Jun 14, 2024

dapplion commented Jun 5, 2024

leobago commented Jun 12, 2024 •

edited

Loading

fradamt commented Jun 14, 2024

jtraglia left a comment

ppopth commented Jul 16, 2024

fradamt commented Jul 16, 2024

jimmygchen Jul 17, 2024

dapplion Jul 23, 2024

ppopth Jul 28, 2024

jtraglia Jul 29, 2024

saltiniroberto Aug 13, 2024 •

edited

Loading

saltiniroberto Aug 13, 2024

PeerDAS fork-choice, validator custody and parameter changes #3779

Are you sure you want to change the base?

PeerDAS fork-choice, validator custody and parameter changes #3779

Conversation

fradamt commented May 24, 2024 • edited Loading

Choose a reason for hiding this comment

fradamt May 28, 2024 • edited Loading

Choose a reason for hiding this comment

jtraglia May 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtraglia May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dapplion commented Jun 5, 2024

leobago commented Jun 12, 2024 • edited Loading

fradamt commented Jun 14, 2024

jtraglia left a comment

Choose a reason for hiding this comment

ppopth commented Jul 16, 2024

fradamt commented Jul 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saltiniroberto Aug 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fradamt commented May 24, 2024 •

edited

Loading

fradamt May 28, 2024 •

edited

Loading

jtraglia May 28, 2024 •

edited

Loading

jtraglia May 29, 2024 •

edited

Loading

leobago commented Jun 12, 2024 •

edited

Loading

saltiniroberto Aug 13, 2024 •

edited

Loading