ATTESTATION_SUBNET_COUNT should be dynamic #1777

jrhea · 2020-04-30T20:13:31Z

ATTESTATION_SUBNET_COUNT is currently a spec constant that defines the valid range of subnets persistent committees should randomly subscribe to (this value is currently set to 64). When the network has a smaller number of validators, this results in nodes joining subnets with committee counts of 0. Spreading persistent committee members across unused subnets could significantly inhibit our ability to find subnets in a timely fashion.

The following table compares active validators to resulting committee counts:

ACTIVE VALIDATORS	COMMITTEE COUNT
16384	4
32768	8
...	...
262144	64

In other words, we don't need 64 attestation subnets until we reach 262,144 active validators.

To fix this, I suggest that we replace ATTESTATION_SUBNET_COUNT with a method that calculates it based on the committee count. Consider something like this to start:

def get_attestation_subnet_count_at_slot(state: BeaconState, slot: Slot) -> uint64:  
    return min(get_committee_count_at_slot(state,slot), 64)

that seems fine if we assume a static validator set, but we really should consider how many validators could exit or activate within a PERSISTENT_COMMITTEE_PERIOD to ensure we create enough attestation subnets. To do that, we should take a look at worst case validator churn limits. From the spec, get_validator_churn_limit() is defined as follows:

def get_validator_churn_limit(state: BeaconState) -> uint64:
    """
    Return the validator churn limit for the current epoch.
    """
    active_validator_indices = get_active_validator_indices(state, get_current_epoch(state))
    return max(MIN_PER_EPOCH_CHURN_LIMIT, len(active_validator_indices) // CHURN_LIMIT_QUOTIENT)

In order to have a churn limit > 4 , this would have to be true:

len(active_validator_indices) > 4 * CHURN_LIMIT_QUOTIENT

In other words, we would need 262,144 active validators. From the table above, we know this is the exact amount needed to reach a committee count of 64. This means that we can calculate the max increase/decrease in committee counts within a PERSISTEN_COMMITTEE_PERIOD as follows:

Change in num active validators = 4*PERSISTEN_COMMITTEE_PERIOD = +/- 8,192 validators
Change in committee count = 8,192/SLOTS_PER_EPOCH/TARGET_COMMITTEE_SIZE = +/- 2

It's probably safe to only consider the max increase in committee count (worst, worst case we have 2 additional attestation subnets). As a result, we can modify our new method get_attestation_subnet_count_at_slot() from above to account for the worst case scenario where 2 additional committees could be added within a PERSISTENT_COMMITTEE_PERIOD:

def get_attestation_subnet_count_at_slot(state: BeaconState, slot: Slot) -> uint64:  
    max_additional_committees = MIN_PER_EPOCH_CHURN_LIMIT*PERSISTEN_COMMITTEE_PERIOD//SLOTS_PER_EPOCH//TARGET_COMMITTEE_SIZE
    return min(get_committee_count_at_slot(state,slot) + max_additional_committees, 64)

If we want to retain the ability to control the committee to attestation subnets ratio, we could add a configurable spec constant called COMMITTEES_PER_SUBNET that can be set to 1 and get_attestation_subnet_count_at_slot could be modified as follows:

def get_attestation_subnet_count_at_slot(state: BeaconState, slot: Slot) -> uint64:  
    max_additional_committees = MIN_PER_EPOCH_CHURN_LIMIT*PERSISTEN_COMMITTEE_PERIOD//SLOTS_PER_EPOCH//TARGET_COMMITTEE_SIZE
    return min(get_committee_count_at_slot(state,slot) + max_additional_committees, 64) // COMMITTEES_PER_SUBNET

Credit/blame to @mkalinin for talking this out with me and @djrtwo for pointing out that low validator counts could result in persistent committee members joining unused subnets.

The text was updated successfully, but these errors were encountered:

zilm13 · 2020-05-01T07:27:54Z

Also we need EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2 (54 hours) warm-up period when validators are required to join randomly to new subnet, but it's not involved in any attestation/block duties. Otherwise we have 0 nodes skeleton when new subnet is added. It couldn't be discovered.

djrtwo · 2020-05-01T16:30:07Z

Thanks @jrhea! I had similar thoughts as we were discussing the attnets debugging work you've been doing.

I think that predicting the validator set size in some future epoch and accounting for how many committees will be is not worth the added complexity here (especially re: @zilm13's comment about the warm-up).

Another couple of notes:

Phase 1 won't have a notion of unused subnets due to the actual assignment of shards to each so replicating full subnets (and the size of persistent vals on the used subnets) to some extent seems reasonable here
Having more validators on persistent subnets has diminishing returns in terms of increased bandwidth to each node.

The biggest arguments might lie in being able to find peers more easily in the dht but this might not be a realistic approximation of what comes in phase 1.

jrhea · 2020-05-01T16:58:00Z

especially re: @zilm13's comment about the warm-up

I don't see his comment as really being an issue. The warmup is accounted for in the PERSISTENT_COMMITTEE_PERIOD of 9 days. If that constant is going away, we simple use EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2 instead. We are just looking for a worst case here to apply as a lead. With a 9 day lead...we are looking at an additional 2 committees.

Phase 1 won't have a notion of unused subnets due to the actual assignment of shards to each so replicating full subnets (and the size of persistent vals on the used subnets) to some extent seems reasonable here

My understanding was that we would be using topic discovery in phase 1.

Having more validators on persistent subnets has diminishing returns in terms of increased bandwidth to each node.

I believe this can be addressed, but it is part of a solution to another concern i have.

The biggest arguments might lie in being able to find peers more easily in the dht but this might not be a realistic approximation of what comes in phase 1.

Again, i thought we plan on implementing topic discovery for phase 1. if not, i see your point. If we are, then i would consider this argument less compelling.

jrhea · 2020-05-01T17:09:49Z

I think that predicting the validator set size in some future epoch and accounting for how many committees will be is not worth the added complexity here

I forgot to comment on this. Ya, I figured you might say that and you would be a better judge of the ramifications of this. Your call boss

nisdas · 2020-05-02T03:50:49Z

This would make sense to use the first option

def get_attestation_subnet_count_at_slot(state: BeaconState, slot: Slot) -> uint64:  
    return min(get_committee_count_at_slot(state,slot), 64)

Using the current active validator set seems like a good gauge as any for determining the amount of subnets.

Although there might be a possibility of newly added validators biasing the subnets through this period EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2. although I dont think it is something we should worry about.

The biggest arguments might lie in being able to find peers more easily in the dht but this might not be a realistic approximation of what comes in phase 1.

While that maybe true, the initial genesis validator set would only have 4 committees. Having validators subscribe to the other 60 empty subnets seems a bit pointless. You would have 94% of your validators not participating in a functioning subnet.

djrtwo · 2020-05-04T15:05:24Z

You would have 94% of your validators not participating in a functioning subnet.

Which is totally fine. If we move to phase 1 with similar number of validators (this is feasible though unlikely), all persistent subnets would then be used, but they would also be the same size as those today. The size of the persistent subnet approximates the phase 1 behaviour more than all of the validators being assigned to subnets that are actually used.

Using the current active validator set seems like a good gauge as any for determining the amount of subnets.

But this does not adequately capture the moment in which the subnet count would jump up and as @zilm13 noted, the subnet would be under-provisioned in number of validators.

jrhea · 2020-05-04T15:42:17Z

But this does not adequately capture the moment in which the subnet count would jump up and as @zilm13 noted, the subnet would be under-provisioned in number of validators.

That's why I suggested adding a calculation that gives the worst case possible number of new committees.

def get_attestation_subnet_count_at_slot(state: BeaconState, slot: Slot) -> uint64:  
    max_additional_committees = MIN_PER_EPOCH_CHURN_LIMIT*PERSISTENT_COMMITTEE_PERIOD//SLOTS_PER_EPOCH//TARGET_COMMITTEE_SIZE
    return min(get_committee_count_at_slot(state,slot) + max_additional_committees, 64) // COMMITTEES_PER_SUBNET

Granted that PERSISTENT_COMMITTEE_PERIOD should be changed to EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2, but considering that PERSISTENT_COMMITTEE_PERIOD = 2048 epochs and only results in an additional 2 committees....i think switching to EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2 = 512 epochs is even easier.

We change:

max_additional_committees = MIN_PER_EPOCH_CHURN_LIMIT*PERSISTENT_COMMITTEE_PERIOD//SLOTS_PER_EPOCH//TARGET_COMMITTEE_SIZE = 2 committees

to

max_additional_committees = MIN_PER_EPOCH_CHURN_LIMIT*EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2//SLOTS_PER_EPOCH//TARGET_COMMITTEE_SIZE = 0.5 (round up to 1) committees

What am i missing?

jrhea · 2020-05-09T21:10:48Z

@djrtwo if I remember correctly, one of the possibilities we discussed in the networking call was reverting back to using all 64 committees so that our network is more similar to how it will be in phase 1. Is that still on the table?

djrtwo · 2020-05-11T15:45:46Z

working on that now
On the table, but not certain worth the complexity/change at this point

djrtwo · 2020-05-14T20:29:20Z

fix merged! thanks @jrhea !

nisdas mentioned this issue May 1, 2020

Suspicious looking enr["attnets"] bitfields in Topaz prysmaticlabs/prysm#5696

Closed

protolambda added the scope:networking label May 1, 2020

djrtwo mentioned this issue May 4, 2020

Eth2 Networking Call 5 ethereum/eth2.0-pm#148

Closed

djrtwo mentioned this issue May 11, 2020

[for discussion] Use all attestation subnets in phase 0 #1804

Merged

djrtwo closed this as completed May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ATTESTATION_SUBNET_COUNT should be dynamic #1777

ATTESTATION_SUBNET_COUNT should be dynamic #1777

jrhea commented Apr 30, 2020

zilm13 commented May 1, 2020

djrtwo commented May 1, 2020

jrhea commented May 1, 2020 •

edited

Loading

jrhea commented May 1, 2020

nisdas commented May 2, 2020 •

edited

Loading

djrtwo commented May 4, 2020

jrhea commented May 4, 2020 •

edited

Loading

jrhea commented May 9, 2020 •

edited

Loading

djrtwo commented May 11, 2020 •

edited

Loading

djrtwo commented May 14, 2020

ATTESTATION_SUBNET_COUNT should be dynamic #1777

ATTESTATION_SUBNET_COUNT should be dynamic #1777

Comments

jrhea commented Apr 30, 2020

zilm13 commented May 1, 2020

djrtwo commented May 1, 2020

jrhea commented May 1, 2020 • edited Loading

jrhea commented May 1, 2020

nisdas commented May 2, 2020 • edited Loading

djrtwo commented May 4, 2020

jrhea commented May 4, 2020 • edited Loading

jrhea commented May 9, 2020 • edited Loading

djrtwo commented May 11, 2020 • edited Loading

djrtwo commented May 14, 2020

jrhea commented May 1, 2020 •

edited

Loading

nisdas commented May 2, 2020 •

edited

Loading

jrhea commented May 4, 2020 •

edited

Loading

jrhea commented May 9, 2020 •

edited

Loading

djrtwo commented May 11, 2020 •

edited

Loading