Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATTESTATION_SUBNET_COUNT should be dynamic #1777

Closed
jrhea opened this issue Apr 30, 2020 · 10 comments
Closed

ATTESTATION_SUBNET_COUNT should be dynamic #1777

jrhea opened this issue Apr 30, 2020 · 10 comments

Comments

@jrhea
Copy link
Contributor

jrhea commented Apr 30, 2020

ATTESTATION_SUBNET_COUNT is currently a spec constant that defines the valid range of subnets persistent committees should randomly subscribe to (this value is currently set to 64). When the network has a smaller number of validators, this results in nodes joining subnets with committee counts of 0. Spreading persistent committee members across unused subnets could significantly inhibit our ability to find subnets in a timely fashion.

The following table compares active validators to resulting committee counts:

ACTIVE VALIDATORS COMMITTEE COUNT
16384 4
32768 8
... ...
262144 64

In other words, we don't need 64 attestation subnets until we reach 262,144 active validators.

To fix this, I suggest that we replace ATTESTATION_SUBNET_COUNT with a method that calculates it based on the committee count. Consider something like this to start:

def get_attestation_subnet_count_at_slot(state: BeaconState, slot: Slot) -> uint64:  
    return min(get_committee_count_at_slot(state,slot), 64)

that seems fine if we assume a static validator set, but we really should consider how many validators could exit or activate within a PERSISTENT_COMMITTEE_PERIOD to ensure we create enough attestation subnets. To do that, we should take a look at worst case validator churn limits. From the spec, get_validator_churn_limit() is defined as follows:

def get_validator_churn_limit(state: BeaconState) -> uint64:
    """
    Return the validator churn limit for the current epoch.
    """
    active_validator_indices = get_active_validator_indices(state, get_current_epoch(state))
    return max(MIN_PER_EPOCH_CHURN_LIMIT, len(active_validator_indices) // CHURN_LIMIT_QUOTIENT)

In order to have a churn limit > 4 , this would have to be true:

len(active_validator_indices) > 4 * CHURN_LIMIT_QUOTIENT

In other words, we would need 262,144 active validators. From the table above, we know this is the exact amount needed to reach a committee count of 64. This means that we can calculate the max increase/decrease in committee counts within a PERSISTEN_COMMITTEE_PERIOD as follows:

Change in num active validators = 4*PERSISTEN_COMMITTEE_PERIOD = +/- 8,192 validators
Change in committee count = 8,192/SLOTS_PER_EPOCH/TARGET_COMMITTEE_SIZE = +/- 2

It's probably safe to only consider the max increase in committee count (worst, worst case we have 2 additional attestation subnets). As a result, we can modify our new method get_attestation_subnet_count_at_slot() from above to account for the worst case scenario where 2 additional committees could be added within a PERSISTENT_COMMITTEE_PERIOD:

def get_attestation_subnet_count_at_slot(state: BeaconState, slot: Slot) -> uint64:  
    max_additional_committees = MIN_PER_EPOCH_CHURN_LIMIT*PERSISTEN_COMMITTEE_PERIOD//SLOTS_PER_EPOCH//TARGET_COMMITTEE_SIZE
    return min(get_committee_count_at_slot(state,slot) + max_additional_committees, 64)

If we want to retain the ability to control the committee to attestation subnets ratio, we could add a configurable spec constant called COMMITTEES_PER_SUBNET that can be set to 1 and get_attestation_subnet_count_at_slot could be modified as follows:

def get_attestation_subnet_count_at_slot(state: BeaconState, slot: Slot) -> uint64:  
    max_additional_committees = MIN_PER_EPOCH_CHURN_LIMIT*PERSISTEN_COMMITTEE_PERIOD//SLOTS_PER_EPOCH//TARGET_COMMITTEE_SIZE
    return min(get_committee_count_at_slot(state,slot) + max_additional_committees, 64) // COMMITTEES_PER_SUBNET

Credit/blame to @mkalinin for talking this out with me and @djrtwo for pointing out that low validator counts could result in persistent committee members joining unused subnets.

@zilm13
Copy link
Contributor

zilm13 commented May 1, 2020

Also we need EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2 (54 hours) warm-up period when validators are required to join randomly to new subnet, but it's not involved in any attestation/block duties. Otherwise we have 0 nodes skeleton when new subnet is added. It couldn't be discovered.

@djrtwo
Copy link
Contributor

djrtwo commented May 1, 2020

Thanks @jrhea! I had similar thoughts as we were discussing the attnets debugging work you've been doing.

I think that predicting the validator set size in some future epoch and accounting for how many committees will be is not worth the added complexity here (especially re: @zilm13's comment about the warm-up).

Another couple of notes:

  • Phase 1 won't have a notion of unused subnets due to the actual assignment of shards to each so replicating full subnets (and the size of persistent vals on the used subnets) to some extent seems reasonable here
  • Having more validators on persistent subnets has diminishing returns in terms of increased bandwidth to each node.

The biggest arguments might lie in being able to find peers more easily in the dht but this might not be a realistic approximation of what comes in phase 1.

@jrhea
Copy link
Contributor Author

jrhea commented May 1, 2020

especially re: @zilm13's comment about the warm-up

I don't see his comment as really being an issue. The warmup is accounted for in the PERSISTENT_COMMITTEE_PERIOD of 9 days. If that constant is going away, we simple use EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2 instead. We are just looking for a worst case here to apply as a lead. With a 9 day lead...we are looking at an additional 2 committees.

Phase 1 won't have a notion of unused subnets due to the actual assignment of shards to each so replicating full subnets (and the size of persistent vals on the used subnets) to some extent seems reasonable here

My understanding was that we would be using topic discovery in phase 1.

Having more validators on persistent subnets has diminishing returns in terms of increased bandwidth to each node.

I believe this can be addressed, but it is part of a solution to another concern i have.

The biggest arguments might lie in being able to find peers more easily in the dht but this might not be a realistic approximation of what comes in phase 1.

Again, i thought we plan on implementing topic discovery for phase 1. if not, i see your point. If we are, then i would consider this argument less compelling.

@jrhea
Copy link
Contributor Author

jrhea commented May 1, 2020

I think that predicting the validator set size in some future epoch and accounting for how many committees will be is not worth the added complexity here

I forgot to comment on this. Ya, I figured you might say that and you would be a better judge of the ramifications of this. Your call boss

@nisdas
Copy link
Contributor

nisdas commented May 2, 2020

This would make sense to use the first option

def get_attestation_subnet_count_at_slot(state: BeaconState, slot: Slot) -> uint64:  
    return min(get_committee_count_at_slot(state,slot), 64)

Using the current active validator set seems like a good gauge as any for determining the amount of subnets.

Although there might be a possibility of newly added validators biasing the subnets through this period EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2. although I dont think it is something we should worry about.

The biggest arguments might lie in being able to find peers more easily in the dht but this might not be a realistic approximation of what comes in phase 1.

While that maybe true, the initial genesis validator set would only have 4 committees. Having validators subscribe to the other 60 empty subnets seems a bit pointless. You would have 94% of your validators not participating in a functioning subnet.

@djrtwo
Copy link
Contributor

djrtwo commented May 4, 2020

You would have 94% of your validators not participating in a functioning subnet.

Which is totally fine. If we move to phase 1 with similar number of validators (this is feasible though unlikely), all persistent subnets would then be used, but they would also be the same size as those today. The size of the persistent subnet approximates the phase 1 behaviour more than all of the validators being assigned to subnets that are actually used.

Using the current active validator set seems like a good gauge as any for determining the amount of subnets.

But this does not adequately capture the moment in which the subnet count would jump up and as @zilm13 noted, the subnet would be under-provisioned in number of validators.

@jrhea
Copy link
Contributor Author

jrhea commented May 4, 2020

But this does not adequately capture the moment in which the subnet count would jump up and as @zilm13 noted, the subnet would be under-provisioned in number of validators.

That's why I suggested adding a calculation that gives the worst case possible number of new committees.

def get_attestation_subnet_count_at_slot(state: BeaconState, slot: Slot) -> uint64:  
    max_additional_committees = MIN_PER_EPOCH_CHURN_LIMIT*PERSISTENT_COMMITTEE_PERIOD//SLOTS_PER_EPOCH//TARGET_COMMITTEE_SIZE
    return min(get_committee_count_at_slot(state,slot) + max_additional_committees, 64) // COMMITTEES_PER_SUBNET

Granted that PERSISTENT_COMMITTEE_PERIOD should be changed to EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2, but considering that PERSISTENT_COMMITTEE_PERIOD = 2048 epochs and only results in an additional 2 committees....i think switching to EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2 = 512 epochs is even easier.

We change:

max_additional_committees = MIN_PER_EPOCH_CHURN_LIMIT*PERSISTENT_COMMITTEE_PERIOD//SLOTS_PER_EPOCH//TARGET_COMMITTEE_SIZE = 2 committees

to

max_additional_committees = MIN_PER_EPOCH_CHURN_LIMIT*EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION * 2//SLOTS_PER_EPOCH//TARGET_COMMITTEE_SIZE = 0.5 (round up to 1) committees

What am i missing?

@jrhea
Copy link
Contributor Author

jrhea commented May 9, 2020

@djrtwo if I remember correctly, one of the possibilities we discussed in the networking call was reverting back to using all 64 committees so that our network is more similar to how it will be in phase 1. Is that still on the table?

@djrtwo
Copy link
Contributor

djrtwo commented May 11, 2020

working on that now
On the table, but not certain worth the complexity/change at this point

@djrtwo
Copy link
Contributor

djrtwo commented May 14, 2020

fix merged! thanks @jrhea !

@djrtwo djrtwo closed this as completed May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants