-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partition proactive refresh #504
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
eff91fe
to
f10877c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to read through your algorithm code and test a bit. I left some comments and questions for now
@@ -350,6 +356,7 @@ pub mod pallet { | |||
pub fn new_session_handler( | |||
validators: &[<T as pallet_session::Config>::ValidatorId], | |||
) -> Result<(), DispatchError> { | |||
// TODO add back in refresh trigger and refreshed counter https://github.com/entropyxyz/entropy-core/issues/511 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we need this TODO
here since we have #511 open.
// TODO add back in refresh trigger and refreshed counter https://github.com/entropyxyz/entropy-core/issues/511 |
pallets/staking/src/lib.rs
Outdated
/// Total amount of refreshes done on the network | ||
#[pallet::storage] | ||
#[pallet::getter(fn refreshes_done)] | ||
pub type RefreshesDone<T: Config> = StorageValue<_, u128, ValueQuery>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the partition algorithm uses this as a sort of nonce to do it's work with? Would it be safe to use pseudo-random hash or something instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, to the first question, and I want it to go in order, like do the first 10 accounts, then the next 10, if it was pseudorandom we would have accounts get refreshed more times than needed while having others not be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha thanks.
Is there any risk to validators knowing they were "next up" on being refreshed?
crypto/shared/src/lib.rs
Outdated
@@ -18,3 +18,6 @@ pub const PRUNE_BLOCK: u32 = 14400; | |||
|
|||
/// Timeout for validators to wait for other validators to join protocol committees | |||
pub const SETUP_TIMEOUT_SECONDS: u64 = 20; | |||
|
|||
/// The amount of proactive refreshes we do per session | |||
pub const REFRESHES_PRE_SESSION: u128 = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub const REFRESHES_PRE_SESSION: u128 = 10; | |
pub const REFRESHES_PER_SESSION: u32 = 10; |
We can get away with using something smaller here for the type, maybe even a u8
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this type. Is this referring to the number of times per session we do the whole proactive refresh algorithm, or the number of validators per session that are split according to the proactive refresh algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type is u128 so I don't have to do extra conversations, I guess the question is what do we optimize for, smaller type or less conversions in the partition function
It is how many accounts we do a proactive refresh on per session so if we have 1000 registered accounts we would take them in x amount of accounts per session to not overstress our validators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left #513 as a suggestion for changing things to a u32
/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION) | ||
/// Currently rotates between a moving batch of all keys | ||
/// https://github.com/entropyxyz/entropy-core/issues/510 | ||
pub async fn partition_all_keys( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to have this as a non-async
function
|
||
/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION) | ||
/// Currently rotates between a moving batch of all keys | ||
/// https://github.com/entropyxyz/entropy-core/issues/510 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// https://github.com/entropyxyz/entropy-core/issues/510 |
Same thing here, we've got the issue so we're good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk man personally I like to keep these in the code base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, made a different suggestion in another comment then
let all_keys_length = all_keys.len() as u128; | ||
let usized_refreshed_pre_session = REFRESHES_PRE_SESSION as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of type casting going on in this function. Maybe we should try and change the types so they work together a bit better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya so the issue here is that I need usizes for the len and the slicing, but I was not able to get substrate to compile with usizes so I opted for u128 and coverting to usize when I needed
|
||
#[tokio::test] | ||
async fn test_partition_all_keys() { | ||
initialize_test_logger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't strictly necessary since there's no logging in partition_all_keys
. I guess there could be in the future though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya I mean just seemed like the thing to do, idk your call should I remove or keep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it then. I've done that in my u32
PR
/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION) | ||
/// Currently rotates between a moving batch of all keys | ||
/// https://github.com/entropyxyz/entropy-core/issues/510 | ||
pub async fn partition_all_keys( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this is being done by the server
instead of on-chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undo stress to be done on chain having a subset of validators know this instead of every node in the network do this seems like a better call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there not be problems if validators don't end up coming to agreement on how the network has been partitioned? That's why I'm thinking it might be better to have the chain be the canonical source of truth here
Im trying to understand this - i think i don't totally understand the process by which proactive refresh gets activated in the first place. It looks like the propagation pallet sends out entropy-core/pallets/propagation/src/lib.rs Line 142 in 41d605d
and then in this PR, those TSS servers ask the staking pallet for the RefreshesDone value by which to decide which accounts should proactive refresh this time around. But i'm missing where RefreshesDone gets updated.
Also could we put The algorithm itself looks great! But from what you said on the dev call i have the impression you're not totally happy with this setup. And i still don't understand what the issue is relating to how rocks/paritydb orders keys. I'd be up for having a call sometime to look at this in more detail. |
ya we can have a call tomorrow the activating proactive refresh can be seen here #511 which includes the incrementing of the refreshed counter and yes that can def be sent in the ocw message |
|
||
#[tokio::test] | ||
async fn test_partition_all_keys() { | ||
initialize_test_logger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it then. I've done that in my u32
PR
crypto/shared/src/lib.rs
Outdated
@@ -18,3 +18,6 @@ pub const PRUNE_BLOCK: u32 = 14400; | |||
|
|||
/// Timeout for validators to wait for other validators to join protocol committees | |||
pub const SETUP_TIMEOUT_SECONDS: u64 = 20; | |||
|
|||
/// The amount of proactive refreshes we do per session | |||
pub const REFRESHES_PRE_SESSION: u128 = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left #513 as a suggestion for changing things to a u32
pallets/staking/src/lib.rs
Outdated
/// Total amount of refreshes done on the network | ||
#[pallet::storage] | ||
#[pallet::getter(fn refreshes_done)] | ||
pub type RefreshesDone<T: Config> = StorageValue<_, u128, ValueQuery>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha thanks.
Is there any risk to validators knowing they were "next up" on being refreshed?
/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION) | ||
/// Currently rotates between a moving batch of all keys | ||
/// https://github.com/entropyxyz/entropy-core/issues/510 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay no worries. Let's add a bit more context here then. How about something like:
/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION) | |
/// Currently rotates between a moving batch of all keys | |
/// https://github.com/entropyxyz/entropy-core/issues/510 | |
/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION) | |
/// Currently rotates between a moving batch of all keys. | |
/// | |
/// See https://github.com/entropyxyz/entropy-core/issues/510 for some issues which exist | |
/// around the scaling of this function. |
#504 (comment) @HCastano sorry outdated so couldn't respond, but possible in only a DDos scenerio, if that is the case would have to be fixed in dkg too, this should be a standalone issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this, after getting clear on the ordering issues on a call. Looks great!
@@ -86,6 +86,12 @@ pub mod pallet { | |||
pub x25519_public_key: X25519PublicKey, | |||
pub endpoint: TssServerURL, | |||
} | |||
/// Info that is requiered to do a proactive refresh | |||
#[derive(Clone, Encode, Decode, Eq, PartialEq, RuntimeDebug, TypeInfo, Default)] | |||
pub struct RefreshInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct is very similar to OcwMessageProactiveRefresh
. Im guessing the reason you didn't use the same type is because of compilation issues with using the RuntimeDebug
and TypeInfo
traits in entropy-shared
. Possibly we could get that to work by fiddling around with feature flags, but i think this is fine as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes lol this is exactly what happened
Creates an algorithm for partitioning the network into chunks for the proactive refresh.