Partition proactive refresh #504

JesseAbram · 2023-11-15T20:12:42Z

Creates an algorithm for partitioning the network into chunks for the proactive refresh.

This allows us to set a constant REFRESHES_PRE_SESSION and the network will grab a rotating batch of registered keys to refresh
The network will "loop around" when all keys have been refreshed starting the process again
This does not take into account that the registered accounts will change over time and will be sorted added and not appended onto the registered key struct, will make an issue to handle Fix Partition all keys function #510

vercel · 2023-11-15T20:12:47Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
entropy-core	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Nov 17, 2023 9:32pm

HCastano

I still need to read through your algorithm code and test a bit. I left some comments and questions for now

HCastano · 2023-11-16T19:27:42Z

pallets/staking/src/lib.rs

@@ -350,6 +356,7 @@ pub mod pallet {
        pub fn new_session_handler(
            validators: &[<T as pallet_session::Config>::ValidatorId],
        ) -> Result<(), DispatchError> {
+            // TODO add back in refresh trigger and refreshed counter https://github.com/entropyxyz/entropy-core/issues/511


Don't think we need this TODO here since we have #511 open.

Suggested change

// TODO add back in refresh trigger and refreshed counter https://github.com/entropyxyz/entropy-core/issues/511

HCastano · 2023-11-16T19:29:19Z

pallets/staking/src/lib.rs

+    /// Total amount of refreshes done on the network
+    #[pallet::storage]
+    #[pallet::getter(fn refreshes_done)]
+    pub type RefreshesDone<T: Config> = StorageValue<_, u128, ValueQuery>;


So the partition algorithm uses this as a sort of nonce to do it's work with? Would it be safe to use pseudo-random hash or something instead?

yes, to the first question, and I want it to go in order, like do the first 10 accounts, then the next 10, if it was pseudorandom we would have accounts get refreshed more times than needed while having others not be

Gotcha thanks.

Is there any risk to validators knowing they were "next up" on being refreshed?

HCastano · 2023-11-16T19:32:56Z

crypto/shared/src/lib.rs

@@ -18,3 +18,6 @@ pub const PRUNE_BLOCK: u32 = 14400;

 /// Timeout for validators to wait for other validators to join protocol committees
 pub const SETUP_TIMEOUT_SECONDS: u64 = 20;
+
+/// The amount of proactive refreshes we do per session
+pub const REFRESHES_PRE_SESSION: u128 = 10;


Suggested change

pub const REFRESHES_PRE_SESSION: u128 = 10;

pub const REFRESHES_PER_SESSION: u32 = 10;

We can get away with using something smaller here for the type, maybe even a u8.

I'm a bit confused by this type. Is this referring to the number of times per session we do the whole proactive refresh algorithm, or the number of validators per session that are split according to the proactive refresh algorithm?

This type is u128 so I don't have to do extra conversations, I guess the question is what do we optimize for, smaller type or less conversions in the partition function

It is how many accounts we do a proactive refresh on per session so if we have 1000 registered accounts we would take them in x amount of accounts per session to not overstress our validators

Left #513 as a suggestion for changing things to a u32

HCastano · 2023-11-16T19:41:18Z

crypto/server/src/signing_client/api.rs

+/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION)
+/// Currently rotates between a moving batch of all keys
+/// https://github.com/entropyxyz/entropy-core/issues/510
+pub async fn partition_all_keys(


We should be able to have this as a non-async function

HCastano · 2023-11-16T19:45:04Z

crypto/server/src/signing_client/api.rs

+
+/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION)
+/// Currently rotates between a moving batch of all keys
+/// https://github.com/entropyxyz/entropy-core/issues/510


Suggested change

/// https://github.com/entropyxyz/entropy-core/issues/510

Same thing here, we've got the issue so we're good

idk man personally I like to keep these in the code base

Okay, made a different suggestion in another comment then

HCastano · 2023-11-16T19:46:38Z

crypto/server/src/signing_client/api.rs

+    let all_keys_length = all_keys.len() as u128;
+    let usized_refreshed_pre_session = REFRESHES_PRE_SESSION as usize;


There's a lot of type casting going on in this function. Maybe we should try and change the types so they work together a bit better

ya so the issue here is that I need usizes for the len and the slicing, but I was not able to get substrate to compile with usizes so I opted for u128 and coverting to usize when I needed

HCastano · 2023-11-16T19:50:49Z

crypto/server/src/signing_client/tests.rs

+
+#[tokio::test]
+async fn test_partition_all_keys() {
+    initialize_test_logger();


This isn't strictly necessary since there's no logging in partition_all_keys. I guess there could be in the future though

ya I mean just seemed like the thing to do, idk your call should I remove or keep

Let's remove it then. I've done that in my u32 PR

HCastano · 2023-11-16T20:07:35Z

crypto/server/src/signing_client/api.rs

+/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION)
+/// Currently rotates between a moving batch of all keys
+/// https://github.com/entropyxyz/entropy-core/issues/510
+pub async fn partition_all_keys(


How come this is being done by the server instead of on-chain?

undo stress to be done on chain having a subset of validators know this instead of every node in the network do this seems like a better call

Can there not be problems if validators don't end up coming to agreement on how the network has been partitioned? That's why I'm thinking it might be better to have the chain be the canonical source of truth here

ameba23 · 2023-11-16T22:34:46Z

Im trying to understand this - i think i don't totally understand the process by which proactive refresh gets activated in the first place.

It looks like the propagation pallet sends out OcwMessageProactiveRefresh messages to all TSS servers:

entropy-core/pallets/propagation/src/lib.rs

Line 142 in 41d605d

pallet_relayer::Pallet::<T>::get_validator_info().unwrap_or_default();

and then in this PR, those TSS servers ask the staking pallet for the RefreshesDone value by which to decide which accounts should proactive refresh this time around. But i'm missing where RefreshesDone gets updated.

Also could we put RefreshesDone into OcwMessageProactiveRefresh?

The algorithm itself looks great! But from what you said on the dev call i have the impression you're not totally happy with this setup. And i still don't understand what the issue is relating to how rocks/paritydb orders keys.

I'd be up for having a call sometime to look at this in more detail.

JesseAbram · 2023-11-16T23:23:25Z

Im trying to understand this - i think i don't totally understand the process by which proactive refresh gets activated in the first place.

It looks like the propagation pallet sends out OcwMessageProactiveRefresh messages to all TSS servers:

entropy-core/pallets/propagation/src/lib.rs

Line 142 in 41d605d

pallet_relayer::Pallet::<T>::get_validator_info().unwrap_or_default();

and then in this PR, those TSS servers ask the staking pallet for the RefreshesDone value by which to decide which accounts should proactive refresh this time around. But i'm missing where RefreshesDone gets updated.
Also could we put RefreshesDone into OcwMessageProactiveRefresh?

The algorithm itself looks great! But from what you said on the dev call i have the impression you're not totally happy with this setup. And i still don't understand what the issue is relating to how rocks/paritydb orders keys.

I'd be up for having a call sometime to look at this in more detail.

ya we can have a call tomorrow

the activating proactive refresh can be seen here #511 which includes the incrementing of the refreshed counter

and yes that can def be sent in the ocw message

HCastano · 2023-11-17T16:22:50Z

crypto/server/src/signing_client/tests.rs

+
+#[tokio::test]
+async fn test_partition_all_keys() {
+    initialize_test_logger();


Let's remove it then. I've done that in my u32 PR

HCastano · 2023-11-17T16:23:39Z

crypto/shared/src/lib.rs

@@ -18,3 +18,6 @@ pub const PRUNE_BLOCK: u32 = 14400;

 /// Timeout for validators to wait for other validators to join protocol committees
 pub const SETUP_TIMEOUT_SECONDS: u64 = 20;
+
+/// The amount of proactive refreshes we do per session
+pub const REFRESHES_PRE_SESSION: u128 = 10;


Left #513 as a suggestion for changing things to a u32

HCastano · 2023-11-17T16:24:14Z

pallets/staking/src/lib.rs

+    /// Total amount of refreshes done on the network
+    #[pallet::storage]
+    #[pallet::getter(fn refreshes_done)]
+    pub type RefreshesDone<T: Config> = StorageValue<_, u128, ValueQuery>;


Gotcha thanks.

Is there any risk to validators knowing they were "next up" on being refreshed?

HCastano · 2023-11-17T16:26:29Z

crypto/server/src/signing_client/api.rs

+/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION)
+/// Currently rotates between a moving batch of all keys
+/// https://github.com/entropyxyz/entropy-core/issues/510


Okay no worries. Let's add a bit more context here then. How about something like:

Suggested change

/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION)

/// Currently rotates between a moving batch of all keys

/// https://github.com/entropyxyz/entropy-core/issues/510

/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION)

/// Currently rotates between a moving batch of all keys.

///

/// See https://github.com/entropyxyz/entropy-core/issues/510 for some issues which exist

/// around the scaling of this function.

* Change refresh session type to `u32` * Fix typo * Remove `Result` return type * Remove `unwrap`s from test

JesseAbram · 2023-11-17T20:21:03Z

#504 (comment) @HCastano sorry outdated so couldn't respond, but possible in only a DDos scenerio, if that is the case would have to be fixed in dkg too, this should be a standalone issue

ameba23

Approving this, after getting clear on the ordering issues on a call. Looks great!

ameba23 · 2023-11-18T11:55:46Z

pallets/staking/src/lib.rs

@@ -86,6 +86,12 @@ pub mod pallet {
        pub x25519_public_key: X25519PublicKey,
        pub endpoint: TssServerURL,
    }
+    /// Info that is requiered to do a proactive refresh
+    #[derive(Clone, Encode, Decode, Eq, PartialEq, RuntimeDebug, TypeInfo, Default)]
+    pub struct RefreshInfo {


This struct is very similar to OcwMessageProactiveRefresh. Im guessing the reason you didn't use the same type is because of compilation issues with using the RuntimeDebug and TypeInfo traits in entropy-shared. Possibly we could get that to work by fiddling around with feature flags, but i think this is fine as it is.

yes lol this is exactly what happened

vercel bot deployed to Preview November 15, 2023 20:58 View deployment

JesseAbram mentioned this pull request Nov 16, 2023

Fix Partition all keys function #510

Closed

JesseAbram marked this pull request as ready for review November 16, 2023 16:10

JesseAbram requested review from ameba23, jakehemmerle and HCastano and removed request for jakehemmerle November 16, 2023 16:10

vercel bot deployed to Preview November 16, 2023 16:28 View deployment

JesseAbram added 5 commits November 16, 2023 13:50

partition proactive refresh

f321b02

finished partitioning algorithim

688c344

refactor

8c02b70

fmt

f450b40

document

f10877c

JesseAbram force-pushed the partition-proactive-refresh branch from eff91fe to f10877c Compare November 16, 2023 18:51

add logger to test

5ffbfa6

vercel bot deployed to Preview November 16, 2023 19:42 View deployment

HCastano reviewed Nov 16, 2023

View reviewed changes

refactor

579d092

vercel bot deployed to Preview November 16, 2023 20:43 View deployment

HCastano reviewed Nov 17, 2023

View reviewed changes

Change REFRESHES_PER_SESSION from u128 to u32 (#513)

5cbaa11

* Change refresh session type to `u32` * Fix typo * Remove `Result` return type * Remove `unwrap`s from test

vercel bot deployed to Preview November 17, 2023 17:07 View deployment

move refreshes done to ocw message

eab1851

JesseAbram added 2 commits November 17, 2023 15:35

lint

6612dc2

fix propogation test

dd9a3ef

JesseAbram requested a review from HCastano November 17, 2023 21:27

vercel bot deployed to Preview November 17, 2023 21:32 View deployment

ameba23 approved these changes Nov 18, 2023

View reviewed changes

JesseAbram merged commit 7b7aa10 into master Nov 20, 2023
5 checks passed

JesseAbram deleted the partition-proactive-refresh branch November 20, 2023 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partition proactive refresh #504

Partition proactive refresh #504

JesseAbram commented Nov 15, 2023 •

edited

Loading

vercel bot commented Nov 15, 2023 •

edited

Loading

HCastano left a comment

HCastano Nov 16, 2023

HCastano Nov 16, 2023

JesseAbram Nov 16, 2023

HCastano Nov 17, 2023

HCastano Nov 16, 2023

HCastano Nov 16, 2023

JesseAbram Nov 16, 2023

HCastano Nov 17, 2023

HCastano Nov 16, 2023

HCastano Nov 16, 2023

JesseAbram Nov 16, 2023

HCastano Nov 17, 2023

HCastano Nov 16, 2023

JesseAbram Nov 16, 2023

HCastano Nov 16, 2023

JesseAbram Nov 16, 2023

HCastano Nov 17, 2023

HCastano Nov 16, 2023

JesseAbram Nov 16, 2023

HCastano Nov 17, 2023

ameba23 commented Nov 16, 2023

JesseAbram commented Nov 16, 2023

HCastano Nov 17, 2023

HCastano Nov 17, 2023

HCastano Nov 17, 2023

HCastano Nov 17, 2023

JesseAbram commented Nov 17, 2023

ameba23 left a comment

ameba23 Nov 18, 2023

JesseAbram Nov 18, 2023

	pub const REFRESHES_PRE_SESSION: u128 = 10;
	pub const REFRESHES_PER_SESSION: u32 = 10;

		let all_keys_length = all_keys.len() as u128;
		let usized_refreshed_pre_session = REFRESHES_PRE_SESSION as usize;

Partition proactive refresh #504

Partition proactive refresh #504

Conversation

JesseAbram commented Nov 15, 2023 • edited Loading

vercel bot commented Nov 15, 2023 • edited Loading

HCastano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ameba23 commented Nov 16, 2023

JesseAbram commented Nov 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JesseAbram commented Nov 17, 2023

ameba23 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JesseAbram commented Nov 15, 2023 •

edited

Loading

vercel bot commented Nov 15, 2023 •

edited

Loading