Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition proactive refresh #504

Merged
merged 11 commits into from
Nov 20, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified crypto/server/entropy_metadata.scale
Binary file not shown.
15 changes: 15 additions & 0 deletions crypto/server/src/helpers/substrate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ use subxt::{

use crate::{
chain_api::{entropy, EntropyConfig},
signing_client::ProtocolErr,
user::UserErr,
};

Expand Down Expand Up @@ -150,3 +151,17 @@ pub async fn get_key_visibility(
.ok_or_else(|| UserErr::NotRegistering("Register Onchain first"))?;
Ok(result.key_visibility.0)
}

/// Queries chain for the amount of proactive refreshes done so far
pub async fn get_refreshes_done(
api: &OnlineClient<EntropyConfig>,
rpc: &LegacyRpcMethods<EntropyConfig>,
) -> Result<u128, ProtocolErr> {
let block_hash = rpc
.chain_get_block_hash(None)
.await?
.ok_or_else(|| ProtocolErr::OptionUnwrapError("Error getting block hash".to_string()))?;
let refreshes_done_query = entropy::storage().staking_extension().refreshes_done();
let result = api.storage().at(block_hash).fetch_or_default(&refreshes_done_query).await?;
Ok(result)
}
54 changes: 51 additions & 3 deletions crypto/server/src/signing_client/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ use entropy_protocol::{
};
use parity_scale_codec::Encode;

use entropy_shared::{KeyVisibility, OcwMessageProactiveRefresh, SETUP_TIMEOUT_SECONDS};
use entropy_shared::{
KeyVisibility, OcwMessageProactiveRefresh, REFRESHES_PRE_SESSION, SETUP_TIMEOUT_SECONDS,
};
use kvdb::kv_manager::{
helpers::{deserialize, serialize as key_serialize},
KvManager,
Expand All @@ -34,7 +36,9 @@ use crate::{
chain_api::{entropy, get_api, get_rpc, EntropyConfig},
helpers::{
launch::LATEST_BLOCK_NUMBER_PROACTIVE_REFRESH,
substrate::{get_key_visibility, get_subgroup, return_all_addresses_of_subgroup},
substrate::{
get_key_visibility, get_refreshes_done, get_subgroup, return_all_addresses_of_subgroup,
},
user::{check_in_registration_group, send_key},
validator::{get_signer, get_subxt_signer},
},
Expand Down Expand Up @@ -67,6 +71,8 @@ pub async fn proactive_refresh(
// TODO batch the network keys into smaller groups per session
let all_keys =
get_all_keys(&api, &rpc).await.map_err(|e| ProtocolErr::ValidatorErr(e.to_string()))?;
let refreshes_done = get_refreshes_done(&api, &rpc).await?;
let proactive_refresh_keys = partition_all_keys(refreshes_done, all_keys).await?;
let (subgroup, stash_address) = get_subgroup(&api, &rpc, &signer)
.await
.map_err(|e| ProtocolErr::UserError(e.to_string()))?;
Expand All @@ -77,7 +83,7 @@ pub async fn proactive_refresh(
let subxt_signer = get_subxt_signer(&app_state.kv_store)
.await
.map_err(|e| ProtocolErr::UserError(e.to_string()))?;
for key in all_keys {
for key in proactive_refresh_keys {
let sig_request_address = AccountId32::from_str(&key).map_err(ProtocolErr::StringError)?;
let key_visibility =
get_key_visibility(&api, &rpc, &sig_request_address.clone().into()).await.unwrap();
Expand Down Expand Up @@ -252,3 +258,45 @@ pub async fn validate_proactive_refresh(
kv_manager.kv().put(reservation, latest_block_number.to_be_bytes().to_vec()).await?;
Ok(())
}

/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION)
/// Currently rotates between a moving batch of all keys
/// https://github.com/entropyxyz/entropy-core/issues/510
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// https://github.com/entropyxyz/entropy-core/issues/510

Same thing here, we've got the issue so we're good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk man personally I like to keep these in the code base

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, made a different suggestion in another comment then

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay no worries. Let's add a bit more context here then. How about something like:

Suggested change
/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION)
/// Currently rotates between a moving batch of all keys
/// https://github.com/entropyxyz/entropy-core/issues/510
/// Partitions all registered keys into a subset of the network (REFRESHES_PRE_SESSION)
/// Currently rotates between a moving batch of all keys.
///
/// See https://github.com/entropyxyz/entropy-core/issues/510 for some issues which exist
/// around the scaling of this function.

pub async fn partition_all_keys(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to have this as a non-async function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this is being done by the server instead of on-chain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo stress to be done on chain having a subset of validators know this instead of every node in the network do this seems like a better call

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there not be problems if validators don't end up coming to agreement on how the network has been partitioned? That's why I'm thinking it might be better to have the chain be the canonical source of truth here

refreshes_done: u128,
all_keys: Vec<String>,
) -> Result<Vec<String>, ProtocolErr> {
let all_keys_length = all_keys.len() as u128;
let usized_refreshed_pre_session = REFRESHES_PRE_SESSION as usize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of type casting going on in this function. Maybe we should try and change the types so they work together a bit better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya so the issue here is that I need usizes for the len and the slicing, but I was not able to get substrate to compile with usizes so I opted for u128 and coverting to usize when I needed

// just return all keys no need to partition network
if REFRESHES_PRE_SESSION > all_keys_length {
return Ok(all_keys);
}
let mut refresh_keys: Vec<String> = vec![];
// handles early on refreshes before refreshes done > all keys
if refreshes_done + REFRESHES_PRE_SESSION <= all_keys_length {
refresh_keys = all_keys
[refreshes_done as usize..refreshes_done as usize + usized_refreshed_pre_session]
.to_vec();
}
// normalize refreshes done down to a partition of the network
let normalized_refreshes_done = refreshes_done % all_keys_length;
let normalized_refreshes_done_as_usize = normalized_refreshes_done as usize;

if normalized_refreshes_done + REFRESHES_PRE_SESSION <= all_keys_length {
refresh_keys = all_keys[normalized_refreshes_done_as_usize
..normalized_refreshes_done_as_usize + usized_refreshed_pre_session]
.to_vec();
}
// handles if number does not perfectly fit
// loops around the partiton adding the beginning of the network to the end
if normalized_refreshes_done + REFRESHES_PRE_SESSION > all_keys_length {
let leftover =
usized_refreshed_pre_session - (all_keys.len() - normalized_refreshes_done_as_usize);
refresh_keys = all_keys[normalized_refreshes_done_as_usize..all_keys.len()].to_vec();
let mut post_turnaround_keys = all_keys[0..leftover].to_vec();
refresh_keys.append(&mut post_turnaround_keys);
}

Ok(refresh_keys)
}
27 changes: 26 additions & 1 deletion crypto/server/src/signing_client/tests.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use super::api::validate_proactive_refresh;
use super::api::{partition_all_keys, validate_proactive_refresh};
use crate::{
chain_api::{get_api, get_rpc},
helpers::{
Expand Down Expand Up @@ -196,3 +196,28 @@ async fn test_proactive_refresh_validation_fail() {
assert_eq!(err_stale_data, Err("Data is repeated".to_string()));
clean_tests();
}

#[tokio::test]
async fn test_partition_all_keys() {
initialize_test_logger();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly necessary since there's no logging in partition_all_keys. I guess there could be in the future though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya I mean just seemed like the thing to do, idk your call should I remove or keep

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it then. I've done that in my u32 PR

let all_keys: Vec<String> = (1..=25).map(|num| num.to_string()).collect();

let result_normal_10 = partition_all_keys(2u128, all_keys.clone()).await.unwrap();
assert_eq!(result_normal_10, all_keys[2..12].to_vec());

let result_next_set = partition_all_keys(12u128, all_keys.clone()).await.unwrap();
assert_eq!(result_next_set, all_keys[12..22].to_vec());

let wrap_around_partial = partition_all_keys(23u128, all_keys.clone()).await.unwrap();
let mut wrap_around_partial_vec = all_keys[23..25].to_vec();
wrap_around_partial_vec.append(&mut all_keys[0..8].to_vec());
assert_eq!(wrap_around_partial, wrap_around_partial_vec);

let result_larger = partition_all_keys(32u128, all_keys.clone()).await.unwrap();
assert_eq!(result_larger, all_keys[7..17].to_vec());

let wrap_around_partial_larger = partition_all_keys(42u128, all_keys.clone()).await.unwrap();
let mut wrap_around_partial_larger_vec = all_keys[17..25].to_vec();
wrap_around_partial_larger_vec.append(&mut all_keys[0..2].to_vec());
assert_eq!(wrap_around_partial_larger, wrap_around_partial_larger_vec);
}
3 changes: 3 additions & 0 deletions crypto/shared/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,6 @@ pub const PRUNE_BLOCK: u32 = 14400;

/// Timeout for validators to wait for other validators to join protocol committees
pub const SETUP_TIMEOUT_SECONDS: u64 = 20;

/// The amount of proactive refreshes we do per session
pub const REFRESHES_PRE_SESSION: u128 = 10;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub const REFRESHES_PRE_SESSION: u128 = 10;
pub const REFRESHES_PER_SESSION: u32 = 10;

We can get away with using something smaller here for the type, maybe even a u8.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this type. Is this referring to the number of times per session we do the whole proactive refresh algorithm, or the number of validators per session that are split according to the proactive refresh algorithm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type is u128 so I don't have to do extra conversations, I guess the question is what do we optimize for, smaller type or less conversions in the partition function

It is how many accounts we do a proactive refresh on per session so if we have 1000 registered accounts we would take them in x amount of accounts per session to not overstress our validators

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left #513 as a suggestion for changing things to a u32

7 changes: 7 additions & 0 deletions pallets/staking/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -140,10 +140,16 @@ pub mod pallet {
ValueQuery,
>;

/// A trigger for the proactive refresh OCW
#[pallet::storage]
#[pallet::getter(fn proactive_refresh)]
pub type ProactiveRefresh<T: Config> = StorageValue<_, Vec<ValidatorInfo>, ValueQuery>;

/// Total amount of refreshes done on the network
#[pallet::storage]
#[pallet::getter(fn refreshes_done)]
pub type RefreshesDone<T: Config> = StorageValue<_, u128, ValueQuery>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the partition algorithm uses this as a sort of nonce to do it's work with? Would it be safe to use pseudo-random hash or something instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, to the first question, and I want it to go in order, like do the first 10 accounts, then the next 10, if it was pseudorandom we would have accounts get refreshed more times than needed while having others not be

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha thanks.

Is there any risk to validators knowing they were "next up" on being refreshed?


#[pallet::genesis_config]
#[derive(DefaultNoBound)]
pub struct GenesisConfig<T: Config> {
Expand Down Expand Up @@ -350,6 +356,7 @@ pub mod pallet {
pub fn new_session_handler(
validators: &[<T as pallet_session::Config>::ValidatorId],
) -> Result<(), DispatchError> {
// TODO add back in refresh trigger and refreshed counter https://github.com/entropyxyz/entropy-core/issues/511
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need this TODO here since we have #511 open.

Suggested change
// TODO add back in refresh trigger and refreshed counter https://github.com/entropyxyz/entropy-core/issues/511

// Init a 2D Vec where indices and values represent subgroups and validators,
// respectively.
let mut new_validators_set: Vec<Vec<<T as pallet_session::Config>::ValidatorId>> =
Expand Down