Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test for validator joining network #493

Merged
merged 12 commits into from
Nov 13, 2023

Conversation

JesseAbram
Copy link
Member

@JesseAbram JesseAbram commented Nov 10, 2023

adds an integration test for joining and syncing a validator

Copy link

vercel bot commented Nov 10, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
entropy-core ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 13, 2023 9:30pm

Copy link
Contributor

@ameba23 ameba23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test looks great.

A lot of my comments relate to the sync_validator function which isn't really relevant to this PR since that code was already there, this just moves it.

One other thing while i am looking at this - get_and_store_values will fail if it encounters a keyshare which is already in the db, as reserve_key will return an error. So sync_validator can only be run once when starting for the first time. If for whatever reason it fails to complete and we get some keyshares but not all, we can't run it again. For this case it might be better to ignore the error from reserve_key and keep getting the other keys in case there's some we don't have. I can make this into an issue.

pub async fn setup_mnemonic(kv: &KvManager, is_alice: bool, is_bob: bool) -> Result<(), KvError> {
pub async fn setup_mnemonic(
kv: &KvManager,
is_alice: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An enum would be neater but im happy with leaving it like this

// if not in subgroup retry until you are
let mut my_subgroup = get_subgroup(&api, &rpc, &signer).await;
while my_subgroup.is_err() {
println!("you are not currently a validator, retrying");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with log::warn or are we happy with printlns in server?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im ok with print line in server, unless other's are not, but currently we only do log in the substrate portions, gonna leave it here but if you feel strongly create an issue and card and we can overhaul all println in server

let health = rpc.system_health().await.expect("Issue checking chain health");
is_syncing = health.is_syncing;
if is_syncing {
println!("chain syncing, retrying {is_syncing:?}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use log::info or log::debug?

}
}
// TODO: find a proper batch size
let batch_size = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not relevant to this PR as i think this was already here, but could this be a constant rather than needing to be passed between functions?

String::from_utf8(key_server_info.endpoint).expect("failed to parse IP address.");
let recip_key = x25519_dalek::PublicKey::from(key_server_info.x25519_public_key);
let all_keys = get_all_keys(&api, &rpc).await.expect("failed to get all keys.");
let _ = get_and_store_values(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check this does what you intended - get_and_store_values returns a result, so if there is an error we ignore it, and go ahead with tell_chain_syncing_is_done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this handles a failure in a reservation of the kvdb but does it in a bad way, ill add an exist check here to fix

dev: bool,
endpoint: &str,
kv_store: &KvManager,
) -> Result<(), ()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems strange to have a Result that never errors. I guess as this is called when launching server we are happy with panicking if something goes wrong. But i think id prefer to make this function return an error, and call it with expect from main.rs.

anyhow is quite good for this - you can also make a main function return an anyhow::Result and it will display the error nicely and give a non-zero process exit code.

But for now i think panicking is fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya will remove the result, leave the panics I mean we can remove them but idk if it is worth it but can talk about it for another PR

}
// if not in subgroup retry until you are
let mut my_subgroup = get_subgroup(&api, &rpc, &signer).await;
while my_subgroup.is_err() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a maximum number of tries?

Copy link
Member Author

@JesseAbram JesseAbram Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no so in this case this is like npos stuff, like you can set up your validator while you are trying to get into the validator rotation and kinda just keep it running

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in like a validator can keep this running as long as they are willing to pay the compute cost while they try to get into the validator set

let mut is_syncing = true;
let sleep_time = Duration::from_secs(20);
// wait for chain to be fully synced before starting key swap
while is_syncing {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a maximum number of tries? when we are deploying a lot of these it might be better to assume no-one is reading the logs and can see why its just hanging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I mean, in that case it is just a validator running in its own machine, let's say a year from now the chain could get long and it could take time to sync..... there is a linear relation between how long this will take and how long the chain is

@JesseAbram
Copy link
Member Author

The test looks great.

A lot of my comments relate to the sync_validator function which isn't really relevant to this PR since that code was already there, this just moves it.

One other thing while i am looking at this - get_and_store_values will fail if it encounters a keyshare which is already in the db, as reserve_key will return an error. So sync_validator can only be run once when starting for the first time. If for whatever reason it fails to complete and we get some keyshares but not all, we can't run it again. For this case it might be better to ignore the error from reserve_key and keep getting the other keys in case there's some we don't have. I can make this into an issue.

agreed imma make an issue and a card for this

Comment on lines 116 to 118
is_alice: bool,
is_bob: bool,
is_charlie: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we took an enum here instead and called the function a few times from the respective tests?

E.g

enum Keyring {
    Alice,
    Bob,
    ...
}

pub async fn setup_mnemonic(kv: &KvManager, account: Keyring) { ... }

#[test]
fn example() {
    setup_mnemonic(&kv, Keyring::Alice);
    setup_mnemonic(&kv, Keyring::Bob);
}

crypto/server/src/validator/api.rs Outdated Show resolved Hide resolved
crypto/server/src/validator/api.rs Outdated Show resolved Hide resolved
crypto/testing-utils/src/substrate_context.rs Outdated Show resolved Hide resolved
let rpc = get_rpc(&ctx.ws_url).await.unwrap();
let values = vec![vec![10], vec![11], vec![12]];
let keys = vec![
"5CiPPseXPECbkjWCa6MnjNokrgYjMqmKndv2rSnekmSK2DjL".to_string(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have these as part of a keyring enum or sets of consts

Comment on lines 48 to 51
sync: bool,
dev: bool,
endpoint: &str,
kv_store: &KvManager,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also move these into a SyncConfig struct or something to make it easier to understand what's going on from the call site

// if not in subgroup retry until you are
let mut my_subgroup = get_subgroup(&api, &rpc, &signer).await;
while my_subgroup.is_err() {
println!("you are not currently a validator, retrying");
Copy link
Collaborator

@HCastano HCastano Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use any sort of logging library yet (e.g log)? If not we should make an issue to use that instead of println!s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not, I feel like we continued this in discord iirc

JesseAbram and others added 5 commits November 13, 2023 15:51
Co-authored-by: Hernando Castano <HCastano@users.noreply.github.com>
Co-authored-by: Hernando Castano <HCastano@users.noreply.github.com>
Co-authored-by: Hernando Castano <HCastano@users.noreply.github.com>
@JesseAbram JesseAbram merged commit 46a3602 into master Nov 13, 2023
5 checks passed
@JesseAbram JesseAbram deleted the integration-test-validator-joining branch November 13, 2023 22:16
ameba23 added a commit that referenced this pull request Nov 14, 2023
* master:
  Integration test for validator joining network (#493)
  Add `ENTROPY_NODE` environment variable (#497)
  Change `SocketAddr` type for `String` (#496)
  Allow big protocol messages (#495)
  Add test for `EncryptedConnection` (#494)
  Use bincode rather than JSON for protocol and subscribe messages (#492)
  Remark on possible need to address TSS server from the Docker host. (#490)
  no proactive refresh on private key visibility (#485)
  Auxiliary data for program evaluation (#475)
  Validate proactive refresh endpoint (#483)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants