-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Make validator_signer mutable #11400
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11400 +/- ##
==========================================
+ Coverage 71.57% 71.64% +0.06%
==========================================
Files 787 787
Lines 160728 160969 +241
Branches 160728 160969 +241
==========================================
+ Hits 115039 115319 +280
+ Misses 40648 40614 -34
+ Partials 5041 5036 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -291,6 +299,7 @@ impl super::NetworkState { | |||
} | |||
} | |||
if let Some(vc) = validator_cfg { | |||
let validator_signer = if let Some(v) = vc.signer.get() { v } else { return }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about tier1 - when the node becomes a validator it should join tier1. Would that happen automatically or do we need to handle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're getting closer :)
@@ -539,7 +540,7 @@ impl Chain { | |||
.iter() | |||
.filter(|shard_uid| { | |||
shard_tracker.care_about_shard( | |||
validator_account_id, | |||
validator.get().map(|v| v.validator_id().clone()).as_ref(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me what is the plan regarding loading memtries when using the hot swap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the place where we load memtries on startup. We do not have shadow tracking implemented yet, so currently the new node would have to load memtries for all shards
// Chunk producers should not receive state witness from themselves. | ||
log_assert!( | ||
signer.is_some(), | ||
"Received a chunk state witness but this is not a validator node. Witness={:?}", | ||
witness | ||
); | ||
let signer = signer.unwrap(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not your change but this is iffy
- The comment doesn't match the code.
- Can an attacker send state witness to an RPC node and crash it?
Best to handle it outside of this PR though.
cc @pugachAG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can an attacker send state witness to an RPC node and crash it?
I don't this so, this code should only be executed after the witness was validated as part of partial witness actor, so this is safe
.validator | ||
.as_ref() | ||
.map(|validator| validator.account_id()), | ||
near_config.network_config.validator.account_id(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be mutable as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a TODO here
nearcore/chain/network/src/peer/peer_actor.rs
Lines 310 to 311 in b053500
// TODO(validator-key-hot-swap) Consider using mutable validator signer instead of PeerInfo.account_id ? | |
// That likely requires bigger changes and account_id here is later used for debug / logging purposes only. |
This account_id is consumed here: https://github.com/near/nearcore/blob/master/core/o11y/src/opentelemetry.rs#L45-L49
to create a name for opentelemetry channel. Changing it would require bigger changes and changing the name would make logs from single node are not visible together.
chain/client/src/stateless_validation/partial_witness/partial_witness_actor.rs
Outdated
Show resolved
Hide resolved
295add3
to
b2646ba
Compare
3a71acf
to
2b04477
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks, that was quite a bit of work!
Non validator logs are spammed with: |
See #11400 (comment) It seems non-validator nodes still call `process_ready_orphan_witnesses_and_clean_old` and logging an error in such case is spammy.
Issue: #11264 This PR build on: * #11372 * #11400 and contains actual changes and test for validator key hot swap. ### Summary - Extend `UpdateableConfig` with validator key. - Update client's mutable validator key when we detect it changed in the updateable config. - Advertise our new validator key through `advertise_tier1_proxies()`. - Add integration test for the new behaviour: - We start with 2 validating nodes (`node0`, `node1`) and 1 non-validating node (`node2`). It is important that the non-validating node tracks all shards, because we do not know which shard it will track when we switch validator keys. - We copy validator key from `node0` to `node2`. - We stop `node0`, then we trigger validator key reload for `node2`. - Now `node2` is a validator, but it figures out as `node0` because it copied validator key from `node0`. - We wait for a couple of epochs and we require that both remaining nodes progress the chain. Both nodes should be synchronised after a few epochs. Test with: ``` cargo build -pneard --features test_features,rosetta_rpc && cargo build -pgenesis-populate -prestaked -pnear-test-contracts && python3 pytest/tests/sanity/validator_switch_key_quick.py ``` #### Extra changes: - Use `MutableValidatorSigner` alias instead of `MutableConfigValue<Option<Arc<ValidatorSigner>>>` - Return `ConfigUpdaterResult` from config updater. - Remove (de)serialization derives for `UpdateableConfigs`. - --------- Co-authored-by: Your Name <you@example.com>
Issue: #11264 This is follow-up to #11372. The actual changes (+test) for #11264 will be done in a third, final PR: #11536. This PR should mostly be no-op. It focuses on propagating `MutableConfigValue` for `validator_signer` everywhere. All instances of mutable `validator_signer` are synchronized. In case validator_id only is needed, we propagate `validator_signer` anyway as it contains the current validator info. - Remove signer as a field and pass to methods instead: `Doomslug`, `InfoHelper`, `ChunkValidator`. - Make some public methods internal where they do not need to be public. - Split `process_ready_orphan_witnesses_and_clean_old` into two functions. - Removed `block_production_started` from `ClientActorInner`. - Add `FrozenValidatorConfig` to make it possible to return a snapshot of `ValidatorConfig`. --------- Co-authored-by: Your Name <you@example.com>
See #11400 (comment) It seems non-validator nodes still call `process_ready_orphan_witnesses_and_clean_old` and logging an error in such case is spammy.
Issue: #11264 This PR build on: * #11372 * #11400 and contains actual changes and test for validator key hot swap. ### Summary - Extend `UpdateableConfig` with validator key. - Update client's mutable validator key when we detect it changed in the updateable config. - Advertise our new validator key through `advertise_tier1_proxies()`. - Add integration test for the new behaviour: - We start with 2 validating nodes (`node0`, `node1`) and 1 non-validating node (`node2`). It is important that the non-validating node tracks all shards, because we do not know which shard it will track when we switch validator keys. - We copy validator key from `node0` to `node2`. - We stop `node0`, then we trigger validator key reload for `node2`. - Now `node2` is a validator, but it figures out as `node0` because it copied validator key from `node0`. - We wait for a couple of epochs and we require that both remaining nodes progress the chain. Both nodes should be synchronised after a few epochs. Test with: ``` cargo build -pneard --features test_features,rosetta_rpc && cargo build -pgenesis-populate -prestaked -pnear-test-contracts && python3 pytest/tests/sanity/validator_switch_key_quick.py ``` #### Extra changes: - Use `MutableValidatorSigner` alias instead of `MutableConfigValue<Option<Arc<ValidatorSigner>>>` - Return `ConfigUpdaterResult` from config updater. - Remove (de)serialization derives for `UpdateableConfigs`. - --------- Co-authored-by: Your Name <you@example.com>
Issue: #11264
This is follow-up to #11372.
The actual changes (+test) for #11264 will be done in a third, final PR: #11536.
Summary
This PR should mostly be no-op. It focuses on propagating
MutableConfigValue
forvalidator_signer
everywhere.All instances of mutable
validator_signer
are synchronized.In case validator_id only is needed, we propagate
validator_signer
anyway as it contains the current validator info.Extra changes
Doomslug
,InfoHelper
,ChunkValidator
.process_ready_orphan_witnesses_and_clean_old
into two functions.block_production_started
fromClientActorInner
.FrozenValidatorConfig
to make it possible to return a snapshot ofValidatorConfig
.