Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Make validator_signer mutable #11400

Merged
merged 42 commits into from
Jun 21, 2024
Merged

refactor: Make validator_signer mutable #11400

merged 42 commits into from
Jun 21, 2024

Conversation

staffik
Copy link
Contributor

@staffik staffik commented May 27, 2024

Issue: #11264

This is follow-up to #11372.
The actual changes (+test) for #11264 will be done in a third, final PR: #11536.

Summary

This PR should mostly be no-op. It focuses on propagating MutableConfigValue for validator_signer everywhere.
All instances of mutable validator_signer are synchronized.
In case validator_id only is needed, we propagate validator_signer anyway as it contains the current validator info.

Extra changes

  • Remove signer as a field and pass to methods instead: Doomslug, InfoHelper, ChunkValidator.
  • Make some public methods internal where they do not need to be public.
  • Split process_ready_orphan_witnesses_and_clean_old into two functions.
  • Removed block_production_started from ClientActorInner.
  • Add FrozenValidatorConfig to make it possible to return a snapshot of ValidatorConfig.

Copy link

codecov bot commented May 27, 2024

Codecov Report

Attention: Patch coverage is 84.83548% with 106 lines in your changes missing coverage. Please review.

Project coverage is 71.64%. Comparing base (57c9796) to head (18b9705).

Files Patch % Lines
chain/chunks/src/shards_manager_actor.rs 88.88% 9 Missing and 12 partials ⚠️
chain/client/src/client.rs 86.48% 3 Missing and 12 partials ⚠️
nearcore/src/dyn_config.rs 0.00% 11 Missing ⚠️
...alidation/partial_witness/partial_witness_actor.rs 65.38% 5 Missing and 4 partials ⚠️
chain/client/src/client_actor.rs 90.80% 4 Missing and 4 partials ⚠️
nearcore/src/config.rs 69.23% 8 Missing ⚠️
integration-tests/src/node/mod.rs 0.00% 4 Missing ⚠️
chain/chain-primitives/src/error.rs 0.00% 3 Missing ⚠️
...nt/src/stateless_validation/chunk_validator/mod.rs 86.36% 3 Missing ⚠️
...hain/network/src/peer_manager/network_state/mod.rs 0.00% 3 Missing ⚠️
... and 14 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11400      +/-   ##
==========================================
+ Coverage   71.57%   71.64%   +0.06%     
==========================================
  Files         787      787              
  Lines      160728   160969     +241     
  Branches   160728   160969     +241     
==========================================
+ Hits       115039   115319     +280     
+ Misses      40648    40614      -34     
+ Partials     5041     5036       -5     
Flag Coverage Δ
backward-compatibility 0.23% <0.00%> (-0.01%) ⬇️
db-migration 0.23% <0.00%> (-0.01%) ⬇️
genesis-check 1.36% <0.00%> (-0.01%) ⬇️
integration-tests 37.77% <60.08%> (+0.03%) ⬆️
linux 69.04% <78.90%> (+0.04%) ⬆️
linux-nightly 71.13% <84.69%> (+0.05%) ⬆️
macos 51.36% <71.47%> (+0.38%) ⬆️
pytests 1.59% <0.00%> (-0.01%) ⬇️
sanity-checks 1.39% <0.00%> (-0.01%) ⬇️
unittests 66.21% <80.23%> (+0.07%) ⬆️
upgradability 0.28% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Base automatically changed from refactor-signer to master June 10, 2024 14:26
@staffik staffik changed the title Validator key hot swap refactor: Make validator_signer mutable Jun 10, 2024
@staffik staffik requested a review from wacban June 11, 2024 17:32
chain/client/src/client.rs Outdated Show resolved Hide resolved
chain/chunks/src/shards_manager_actor.rs Outdated Show resolved Hide resolved
@@ -291,6 +299,7 @@ impl super::NetworkState {
}
}
if let Some(vc) = validator_cfg {
let validator_signer = if let Some(v) = vc.signer.get() { v } else { return };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about tier1 - when the node becomes a validator it should join tier1. Would that happen automatically or do we need to handle it?

nearcore/src/config.rs Outdated Show resolved Hide resolved
nearcore/src/lib.rs Show resolved Hide resolved
nearcore/src/lib.rs Outdated Show resolved Hide resolved
nearcore/src/state_sync.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting closer :)

@@ -539,7 +540,7 @@ impl Chain {
.iter()
.filter(|shard_uid| {
shard_tracker.care_about_shard(
validator_account_id,
validator.get().map(|v| v.validator_id().clone()).as_ref(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me what is the plan regarding loading memtries when using the hot swap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the place where we load memtries on startup. We do not have shadow tracking implemented yet, so currently the new node would have to load memtries for all shards

chain/chunks/src/shards_manager_actor.rs Show resolved Hide resolved
chain/chunks/src/shards_manager_actor.rs Show resolved Hide resolved
chain/chunks/src/shards_manager_actor.rs Outdated Show resolved Hide resolved
chain/client/src/client.rs Outdated Show resolved Hide resolved
Comment on lines +258 to +265
// Chunk producers should not receive state witness from themselves.
log_assert!(
signer.is_some(),
"Received a chunk state witness but this is not a validator node. Witness={:?}",
witness
);
let signer = signer.unwrap();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not your change but this is iffy

  1. The comment doesn't match the code.
  2. Can an attacker send state witness to an RPC node and crash it?

Best to handle it outside of this PR though.
cc @pugachAG

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an attacker send state witness to an RPC node and crash it?

I don't this so, this code should only be executed after the witness was validated as part of partial witness actor, so this is safe

.validator
.as_ref()
.map(|validator| validator.account_id()),
near_config.network_config.validator.account_id(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be mutable as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a TODO here

// TODO(validator-key-hot-swap) Consider using mutable validator signer instead of PeerInfo.account_id ?
// That likely requires bigger changes and account_id here is later used for debug / logging purposes only.

This account_id is consumed here: https://github.com/near/nearcore/blob/master/core/o11y/src/opentelemetry.rs#L45-L49
to create a name for opentelemetry channel. Changing it would require bigger changes and changing the name would make logs from single node are not visible together.

chain/network/src/config.rs Show resolved Hide resolved
@staffik staffik force-pushed the validator-key-hot-swap branch from 295add3 to b2646ba Compare June 20, 2024 11:49
@staffik staffik force-pushed the validator-key-hot-swap branch from 3a71acf to 2b04477 Compare June 20, 2024 13:03
Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks, that was quite a bit of work!

chain/chain-primitives/src/error.rs Outdated Show resolved Hide resolved
@staffik staffik added this pull request to the merge queue Jun 21, 2024
Merged via the queue into master with commit e95c123 Jun 21, 2024
30 checks passed
@staffik staffik deleted the validator-key-hot-swap branch June 21, 2024 08:54
@andrei-near
Copy link
Contributor

Non validator logs are spammed with:
Cannot process ready orphan witnesses - not a validator

@staffik

github-merge-queue bot pushed a commit that referenced this pull request Jun 21, 2024
See #11400 (comment)

It seems non-validator nodes still call
`process_ready_orphan_witnesses_and_clean_old` and logging an error in
such case is spammy.
github-merge-queue bot pushed a commit that referenced this pull request Jun 24, 2024
Issue: #11264

This PR build on:
* #11372
* #11400

and contains actual changes and test for validator key hot swap.

### Summary
- Extend `UpdateableConfig` with validator key.
- Update client's mutable validator key when we detect it changed in the
updateable config.
- Advertise our new validator key through `advertise_tier1_proxies()`.
- Add integration test for the new behaviour:
- We start with 2 validating nodes (`node0`, `node1`) and 1
non-validating node (`node2`). It is important that the non-validating
node tracks all shards, because we do not know which shard it will track
when we switch validator keys.
  - We copy validator key from `node0` to `node2`.
  - We stop `node0`, then we trigger validator key reload for `node2`.
- Now `node2` is a validator, but it figures out as `node0` because it
copied validator key from `node0`.
- We wait for a couple of epochs and we require that both remaining
nodes progress the chain. Both nodes should be synchronised after a few
epochs.


Test with:
```
cargo build -pneard --features test_features,rosetta_rpc &&
cargo build -pgenesis-populate -prestaked -pnear-test-contracts &&
python3 pytest/tests/sanity/validator_switch_key_quick.py
```

#### Extra changes:
- Use `MutableValidatorSigner` alias instead of
`MutableConfigValue<Option<Arc<ValidatorSigner>>>`
- Return `ConfigUpdaterResult` from config updater.
- Remove (de)serialization derives for `UpdateableConfigs`.
-

---------

Co-authored-by: Your Name <you@example.com>
staffik added a commit that referenced this pull request Jun 25, 2024
Issue: #11264

This is follow-up to #11372.
The actual changes (+test) for
#11264 will be done in a third,
final PR: #11536.
This PR should mostly be no-op. It focuses on propagating
`MutableConfigValue` for `validator_signer` everywhere.
All instances of mutable `validator_signer` are synchronized.
In case validator_id only is needed, we propagate `validator_signer`
anyway as it contains the current validator info.
- Remove signer as a field and pass to methods instead: `Doomslug`,
`InfoHelper`, `ChunkValidator`.
- Make some public methods internal where they do not need to be public.
- Split `process_ready_orphan_witnesses_and_clean_old` into two
functions.
- Removed `block_production_started` from `ClientActorInner`.
- Add `FrozenValidatorConfig` to make it possible to return a snapshot
of `ValidatorConfig`.

---------

Co-authored-by: Your Name <you@example.com>
staffik added a commit that referenced this pull request Jun 25, 2024
See #11400 (comment)

It seems non-validator nodes still call
`process_ready_orphan_witnesses_and_clean_old` and logging an error in
such case is spammy.
staffik added a commit that referenced this pull request Jun 25, 2024
Issue: #11264

This PR build on:
* #11372
* #11400

and contains actual changes and test for validator key hot swap.

### Summary
- Extend `UpdateableConfig` with validator key.
- Update client's mutable validator key when we detect it changed in the
updateable config.
- Advertise our new validator key through `advertise_tier1_proxies()`.
- Add integration test for the new behaviour:
- We start with 2 validating nodes (`node0`, `node1`) and 1
non-validating node (`node2`). It is important that the non-validating
node tracks all shards, because we do not know which shard it will track
when we switch validator keys.
  - We copy validator key from `node0` to `node2`.
  - We stop `node0`, then we trigger validator key reload for `node2`.
- Now `node2` is a validator, but it figures out as `node0` because it
copied validator key from `node0`.
- We wait for a couple of epochs and we require that both remaining
nodes progress the chain. Both nodes should be synchronised after a few
epochs.


Test with:
```
cargo build -pneard --features test_features,rosetta_rpc &&
cargo build -pgenesis-populate -prestaked -pnear-test-contracts &&
python3 pytest/tests/sanity/validator_switch_key_quick.py
```

#### Extra changes:
- Use `MutableValidatorSigner` alias instead of
`MutableConfigValue<Option<Arc<ValidatorSigner>>>`
- Return `ConfigUpdaterResult` from config updater.
- Remove (de)serialization derives for `UpdateableConfigs`.
-

---------

Co-authored-by: Your Name <you@example.com>
@staffik staffik added the A-stateless-validation Area: stateless validation label Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants