-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always record storage when stateless validation is enabled #10859
Conversation
For stateless validation we need to record all trie reads performed when applying a chunk in order to prepare a `PartialState` that will be included in `ChunkStateWitness` Initially it was only necessary to record trie reads when producing a chunk. Validators could read all the values from the provided `PartialState` without recording anything. A recent change (near#10703) introduced a soft limit on the size of `PartialState`. When applying a chunk we watch how much state was recorded, and once the amount of state exceeds the soft limit we stop applying the receipts in this chunk. This needs to be done on both the chunk producer and chunk validator - if the chunk validator doesn't record reads and enforce the limit, it will produce a different result of chunk application, which would lead to validation failure. This means that state should be recorded in all cases when a statelessly validated chunk is applied. Let's remove the configuration option that controls whether trie reads should be recorded (`record_storage`) and just record the reads on every chunk application (when `statelessnet_protocol` feature is enabled). Refs: [zulip conversation](https://near.zulipchat.com/#narrow/stream/407237-core.2Fstateless-validation/topic/State.20witness.20size.20limit/near/428313518)
…idation is not enabled prepare_transactions() doesn't produce a storage proof when stateless validation isn't enabled. Let's disable tests that require this storage proof when stateless validation is disabled.
This is (going) to change the protocol, right? Then it needs to be gated by ProtocolFeature / protocol version, so we could smoothly roll it out to statelessnet and mainnet. |
I'd say that it's a fix for the protocol version introduced by #10703. |
using the `checked_feature!` macro covers both `statelessnet_protocol` and `nightly`, so it's better to use it instead of checking for just `statelessnet_protocol`.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10859 +/- ##
==========================================
+ Coverage 71.61% 71.63% +0.01%
==========================================
Files 758 758
Lines 151765 151789 +24
Branches 151765 151789 +24
==========================================
+ Hits 108692 108730 +38
+ Misses 38537 38518 -19
- Partials 4536 4541 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
if !checked_feature!("stable", StatelessValidationV0, PROTOCOL_VERSION) { | ||
println!("Test not applicable without StatelessValidation enabled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that this test won't kick in before hitting testnet (with stable build)? isn't it too late?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the test will only execute in builds where the default protocol version has stateless validation enabled.
In practice this means that the test will be enabled for builds with --features statelessnet_protocol
and --features nightly
. Both of those options are present in the CI, so the test will be run on every change, and it will be enabled on statelessnet builds.
Tbh I don't know what "stable"
in checked_feature
is supposed to mean, the macro just checks the protocol version:
#[macro_export]
macro_rules! checked_feature {
("stable", $feature:ident, $current_protocol_version:expr) => {{
$crate::version::ProtocolFeature::$feature.protocol_version() <= $current_protocol_version
}};
The code is here if anyone wants to take a look:
nearcore/core/primitives-core/src/version.rs
Lines 224 to 256 in f2e87c0
pub const PROTOCOL_VERSION: ProtocolVersion = if cfg!(feature = "statelessnet_protocol") { | |
// Current StatelessNet protocol version. | |
83 | |
} else if cfg!(feature = "nightly_protocol") { | |
// On nightly, pick big enough version to support all features. | |
140 | |
} else { | |
// Enable all stable features. | |
STABLE_PROTOCOL_VERSION | |
}; | |
/// Both, outgoing and incoming tcp connections to peers, will be rejected if `peer's` | |
/// protocol version is lower than this. | |
pub const PEER_MIN_ALLOWED_PROTOCOL_VERSION: ProtocolVersion = STABLE_PROTOCOL_VERSION - 2; | |
#[macro_export] | |
macro_rules! checked_feature { | |
("stable", $feature:ident, $current_protocol_version:expr) => {{ | |
$crate::version::ProtocolFeature::$feature.protocol_version() <= $current_protocol_version | |
}}; | |
($feature_name:tt, $feature:ident, $current_protocol_version:expr) => {{ | |
#[cfg(feature = $feature_name)] | |
let is_feature_enabled = $crate::version::ProtocolFeature::$feature.protocol_version() | |
<= $current_protocol_version; | |
#[cfg(not(feature = $feature_name))] | |
let is_feature_enabled = { | |
// Workaround unused variable warning | |
let _ = $current_protocol_version; | |
false | |
}; | |
is_feature_enabled | |
}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have this kind of check in many other tests that don't work without stateless validation, e.g:
if !checked_feature!("stable", StatelessValidationV0, PROTOCOL_VERSION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Could @Longarithm confirm whether it's fine to delete the should_produce_state_witness_for_this_or_next_epoch
function? In the future, with single shard tracking, we shouldn't even require the condition self.epoch_manager.is_chunk_producer_for_epoch(epoch_id, account_id)? || self.epoch_manager.is_chunk_producer_for_epoch(&next_epoch_id, account_id)?
?
storage_data_source: StorageDataSource::Db, | ||
state_patch, | ||
record_storage: self | ||
.should_produce_state_witness_for_this_or_next_epoch(me, block.header())?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should_produce_state_witness_for_this_or_next_epoch
is used in one other place in prepare_transactions
. We should remove it from there as well and delete the function.
Good point, actually. We should still call |
chain/chain/src/runtime/mod.rs
Outdated
if checked_feature!("stable", StatelessValidationV0, PROTOCOL_VERSION) { | ||
trie = trie.recording_reads(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recording should probably also be enabled with the shadow_chunk_validation
feature.
v2:
|
Merged |
Didn't help, opened #10876 about the problem. |
@Longarithm are you okay with this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I think we need to make protocol version check more strict.
if checked_feature!("stable", StatelessValidationV0, PROTOCOL_VERSION) | ||
|| cfg!(feature = "shadow_chunk_validation") | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PROTOCOL_VERSION
is not enough, we need a check with protocol version of next epoch here. If we stabilize this version and some validator picks it up before others, its performance will be worsened because they will have to read all the nodes unlike others. And if protocol version switch is delayed for some reason, validator may miss blocks/chunks because of that.
We need next epoch, not current epoch, because state transitions must be recorded 1 epoch before protocol upgrade. EpochManager has a method for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I imagined that before switching to stateless validation we would enable this feature (or make it the default), and then all nodes would start recording the data needed for stateless validation. At some point the protocol version would change and the nodes would start using this recorded data, but it's okay to start recording much earlier, without stateless validation the data would just be discarded.
Basically we don't need to start recording at the exact moment when it's needed, we can do it as soon as the updated node starts.
Is it this that much of a performance hit? We have to do it on the epoch before switching to stateless validation, so it has to be performant enough for normal traffic. Anton's shadow_chunk_validation
records data and it's able to keep up with the mainnet traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anton's shadow_chunk_validation records data and it's able to keep up with the mainnet traffic.
Oh, that's nice.
I also realised that memtrie should be enabled not later than stateless validation, which will give desired performance. Then looks good!
if checked_feature!("stable", StatelessValidationV0, PROTOCOL_VERSION) | ||
|| cfg!(feature = "shadow_chunk_validation") | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as below. If we want to be super precise, we can use current epoch protocol version, because that recorded storage is not saved anythere, it is used immediately on chunk production.
@@ -413,6 +419,7 @@ impl<'a> ChainUpdate<'a> { | |||
block: &Block, | |||
block_preprocess_info: BlockPreprocessInfo, | |||
apply_chunks_results: Vec<(ShardId, Result<ShardUpdateResult, Error>)>, | |||
should_save_state_transition_data: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be computed here, because postprocess_block
has me
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but Chain
also calls ChainUpdate::process_apply_chunk_result
, which doesn't have a me
argument. This means that should_produce_state_witness_for_this_or_next_epoch
would have to be available from both Chain
and ChainUpdate
. I can't just do everything inside ChainUpdate
.
I guess it would be possible to move should_produce_state_witness_for_this_or_next_epoch
to EpochManager
and use it from both Chain
and ChainUpdate
, but I'm not convinced if there's any real benefit gained in exchange for this additional work. At this point I'm tempted to just merge it and unblock the statelessnet release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhh right. Okay, makes sense, feel free to merge
…ear#10859)" This reverts commit c2f9695.
For stateless validation we need to record all trie reads performed when applying a chunk in order to prepare a `PartialState` that will be included in `ChunkStateWitness` Initially it was only necessary to record trie reads when producing a chunk. Validators could read all the values from the provided `PartialState` without recording anything. A recent change (near#10703) introduced a soft limit on the size of `PartialState`. When applying a chunk we watch how much state was recorded, and once the amount of state exceeds the soft limit we stop applying the receipts in this chunk. This needs to be done on both the chunk producer and chunk validator - if the chunk validator doesn't record reads and enforce the limit, it will produce a different result of chunk application, which would lead to validation failure. This means that state should be recorded in all cases when a statelessly validated chunk is applied. Let's remove the configuration option that controls whether trie reads should be recorded (`record_storage`) and just record the reads on every chunk application (when `statelessnet_protocol` feature is enabled). Refs: [zulip conversation](https://near.zulipchat.com/#narrow/stream/407237-core.2Fstateless-validation/topic/State.20witness.20size.20limit/near/428313518)
…ng back always recording storage (#10904) Fixes the issue that caused #10859 to break validation. The flag `charge_gas_for_trie_node_access` should be forwarded to the new Trie created by `recording_reads`, otherwise the node counting won't work correctly and there'll be invalid state. Fix the issue and bring back #10859 which was reverted in #10900.
For stateless validation we need to record all trie reads performed when applying a chunk in order to prepare a
PartialState
that will be included inChunkStateWitness
Initially it was only necessary to record trie reads when producing a chunk. Validators could read all the values from the provided
PartialState
without recording anything.A recent change (#10703) introduced a soft limit on the size of
PartialState
. When applying a chunk we watch how much state was recorded, and once the amount of state exceeds the soft limit we stop applying the receipts in this chunk.This needs to be done on both the chunk producer and chunk validator - if the chunk validator doesn't record reads and enforce the limit, it will produce a different result of chunk application, which would lead to validation failure.
This means that state should be recorded in all cases when a statelessly validated chunk is applied. Let's remove the configuration option that controls whether trie reads should be recorded (
record_storage
) and just record the reads on every chunk application (whenstatelessnet_protocol
feature is enabled).Refs: zulip conversation