-
Notifications
You must be signed in to change notification settings - Fork 1.6k
pre-checking: Reject failed PVFs #6492
Changes from 10 commits
30c3215
d14c979
79f1677
978e992
3675052
baed8f7
2806d88
f05224d
b7cf378
db1a822
30ec605
c94135b
876daa6
af102c3
1d5753b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -190,16 +190,15 @@ async fn handle_pvf_check( | |
PreCheckOutcome::Valid => Judgement::Valid, | ||
PreCheckOutcome::Invalid => Judgement::Invalid, | ||
PreCheckOutcome::Failed => { | ||
// Abstain. | ||
// | ||
// Returning here will leave the PVF in the view dangling. Since it is there, no new | ||
// pre-checking request will be sent. | ||
// Always vote against in case of failures. Voting against a PVF when encountering a | ||
// timeout (or an unlikely node-specific issue) can be considered safe, since | ||
// there is no slashing for being on the wrong side on a pre-check vote. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exactly. Also by being more strict here, we can safely be more lenient during preparation and avoid risk of getting slashed there. |
||
gum::info!( | ||
target: LOG_TARGET, | ||
?validation_code_hash, | ||
"Pre-check failed, abstaining from voting", | ||
"Pre-check failed, voting against", | ||
); | ||
return | ||
Judgement::Invalid | ||
}, | ||
}; | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -525,7 +525,7 @@ async fn handle_execute_pvf( | |
}, | ||
ArtifactState::FailedToProcess { last_time_failed, num_failures, error } => { | ||
if can_retry_prepare_after_failure(*last_time_failed, *num_failures, error) { | ||
gum::debug!( | ||
gum::info!( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make it a warn or leave it debug. Here you attach enough information. |
||
target: LOG_TARGET, | ||
?pvf, | ||
?artifact_id, | ||
|
@@ -595,7 +595,7 @@ async fn handle_heads_up( | |
}, | ||
ArtifactState::FailedToProcess { last_time_failed, num_failures, error } => { | ||
if can_retry_prepare_after_failure(*last_time_failed, *num_failures, error) { | ||
gum::debug!( | ||
gum::info!( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same warn or debug |
||
target: LOG_TARGET, | ||
?active_pvf, | ||
?artifact_id, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,11 +16,13 @@ To be relevant for the subsystem, a PVF must be returned by the [`pvfs_require_p | |
|
||
When a PVF just becomes relevant, the subsystem will send a message to the [Candidate Validation] subsystem asking for the pre-check. | ||
|
||
Upon receving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its judgement and will also sign and submit a [`PvfCheckStatement`][PvfCheckStatement] via the [`submit_pvf_check_statement` runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is ignored. It is possible that the candidate validation was not able to check the PVF. In that case, the PVF pre-checker will abstain and won't submit any check statements. | ||
Upon receving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its judgement and will also sign and submit a [`PvfCheckStatement`][PvfCheckStatement] via the [`submit_pvf_check_statement` runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is ignored. | ||
|
||
Since a vote only is valid during [one session][overview], the subsystem will have to resign and submit the statements for for the new session. The new session is assumed to be started if at least one of the leaves has a greater session index that was previously observed in any of the leaves. | ||
It is possible that the candidate validation was not able to check the PVF, e.g. if it timed out. In that case, the PVF pre-checker will vote against it. This is considered safe, as there is no slashing for being on the wrong side of a pre-check vote. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to have a rationale on why we must reject instead of abstaining. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is better in two ways:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know why, sr-labs explains it, but I don't see it in a guide :) |
||
|
||
The subsystem tracks all the statement that it submitted within a session. If for some reason a PVF became irrelevant and then becomes relevant again, the subsystem will not submit a new statement for that PVF within the same session. | ||
Since a vote only is valid during [one session][overview], the subsystem will have to resign and submit the statements for the new session. The new session is assumed to be started if at least one of the leaves has a greater session index that was previously observed in any of the leaves. | ||
|
||
The subsystem tracks all the statements that it submitted within a session. If for some reason a PVF became irrelevant and then becomes relevant again, the subsystem will not submit a new statement for that PVF within the same session. | ||
|
||
If the node is not in the active validator set, it will still perform all the checks. However, it will only submit the check statements when the node is in the active validator set. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should the operator do with this information? Do we log the worker death error somewhere?
Info is also probably not seen, you would need to make this
warn!
to appear in monitoring.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is not enough information attached to make anything out of this. Even if they open an issue for this, we will be clueless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There wasn't any more information available here, but we can log the error when it happens!
Done!