-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix error in resolve funcs #815
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine. If it's not actually an issue with "arbitrary user-defined logic" but some other fatal error, instead something local to the node, would there least be an app hash mismatch? Or would that be go unnoticed for some arbitrary time?
There are only two cases I could see where the node does something node-specific, resulting in a different result:
Since we do not include user balances and validator power in app-hashes, there is a risk that a non-deterministic change there might not result in an app-hash mismatch until later, but this seems more like a Kwil issue to fix. |
Point 2 there is not really the subject of this, I think. The question is, what is the meaning of a nil vs. non-nil error for the purposes of the extension author writing a But we do have examples and built-in resolutions that look like: ResolveFunc: func(ctx context.Context, app *common.App, resolution *resolutions.Resolution) error {
removeReq := &UpdatePowerRequest{}
if err := removeReq.UnmarshalBinary(resolution.Body); err != nil {
return fmt.Errorf("failed to unmarshal remove request: %w", err)
}
if removeReq.Power != 0 {
// this should never happen since UpdatePowerRequest is only used for internal communication
// between modules. Removes are sent from the client in a separate message.
return fmt.Errorf("remove request with non-zero power")
}
return SetValidatorPower(ctx, app.DB, removeReq.PubKey, 0)
}, So in part an error can be created by several things:
I agree the majority of cases should not halt consensus, so this change is best, but it's funny that we have no distinction between fatal an non-fatal errors here. If there is something funny with the node that caused |
I see what you're saying. IMO, it still seems like this could be solved with improved apphash tracking. Right now, only user schemas affect apphash. If the validator schema affected apphash too, then the node that gets an unexpected result / error from changing validator power would recognize their state diverged in the next block, right? |
Hypothetically. I think it's handled more easily by hard failing where the extension author does not anticipate an error. You cannot roll back an already-committed an app state change. If you commit something erroneous (and we can know it is), then you diverge permanently and have to resync if you want to get back on the network even if apphash mismatch catches it right away. Not a good outcome. I think the API is simply lacking for the resolution resolution function, or the ResolveFunc should log (not the app) when it's not fatal, and return an error when it's fatal. |
In any case, if I were writing an extension, I would look at the docs for the field: // ResolveFunc is a function that is called once a resolution has
// received a required number of votes, as defined by the
// ConfirmationThreshold. It is given a readwrite database
// connection and the information for the resolution that has
// been confirmed. All nodes will call this function as a part of
// block execution. It is therefore expected that the function is
// deterministic, regardless of a node's local configuration.
ResolveFunc func(ctx context.Context, app *common.App, resolution *Resolution) error When should I return an error. How does the app handle an error? How should I deal with internal errors? The contract for that API could use some clarification. |
Agreed |
Unfortunately, this is hard to delineate right now. For example, if procedure execution failed due to an |
While working through the Powerpod integration, I spotted a bug in our resolution code. If a resolution fails, it will halt consensus. This should not be the case, since resolutions are made to target databases, which can have arbitrary user-defined logic.