Make core `Error` enums less specific #270

plafer · 2022-11-29T18:12:58Z

Follow-up to #255.

Our Error enums have too many variants. We should drastically condense them following the ideas described here.

Tasks

Give feedback

Move host-relevant error variants to a newly-defined HostError type #1320

O: code-hygiene O: maintainability O: usability S: errors
Define an error type that captures all error variants shared between ibc-rs's errors #1249

S: errors rust
Improve error handling when decoding Any to MsgEnvelope #950

S: errors
Options

The text was updated successfully, but these errors were encountered:

plafer · 2023-02-08T15:40:42Z

A thought: ideally, internal functions that do use an Error enum would return an Error enum where the variants capture only the possible errors for that function. For example, verify_delay_passed()'s Error would only have the TimestampOverflow,NotEnoughTimeElapsed, and NotEnoughBlocksElapsed variants (or less if we don't care to be that specific). This provides better documentation of what can go wrong, and callers won't write code for errors that cannot happen.

plafer · 2023-02-27T21:53:49Z

In #164, there was some push-back against using anyhow for our errors. However, looking at projects that use ibc-rs, none of them actually make use of our Error enum:

Nomic's errors are modeled as String
Namada make heavy use of the Other(String) error variant,
Penumbra use anyhow in their own implementation of IBC handlers

In other words, programmatic logic based on which error occurred is non-existent in all these projects. This tells me that the most important use case for our errors is aiding in debugging; either host chain implementations, or relayers such as Hermes.

Therefore, I feel like our error system should focus on producing great human-readable error diagnostics, which is exactly what libraries such as anyhow are designed to do. It seems to me like the best way forward is to ignore the conventional wisdom that libraries shouldn't use anyhow, and use anyhow::Error as our main Error type.

I would love the hear people's thoughts on this!

cc @Farhad-Shabani @romac @yito88 @keppel

Farhad-Shabani · 2023-03-01T02:29:14Z

My take, Anyhow has a concise syntax for propagating errors, cleaning up the code look, and is easier to implement since it doesn't care what error types your functions return. Though, this approach (freeform as mentioned) comes at a cost of not having a finely tuned error-handling system. We would be limited in the places where we might need to perform specific actions based on the type of error. It can sometimes even make debugging harder by providing unnecessary info/backtrace and representing all kinds of errors making it tricky to identify the exact causes.

Maybe you have a way of implementing anyhow in your mind? Perhaps not a bad idea to demo a part of our codebase if we think it works.

And, It would be great if you elaborate more on the limitation we face with the current error enum. Couldn’t we refactor it for a better one?

I see the fact you mentioned that IBC-rs errors are not used well in projects, but I think the main reason is they are still in development (e.g. bunch of unwrap() in those codebases) and don't have a refined error-handling system yet. Though, that's true for us too - we're still figuring out some of the designs, so our error enum isn't perfect either, but I don't think it necessarily is the reason why our current errors are not widely used.

Actually, I am a bit concerned, as IIRC we just revamped our error system. If we switch to Anyhow, that would be the second change. For me, It sounds like a more detailed study/proposal is required to make sure! Maybe other folks have excellent insights and ideas to make it an easy choice.

romac · 2023-03-01T07:44:02Z

I agree with Fahrad, and believe he has a good point wrt codebases not handling errors properly when under heavy development.

Moreover, I too am not sure that I see how anyhow can improve the quality of the errors reported over custom error enums. Perhaps you could into more details about the problems you are seeing?

plafer · 2023-03-15T19:43:36Z

I believe I found a good strategy for a better error systems (discussion with @romac heavily influenced the design).

The new system recognizes 2 "error sources":

Host errors: the errors controlled by the host, coming from methods Validation/ExecutionContext
Protocol errors: the errors that come form IBC handlers' validation (e.g. channel not in the expected state)

Host errors

The host error type will be defined by the host. Specifically, we'd make it an associated type of ValidationContext:

trait ValidationContext {
  type Error;

  fn host_timestamp(&self) -> Result<Time, Self::Error>;
}

Protocol errors

The type for protocol errors would roughly be a cleaned up version of our current ContextError. The crucial differences for ContextError in this system compared to the current one are:

we no longer need error variants for host errors in ContextError
its purpose is solely to generate clear debugging error messages

that is, we don't expect users to handle these errors

Top-level error type

Our top-level error type, returned by dispatch(), would be:

enum Error<E> {
    Host(E),
    Protocol(CleanedUpContextError),
}

As an example, I would expect the user code to look like:

match dispatch(...) {
    Host(err) => /* optionally handle my errors */,
    Protocol(err) => /* log errors */
}

Internal logic based on errors from the host

This section borrows the great idea presented in this sled.rs's blog post.

In a few instances we need to perform some logic based on the specific error returned from the host. For example, when the packet commitment is not present during acknowledgement processing, we want to return early without an error. We currently return early for all errors, but ideally we'd distinguish the "packet commitment not present" error from all other errors, and only return early for the "packet commitment not present" error. Here's where we'd use sled's idea. get_packet_commitment would look like:

fn get_packet_commitment(
    &self,
    commitment_path: &CommitmentPath,
) -> Result<Result<PacketCommitment, CommitmentError>, Self::Error>;

enum CommitmentError {
    NotFound,
}

The benefit of this signature is that our internal code would look like:

// note how the outer `Self::Error` gets propagated out, and we 
// match on the inner error that we care about
if matches!(ctx_a.get_packet_commitment(&commitment_path_on_a)?, CommitmentError::NotFound) {
    return Ok(());
};

And that's it! I also expect cleaning up ContextError to solve #342.

Farhad-Shabani · 2023-04-06T21:05:11Z

I liked the idea of separating the error from the sources. So, we only have to take care of ours without assuming anything about the hosts! That’s perfect in my view.

There are just a few questions it may help to clarify some of the details:

Given that hosts will be free to introduce their desired errors through the associated Error type under the ValidationContext, why is it needed to include the Host(E) variant, and basically having a top-level error type?
Regarding the "not present" state, I have a different perspective. In similar cases, unavailability shouldn't be treated as an error. Instead, it should be represented by None as a legitimate response from a storage call. Accordingly, users won’t have to deal with the additional overhead of introducing errors like the CommitmentError. Since there is a single acceptable error variant, we can automate this process in our boundary.
I believe Result<Option<Output>, Error> is the correct signature, and I've expanded my view on this in [Context] Issues caused by improper output signatures of provable store readers #607

And a suggestion:

Some of the confusing error variants we already have are due to how we implemented certain logic. Prioritizing to study of error sources and the resolution of issues like 603, 607, 536 will streamline the error variants and reduce the burden of redesigning the system.

plafer · 2023-04-10T18:40:46Z

Given that hosts will be free to introduce their desired errors through the associated Error type under the ValidationContext, why is it needed to include the Host(E) variant, and basically having a top-level error type?

The "top-level" type is just the type that validate(), execute() and dispatch() return. We have 2 variants because 2 separates types of errors can occur (ValidationContext::Error and CleanedUpContextError), and both can be bubbled up and returned from one of the 3 calls. If you don't have a Host(E) variant, then how do you return Host errors?

Accordingly, users won’t have to deal with the additional overhead of introducing errors like the CommitmentError. Since there is a single acceptable error variant, we can automate this process in our boundary.

Errors such as CommitmentError are defined by us, not the user. The user only defines ValidationContext::Error; we define all the others.

In similar cases, unavailability shouldn't be treated as an error. Instead, it should be represented by None as a legitimate response from a storage call.

Returning an error doesn't mean it's "illegitimate". Result<Option<Output>, Error> is mathematically equivalent to Result<Result<PacketCommitment, CommitmentError>, Self::Error> (where CommitmentError is defined with just one variant); they're isomorphic. Choosing one over the other is a subjective decision; a matter of taste. I prefer Result<Result<PacketCommitment, CommitmentError>, Self::Error>, since it effectively avoids form of boolean blindness (which we could call "Option blindness"). Basically, reading

if matches!(ctx_a.get_packet_commitment(&commitment_path_on_a)?, CommitmentError::NotFound) {
    return Ok(());
};

tells me a ton more about what's going on; I see that the commitment was not found directly at the call site. Compare to your suggested signature:

if ctx_a.get_packet_commitment(&commitment_path_on_a)?.is_none() {
    return Ok(());
};

is_none() doesn't tell me the semantics of None in this case; I need to go read the docstring of the method to remember what None means in this case.

So using a Result doesn't mean the error is "illegitimate"; it's just an equivalent but more readable way of encoding the scenario where the packet commitment is not found.

plafer added the O: new-feature Objective: aims to add new feature label Nov 29, 2022

Farhad-Shabani added S: errors Scope: related to error handlings O: maintainability Objective: cause to ease modification, fault corrections and improve code understanding and removed O: new-feature Objective: aims to add new feature labels Jan 5, 2023

plafer mentioned this issue Jan 10, 2023

Remove #![allow(clippy::result_large_err)] #342

Open

plafer added this to the v0.30.0 milestone Feb 22, 2023

Farhad-Shabani modified the milestones: v0.30.0, v0.31.0 Feb 27, 2023

plafer self-assigned this Feb 27, 2023

Farhad-Shabani modified the milestones: v0.31.0, v0.32.0 Feb 28, 2023

Farhad-Shabani modified the milestones: v0.32.0, v0.33.0 Mar 13, 2023

plafer mentioned this issue Mar 15, 2023

Supersede ClientState proof verification methods with generic interfaces #531

Merged

7 tasks

Farhad-Shabani modified the milestones: v0.33.0, v0.34.0 Mar 17, 2023

Farhad-Shabani mentioned this issue Mar 17, 2023

🚀 Road to V1 #554

Open

This was referenced Mar 20, 2023

Make ValidationContext and ExecutionContext use non-core Error types #269

Closed

Fix timeout validation for timeout height #556

Merged

Farhad-Shabani removed this from the v0.34.0 milestone Mar 27, 2023

Farhad-Shabani mentioned this issue Apr 5, 2023

[Context] Issues caused by improper output signatures of provable store readers #607

Open

plafer mentioned this issue Apr 5, 2023

Check if ClientStatePath is empty during client creation #605

Merged

7 tasks

plafer removed their assignment Sep 22, 2023

Farhad-Shabani mentioned this issue Nov 7, 2023

Improve error handling when decoding Any to MsgEnvelope #950

Closed

Farhad-Shabani mentioned this issue Dec 1, 2023

Minimize prost dependency via ToVec trait #997

Closed

seanchen1991 self-assigned this Apr 4, 2024

seanchen1991 mentioned this issue Jun 4, 2024

Define top-level basecoin error type informalsystems/basecoin-rs#186

Open

Farhad-Shabani mentioned this issue Aug 12, 2024

Critical review of error handling across ibc-rs #1310

Open

Farhad-Shabani mentioned this issue Sep 24, 2024

imp(ibc)!: refactor error handling throughout codebase #1350

Merged

15 tasks

Farhad-Shabani added this to the 0.55.0 milestone Sep 24, 2024

Farhad-Shabani closed this as completed in #1350 Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make core `Error` enums less specific #270

Make core `Error` enums less specific #270

plafer commented Nov 29, 2022 •

edited by seanchen1991

Loading

Tasks

plafer commented Feb 8, 2023

plafer commented Feb 27, 2023

Farhad-Shabani commented Mar 1, 2023

romac commented Mar 1, 2023

plafer commented Mar 15, 2023

Farhad-Shabani commented Apr 6, 2023

plafer commented Apr 10, 2023 •

edited by seanchen1991

Loading

Make core Error enums less specific #270

Make core Error enums less specific #270

Comments

plafer commented Nov 29, 2022 • edited by seanchen1991 Loading

Tasks

plafer commented Feb 8, 2023

plafer commented Feb 27, 2023

Farhad-Shabani commented Mar 1, 2023

romac commented Mar 1, 2023

plafer commented Mar 15, 2023

Host errors

Protocol errors

Top-level error type

Internal logic based on errors from the host

Farhad-Shabani commented Apr 6, 2023

plafer commented Apr 10, 2023 • edited by seanchen1991 Loading

Make core `Error` enums less specific #270

Make core `Error` enums less specific #270

plafer commented Nov 29, 2022 •

edited by seanchen1991

Loading

plafer commented Apr 10, 2023 •

edited by seanchen1991

Loading