Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: reinvestigate error handling and reporting in the system #11443

Open
BugenZhao opened this issue Aug 4, 2023 · 7 comments
Open

refactor: reinvestigate error handling and reporting in the system #11443

BugenZhao opened this issue Aug 4, 2023 · 7 comments

Comments

@BugenZhao
Copy link
Member

BugenZhao commented Aug 4, 2023

Note

Tracking the tasks in the comment: #11443 (comment)


As more and more features are introduced in our system, it turns out that error handling and reporting become increasingly messy at the same time. We're struggling with several issues, including...

Besides, developers also find it annoying to deal with errors...

  • Find it hard to categorize errors into different types or enum variants.
  • Write a lot of boilerplate code just for propagating the error to the caller.
  • Really verbose to construct an error.
    return Err(ErrorCode::NotImplemented(
    format!(
    "window frame exclusion `{}` is not supported yet",
    exclusion
    ),
    9124.into(),
    )
    .into());
  • RwError still lives in the frontend and batch executors even though we proposed to clean it up one year ago. (Tracking: Eliminate usages of RwError #4077)
  • As a result, developers often use anyhow or bail extensively, resulting in a messy categorization of the majority of errors as internal error.

Based on this, I propose several guidelines on how we should deal with errors in different modules in RisingWave:

  • Use snafu (or thiserror) to define errors, if a considerable usage of this error is to construct new errors, and the rest is to propagate errors attaching contexts optionally.

    • This applies to modules that reside in the lower level of RisingWave or sound like a library, like ExprError, StreamError, and OptimizerError.
    • Define ad-hoc error types as much as possible, instead of passing String everywhere.
    • Use anyhow as the fall-back variant of the error for the errors that cannot be easily categorized or can rarely happen.
  • Directly wrap anyhow::Error into a new error type, if a considerable usage of this error is to propagate errors from lower-level crates without strong necessity of attaching contexts, and the rest is to construct new errors.

    • This applies to modules that sound like an application, like StreamExecutorError.
    • Backtraces will be automatically captured in this case.
@xxchan
Copy link
Member

xxchan commented Aug 4, 2023

strong +1

some points

Error's source chain gets broken when reporting error, so the actual cause for the error is lost.

We might need to revisit this to see whether other places have the same problem. I didn't care much about source before.

Besides, developers also find it annoying to deal with errors...

We might need a developer guide (and/or crate rustdocs) for developers to follow. Even I've read a lot about error handling before, I feel still confused about how to do it. A short and actionable HOWTOs (and some simple explanations) should reduce a lot of confusions. It can be opinionated, but we need some guide.

Find it hard to categorize errors into different types or enum variants.

This is especially confusing. I'm now wondering whether categorization is needed at all.

Directly wrap anyhow::Error into a new error type

I recently noticed wasmtime::Error is like that. I had thought it should be an enum or sth, since it's a library. It turns out they use error.downcast_ref::<Trap> to get the "kind" (Trap). Might worth checking how they do it.


Some arbitrary refs (for myself to read later)

@xxchan
Copy link
Member

xxchan commented Aug 4, 2023

BTW, I like this sentence: "Be either actionable or informational" (from smithy-lang/smithy-rs#1950)

At a high level, errors were refactored to:

  • Be either actionable or informational. Actionable errors can be matched upon, leading to different program flow. Informational errors indicate why a failure occurred, but are not intended to be matched upon.
  • No longer print error sources in Display impls. A DisplayErrorContext utility has been re-exported in the types module to easily log or print the entire error source chain.
  • Reliably return their cause/source in their Error::cause/Error::source impl

P.S., I don't know whether the result of their refactor is good, since it has some learning burden for me. Haven't check its every details yet. AWS SDK is complex beasts though. 🤣

@BugenZhao
Copy link
Member Author

  • No longer print error sources in Display impls.

I also found this a good practice we can follow. This is also how anyhow handles Display.

https://github.com/dtolnay/anyhow/blob/7fc0c073c4c31ef8664f699e9e4883b1c92b2fb6/src/fmt.rs#L7

@BugenZhao
Copy link
Member Author

  • No longer print error sources in Display impls.

I also found this a good practice we can follow. This is also how anyhow handles Display.

dtolnay/anyhow@7fc0c07/src/fmt.rs#L7

However, not all libraries follows the convention. So eventually we get https://docs.rs/snafu/latest/snafu/struct.CleanedErrorText.html 🤣.

@BugenZhao
Copy link
Member Author

BugenZhao commented Nov 2, 2023

Tracking the task:

@xxchan
Copy link
Member

xxchan commented Nov 6, 2023

There are some good discussions. Note here for future reference.

#13215 about boxed error: benefits and limitations

#13248 about thiserror: what does its attributes do; when to add #[source] and #[backtrace] (almost always)

Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants