Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: correctly maintain the source chain of error types #13248

Merged
merged 7 commits into from
Nov 7, 2023

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Nov 3, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Correctly mark the field with #[source], #[from], #[backtrace] or #[transparent] for all thiserror derived error types in the code base. Did almost nothing else.

We overlooked them a lot possibly due to the reason discussed in #13215:

  • What is even more concerning is that developers are not giving enough attention to thiserror annotations like #[from] and #[source], since they seem useless. However, they're really critical since they help us to maintain the source chain of an error.
    #[error("Array error: {0}")]
    ArrayError(ArrayError),

Cheatsheet for how to choose among them:

  • If there's no extra context to provide for the inner error, or the message of the inner error is self-contained enough, use #[error(transparent)].

    #[error(transparent)]
    ObjectStore(
    #[from]
    #[backtrace]
    ObjectError,
    ),

    A common case is uncategorized error, which we call internal error.
    #[error(transparent)]
    Internal(
    #[from]
    #[backtrace]
    anyhow::Error,
    ),

  • If you want to directly convert the inner error type to the defined type with ?, mark the error field with #[from].

    #[error("Unable to set up an ssl connection")]
    SslError(#[from] openssl::ssl::Error),

  • If the #[from] is not possible, or implicitly doing the conversion is not desired, mark the error field with #[source]. Generally this is because we need the variant name itself to provide some context.

    #[error("QueryError: {0}")]
    QueryError(#[source] BoxedError),
    #[error("ParseError: {0}")]
    ParseError(#[source] BoxedError),

  • NEVER leave the field unmarked, as long as it's an error type (i.e., impl std::error::Error)!

  • ALMOST ALWAYS mark field with #[backtrace] to allow recursive visiting to find the backtrace provided by the error tree, as long as it's anyhow::Error or an error type defined by us. Exceptions:

    • Errors from 3rd-party crates or the standard library: they rarely support providing the backtrace with Error::provide due to its instability.
    • Inner error is lightweight enough or acts as the leaf node: may omit it since we're sure there's no backtrace to visit, though it has no performance impact.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me think aloud to learn thiserror again and ask some questions first:

https://docs.rs/thiserror/latest/thiserror/ (doc here for convenience)

First, all thiserror do is just to help you impl error methods (mainly Display + source() + backtrace), via the markers. No magic at all.

You get the same thing as if you had written an implementation of std::error::Error by hand

So it's not about incorrect usage of the crate, but more like not following practices of err definition (mainly the source pattern). And that's because we didn't use the source chain to print good err msg before.

So Q1: It seems maintain source chain ("the main trouble" and the PR title) is only about #[source]. Other refactors are just done together by convenicence? They look like different problems IMO.


Regarding the markers, it's not like "choose one", but each one provides a feature orthogonally. So for each err, we need to consider each marker.

Hmmm, actually not orthogonal, but have some implication relationships. But I think maybe we can consider them in the following order.

#[error()]

a must-have, for Display. It's at the outer level (enum label)

  • #[error(transparent)] is a special case. (= #[error("{0}")] and implies #[source])

Other markers are at the inner level (enum fields)

#[from]

  • If you want to directly convert the inner error type to the defined type with ?, mark the error field with #[from].

    Because #[from] generates From impl.

  • Also implies #[source]

  • If the #[from] is not possible, or implicitly doing the conversion is not desired, mark the error field with #[source].

    • "not possible" is easy to understand: "cannot derive From because another variant has the same source type"
    • Q2: When is "implicitly doing the conversion is not desired"?

#[source]:

the main problem. Lack of it causes us trouble. We should try to always add it. If not #[from] (or #[error(transparent)]), which gives us #[source] implicitly (we can also add it explicitly...), add it manually.

NEVER leave the field unmarked, as long as it's an error type

Q3: I think the "unmarked" here is exactly unmarked with #[source], right?

#[backtrace]

To provide backtrace, we either have a Backtrace field to capture now, or use #[backtrace] to rely on the inner error to capture one.

ALMOST ALWAYS ... Exceptions:

If we #[backtrace] in the exceptions' case, we lost a chance to get the backtrace, right?

But now I'm confused again...

Q4: For the exceptions, should we add the field to manually capture backtrace? If not, when some variants provide backtrace, and others don't, how do we use it? If we capture backtrace in the boxed error, it seems wasteful (and we will lost some context?)

@BugenZhao
Copy link
Member Author

Q1: It seems that maintain source chain ("the main trouble" and the PR title) is only about #[source]. Other refactors are just done together by convenicence? They look like different problems IMO.

There are barely other refactors (except for specifying #[backtrace] together in the PR). Some deletions of impl From are replaced by #[from] attributes.

#[error(transparent)] is a special case. (= #[error("{0}")] and implies #[source])

Basically yes, but not that precise. Actually transparent delegates all methods to the source error, instead of introducing a new "no-op" layer. This also matches the best practice that we should not include the source's message in Display.

Q2: When is "implicitly doing the conversion is not desired"?

An error type can be really general, like std::io::Error. Always converting it to a single variant (like IoError) does work, but is not very helpful for error reporting due to the lack of context. So developers may put it into a finer-grained variant (like ReadConfigFile) and intentionally not implement From for it, in case future functionalities (perhaps significantly differs from the reading config file) accidentally convert std::io::Error into that variant with ?.

BTW, there's also the case where we may want to attach some contexts for the inner error, where we're not able to implicitly cast certainly.

EvalError {
  #[source] #[backtrace]
  inner: ExprError,
  original_input: String,
}

Q3: I think the "unmarked" here is exactly unmarked with #[source], right?

Exactly.

If we #[backtrace] in the exceptions' case, we lost a chance to get the backtrace, right?

Yeah, and all that we can do is to capture the backtrace when From occurs.

If we capture backtrace in the boxed error, it seems wasteful (and we will lost some context?)

We can decide whether to capture a backtrace by first checking whether there's already one provided. 😄

If not, when some variants provide backtrace, and others don't, how do we use it?

What do you mean by "use"? IIUC, currently the only way to use it is to print it out. And it seems always to be a good-to-have but definitely not a must-have.

Actually, I've recently thinking about how much the backtrace can really help us. To be honest, I don't find most backtraces useful but suffer from its verbosity and performance overhead, except for panicking. 😕 What do you think?

@BugenZhao
Copy link
Member Author

Oh, seems that I forget to mention the implicit implementations based on field names. For example, an error named source will be automatically implemented as the source, and a backtrace will be (always) automatically captured.

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Copy link

codecov bot commented Nov 6, 2023

Codecov Report

Merging #13248 (4b1e456) into main (fe70f9e) will decrease coverage by 0.01%.
Report is 13 commits behind head on main.
The diff coverage is 77.27%.

@@            Coverage Diff             @@
##             main   #13248      +/-   ##
==========================================
- Coverage   68.09%   68.08%   -0.01%     
==========================================
  Files        1517     1517              
  Lines      257522   257499      -23     
==========================================
- Hits       175355   175324      -31     
- Misses      82167    82175       +8     
Flag Coverage Δ
rust 68.08% <77.27%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/batch/src/error.rs 0.00% <ø> (ø)
src/common/src/array/error.rs 0.00% <ø> (ø)
src/common/src/util/value_encoding/error.rs 0.00% <ø> (ø)
src/connector/src/error.rs 0.00% <ø> (ø)
src/connector/src/lib.rs 52.08% <ø> (ø)
src/connector/src/parser/unified/mod.rs 87.50% <ø> (ø)
src/connector/src/sink/mod.rs 60.71% <ø> (ø)
src/expr/core/src/error.rs 21.05% <ø> (ø)
src/frontend/src/binder/expr/value.rs 85.14% <100.00%> (ø)
src/frontend/src/lib.rs 20.00% <ø> (ø)
... and 18 more

... and 4 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@xxchan
Copy link
Member

xxchan commented Nov 6, 2023

If not, when some variants provide backtrace, and others don't, how do we use it?

What do you mean by "use"? IIUC, currently the only way to use it is to print it out. And it seems always to be a good-to-have but definitely not a must-have.

Yes, I just mean how to print it. I think backtrace_if_absent is enough to solve my confusion.

Actually, I've recently thinking about how much the backtrace can really help us. To be honest, I don't find most backtraces useful but suffer from its verbosity and performance overhead, except for panicking. 😕 What do you think?

I totally agree. However, I feel it's kind of "annoying most of the time, while really wanted but absent sometimes". 🤡 I don't have any better suggestions or solutions.

Other points quite clear. THanks!

#[error("Panicked when processing: {0}.
This is a bug. We would appreciate a bug report at https://github.com/risingwavelabs/risingwave/issues/new?labels=type%2Fbug&template=bug_report.yml")]
#[error("Panicked when processing: {0}.\n
This is a bug. We would appreciate a bug report at https://github.com/risingwavelabs/risingwave/issues/new?labels=type%2Fbug&template=bug_report.yml.")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer not to add trailing dot after url. 😟

Copy link
Member Author

@BugenZhao BugenZhao Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my own perspective, it's not about whether there's a URL. Since there's multiple sentences (even paragraphs) in the error message, maybe a period is necessary to become more consistent. 😟

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I know, but e.g., 😟
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can :\n<url>

Copy link
Member Author

@BugenZhao BugenZhao Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most terminal can parse this correctly with cmd+click. However it's not good for those copy-paste guys. 😢

image

maybe we can :\n<url>

Cool, will piggy-back in my next PR.

Comment on lines -198 to +202
.inspect_err(|error| error!(%error, "error when process message"));
.inspect_err(|error| {
error!(
error = error as &dyn std::error::Error,
"error when process message"
)
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why to change this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we used Display(%). By directly passing &dyn Error, the sources are also getting printed:

https://docs.rs/tracing-subscriber/latest/src/tracing_subscriber/fmt/format/mod.rs.html#1234

However it's ugly to cast to &dyn manually and I'm not quite sure why coercion does not work here. Need further investigation. 🙁

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM. Great!

@BugenZhao BugenZhao added this pull request to the merge queue Nov 7, 2023
Merged via the queue into main with commit e3c8649 Nov 7, 2023
@BugenZhao BugenZhao deleted the bz/maintain-error-source branch November 7, 2023 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants