Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protocols/relay/src/v2: Refactor error handling #7

Merged
merged 6 commits into from
Dec 27, 2021

Conversation

mxinden
Copy link
Owner

@mxinden mxinden commented Dec 21, 2021

Does the following changes:

  • Distinguishes between fatal and non-fatal errors within v2/protocol/*.
  • Emits fatal errors via ProtocolsHandlerEvent::Close not attempting to emit events to the NetworkBehaviour or Transport before closing
  • Emits non-fatal errors via ProtocolsHandler::OutEvent and consecutively via NetworkBehaviour::OutEvent.

@elenaf9
Copy link

elenaf9 commented Dec 21, 2021

First of all: I don't feel strongly the whole FatalUpgradeError discussion and the current implementation looks good to me.

That being said, for the sake of the discussion I will still write down my thoughts; I think error handling in rust in general is an interesting topic with lots of different opinions.

So continuing on libp2p#2059 (comment):

Because of the above mentioned SwarmEvent I would still be in favor of reducing the FatalUpgradeError to the two for the user relevant variants IO and Decode. Otherwise it looks good to me.

I might be missing something. Why aggregate the fatal error types into the two (IO and Decode)? In other words, why loose the details?

Because imho said details are much more confusing than helpful.
For the lack of better words: Just throwing all available information at the user is not always the best decision, because most users aren't (and should not have to be) familiar with the implementation details, so it does more bad than good imho.
Instead, a library and all its user-facing types should abstract these implementation details away.

Ideally, when integrating a library, the user should look for each operation at the possible errors, differentiate between the different variants, and handle the errors accordingly. Some errors may be fatal, whereas in other cases, e.g. a timeout, the user would want to retry the operation.
While this sounds good in theory, it is in practice often difficult for the user if they didn't read through the whole library code.
And having error enums with a large number of variants makes this even harder, especially if:

  1. the names are not really self-descriptive, e.g. FatalUpgradeError::ParseTypeField
    • it deosn't mention that this is related to the decoding process
    • the user is not even aware of the underlying protobuf schema
  2. there are no code docs. In many cases when #[derive(thiserror::Errror)] is used, descriptive information is added to each error via #[errror(..)], but there are no code docs (unless I am missing something).

So the user would be confronted with 12 variants where it's

  • not clear what the actual error is unless they read the protocol details
  • they won't handle each variant individually anyway; at least not all 12.

On the contrary; it will much more likely cause them to just treat no variant separately at all, and either just wrap them if they are themselves a library (resulting in the kitchen-sink anti-pattern), or treat all variants the same, e.g. abort/ shutdown.

From my perspective, the only situation where a user needs the detailed information that FatalUpgradeError currently includes is for debug reasons. But this can be solved much better by either
a) log::debug with the details of why decoding failed
b) add a debug string to variant FatalUpgradeError::ParseTypeField(String) where the string includes the details

Now I know the latter is so far not really common in rust-libp2p. On the other hand, having a separate error variant for each step in the decoding process is not common in other rust-libp2p protocols either.

Looking at identify for instance, it does the decoding via

fn parse_proto_msg(msg: impl AsRef<[u8]>) -> Result<IdentifyInfo, io::Error>;

where io::Error is returned in case any step in the parsing failed due to an unexpected field. Imho this is much cleaner.

And especially for the case of decoding a protobuf: Does it really make a difference if decoding the received bytes to the protobuf message failed, or if that protobuf message then contains fields that are not what we expected? Both cases are protocol violations.

@mxinden
Copy link
Owner Author

mxinden commented Dec 27, 2021

Thank you @elenaf9 for the review and the detailed suggestion.

I still need to give the above more thoughts.

I agree that "Just throwing all available information at the user is not always the best decision". As you pointed out above, next to using the returned error as a simple signal that the operation failed, I am also (mis-) using it as a debugging instrument, containing all relevant information.

Instead, a library and all its user-facing types should abstract these implementation details away.

True. I am not yet sure how to combine the simple abstraction with good debuggability.

a) log::debug with the details of why decoding failed

Logging the detailed error and returning the abstracted error seems simple code-wise, but in my eyes hard to debug. E.g. you loose the coupling between log lines.

b) add a debug string to variant FatalUpgradeError::ParseTypeField(String) where the string includes the details

Including the detailed error in the abstracted error as a String hurts my desire for strong-typing. In case a user wants to react to a specific error variant, I really don't want them to parse a String.


If I am not mistaken we agree that this pull request represents a step forward. Thus I am merging it. I agree that we should give error handling more thoughts in rust-libp2p, ideally coming up with consistent patterns across protocol implementations.

mxinden added a commit that referenced this pull request Dec 27, 2021
See #7 for similar change in
libp2p-relay.
@mxinden mxinden merged commit 14e7912 into relay-v2 Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants