Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reporting for network & consensus errors #4015

Merged
merged 6 commits into from
Sep 27, 2022

Conversation

coot
Copy link
Contributor

@coot coot commented Sep 20, 2022

Description

runRethrowPolicy logs the exception through a separate tracer:
RethrowPolicyTracer. A seprate tracer will allow to not use a configuration
to disable it in cardano-node.

Also added a top level handler for exceptions thrown by the consensus code.
This tracer should also not be configurable. To avoid duplication it does not
log diiffusion startup failure exceptions and exceptions that were rethrown
from connection handler thread (which are wrapped in ExceptionInHandler).

Fixes #4000.

  • rethrow-policies: fixed typos
  • connection-handler: rethrow exceptions using ExceptionInHandler
  • ouroboros-consensus: updated consensusRethrowPolicy
  • ouroboros-newtork-framework: changed ExceptionInHandler
  • ouroboros-consensus: added top level error tracer

Checklist

  • Branch
    • Commit sequence broadly makes sense
    • Commits have useful messages
    • New tests are added if needed and existing tests are updated
    • If this branch changes Consensus and has any consequences for downstream repositories or end users, said changes must be documented in interface-CHANGELOG.md
    • If this branch changes Network and has any consequences for downstream repositories or end users, said changes must be documented in interface-CHANGELOG.md
    • If serialization changes, user-facing consequences (e.g. replay from genesis) are confirmed to be intentional.
  • Pull Request
    • Self-reviewed the diff
    • Useful pull request description at least containing the following information:
      • What does this PR change?
      • Why these changes were needed?
      • How does this affect downstream repositories and/or end-users?
      • Which ticket does this PR close (if any)? If it does, is it linked?
    • Reviewer requested

@coot coot added networking consensus issues related to ouroboros-consensus labels Sep 20, 2022
Copy link
Contributor

@bolt12 bolt12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

llrnRunDataDiffusion registry apps appsExtra
withRegistry $ \registry ->
handleJust
-- ignore exception thrown in connection handlers and diffusion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should provide a rationale on why we ignore these errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are already logged by the network layer. I'll update the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included, I also caught a freshman mistake, check the diff 😁

@@ -86,12 +95,32 @@ data ErrorContext = OutboundError
type RethrowPolicy_ = ErrorContext -> SomeException -> ErrorCommand

newtype RethrowPolicy = RethrowPolicy {
runRethrowPolicy :: RethrowPolicy_
runRethrowPolicyPure :: RethrowPolicy_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the Pure suffix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is pure unlike runRethrowPolicy.

Initially, I thought that runRethrowPolicyPure won't be needed, but it's actually useful in consensusRethrowPolicy.

@coot
Copy link
Contributor Author

coot commented Sep 22, 2022

@bolt12 I added a commit which ignores ExceptionInHandler in top level Diffusion.InitializationTracer (they are now logged by RethrowPolicyTracer).

@coot
Copy link
Contributor Author

coot commented Sep 22, 2022

Hmm, I think this PR should be changed. Parts of it are inferior to what we already have, let me try to explain.

The diffusion already has a top level handler which logs all exceptions (the only confusing part about it is that the tracer is called InitializationTracer). If we introduce RethrowPolicyTracer we cannot distinguish errors thrown by connection handler which should kill the node (which ought to use Alert severity) from ones that just should kill the connection (which ought to have Info severity).

For this reason I will remove the part that introduced RethrowPolicyTracer.

This allows the consensus to use it's error policy to wrap around the
`Ouroboros.Network.Consensus.Node.runWith` to catch all initialisation
errors without interfering with all the exceptions rethrown by the
network layer.
Made `ExceptionInHandler` an existential type.  This allows to catch all
exceptions regardless of peer address type.
Renamed InitialisationTracer as DiffusionTracer: it not only logs
initialisation messages but also terminal errors.
@coot coot requested a review from bolt12 September 22, 2022 08:01
@coot
Copy link
Contributor Author

coot commented Sep 27, 2022

bors merge

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Sep 27, 2022

@iohk-bors iohk-bors bot merged commit 9249a70 into master Sep 27, 2022
@iohk-bors iohk-bors bot deleted the coot/rethrow-policy-tracer branch September 27, 2022 22:12
iohk-bors bot added a commit that referenced this pull request Nov 9, 2022
4120: Cherry picked network changes for cardano-node-1.35.5 release r=coot a=coot

This cherry-picked patches from the following PRs:

* #3794
* #3844
* #3785
* #3904
* #3915
* #3852
* #3970
* #3979
* #4015
* #4067
* #4004
* #4086
* #4113
* #4106
* #4127
* #4103

Also cherry-picked almost all the commits which modify GitHub actions:
* 18c5244 Run GitHub Actions on pull requests   
* 3adf5a9 Use newer version of io-sim           
* ee9b7a6 Fix GH Actions Windows CI: switch from pkgconf to pkg-config 
* e6cf074 github-actions: use `ubuntu-latest`   
* 9a8b959 Updated versions of github actions    
* fc8f8f0 Fix GH Actions Windows CI caching     
* 7f07c40 Windows Github Actions now use MSYS2  
* b21a7ce Fix chocolatey CI error
* #4134               

TODO:

* [x] bump versions of packages
* [x] input-output-hk/cardano-haskell-packages#84

Co-authored-by: Mark Tullsen <tullsen@galois.com>
Co-authored-by: Marcin Szamotulski <coot@coot.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus issues related to ouroboros-consensus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RethrowPolicy is not logging exceptions, unlike ErrorPolicy
3 participants