Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make monitorPeerConnection non-blocking #4067

Merged
merged 11 commits into from
Oct 14, 2022

Conversation

coot
Copy link
Contributor

@coot coot commented Oct 10, 2022

Description

Fixes #4064

TODO:

  • improve network simulation test so it catches the problem (it will be done in another PR)
  • update changelog
  • connection-manager: reworded some comments
  • peer-state-actions: make monitorPeerSelection non-blocking
  • peer-state-actions: use updateUnlessCold
  • peer-state-actions: use epErrorDelay
  • exit-policy: derive semigroup via Max
  • exit-policy: derive Fractional instance for ReconnectDelay
  • peer-selection: trace reconnect delay with 'TraceDemoteAsynchronous'
  • peer-selection: refactor monitoring connections
  • peer-selection: added localRoots to PeerSelectionCounters

Checklist

  • Branch
    • Commit sequence broadly makes sense
    • Commits have useful messages
    • New tests are added if needed and existing tests are updated
    • If this branch changes Consensus and has any consequences for downstream repositories or end users, said changes must be documented in interface-CHANGELOG.md
    • If this branch changes Network and has any consequences for downstream repositories or end users, said changes must be documented in interface-CHANGELOG.md
    • If serialization changes, user-facing consequences (e.g. replay from genesis) are confirmed to be intentional.
  • Pull Request
    • Self-reviewed the diff
    • Useful pull request description at least containing the following information:
      • What does this PR change?
      • Why these changes were needed?
      • How does this affect downstream repositories and/or end-users?
      • Which ticket does this PR close (if any)? If it does, is it linked?
    • Reviewer requested

@coot coot requested a review from bolt12 October 11, 2022 16:58
@coot coot force-pushed the coot/peer-selection-action-fix branch from 6c535f1 to cf6f9dd Compare October 11, 2022 17:05
@coot coot marked this pull request as ready for review October 11, 2022 17:05
@coot coot requested a review from njd42 as a code owner October 11, 2022 17:05
@coot coot force-pushed the coot/peer-selection-action-fix branch from cf6f9dd to c733e87 Compare October 11, 2022 17:10
coot added a commit to IntersectMBO/cardano-node that referenced this pull request Oct 11, 2022
coot added a commit to IntersectMBO/cardano-node that referenced this pull request Oct 11, 2022
Comment on lines +215 to +217
-- | Monitor peer state. Must be non-blocking.
--
monitorPeerConnection :: peerconn -> STM m (PeerStatus, ReconnectDelay),
monitorPeerConnection :: peerconn -> STM m (PeerStatus, Maybe ReconnectDelay),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it must be non-blocking could we somehow get rid of STM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no because the Guarded monoid is executed in STM monad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I had edited this comment don't know why it didn't get through. But I think this comment is misleading, we want monitorPeerConnection to be non-blocking in a particular function's logic. Saying this at the top level and then have a STM type is bit confusing. I'd be more accurate and explain in what situation we do not want to block

Comment on lines 756 to 764
-> STM m (Map MiniProtocolNum (Maybe (HasReturned a)))
-- do not block when a mini-protocol is still running
f = traverse (((\stm -> (Just <$> stm) `orElse` pure Nothing)))
<=< readTVar . ahMiniProtocolResults
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the culprit? Maybe explain here with more detail why we do not want this to block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

There's no need to wake up the outbound-governor multiple times when the
mini-protocols are terminating.

Added comments.
`ReconnectDelay` ought to use `max` as its semigroup rather than `+`.
This allows to use fractional literals without wrapping them in
`ReconnectDelay` constructor.
Instead of relaying on `policyErrorDelay` we can relay on the reconnect
delay computed by `monitorPeerConnection`.
@coot coot force-pushed the coot/peer-selection-action-fix branch from c733e87 to 332c303 Compare October 12, 2022 10:04
This will cause the `peerSelectionGovernor` to fail and will stop the
diffusion layer.  This is better than silently skipping update of the
state due to an asynchronous demotion.
@coot
Copy link
Contributor Author

coot commented Oct 14, 2022

bors merge

coot added a commit to IntersectMBO/cardano-node that referenced this pull request Oct 14, 2022
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Oct 14, 2022

@iohk-bors iohk-bors bot merged commit b995908 into master Oct 14, 2022
@iohk-bors iohk-bors bot deleted the coot/peer-selection-action-fix branch October 14, 2022 08:19
iohk-bors bot added a commit that referenced this pull request Nov 9, 2022
4120: Cherry picked network changes for cardano-node-1.35.5 release r=coot a=coot

This cherry-picked patches from the following PRs:

* #3794
* #3844
* #3785
* #3904
* #3915
* #3852
* #3970
* #3979
* #4015
* #4067
* #4004
* #4086
* #4113
* #4106
* #4127
* #4103

Also cherry-picked almost all the commits which modify GitHub actions:
* 18c5244 Run GitHub Actions on pull requests   
* 3adf5a9 Use newer version of io-sim           
* ee9b7a6 Fix GH Actions Windows CI: switch from pkgconf to pkg-config 
* e6cf074 github-actions: use `ubuntu-latest`   
* 9a8b959 Updated versions of github actions    
* fc8f8f0 Fix GH Actions Windows CI caching     
* 7f07c40 Windows Github Actions now use MSYS2  
* b21a7ce Fix chocolatey CI error
* #4134               

TODO:

* [x] bump versions of packages
* [x] input-output-hk/cardano-haskell-packages#84

Co-authored-by: Mark Tullsen <tullsen@galois.com>
Co-authored-by: Marcin Szamotulski <coot@coot.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Outbound governor does not remove a hot peer when mux errors
2 participants