Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PeerCooling more robust #4748

Merged
merged 4 commits into from
Dec 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cardano-client/cardano-client.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ library
ouroboros-network-api >= 0.5.2 && < 0.7,
ouroboros-network >= 0.9 && < 0.11,
ouroboros-network-framework >= 0.8 && < 0.11,
network-mux ^>= 0.4.2,
network-mux ^>= 0.4.4,

ghc-options: -Wall
-Wno-unticked-promoted-constructors
Expand Down
2 changes: 1 addition & 1 deletion cardano-ping/cardano-ping.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ library
si-timers ^>=1.3,
strict-stm,

network-mux ^>=0.4.2,
network-mux ^>=0.4.4,
tdigest ^>=0.3,
text >=1.2.4 && <2.1,
transformers >=0.5 && <0.7,
Expand Down
7 changes: 7 additions & 0 deletions network-mux/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@

### Non-breaking changes

* Make sure jobs are removed from the `JobPool`.
* Use `io-sim-1.3.1.0`.

## 0.4.3.0 -- 2023-11-16

### Non-breaking changes

* Use `io-sim-1.3.0.0`.

## 0.4.2.0
Expand Down
2 changes: 1 addition & 1 deletion network-mux/network-mux.cabal
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
cabal-version: 3.0

name: network-mux
version: 0.4.3.0
version: 0.4.4.0
synopsis: Multiplexing library
description: Multiplexing library.
license: Apache-2.0
Expand Down
6 changes: 6 additions & 0 deletions ouroboros-network-api/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

## next release

## 0.6.2.0 -- 2023-12-14

### Non-breaking changes

* Refactored `NodeToNodeVersionData` decoder.

## 0.6.1.0 -- 2023-11-29

### Breaking changes
Expand Down
4 changes: 2 additions & 2 deletions ouroboros-network-api/ouroboros-network-api.cabal
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cabal-version: 3.0
name: ouroboros-network-api
version: 0.6.1.0
version: 0.6.2.0
synopsis: A networking api shared with ouroboros-consensus
description: A networking api shared with ouroboros-consensus.
license: Apache-2.0
Expand Down Expand Up @@ -67,7 +67,7 @@ library
contra-tracer,

io-classes ^>=1.3.1,
network-mux ^>=0.4,
network-mux ^>=0.4.4,
strict-stm,
si-timers,
typed-protocols ^>=0.1.1,
Expand Down
6 changes: 6 additions & 0 deletions ouroboros-network-framework/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@

### Non-breaking changes

## 0.10.2.0 -- 2023-12-14

### Non-breaking changes

* Use `io-sim-1.3.1.0`.

## 0.10.1.0 -- 2023-11-16

### Non-breaking changes
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cabal-version: 3.0
name: ouroboros-network-framework
version: 0.10.1.0
version: 0.10.2.0
synopsis: Ouroboros network framework
description: Ouroboros network framework.
license: Apache-2.0
Expand Down Expand Up @@ -98,9 +98,9 @@ library
, monoidal-synchronisation
^>=0.1.0.3
, network >=3.1.2.2 && < 3.2
, network-mux ^>=0.4.2
, network-mux ^>=0.4.4
, ouroboros-network-api
^>=0.6
^>=0.6.1
, ouroboros-network-testing
, typed-protocols ^>=0.1.1
, typed-protocols-cborg
Expand Down
5 changes: 5 additions & 0 deletions ouroboros-network-protocols/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,13 @@

### Non-breaking changes

## 0.6.1.0 -- 2023-12-14

### Non-breaking changes

* Testlib depends on `cardano-slotting`'s `testlib` at version
`0.1.2.0` and uses its instances.
* Use `io-sim-1.3.1.0`.

## 0.6.0.1 -- 2023-11-16

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cabal-version: 3.0
name: ouroboros-network-protocols
version: 0.6.0.1
version: 0.6.1.0
synopsis: Ouroboros Network Protocols
description: Ouroboros Network Protocols.
license: Apache-2.0
Expand Down Expand Up @@ -101,7 +101,7 @@ library
si-timers,

ouroboros-network-api
^>=0.6,
^>=0.6.1,
serialise,
typed-protocols ^>=0.1.1,
typed-protocols-cborg
Expand Down
6 changes: 6 additions & 0 deletions ouroboros-network-testing/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@

### Non-breaking changes

## 0.4.1.0 -- 2023-12-14

### Non-breaking changes

* Use `io-sim-1.3.1.0`

## 0.4.0.1 -- 2023-11-16

### Non-breaking changes
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
cabal-version: 3.0

name: ouroboros-network-testing
version: 0.4.0.1
version: 0.4.1.0
synopsis: Common modules used for testing in ouroboros-network and ouroboros-consensus
description: Common modules used for testing in ouroboros-network and ouroboros-consensus.
license: Apache-2.0
Expand Down
11 changes: 8 additions & 3 deletions ouroboros-network/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,20 @@
# Revision history for ouroboros-network


## next release

### Breaking changes

### Non-breaking changes

## 0.10.1.0 -- 2023-11-29
## 0.10.2.0 -- 2023-12-14

### Breaking changes
### Non-breaking changes

* Fixed a bug in `outbound-governor`: PR #4748. In rare cases the Outbound
Governor could lose track of a connection, and thus not being able to
reconnect to a remote peer.

## 0.10.1.0 -- 2023-11-29

### Non-breaking changes

Expand Down
8 changes: 4 additions & 4 deletions ouroboros-network/ouroboros-network.cabal
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cabal-version: 3.0
name: ouroboros-network
version: 0.10.1.0
version: 0.10.2.0
synopsis: A networking layer for the Ouroboros blockchain protocol
description: A networking layer for the Ouroboros blockchain protocol.
license: Apache-2.0
Expand Down Expand Up @@ -129,9 +129,9 @@ library
io-classes-mtl ^>=0.1,
network-mux,
si-timers,
ouroboros-network-api ^>=0.6,
ouroboros-network-framework ^>=0.10,
ouroboros-network-protocols ^>=0.6,
ouroboros-network-api ^>=0.6.2,
ouroboros-network-framework ^>=0.10.2,
ouroboros-network-protocols ^>=0.6.1,
strict-stm,
typed-protocols ^>=0.1.1,
if !os(windows)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -272,15 +272,20 @@ connections PeerSelectionActions{
| peeraddr `Set.member` activePeers
, peeraddr `Set.notMember` inProgressDemoteHot = Just (PeerWarm, returnCommand)

-- a warm -> cooling transition has occurred if it is now cooling, and it was
-- warm, but not in the set we were deliberately demoting synchronously
-- a `{PeerHot,PeerWarm} -> PeerCooling` transition has occurred if it is
-- now cooling, and it was warm, but not in the set of peers being demoted
-- synchronously, e.g. `inProgressDemote{Hot,Warm}`.
--
-- If the peer is a member of the `inProgressDemoteToCold` set it means we
-- already accounted it, since we are adding peers to
-- `inProgressDemoteToCold` only if this function returns
-- `Just (PeerCooling, ...)`.
--
-- A peer in the `PeerCooling` state is going to be a member of the established set
-- until its connection is effectively terminated on the outbound side when
-- it will become `PeerCold`. We check if the peer does not exist in the
-- `inProgressDemoteToCold` to see if it is a new asynchronous demotion.
--
-- If the peer is a member of inProgressDemoteToCold it means we already
-- accounted it, e.g. traced it. A peer in cooling state is going to be a
-- member of the established set until its connection is effectively
-- terminated on the outbound side so we have to be careful. So, we need to
-- check if the peer does not exist in the inProgressDemoteToCold to see if
-- it is a new/unique async demotion. Same for hot->cooling transitions.
asyncDemotion peeraddr (PeerCooling, returnCommand)
| peeraddr `EstablishedPeers.member` establishedPeers
, peeraddr `Set.notMember` activePeers
Expand All @@ -297,8 +302,16 @@ connections PeerSelectionActions{
asyncDemotion peeraddr (PeerCold, returnCommand)
| peeraddr `EstablishedPeers.member` establishedPeers || peeraddr `Set.member` activePeers
, peeraddr `Set.notMember` inProgressDemoteWarm
, peeraddr `Set.notMember` inProgressDemoteHot
, peeraddr `Set.member` inProgressDemoteToCold = Just (PeerCold, returnCommand)
, peeraddr `Set.notMember` inProgressDemoteHot = Just (PeerCold, returnCommand)
-- Note:
--
-- We need to take care of direct transitions too `PeerCold` without going
-- through `PeerCooling` which can be triggered by
-- `deactivatePeerConnection`.
--
-- Also the peer might not be in `inProgressDemoteToCold`, that could
-- happen in `outbound-governor` skipped seeing `PeerCooling`. This can
-- happen under load or we could be just unlucky.

asyncDemotion _ _ = Nothing

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -596,7 +596,7 @@ withPeerStateActions PeerStateActionsArguments {
monitorPeerConnection,
activatePeerConnection,
deactivatePeerConnection,
closePeerConnection
closePeerConnection = void . closePeerConnection
}

where
Expand Down Expand Up @@ -705,11 +705,15 @@ withPeerStateActions PeerStateActionsArguments {
-- supposed to terminate (unless the remote peer did something
-- wrong).
Just (WithSomeProtocolTemperature (WithWarm MiniProtocolSuccess {})) -> do
closePeerConnection pch
peerMonitoringLoop pch
isCooling <- closePeerConnection pch
if isCooling
then peerMonitoringLoop pch
else return ()
Just (WithSomeProtocolTemperature (WithEstablished MiniProtocolSuccess {})) -> do
closePeerConnection pch
peerMonitoringLoop pch
isCooling <- closePeerConnection pch
if isCooling
then peerMonitoringLoop pch
else return ()

Nothing ->
traceWith spsTracer (PeerStatusChanged (CoolingToCold pchConnectionId))
Expand Down Expand Up @@ -974,7 +978,7 @@ withPeerStateActions PeerStateActionsArguments {


closePeerConnection :: PeerConnectionHandle muxMode responderCtx peerAddr versionData ByteString m a b
-> m ()
-> m Bool
closePeerConnection
PeerConnectionHandle {
pchConnectionId,
Expand Down Expand Up @@ -1002,21 +1006,24 @@ withPeerStateActions PeerStateActionsArguments {
Nothing -> do
-- timeout fired
Mux.stopMux pchMux
atomically (writeTVar pchPeerStatus PeerCooling)
traceWith spsTracer (PeerStatusChangeFailure
(WarmToCooling pchConnectionId)
TimeoutError)
wasWarm <- atomically (updateUnlessCoolingOrCold pchPeerStatus PeerCooling)
when wasWarm $
traceWith spsTracer (PeerStatusChangeFailure
(WarmToCooling pchConnectionId)
TimeoutError)
return wasWarm

Just (SomeErrored errs) -> do
-- some mini-protocol errored
--
-- we don't need to notify the connection manager, we can instead
-- rely on mux property: if any of the mini-protocols errors, mux
-- throws an exception as well.
atomically (writeTVar pchPeerStatus PeerCooling)
traceWith spsTracer (PeerStatusChangeFailure
(WarmToCooling pchConnectionId)
(ApplicationFailure errs))
wasWarm <- atomically (updateUnlessCoolingOrCold pchPeerStatus PeerCooling)
when wasWarm $
traceWith spsTracer (PeerStatusChangeFailure
(WarmToCooling pchConnectionId)
(ApplicationFailure errs))
throwIO (MiniProtocolExceptions errs)

Just AllSucceeded {} -> do
Expand All @@ -1026,8 +1033,10 @@ withPeerStateActions PeerStateActionsArguments {
-- connection manager would simultaneously promote it, but this is not
-- possible.
_ <- unregisterOutboundConnection spsConnectionManager (remoteAddress pchConnectionId)
atomically (writeTVar pchPeerStatus PeerCooling)
traceWith spsTracer (PeerStatusChanged (WarmToCooling pchConnectionId))
wasWarm <- atomically (updateUnlessCoolingOrCold pchPeerStatus PeerCooling)
when wasWarm $
traceWith spsTracer (PeerStatusChanged (WarmToCooling pchConnectionId))
return wasWarm

--
-- Utilities
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,14 @@ data PeerStatus =
| PeerCooling
-- ^ Peer is in cold state but its connection still lingers.
-- I.e. it is still in progress to be fully demoted.
--
-- Note:
-- The `PeerCooling -> PeerCold` state transition is an `outbound-governor`
-- reflection of the connection-manager's `TerminatingSt -> TerminatedSt`
-- state transition (our version of tcp's `TimeWait`). It is only
-- triggered in case of a clean connection shutdown, not in the case of
-- errors.
--
| PeerWarm
| PeerHot
deriving (Eq, Ord, Show)
Expand Down
Loading