-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sweeper: avoid deadlock on shutdown #4851
sweeper: avoid deadlock on shutdown #4851
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix looks good to me, would it be possible to add a unit test for this case, or is it too involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good just have a question with regard to how the deadlock can actually happen.
|
||
// The collector exited and won't longer handle incoming | ||
// requests. This can happen on shutdown, when the block | ||
// notifier shuts down before the sweeper and its clients. In |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case does the block notifier shut down? Aren't sweeper clients listen to the same quit channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When lnd is signalled to shut down, the notifier will be stopped. This means the the blockEpoch
channel will close and the collector
goroutine in the sweeper will exit.
This all happens before the sweeper's quit
channel has been closed, and any of the subsystems attemping to sweep an input at this point will block which stalls lnd shut down.
In the case I saw the contractcourt
resolvers would attempt to sweep inputs, but since the notifier already had exited, the collector
go routine wouldn't pick up their request. This caused shutdown to deadlock, since the contractcourt
subsystems are stopped before the sweeper, and they would never exit.
ca9b9cd
to
ae9bdb4
Compare
Unit test added and comments addressed, PTAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
||
// TestSweeperShutdownHandling tests that we notify callers when the sweeper | ||
// cannot handle requests since it's in the process of shutting down. | ||
func TestSweeperShutdownHandling(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice!
sweep/sweeper_test.go
Outdated
select { | ||
case res := <-resultChan: | ||
require.Equal(t, ErrSweeperShuttingDown, res.Err) | ||
case <-time.After(2 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: defaultTestTimeout
?
nitnit: newline between cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 💯
ae9bdb4
to
d64c988
Compare
We risked deadlocking on shutdown if a client (in our case a contract resolver) attempted to schedule a sweep of an input after the ChainNotifier had been shut down. This would cause the `collector` goroutine to exit, and not handle incoming requests, causing a deadlock (since the ChainArbitrator is being stopped before the Sweeper in the server). To fix this we could change the order these subsystems are stopped, but this doesn't ensure there aren't other clients that could end up in the same deadlock scenario. So instead we keep handling the incoming requests even after the collector has exited (immediatly returning an error), until the sweeper is signalled to shutdown.
d64c988
to
77daa3d
Compare
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
We risked deadlocking on shutdown if a client (in our case a contract
resolver) attempted to schedule a sweep of an input after the
ChainNotifier had been shut down. This would cause the
collector
goroutine to exit, and not handle incoming requests, causing a deadlock
(since the ChainArbitrator is being stopped before the Sweeper in the
server).
To fix this we could change the order these subsystems are stopped, but
this doesn't ensure there aren't other clients that could end up in the
same deadlock scenario. So instead we keep handling the incoming
requests even after the collector has exited (imemdiately returning an
error), until the sweeper is signalled to shutdown.