Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sweeper: avoid deadlock on shutdown #4851

Merged
merged 1 commit into from
Dec 10, 2020

Conversation

halseth
Copy link
Contributor

@halseth halseth commented Dec 9, 2020

We risked deadlocking on shutdown if a client (in our case a contract
resolver) attempted to schedule a sweep of an input after the
ChainNotifier had been shut down. This would cause the collector
goroutine to exit, and not handle incoming requests, causing a deadlock
(since the ChainArbitrator is being stopped before the Sweeper in the
server).

To fix this we could change the order these subsystems are stopped, but
this doesn't ensure there aren't other clients that could end up in the
same deadlock scenario. So instead we keep handling the incoming
requests even after the collector has exited (imemdiately returning an
error), until the sweeper is signalled to shutdown.

Copy link
Collaborator

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks good to me, would it be possible to add a unit test for this case, or is it too involved?

sweep/sweeper.go Show resolved Hide resolved
sweep/sweeper.go Show resolved Hide resolved
Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just have a question with regard to how the deadlock can actually happen.

sweep/sweeper.go Show resolved Hide resolved

// The collector exited and won't longer handle incoming
// requests. This can happen on shutdown, when the block
// notifier shuts down before the sweeper and its clients. In
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case does the block notifier shut down? Aren't sweeper clients listen to the same quit channel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When lnd is signalled to shut down, the notifier will be stopped. This means the the blockEpoch channel will close and the collector goroutine in the sweeper will exit.

This all happens before the sweeper's quit channel has been closed, and any of the subsystems attemping to sweep an input at this point will block which stalls lnd shut down.

In the case I saw the contractcourt resolvers would attempt to sweep inputs, but since the notifier already had exited, the collector go routine wouldn't pick up their request. This caused shutdown to deadlock, since the contractcourt subsystems are stopped before the sweeper, and they would never exit.

@halseth halseth force-pushed the sweeper-cnct-deadlock branch from ca9b9cd to ae9bdb4 Compare December 10, 2020 10:35
@halseth
Copy link
Contributor Author

halseth commented Dec 10, 2020

Unit test added and comments addressed, PTAL!

@halseth halseth requested review from bhandras and carlaKC December 10, 2020 10:39
Copy link
Collaborator

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


// TestSweeperShutdownHandling tests that we notify callers when the sweeper
// cannot handle requests since it's in the process of shutting down.
func TestSweeperShutdownHandling(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

select {
case res := <-resultChan:
require.Equal(t, ErrSweeperShuttingDown, res.Err)
case <-time.After(2 * time.Second):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: defaultTestTimeout?
nitnit: newline between cases

Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💯

@halseth halseth force-pushed the sweeper-cnct-deadlock branch from ae9bdb4 to d64c988 Compare December 10, 2020 11:10
We risked deadlocking on shutdown if a client (in our case a contract
resolver) attempted to schedule a sweep of an input after the
ChainNotifier had been shut down. This would cause the `collector`
goroutine to exit, and not handle incoming requests, causing a deadlock
(since the ChainArbitrator is being stopped before the Sweeper in the
server).

To fix this we could change the order these subsystems are stopped, but
this doesn't ensure there aren't other clients that could end up in the
same deadlock scenario. So instead we keep handling the incoming
requests even after the collector has exited (immediatly returning an
error), until the sweeper is signalled to shutdown.
@halseth halseth force-pushed the sweeper-cnct-deadlock branch from d64c988 to 77daa3d Compare December 10, 2020 12:27
@halseth halseth merged commit 08bb8ab into lightningnetwork:master Dec 10, 2020
yyforyongyu added a commit to yyforyongyu/lnd that referenced this pull request Nov 18, 2024
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit to yyforyongyu/lnd that referenced this pull request Nov 18, 2024
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit to yyforyongyu/lnd that referenced this pull request Nov 18, 2024
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 18, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 18, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 18, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 19, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit to yyforyongyu/lnd that referenced this pull request Nov 19, 2024
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 19, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 19, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 19, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 21, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 21, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 21, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 25, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Nov 27, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Dec 3, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Dec 3, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Dec 5, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Dec 6, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit to yyforyongyu/lnd that referenced this pull request Dec 9, 2024
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit to yyforyongyu/lnd that referenced this pull request Dec 9, 2024
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Dec 10, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit to yyforyongyu/lnd that referenced this pull request Dec 10, 2024
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit to yyforyongyu/lnd that referenced this pull request Dec 12, 2024
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit to yyforyongyu/lnd that referenced this pull request Dec 13, 2024
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Dec 15, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Dec 17, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Dec 19, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit to yyforyongyu/lnd that referenced this pull request Dec 20, 2024
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Dec 20, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
yyforyongyu added a commit that referenced this pull request Dec 20, 2024
This commit removes the hack introduced in #4851. Previously we had this
issue because the chain notifier was stopped before the sweeper, which
was changed a while back and we now always stop the chain notifier last.
In addition, since we no longer subscribe to the block epoch chan
directly, this issue can no longer happen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants