-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contract court acting on confirmed chain events #1017
Contract court acting on confirmed chain events #1017
Conversation
c0c6549
to
f0829f9
Compare
One integration tests hit what looks to be a deadlock. The other instance ran into this error (which may or may not be new?):
(or possibly is an incorrect test that was coded around a prior race condition?) |
The "transactions not found in mempool" error should be fixed when rebased on The deadlock I'm not sure about, not seen before 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! This is definitely something that should have been properly implemented when lnd
was created even. It should resolve a number of issues we've been running into in the integration tests, and also some weird states we've seen channels get into on mainnet.
Once this PR finally gets in, I'll make a follow up PR that modifies things to ensure we broadcast the commitment accordingly in the face of channel desynchronization.
@@ -1374,17 +1374,6 @@ func (c *ChannelArbitrator) channelAttendant(bestHeight int32, | |||
CommitResolution: uniClosure.CommitResolution, | |||
HtlcResolutions: *uniClosure.HtlcResolutions, | |||
} | |||
if contractRes.IsEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on the simplification here!
contractcourt/briefcase.go
Outdated
// StateCommitmentBroadcasted is a state that indicates that the | ||
// attendant has broadcasted the commitment transaction, and is now | ||
// waiting for it to confirm. | ||
StateCommitmentBroadcasted ArbitratorState = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is now, this'll break any dormant nodes that are in the StateContractClosed
state. We didn't use iota
here as we need the enums to be stable on disk. instead this should be added as the final state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1475,6 +1475,7 @@ func (c *ChannelArbitrator) channelAttendant(bestHeight int32, | |||
log.Infof("ChannelArbitrator(%v): all "+ | |||
"contracts resolved, exiting", | |||
c.cfg.ChanPoint) | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I think I've seen a few lingering chan arbitrators that never exited, I think this was the souce
contractcourt/chain_watcher.go
Outdated
@@ -307,6 +311,13 @@ func (c *chainWatcher) closeObserver(spendNtfn *chainntnfs.SpendEvent) { | |||
&commitmentHash, | |||
) | |||
if isOurCommitment { | |||
if err := c.dispatchLocalClose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming suggestion: dispatchLocalForceClose
.
contractcourt/chain_watcher.go
Outdated
if err != nil { | ||
return err | ||
} | ||
// With the event processed, we'll now notify all subscribers of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: missing new line above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
if err := net.DisconnectNodes(ctxt, net.Alice, net.Bob); err != nil { | ||
t.Fatalf("unable to disconnect Bob's peer from Alice's: err %v", err) | ||
var predErr error | ||
err = lntest.WaitPredicate(func() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why's this necessary now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we now don't consider the channel closed (adding the CloseSummary
to the db) before the commitment is on-chain, we run into a race with the above force close. Alternative is to modify the DisconnectPeer
rpc to allow disconnecting from a peer where the commitment has been broadcasted, but not confirmed yet?
chainntnfs/interface_test.go
Outdated
} | ||
|
||
// TODO(halseth): create new clients that should be registered after tx | ||
// is in the mempool already, when btcd supports notifying on these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btcd
supports this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue created to address this.
channeldb/channel.go
Outdated
@@ -1551,6 +1596,7 @@ type ChannelCloseSummary struct { | |||
|
|||
// CloseType details exactly _how_ the channel was closed. Three | |||
// closure types are possible: cooperative, force, and breach. | |||
// TODO: migration logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No migration logic needed if we append the new close types to the end.
contractcourt/channel_arbitrator.go
Outdated
case <-c.cfg.ChainEvents.CooperativeClosure: | ||
log.Infof("ChannelArbitrator(%v) closing due to co-op "+ | ||
"closure", c.cfg.ChanPoint) | ||
return | ||
|
||
// We have broadcasted our commitment, and it is now confirmed | ||
// on-chain. | ||
case closeSummary := <-c.cfg.ChainEvents.LocalUnilateralClosure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would need to double check, but I think it's the case that for all of these events, we need to ensure we always pass in the block height of the event. Otherwise, we may end up writing the incorrect height hint to disk, which has ramifications for neutrino (possibly missing an event).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now using the spendingHeight
when advancing state. Also changed the variable name to triggerHeight
to make it more clear.
// } | ||
} | ||
|
||
func TestChannelArbitratorCooperativeClose(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to at least finish this pending started set of tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests finished.
Also needs a rebase in light of the recent merges. |
716d9f4
to
5f35d47
Compare
Addressed everything except integration tests and |
56547dd
to
7433356
Compare
contractcourt/channel_arbitrator.go
Outdated
// When the state machine reaches that state, we'll log | ||
// out the set of resolutions. | ||
stateCb := func(nextState ArbitratorState) error { | ||
if nextState == StateContractClosed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you invert this condition, you can save a level of indentation and prevent a lot of line-wrapping.
if nextState != StateContractClosed {
return nil
}
Admittedly you'd have to have two "return nil"s then, but on net it still seems nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! 👍
c87c51f
to
93e23f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
This PR is getting veeeerrry close...
Moving to start to test this on some lives nodes to weed out any issues introduced and also test out the new RPC modifications.
One lingering thing w.r.t the PR is the incomplete unit tests. We can either block and wait for them to be completed (which will allow be to do some manual testing while they're being finalized), or go ahead with this PR and the follow up with another PR to add the unit tests. The other dependency to this PR (atm at least), is the change to fix the race condition between the brar and the chain watcher. Seems easiest to delete the commit which introduces the partial fix, in favor of going with the ultimate solution developed within the pending PR.
contractcourt/channel_arbitrator.go
Outdated
@@ -1406,7 +1406,7 @@ func (c *ChannelArbitrator) channelAttendant(bestHeight int32) { | |||
// We'll now advance our state machine until it reaches | |||
// a terminal state. | |||
_, _, err := c.advanceState( | |||
uint32(bestHeight), | |||
uint32(uniClosure.SpendingHeight), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// trimmed. We'll need to wait for a CSV timeout before we can | ||
// reclaim the funds. | ||
commitRes := contractResolutions.CommitResolution | ||
if commitRes != nil && commitRes.MaturityDelay > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so.
if dbChan.IsBorked { | ||
peerLog.Warnf("ChannelPoint(%v) is borked, won't "+ | ||
"start.", chanPoint) | ||
if dbChan.ChanStatus != channeldb.Default { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Not a blocker, but we may want to add a String()
method to the ChanStatus
variables so this logs a bit nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
channeldb/db.go
Outdated
if channel.IsPending != pending { | ||
continue | ||
} | ||
if (channel.ChanStatus != Default) != waitingClose { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprised this actually compiles...
I think this would be clearer written as:
switch channel.ChanStatus {
case Borked && !waitingClose:
continue
case CommitmentBroadcasted && !waitingClose:
continue
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote and added comments to make it more clear what is going on.
|
||
// The commitment transaction has not been confirmed, so we expect to | ||
// see a maturity height and blocks til maturity of 0. | ||
assertCommitmentMaturity(t, forceClose, 0, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check was removed, but never replaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with the check above for RecoveredBalance
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WaitingClose
state doesn't have the commitment maturity property, as it is implicit 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for RecoveredBalance
@@ -251,6 +251,30 @@ func (c *chainWatcher) SubscribeChannelEvents(syncDispatch bool) *ChainEventSubs | |||
|
|||
if syncDispatch { | |||
sub.ProcessACK = make(chan error, 1) | |||
|
|||
// If this client is syncDispatch, we cannot safely delete it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this commit need to be part of the wider set of commits?
Perhaps we should merge in the fix that you can @cfromknecht are working on before we merge this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this commit for now, as it is needed to make the integration tests pass reliably. It should be reverted when #1104 is merged.
93e23f1
to
f69bf6f
Compare
Comments addressed and rebased! PTAL 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking really good here! The added unit tests seem comprehensive, and changes to the integration suite make sense. Nothing major from me, just a couple small things/questions :)
channeldb/channel.go
Outdated
@@ -285,6 +285,38 @@ type ChannelCommitment struct { | |||
// * lets just walk through | |||
} | |||
|
|||
// ChannelStatus is used to indicate whether an OpenChannel is in the default | |||
// usable state, or a state where it shouln't be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouln't -> shouldn't
// Borked indicates that the channel has entered an irreconcilable | ||
// state, triggered by a state desynchronization or channel breach. | ||
// Channels in this state should never be added to the htlc switch. | ||
Borked ChannelStatus = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we ever decide if we need to distinguish the case between realizing we are borked, or if we detect that they have been borked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'll be a follow which will lead to a full data-loss protection implementation.
channeldb/channel.go
Outdated
// closure types are possible: cooperative, force, and breach. | ||
// CloseType details exactly _how_ the channel was closed. Four closure | ||
// types are possible: cooperative, local force, remote force, and | ||
// breach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add FundingCancelled
to this list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we currently have a We do, will add.FundingCancelled
state within the daemon? The state could be added to the list when that scenario is something we're aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it removed in this diff? Didn't pick up...or are we talking about merging some of the other ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
contractcourt/chain_arbitrator.go
Outdated
|
||
return channel.CloseChannel(summary) | ||
MarkCommitmentBroadcasted: func() error { | ||
return channel.MarkCommitmentBroadcasted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thinkk this closure is unnecessary, since they have the same signature
|
||
return NewChannelArbitrator(arbCfg, nil, testLog), | ||
resolvedChan, cleanUp, nil | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra newline
contractcourt/channel_arbitrator.go
Outdated
|
||
// As a new block has been connected to the end of the main | ||
// chain, we'll check to see if we need to make any on-chain | ||
// claims on behalf of the channel contract that we're | ||
// arbitrating for. | ||
chainActions := c.checkChainActions(uint32(bestHeight), | ||
chainActions := c.checkChainActions(uint32(triggerHeight), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like triggerHeight
is already a uint32
, maybe conversion is unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🔥
Needs rebase to resolve proto conflict due to recent merge. |
…t, advance from height of event not current height
…_watcher This commit changes the channel arbitrator state machine to only care about commitment transactions that are being confirmed on-chain according to the chain_watcher. This is meant to handles the cases where we would broadcast our commitment, expecting it to get confirmed, but instead a competing transaction was confirmed. This commit readies the ChannelArbitrator state machine for the change that will make the ChainWatcher only notify on confirmed commitments. The state machine has gotten a new state, StateCommitmentBroadcasted, which we'll transition to after we have broadcasted our own commitment. From this state we'll go to the StateContractClosed state regardless of which commitment the ChainWatcher notifies about, unifying the contract resolution betweee the local and remote force close.
…enChannel This commit changes the bool `IsBorked` in OpenChannel to a `ChanStatus` struct, of type ChannelStatus. This is used to indicated that a channel that is technically still open, is either borked, or has had a commitment broadcasted, but is not confirmed on-chain yet. The ChannelStatus type has the value 1 for the status Borked, meaning it is backwards compatible with the old database format.
…d when detected This commit changes the ChainWatcher to only send a chain event in case the various spends are _confirmed_ on-chain, not only seen on the network. A consequence of this is that we now give the ChainWatcher the responsibility of marking the channel closed when the closing tx is confirmed, instead of the ChannelArbitrator.
… instead of closed after broadcast
…elArbitrator config
This commit adds MVP unit tests for the following scenarios in the ChannelArbitrator: 1) A cooperative close is confirmed. 2) A remote force close is confirmed. 3) A local force close is requested and confirmed. 4) A local force close is requested, but a remote force close gets confirmed.
This commit adds a new method FetchWaitingCloseChannels to the database, used for fetching OpenChannels that have a ChanStatus != Default. These are channels that are borked, or have had a commitment broadcasted, and is now waiting for it to confirm. The fetchChannels method is rewritten to return channels exclusively based on wheter they are pending or waitingClose.
This commit changes from using FetchAllChannels to FetchAllOpenChannels, making the check for whether a channel is pending unnecessary.
…g channels response
This commit modifies the integration tests to work with the recent changes to the ChannelArbitrator, where it will only act on commitments that has been confirmed. Main changes involving when to look for transactions in the mempool and in blocks, and using the new RPC for getting channels in the "waiting close" phase when they are waiting for the commitment to confirm.
…cel() This commit makes clients subscribing to channel events that are marked "sync dispatch" _not_ being deleted from the list of clients when they call Cancel(). Instead a go routine will be launched that will send an error on every read of the ProcessACK channel. This fixes a race in handing off the breach info while lnd was shutting down. The breach arbiter could end up being shut down (and calling Cancel()) before while the ChainWatcher was in the process of dispatching a breach. Since the breach arbiter no longer was among the registered clients at this point, the ChainWatcher would assume the breach was handed off successfully, and mark the channel as pending closed. When lnd now was restarted, the breach arbiter would not know about the breach, and the ChainWatcher wouldn't attempt to re-dispatch, as it was already marked as pending closed.
9e27499
to
4320421
Compare
Decided to keep as is. Addressed last comment and rebased 💯 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🌟
This PR changes the way the
ChannelArbitrator
state machine is triggered, by making theChainEvents
sent from theChainWatcher
only being sent if a commitment is confirmed on-chain. This is done to ensure that if we act on a commitment (force close, breach, cooperative close etc), then the commitment we act on will not be replaced by another, valid commitment in-chain.ChannelArbitrator
The
ChannelArbitrator
's state machine is changed to accept any commitment being seen on-chain, regardless what state it is currently in. In particular, we would previously assume a local force close transaction always would be confirmed if we had successfully broadcasted it. Now we broadcast, and wait for any commitment to be accepted into a block before making any more assumptions.OpenChannel
By having to wait for a commitment to appear in-chain before taking any further action, a new "pending close state" a channel can be in is introduced. We add a field
BroadcastedCommitment
to theOpenChannel
struct, which keeps track of the intermediate state a channel can be in if the commitment has been broadcasted. This field is needed to make sure we don't try to forward payments over such a channel, as it might be our commitment that is broadcasted.ChainWatcher
The
ChainWatcher
now only dispatches confirmed events, making all it's clients unaware of any unconfirmed spends.Fixes #538
Fixes #880
Fixes #1112
Fixes #1071
Fixes #1091
Note: builds on top of #956 and #969, and the integration tests need Roasbeef/btcd#20 to avoid flakes.
TODO before merge:
Database migration logic