-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle missing case in reestablish local commitment number checks #2721
Handle missing case in reestablish local commitment number checks #2721
Conversation
The data loss protect test was panicking in a message assertion which should be passing, but because the test was marked only `#[should_panic]` it was being treated as a successful outcome. Instead, we use `catch_unwind` on exactly the line we expect to panic to ensure we are hitting the right one.
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2721 +/- ##
==========================================
+ Coverage 88.80% 90.09% +1.28%
==========================================
Files 113 114 +1
Lines 89170 101278 +12108
Branches 89170 101278 +12108
==========================================
+ Hits 79191 91243 +12052
+ Misses 7735 7647 -88
- Partials 2244 2388 +144 ☔ View full report in Codecov by Sentry. |
4a5d7f7
to
cc441dc
Compare
I dont think that the PR is doing that though, it just logs and closes the channel with a different case but it would still result in loss of funds from the looks of it. |
the off by one is fixed in the final commit |
lightning/src/ln/channel.rs
Outdated
} else { | ||
Err(ChannelError::Close("Peer attempted to reestablish channel with a very old remote commitment transaction".to_owned())) | ||
Err(ChannelError::Close(format!( | ||
"Peer attempted to reestablish channel with a future local commitment transaction: {} (received) vs {} (expected)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they have a future commitment shouldn't we just send them an an error and have them close, so we don't close with a revoked state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we won't be closing with a revoked state, it implies we have lost some state, but only a new commitment transaction, we haven't revoked the latest state we have, so its still safe to broadcast.
@@ -4295,8 +4302,18 @@ impl<SP: Deref> Channel<SP> where | |||
order: self.context.resend_order.clone(), | |||
}) | |||
} | |||
} else if msg.next_local_commitment_number < next_counterparty_commitment_number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the case before this is else if msg.next_local_commitment_number == next_counterparty_commitment_number - 1
which would qualify for this as well. Should this be msg.next_local_commitment_number < next_counterparty_commitment_number - 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its an else if and and the intent is to capture any "they're behind us" case (that we cannot recover from, which is handled in the above conditionals).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just think it'd be more clear if this was msg.next_local_commitment_number < next_counterparty_commitment_number - 1
since the statement before is else if msg.next_local_commitment_number == next_counterparty_commitment_number - 1
, shouldn't have to rely on the -1
in the previous statement to make this one correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, oh, okay so that's where the disagreement is. So the statement in the error message (even if not closing in response, or the "very" part of the comment) is correct with or without the - 1. Given the else
clause is "future", it seems better to use "very old" in the "one step old" case.
cc441dc
to
c4e9da4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK!
This pull request effectively addresses the scenario where the next_remote_commitment_number
is equal to our_commitment_transaction + 1
. Previously, this specific case was not accounted for, leading to the triggering of the else statement and resulting in a final close and hence loss of funds. Now, with the recent changes, this situation is properly handled under the 'We have fallen behind case…'
I'm planning to review the test soon and will share my complete review shortly!
FYI, since we started panic-ing here instead of allowing it to force close when running behind, we have had a lot less force closes and none that have resulted in justice transactions. And no reports of errors on the wallet side yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after squash
When we reestablish there are generally always 4 conditions for both local and remote commitment transactions: * we're stale and have possibly lost data * we're ahead and the peer has lost data * we're caught up * we're nearly caught up and need to retransmit one update. In especially the local commitment case we had a mess of different comparisons, which is improved here. Further, the error messages are clarified and include more information.
If we're behind exactly one commitment (which we've revoked), we'd previously force-close the channel, guaranteeing we'll lose funds as the counterparty has our latest local commitment state's revocation secret. While this shouldn't happen because users should never lose data, sometimes issues happen, and we should ensure we always panic. Further, `test_data_loss_protect` is updated to test this case.
1e4c520
to
ac3fd98
Compare
Squashed without any changes - $ git diff-tree -U1 1e4c520a ac3fd98e
$ |
@@ -4152,14 +4152,15 @@ impl<SP: Deref> Channel<SP> where | |||
return Err(ChannelError::Close("Peer sent an invalid channel_reestablish to force close in a non-standard way".to_owned())); | |||
} | |||
|
|||
let our_commitment_transaction = INITIAL_COMMITMENT_NUMBER - self.context.cur_holder_commitment_transaction_number - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhat poor choice for a variable name that's a number/index, and not an actual tx
If we're behind exactly one commitment (which we've revoked), we'd
previously force-close the channel, guaranteeing we'll lose funds
as the counterparty has our latest local commitment state's
revocation secret.
While this shouldn't happen because users should never lose data,
sometimes issues happen, and we should ensure we always panic.
Further,
test_data_loss_protect
is updated to test this case.