Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not broadcast commitment txn on Permanent mon update failure #1106

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

See doc updates for more info on the edge case this prevents, and
there isn't really a strong reason why we would need to broadcast
the latest state immediately. Specifically, in the case of HTLC
claims (the most important reason to ensure we have state on chain
if it cannot be persisted), we will still force-close if there are
HTLCs which need claiming and are going to expire.

Surprisingly, there were no tests which failed as a result of this
change, but a new one has been added.

@codecov
Copy link

codecov bot commented Oct 5, 2021

Codecov Report

Merging #1106 (5f848de) into main (107c6c7) will increase coverage by 1.71%.
The diff coverage is 90.00%.

❗ Current head 5f848de differs from pull request most recent head f99da12. Consider uploading reports for the commit f99da12 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1106      +/-   ##
==========================================
+ Coverage   90.40%   92.12%   +1.71%     
==========================================
  Files          68       66       -2     
  Lines       34796    40933    +6137     
==========================================
+ Hits        31458    37708    +6250     
+ Misses       3338     3225     -113     
Impacted Files Coverage Δ
lightning/src/ln/channelmanager.rs 89.20% <50.00%> (+5.68%) ⬆️
lightning/src/chain/channelmonitor.rs 91.28% <100.00%> (+0.33%) ⬆️
lightning/src/ln/chanmon_update_fail_tests.rs 98.79% <100.00%> (+1.14%) ⬆️
lightning/src/chain/mod.rs 58.82% <0.00%> (-2.29%) ⬇️
lightning-background-processor/src/lib.rs 93.75% <0.00%> (-0.49%) ⬇️
lightning/src/chain/onchaintx.rs 94.27% <0.00%> (-0.48%) ⬇️
lightning/src/ln/onion_utils.rs 94.91% <0.00%> (-0.46%) ⬇️
lightning-invoice/src/utils.rs 84.09% <0.00%> (-0.18%) ⬇️
lightning-persister/src/lib.rs 94.21% <0.00%> (-0.10%) ⬇️
lightning/src/ln/payment_tests.rs 98.75% <0.00%> (-0.09%) ⬇️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 107c6c7...f99da12. Read the comment docs.

lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved
lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved
Comment on lines 150 to 151
/// or, e.g. we've moved on to a different watchtower and cannot update with all watchtowers
/// that were previously informed of this channel).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to parse this watchtower case -- in this situation, we're switching watchtowers, but have no way of contacting the old watchtower to just delete our data, therefore the channel needs to close via PermanentFailure?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, but honestly its confusing, I replaced it with a better example.

lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved
lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved
Comment on lines 167 to 168
/// [`ChannelMonitor::get_latest_holder_commitment_txn`] once you've safely ensured no further
/// off-chain updates to the channel can occur.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking -- "ensure no further off-chain updates can occur" == "applied the final monitor update mentioned on L170"? Or?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means "no ChannelManager still knows about this channel", I reworded it.

lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved
lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved
@valentinewallace
Copy link
Contributor

Bit confused how to implement these new requirements in the sample. I guess we now need a proxy layer implementing Persist between the FilesystemPersister and ChainMonitor?

@TheBlueMatt
Copy link
Collaborator Author

Bit confused how to implement these new requirements in the sample.

I think we, in general, need to think hard about how we handle disk failures in the sample/lightning-persister. If the disk gets unplugged while we're running, we should just shut down and move on, but there's no way to get an error out of the Persister. The way the code works today is unsafe in this condition - if the disk gets unplugged, we'll broadcast the latest state and keep going, if the user then plugs the disk back in and restarts we may revoke the now-broadcasted state.

I think the end result for the sample needs to be a "really definitely broadcast the latest state from the channelmonitor for channel X" command, which is marked unsafe. We should probably also have a programatic way to detect monitors in this condition and not just rely on the logs in ChannelManager deserialization, I'll work on that bit here.

@TheBlueMatt
Copy link
Collaborator Author

Rebased on upstream. Need to make sure all the updated content in the failure enum docs is still kept and should rename temporaryfailure -> asyncpersist or so.

@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2022

Codecov Report

Base: 90.78% // Head: 91.21% // Increases project coverage by +0.43% 🎉

Coverage data is based on head (74745cb) compared to base (48d21ba).
Patch coverage: 89.09% of modified lines in pull request are covered.

❗ Current head 74745cb differs from pull request most recent head 52934e0. Consider uploading reports for the commit 52934e0 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1106      +/-   ##
==========================================
+ Coverage   90.78%   91.21%   +0.43%     
==========================================
  Files          86       87       +1     
  Lines       46631    50794    +4163     
  Branches    46631    50794    +4163     
==========================================
+ Hits        42335    46333    +3998     
- Misses       4296     4461     +165     
Impacted Files Coverage Δ
lightning/src/chain/mod.rs 68.18% <ø> (ø)
lightning/src/util/errors.rs 72.22% <0.00%> (ø)
lightning/src/chain/channelmonitor.rs 91.21% <75.00%> (+<0.01%) ⬆️
lightning/src/ln/channelmanager.rs 88.11% <83.23%> (+3.05%) ⬆️
lightning-invoice/src/payment.rs 93.13% <87.50%> (+2.26%) ⬆️
lightning/src/util/persist.rs 95.23% <87.50%> (+0.50%) ⬆️
lightning/src/chain/chainmonitor.rs 97.78% <88.46%> (-0.28%) ⬇️
lightning-persister/src/lib.rs 93.45% <100.00%> (ø)
lightning/src/ln/chanmon_update_fail_tests.rs 97.72% <100.00%> (+<0.01%) ⬆️
lightning/src/ln/channel.rs 88.66% <100.00%> (ø)
... and 17 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TheBlueMatt TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 2572ac5 to 0fe1f28 Compare July 18, 2022 01:59
@TheBlueMatt TheBlueMatt marked this pull request as ready for review July 18, 2022 01:59
@TheBlueMatt
Copy link
Collaborator Author

Finally getting back to the monitor work, finally updated this to include a monitor update failure type rework, but no further functional changes.

lightning/src/chain/channelmonitor.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/chain/chainmonitor.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/chanmon_update_fail_tests.rs Show resolved Hide resolved
@TheBlueMatt
Copy link
Collaborator Author

Oops, somehow I totally lost track of this PR having a pending review, so sorry about that! Rebased on latest git and addressed feedback.

See doc updates for more info on the edge case this prevents, and
there isn't really a strong reason why we would need to broadcast
the latest state immediately. Specifically, in the case of HTLC
claims (the most important reason to ensure we have state on chain
if it cannot be persisted), we will still force-close if there are
HTLCs which need claiming and are going to expire.

Surprisingly, there were no tests which failed as a result of this
change, but a new one has been added.
@TheBlueMatt
Copy link
Collaborator Author

Rebased to address conflicts and went ahead and squashed as its been a while.

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@@ -4492,7 +4492,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
// We do not do a force-close here as that would generate a monitor update for
// a monitor that we didn't manage to store (and that we don't care about - we
// don't respond with the funding_signed so the channel can never go on chain).
let (_monitor_update, failed_htlcs) = chan.force_shutdown(true);
let (_monitor_update, failed_htlcs) = chan.force_shutdown(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is not test for this scenario. All tests pass when reverting to true.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its unobservable - note that the monitor_update, which is the place the bool is propagated to, is drop'd.

Comment on lines 49 to 50
/// An attempt to call watch/update_channel returned an Err (ie you did this!), causing the
/// attempted action to fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the docs since it's no longer from an Err? Would be good to link to the corresponding method docs and maybe phrase the docs to be a little clearer, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, also a bit weird that it's still considered an APIError variant, but that might require some more brainstorming than a simple rename/doc update.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, did update the docs, but agree - this shouldnt really be an error at all, it should just be Ok. I think there's only one(-ish) place where it Errs - sending payments - so it should be easy to do in a followup.

Comment on lines +223 to +228
/// When this is returned, [`ChannelManager`] will force-close the channel but *not* broadcast
/// our current commitment transaction. This avoids a dangerous case where a local disk failure
/// (e.g. the Linux-default remounting of the disk as read-only) causes [`PermanentFailure`]s
/// for all monitor updates. If we were to broadcast our latest commitment transaction and then
/// restart, we could end up reading a previous [`ChannelMonitor`] and [`ChannelManager`],
/// revoking our now-broadcasted state before seeing it confirm and losing all our funds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, while the user could choose not to broadcast the latest commitment transaction, which has been revoked, it could still be done by LDK if it contained any pending HTLCs that are expiring soon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, somewhat of a separate issue that should be addressed in #1593

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
Comment on lines 49 to 50
/// An attempt to call watch/update_channel returned an Err (ie you did this!), causing the
/// attempted action to fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, also a bit weird that it's still considered an APIError variant, but that might require some more brainstorming than a simple rename/doc update.

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to squash.

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please squash.

lightning/src/util/errors.rs Outdated Show resolved Hide resolved
@TheBlueMatt
Copy link
Collaborator Author

Addressed one more comment and squashed. Diff since last push:

$ git diff-tree -U1 fc18acb4 4ac58048
diff --git a/lightning/src/util/errors.rs b/lightning/src/util/errors.rs
index 83324eab6..ad6993542 100644
--- a/lightning/src/util/errors.rs
+++ b/lightning/src/util/errors.rs
@@ -48,7 +48,9 @@ pub enum APIError {
 	},
-	/// An attempt to call watch/update_channel returned a
-	/// [`ChannelMonitorUpdateStatus::InProgress`] indicating the persistence of a monitor update
-	/// is awaiting async resolution. Once it resolves the attempted action should complete
-	/// automatically.
+	/// An attempt to call [`chain::Watch::watch_channel`]/[`chain::Watch::update_channel`]
+	/// returned a [`ChannelMonitorUpdateStatus::InProgress`] indicating the persistence of a
+	/// monitor update is awaiting async resolution. Once it resolves the attempted action should
+	/// complete automatically.
 	///
+	/// [`chain::Watch::watch_channel`]: crate::chain::Watch::watch_channel
+	/// [`chain::Watch::update_channel`]: crate::chain::Watch::update_channel
 	/// [`ChannelMonitorUpdateStatus::InProgress`]: crate::chain::ChannelMonitorUpdateStatus::InProgress

When a `chain::Watch` `ChannelMonitor` update method is called, the
user has three options:
 (a) persist the monitor update immediately and return success,
 (b) fail to persist the monitor update immediately and return
     failure,
 (c) return a flag indicating the monitor update is in progress and
     will complete in the future.

(c) is rather harmless, and in some deployments should be expected
to be the return value for all monitor update calls, but currently
requires returning `Err(ChannelMonitorUpdateErr::TemporaryFailure)`
which isn't very descriptive and sounds scarier than it is.

Instead, here, we change the return type used to be a single enum
(rather than a Result) and rename `TemporaryFailure`
`UpdateInProgress`.
If we receive a monitor event from a forwarded-to channel which
contains a preimage for an HTLC, we have to propogate that preimage
back to the forwarded-from channel monitor. However, once we have
that update, we're running in a relatively unsafe state - we have
the preimage in memory, but if we were to crash the forwarded-to
channel monitor will not regenerate the update with the preimage
for us. If we haven't managed to write the monitor update to the
forwarded-from channel by that point, we've lost the preimage, and,
thus, money!
This much more accurately represents the error, indicating that a
monitor update is in progress asynchronously and may complete at a
later time.
@TheBlueMatt
Copy link
Collaborator Author

Oops, sorry, reordered the missing hunk into the right commit, no ultimate diff, though.

@TheBlueMatt TheBlueMatt merged commit 7544030 into lightningdevkit:main Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants