Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutputSweeper: Delay pruning until monitors have likely been archived #3559

Merged
merged 3 commits into from
Jan 27, 2025

Conversation

tnull
Copy link
Contributor

@tnull tnull commented Jan 24, 2025

Previously, we would prune tracked descriptors once we see a spend hit
ANTI_REORG_DELAY = 6 confirmations. However, this could lead to a
scenario where lingering ChannelMonitors waiting to be archived would
still regenerate and replay Event::SpendableOutputs, i.e., we would
re-add the same (now unspendable due to be actually being already spent)
outputs again after having intially pruned them.

Here, we therefore keep the tracked descriptors around for longer, in
particular at least ARCHIVAL_DELAY_BLOCKS + ANTI_REORG_DELAY = 4038
confirmations, at which point we assume the lingering monitors to have
been likely archived, and it's 'safe' for us to also forget about the
descriptors.

@tnull tnull requested a review from TheBlueMatt January 24, 2025 10:25
@@ -32,6 +32,9 @@ use bitcoin::{BlockHash, Transaction, Txid};

use core::ops::Deref;

/// The number of blocks we wait before we prune the tracked spendable outputs.
pub const PRUNE_DELAY_BLOCKS: u32 = ARCHIVAL_DELAY_BLOCKS + ANTI_REORG_DELAY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how likely is it that 4032 blocks was insufficient, but 4038 was?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the idea here is that we start pruning after the monitors have been archived. For that to work we need to wait at least 4032 blocks, but of course we're not super sure how soon after the threshold has been met user will actually call archive_fully_resolved_monitors, so I added the reorg delay on top. Could be debatable one way or another.

FWIW, I alternatively considered introducing a new MonitorArchivalComplete event type, but given that we on purpose keep the archival a best-effort thing it seemed wrong to go this way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty unlikely, but holding onto them doesn't really cost us anything, so might as well...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I would have though max(4032, 6) would have also worked, but given that it indeed costs nothing…

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one comment.

tnull added 2 commits January 27, 2025 09:56
.. previously we just used the 4032 magic number, here we put it in a
`pub const` that is reusable elsewhere.
Previously, we would prune tracked descriptors once we see a spend hit
`ANTI_REORG_DELAY = 6` confirmations. However, this could lead to a
scenario where lingering `ChannelMonitor`s waiting to be archived would
still regenerate and replay `Event::SpendableOutput`s, i.e., we would
re-add the same (now unspendable due to be actually being already spent)
outputs again after having intially pruned them.

Here, we therefore keep the tracked descriptors around for longer, in
particular at least `ARCHIVAL_DELAY_BLOCKS + ANTI_REORG_DELAY = 4038`
confirmations, at which point we assume the lingering monitors to have
been likely archived, and it's 'safe' for us to also forget about the
descriptors.
@tnull tnull force-pushed the 2025-01-sweeper-improvements branch from 0aa813f to 84412cc Compare January 27, 2025 08:56
@tnull tnull merged commit 79267d3 into lightningdevkit:main Jan 27, 2025
24 of 25 checks passed
@TheBlueMatt TheBlueMatt mentioned this pull request Jan 28, 2025
@TheBlueMatt
Copy link
Collaborator

Backported in #3567

TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jan 29, 2025
v0.1.1 - Jan 28, 2025 - "Onchain Matters"

API Updates
===========

 * A `ChannelManager::send_payment_with_route` was (re-)added, with semantics
   similar to `ChannelManager::send_payment` (rather than like the pre-0.1
   `send_payent_with_route`, lightningdevkit#3534).
 * `RawBolt11Invoice::{to,from}_raw` were added (lightningdevkit#3549).

Bug Fixes
=========

 * HTLCs which were forwarded where the inbound edge times out within the next
   three blocks will have the inbound HTLC failed backwards irrespective of the
   status of the outbound HTLC. This avoids the peer force-closing the channel
   (and claiming the inbound edge HTLC on-chain) even if we have not yet managed
   to claim the outbound edge on chain (lightningdevkit#3556).
 * On restart, replay of `Event::SpendableOutput`s could have caused
   `OutputSweeper` to generate double-spending transactions, making it unable to
   claim any delayed claims. This was resolved by retaining old claims for more
   than four weeks after they are claimed on-chain to detect replays (lightningdevkit#3559).
 * Fixed the additional feerate we will pay each time we RBF on-chain claims to
   match the Bitcoin Core policy (1 sat/vB) instead of 16 sats/vB (lightningdevkit#3457).
 * Fixed a cased where a custom `Router` which returns an invalid `Route`,
   provided to `ChannelManager`, can result in an outbound payment remaining
   pending forever despite no HTLCs being pending (lightningdevkit#3531).

Security
========

0.1.1 fixes a denial-of-service vulnerability allowing channel counterparties to
cause force-closure of unrelated channels.
 * If a malicious channel counterparty force-closes a channel, broadcasting a
   revoked commitment transaction while the channel at closure time included
   multiple non-dust forwarded outbound HTLCs with identical payment hashes and
   amounts, failure to fail the HTLCs backwards could cause the channels on
   which we recieved the corresponding inbound HTLCs to be force-closed. Note
   that we'll receive, at a minimum, the malicious counterparty's reserve value
   when they broadcast the stale commitment (lightningdevkit#3556). Thanks to Matt Morehouse for
   reporting this issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants