Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for aggregated revoked HTLC claim on anchors channel #2034

Merged

Conversation

wpaulino
Copy link
Contributor

This PR adds a new test to cover the aggregated revoked HTLC claim edge case for channels featuring anchors outputs.

Along the way, we also fix a few issues to allow the anchors build to run, and we start running it in CI.

Fixes #1905.

@wpaulino wpaulino force-pushed the anchor-revoked-aggregate-claim branch from 581d8b3 to 84ec2f6 Compare February 15, 2023 17:52
// is txid of the initial claiming transaction and is immutable until outpoint is
// post-anti-reorg-delay solved, confirmaiton_block is used to erase entry if
// block with output gets disconnected.
pending_claim_events: Vec<(PackageID, ClaimEvent)>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think this is totally fine, but what's the worst-case here? Can we get a combinitorial explosion from the ~800 HTLCs in a state? In practice we rarely combine HTLC claims, but let's say we have one HTLC claim which is soon and 800 which are a ways away, then we could combine them into one claim, then then the counterparty could peel them off by claiming on-chain one at a time? That's still only 800 entries which is fine, is there a way for them to somehow get N^2 behavior or worse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was confirming the runtime cost here, I realized that pending_claim_requests keeps a bunch of stale entries if something like this ends up happening because we don't clear the previous claim (now invalid since the counterparty claimed at least one of the same outputs as us). The reason for this is to merge the output the counterparty claimed back into it's original claim if it's reorged out of the chain.

So if we aggregate the HTLCs to timeout into a single claim, and the counterparty claims them one-by-one with the preimage, we'll end up with a new entry in pending_claim_requests per HTLC the counterparty claimed, but only one entry for the latest claim in pending_claim_events. The reason the events don't grow linearly in this case is because we can be more aggressive removing them since they do not represent the "main" state of a claim/request, they're simply a proxy to the consumer's claim.

  1. Claiming 483 HTLCs in a single claim:
  • 1 pending_claim_requests (483 HTLCs)
  • 1 pending_claim_events (483 HTLCs)
  1. Counterparty claims 1 HTLC:
  • 2 pending_claim_requests (first 483 HTLCs (stale), second 482 HTLCs)
  • 1 pending_claim_events (482 HTLCs)
  1. Counterparty claims another HTLC:
  • 3 pending_claim_requests (first 483 HTLCs (stale), second 482 HTLCs (stale), third 481 HTLCs)
  • 1 pending_claim_events (481 HTLCs)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! indeed, I misread, yea, okay, so this doesn't even grow at all, cool. FWIW, I'm not at all worried about one-entry-per-HTLC (isnt it 966 - 483 max per direction?) , only if we get to N^3 or something stupid we need to start worrying about it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's 483 as we shouldn't aggregate in a single pending_claim_events offered outputs (timeout path for us) and received outputs (preimage path for us). The runtime cost reasoning sounds correct, it's like at worst 482 pending_claim_requests entries + 1 pending_claim_events iiuc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can document the above reasoning near pending_claim_events.

@wpaulino wpaulino force-pushed the anchor-revoked-aggregate-claim branch 2 times, most recently from fead0c6 to 1368a62 Compare February 15, 2023 20:07
@codecov-commenter
Copy link

codecov-commenter commented Feb 15, 2023

Codecov Report

Patch coverage: 90.67% and project coverage change: -0.14 ⚠️

Comparison is base (48fa2fd) 91.33% compared to head (ba8a280) 91.20%.

❗ Current head ba8a280 differs from pull request most recent head d5bd25c. Consider uploading reports for the commit d5bd25c to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2034      +/-   ##
==========================================
- Coverage   91.33%   91.20%   -0.14%     
==========================================
  Files         101      101              
  Lines       48837    49489     +652     
  Branches    48837    49489     +652     
==========================================
+ Hits        44604    45134     +530     
- Misses       4233     4355     +122     
Impacted Files Coverage Δ
lightning/src/util/events.rs 34.05% <0.00%> (+2.51%) ⬆️
lightning/src/chain/onchaintx.rs 92.66% <81.57%> (-3.11%) ⬇️
lightning/src/ln/monitor_tests.rs 97.54% <93.08%> (-2.10%) ⬇️
lightning/src/ln/chan_utils.rs 94.62% <100.00%> (+0.91%) ⬆️
lightning/src/ln/functional_test_utils.rs 92.65% <100.00%> (+0.23%) ⬆️

... and 36 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TheBlueMatt
Copy link
Collaborator

Feel free to squash, LGTM, I think, needs another review tho.

@wpaulino wpaulino force-pushed the anchor-revoked-aggregate-claim branch from 1368a62 to 1bfeea5 Compare February 17, 2023 02:40
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the additional test coverage, gonna look more

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great test!

@@ -248,15 +252,17 @@ pub struct OnchainTxHandler<ChannelSigner: WriteableEcdsaChannelSigner> {
pub(crate) pending_claim_requests: HashMap<PackageID, PackageTemplate>,
#[cfg(not(test))]
pending_claim_requests: HashMap<PackageID, PackageTemplate>,

// Used to track external events that need to be forwarded to the `ChainMonitor`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected forwarding frequency to ChainMonitor could be more documented.

IIRC, we generate a new ClaimEvent in update_claims_view_from_requests, which itself relies on 2 event: new block connected and off-chain forward preimage release. ChainMonitor could call get_and_clear_pending_claim_events() more often to generate ClaimEvent::BumpHTLC in reaction to the second trigger. However, in the lack of fee-estimation based on ongoing mempool networks congestion, I don't think it matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not public so it can only follow the same frequency as ChannelMonitor::get_and_clear_pending_events, which we already require to be called pretty often (orders of magnitude less than the average block time).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not public, though still think those interfaces might more exposed a day for dual-funding/splicing support, if OnchainTxHandler becomes its own trait :)

// is txid of the initial claiming transaction and is immutable until outpoint is
// post-anti-reorg-delay solved, confirmaiton_block is used to erase entry if
// block with output gets disconnected.
pending_claim_events: Vec<(PackageID, ClaimEvent)>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's 483 as we shouldn't aggregate in a single pending_claim_events offered outputs (timeout path for us) and received outputs (preimage path for us). The runtime cost reasoning sounds correct, it's like at worst 482 pending_claim_requests entries + 1 pending_claim_events iiuc.

@wpaulino
Copy link
Contributor Author

wpaulino commented Mar 6, 2023

Rebased and updated the test to claim HTLCs across two commitments per @ariard's request.

@TheBlueMatt
Copy link
Collaborator

Assigned @arik-so as second reviewer.

@@ -1232,24 +1232,23 @@ macro_rules! check_warn_msg {

/// Check that a channel's closing channel update has been broadcasted, and optionally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence hurts my head lol

MessageSendEvent::BroadcastChannelUpdate { ref msg } => {
assert_eq!(msg.contents.flags & 2, 2);
None
},
MessageSendEvent::HandleError { action: msgs::ErrorAction::SendErrorMessage { ref msg }, node_id: _ } => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a point in making sure that the handle error message is odd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their order doesn't really matter – one message is at the gossip level, the other at the peer/channel level.

}

// Bob force closes by broadcasting his revoked state for each channel.
nodes[1].node.force_close_broadcasting_latest_txn(&chan_a.2, &nodes[0].node.get_our_node_id()).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kinda looks like it wants to be a lambda, seeing how the only change is once it's chan_a, and once it's chan_b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely could be. FWIW this is being simplified a bit in #2059, so I think we can leave it as is here.

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test coverage extension on correct punishment of multiple revoked channels!

@@ -248,15 +252,17 @@ pub struct OnchainTxHandler<ChannelSigner: WriteableEcdsaChannelSigner> {
pub(crate) pending_claim_requests: HashMap<PackageID, PackageTemplate>,
#[cfg(not(test))]
pending_claim_requests: HashMap<PackageID, PackageTemplate>,

// Used to track external events that need to be forwarded to the `ChainMonitor`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not public, though still think those interfaces might more exposed a day for dual-funding/splicing support, if OnchainTxHandler becomes its own trait :)

// is txid of the initial claiming transaction and is immutable until outpoint is
// post-anti-reorg-delay solved, confirmaiton_block is used to erase entry if
// block with output gets disconnected.
pending_claim_events: Vec<(PackageID, ClaimEvent)>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can document the above reasoning near pending_claim_events.

// end up producing an invalid transaction by double spending
// input(s) that already have a confirmed spend. If such spend is
// reorged out of the chain, then we'll attempt to re-spend the
// inputs once we see it.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still re-generate adequate ClaimEvent L965 in block_disconnected though I'm not sure if we have test coverage for both post-anchor bump HTLC/bump commitment ? Or at least doesn't seem so from a look in monitor_tests, reorg_tests and functional_tests. Can be added in future PRs to avoid delaying more this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I definitely plan on improving the test coverage, this PR is mostly just about addressing the revoked commitment case.

@wpaulino wpaulino force-pushed the anchor-revoked-aggregate-claim branch from 3173fbc to ac90285 Compare March 10, 2023 20:28
@@ -846,10 +885,15 @@ impl<ChannelSigner: WriteableEcdsaChannelSigner> OnchainTxHandler<ChannelSigner>
#[cfg(anchors)]
OnchainClaim::Event(claim_event) => {
log_info!(logger, "Yielding RBF-bumped onchain event to spend inputs {:?}", request.outpoints());
self.pending_claim_events.insert(*first_claim_txid, claim_event);
if let Some((existing_claim_idx, _)) = self.pending_claim_events.iter().enumerate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this stop at the first find? Also, what's the utility of this rather verbose expression rather than a retain for inequality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should only be one entry at most per package ID. No reason we can't use retain though.

@wpaulino wpaulino force-pushed the anchor-revoked-aggregate-claim branch 2 times, most recently from 2746dc2 to ba8a280 Compare March 14, 2023 14:40
@wpaulino
Copy link
Contributor Author

Also rebased on main again for the Windows CI fix.

@wpaulino
Copy link
Contributor Author

Hmm, seeing an unrelated failure when running with the anchors flag on our MSRV:

error[E0599]: no function or associated item named `as_ptr` found for type `std::sync::Arc<_>` in the current scope
  --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-task-0.3.27/src/waker_ref.rs:61:20
   |
61 |     let ptr = Arc::as_ptr(wake).cast::<()>();
   |                    ^^^^^^ function or associated item not found in `std::sync::Arc<_>`

@TheBlueMatt
Copy link
Collaborator

Should be resolved with #2107

@TheBlueMatt
Copy link
Collaborator

Needs a rebase.

@wpaulino wpaulino force-pushed the anchor-revoked-aggregate-claim branch from ba8a280 to d5bd25c Compare March 20, 2023 17:01
Since we don't store `pending_claim_events` within `OnchainTxHandler` as
they'll be regenerated on restarts, we opt to implement `PartialEq`
manually such that the field is not longer considered.
The previous documentation was slightly incorrect, a `Claim` can also be
from the counterparty if they happened to claim the same exact set of
outputs as a claiming transaction we generated.
Since the claim events are stored internally within a HashMap, they will
be yielded in a random order once dispatched. Claim events may be
invalidated if a conflicting claim has confirmed on-chain and we need to
generate a new claim event; the randomized order could result in the
new claim event being handled prior to the previous. To maintain the
order in which the claim events are generated, we track them in a Vec
instead and ensure only one instance of a PackageId only ever exists
within it.

This would have certain performance implications, but since we're
bounded by the total number of HTLCs in a commitment anyway, we're
comfortable with taking the cost.
@wpaulino wpaulino force-pushed the anchor-revoked-aggregate-claim branch from d5bd25c to dc2b11f Compare March 20, 2023 18:32
@wpaulino
Copy link
Contributor Author

error: unresolved link to `ChannelTypeFeatures::supports_zero_conf`
   --> lightning/src/util/events.rs:915:34
    |
915 |         /// Furthermore, note that if [`ChannelTypeFeatures::supports_zero_conf`] returns true on this type,
    |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the type alias `ChannelTypeFeatures` has no associated item named `supports_zero_conf`

Not sure how this is only failing on our MSRV builds... I guess cargo doc didn't support redirecting on type aliases back then? Maybe we should just get rid of the cargo doc step?

@TheBlueMatt
Copy link
Collaborator

Hum, that's annoying, uhhhh, yea, I guess let's only support cargo doc on, like, 1.63 or 1.59 if we can? but let's change the CI to actually test that, maybe in the check_commits run also do a cargo doc on anchors?

@wpaulino wpaulino force-pushed the anchor-revoked-aggregate-claim branch from dc2b11f to 881656b Compare March 20, 2023 23:46
// reorged out of the chain, then we'll attempt to re-spend the
// inputs once we see it.
#[cfg(anchors)] {
#[cfg(debug_assertions)] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for being dumb, but what's the benefit of cfg(debug_assertions) with an assert over debug_assert with no config flag? Is it just an optimization to not do the counting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just an optimization to not do the counting?

Yeah, we only want to run this code on debug_assertions which is active on tests.

msg_events.into_iter().filter_map(|msg_event| {
match msg_event {
MessageSendEvent::BroadcastChannelUpdate { ref msg } => {
assert_eq!(msg.contents.flags & 2, 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is an overlap between people working on this code and people who haven't memorized the meaning of the second bit in the channel update message, this needs a comment. However, I'm not sure such an overlap exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just look at the spec 😛 It's checking if the channel is marked disabled once closed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or… we could use a constant instead of a magic 2. I know it's just used in tests, but given it's a test utility and not an individual test, might be some merit to not forcing readers to look up the spec. I won't block the PR over this though.


let bob_persister: test_utils::TestPersister;
let bob_chain_monitor: test_utils::TestChainMonitor;
let bob_deserialized: ChannelManager<
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for future reference we actually dont have to define the type here, maybe we should remove them in all the tests...

@TheBlueMatt TheBlueMatt merged commit 9f8e832 into lightningdevkit:main Mar 21, 2023
@wpaulino wpaulino deleted the anchor-revoked-aggregate-claim branch March 21, 2023 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants