-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for aggregated revoked HTLC claim on anchors channel #2034
Add test for aggregated revoked HTLC claim on anchors channel #2034
Conversation
581d8b3
to
84ec2f6
Compare
// is txid of the initial claiming transaction and is immutable until outpoint is | ||
// post-anti-reorg-delay solved, confirmaiton_block is used to erase entry if | ||
// block with output gets disconnected. | ||
pending_claim_events: Vec<(PackageID, ClaimEvent)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think this is totally fine, but what's the worst-case here? Can we get a combinitorial explosion from the ~800 HTLCs in a state? In practice we rarely combine HTLC claims, but let's say we have one HTLC claim which is soon and 800 which are a ways away, then we could combine them into one claim, then then the counterparty could peel them off by claiming on-chain one at a time? That's still only 800 entries which is fine, is there a way for them to somehow get N^2 behavior or worse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I was confirming the runtime cost here, I realized that pending_claim_requests
keeps a bunch of stale entries if something like this ends up happening because we don't clear the previous claim (now invalid since the counterparty claimed at least one of the same outputs as us). The reason for this is to merge the output the counterparty claimed back into it's original claim if it's reorged out of the chain.
So if we aggregate the HTLCs to timeout into a single claim, and the counterparty claims them one-by-one with the preimage, we'll end up with a new entry in pending_claim_requests
per HTLC the counterparty claimed, but only one entry for the latest claim in pending_claim_events
. The reason the events don't grow linearly in this case is because we can be more aggressive removing them since they do not represent the "main" state of a claim/request, they're simply a proxy to the consumer's claim.
- Claiming 483 HTLCs in a single claim:
- 1
pending_claim_requests
(483 HTLCs) - 1
pending_claim_events
(483 HTLCs)
- Counterparty claims 1 HTLC:
- 2
pending_claim_requests
(first 483 HTLCs (stale), second 482 HTLCs) - 1
pending_claim_events
(482 HTLCs)
- Counterparty claims another HTLC:
- 3
pending_claim_requests
(first 483 HTLCs (stale), second 482 HTLCs (stale), third 481 HTLCs) - 1
pending_claim_events
(481 HTLCs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! indeed, I misread, yea, okay, so this doesn't even grow at all, cool. FWIW, I'm not at all worried about one-entry-per-HTLC (isnt it 966 - 483 max per direction?) , only if we get to N^3 or something stupid we need to start worrying about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's 483 as we shouldn't aggregate in a single pending_claim_events
offered outputs (timeout path for us) and received outputs (preimage path for us). The runtime cost reasoning sounds correct, it's like at worst 482 pending_claim_requests
entries + 1 pending_claim_events
iiuc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can document the above reasoning near pending_claim_events
.
fead0c6
to
1368a62
Compare
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #2034 +/- ##
==========================================
- Coverage 91.33% 91.20% -0.14%
==========================================
Files 101 101
Lines 48837 49489 +652
Branches 48837 49489 +652
==========================================
+ Hits 44604 45134 +530
- Misses 4233 4355 +122
... and 36 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
Feel free to squash, LGTM, I think, needs another review tho. |
1368a62
to
1bfeea5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the additional test coverage, gonna look more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great test!
lightning/src/chain/onchaintx.rs
Outdated
@@ -248,15 +252,17 @@ pub struct OnchainTxHandler<ChannelSigner: WriteableEcdsaChannelSigner> { | |||
pub(crate) pending_claim_requests: HashMap<PackageID, PackageTemplate>, | |||
#[cfg(not(test))] | |||
pending_claim_requests: HashMap<PackageID, PackageTemplate>, | |||
|
|||
// Used to track external events that need to be forwarded to the `ChainMonitor`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expected forwarding frequency to ChainMonitor
could be more documented.
IIRC, we generate a new ClaimEvent
in update_claims_view_from_requests
, which itself relies on 2 event: new block connected and off-chain forward preimage release. ChainMonitor
could call get_and_clear_pending_claim_events()
more often to generate ClaimEvent::BumpHTLC
in reaction to the second trigger. However, in the lack of fee-estimation based on ongoing mempool networks congestion, I don't think it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not public so it can only follow the same frequency as ChannelMonitor::get_and_clear_pending_events
, which we already require to be called pretty often (orders of magnitude less than the average block time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not public, though still think those interfaces might more exposed a day for dual-funding/splicing support, if OnchainTxHandler
becomes its own trait :)
// is txid of the initial claiming transaction and is immutable until outpoint is | ||
// post-anti-reorg-delay solved, confirmaiton_block is used to erase entry if | ||
// block with output gets disconnected. | ||
pending_claim_events: Vec<(PackageID, ClaimEvent)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's 483 as we shouldn't aggregate in a single pending_claim_events
offered outputs (timeout path for us) and received outputs (preimage path for us). The runtime cost reasoning sounds correct, it's like at worst 482 pending_claim_requests
entries + 1 pending_claim_events
iiuc.
1bfeea5
to
5b5c036
Compare
Rebased and updated the test to claim HTLCs across two commitments per @ariard's request. |
Assigned @arik-so as second reviewer. |
5b5c036
to
3173fbc
Compare
@@ -1232,24 +1232,23 @@ macro_rules! check_warn_msg { | |||
|
|||
/// Check that a channel's closing channel update has been broadcasted, and optionally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sentence hurts my head lol
MessageSendEvent::BroadcastChannelUpdate { ref msg } => { | ||
assert_eq!(msg.contents.flags & 2, 2); | ||
None | ||
}, | ||
MessageSendEvent::HandleError { action: msgs::ErrorAction::SendErrorMessage { ref msg }, node_id: _ } => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a point in making sure that the handle error message is odd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Their order doesn't really matter – one message is at the gossip level, the other at the peer/channel level.
} | ||
|
||
// Bob force closes by broadcasting his revoked state for each channel. | ||
nodes[1].node.force_close_broadcasting_latest_txn(&chan_a.2, &nodes[0].node.get_our_node_id()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this kinda looks like it wants to be a lambda, seeing how the only change is once it's chan_a, and once it's chan_b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely could be. FWIW this is being simplified a bit in #2059, so I think we can leave it as is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test coverage extension on correct punishment of multiple revoked channels!
lightning/src/chain/onchaintx.rs
Outdated
@@ -248,15 +252,17 @@ pub struct OnchainTxHandler<ChannelSigner: WriteableEcdsaChannelSigner> { | |||
pub(crate) pending_claim_requests: HashMap<PackageID, PackageTemplate>, | |||
#[cfg(not(test))] | |||
pending_claim_requests: HashMap<PackageID, PackageTemplate>, | |||
|
|||
// Used to track external events that need to be forwarded to the `ChainMonitor`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not public, though still think those interfaces might more exposed a day for dual-funding/splicing support, if OnchainTxHandler
becomes its own trait :)
// is txid of the initial claiming transaction and is immutable until outpoint is | ||
// post-anti-reorg-delay solved, confirmaiton_block is used to erase entry if | ||
// block with output gets disconnected. | ||
pending_claim_events: Vec<(PackageID, ClaimEvent)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can document the above reasoning near pending_claim_events
.
// end up producing an invalid transaction by double spending | ||
// input(s) that already have a confirmed spend. If such spend is | ||
// reorged out of the chain, then we'll attempt to re-spend the | ||
// inputs once we see it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still re-generate adequate ClaimEvent
L965 in block_disconnected
though I'm not sure if we have test coverage for both post-anchor bump HTLC/bump commitment ? Or at least doesn't seem so from a look in monitor_tests
, reorg_tests
and functional_tests
. Can be added in future PRs to avoid delaying more this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I definitely plan on improving the test coverage, this PR is mostly just about addressing the revoked commitment case.
3173fbc
to
ac90285
Compare
lightning/src/chain/onchaintx.rs
Outdated
@@ -846,10 +885,15 @@ impl<ChannelSigner: WriteableEcdsaChannelSigner> OnchainTxHandler<ChannelSigner> | |||
#[cfg(anchors)] | |||
OnchainClaim::Event(claim_event) => { | |||
log_info!(logger, "Yielding RBF-bumped onchain event to spend inputs {:?}", request.outpoints()); | |||
self.pending_claim_events.insert(*first_claim_txid, claim_event); | |||
if let Some((existing_claim_idx, _)) = self.pending_claim_events.iter().enumerate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this stop at the first find? Also, what's the utility of this rather verbose expression rather than a retain for inequality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should only be one entry at most per package ID. No reason we can't use retain
though.
2746dc2
to
ba8a280
Compare
Also rebased on main again for the Windows CI fix. |
Hmm, seeing an unrelated failure when running with the anchors flag on our MSRV:
|
Should be resolved with #2107 |
Needs a rebase. |
ba8a280
to
d5bd25c
Compare
Since we don't store `pending_claim_events` within `OnchainTxHandler` as they'll be regenerated on restarts, we opt to implement `PartialEq` manually such that the field is not longer considered.
The previous documentation was slightly incorrect, a `Claim` can also be from the counterparty if they happened to claim the same exact set of outputs as a claiming transaction we generated.
Since the claim events are stored internally within a HashMap, they will be yielded in a random order once dispatched. Claim events may be invalidated if a conflicting claim has confirmed on-chain and we need to generate a new claim event; the randomized order could result in the new claim event being handled prior to the previous. To maintain the order in which the claim events are generated, we track them in a Vec instead and ensure only one instance of a PackageId only ever exists within it. This would have certain performance implications, but since we're bounded by the total number of HTLCs in a commitment anyway, we're comfortable with taking the cost.
d5bd25c
to
dc2b11f
Compare
Not sure how this is only failing on our MSRV builds... I guess |
Hum, that's annoying, uhhhh, yea, I guess let's only support |
dc2b11f
to
881656b
Compare
// reorged out of the chain, then we'll attempt to re-spend the | ||
// inputs once we see it. | ||
#[cfg(anchors)] { | ||
#[cfg(debug_assertions)] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for being dumb, but what's the benefit of cfg(debug_assertions) with an assert over debug_assert with no config flag? Is it just an optimization to not do the counting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it just an optimization to not do the counting?
Yeah, we only want to run this code on debug_assertions
which is active on tests.
msg_events.into_iter().filter_map(|msg_event| { | ||
match msg_event { | ||
MessageSendEvent::BroadcastChannelUpdate { ref msg } => { | ||
assert_eq!(msg.contents.flags & 2, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is an overlap between people working on this code and people who haven't memorized the meaning of the second bit in the channel update message, this needs a comment. However, I'm not sure such an overlap exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just look at the spec 😛 It's checking if the channel is marked disabled once closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or… we could use a constant instead of a magic 2. I know it's just used in tests, but given it's a test utility and not an individual test, might be some merit to not forcing readers to look up the spec. I won't block the PR over this though.
|
||
let bob_persister: test_utils::TestPersister; | ||
let bob_chain_monitor: test_utils::TestChainMonitor; | ||
let bob_deserialized: ChannelManager< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for future reference we actually dont have to define the type here, maybe we should remove them in all the tests...
This PR adds a new test to cover the aggregated revoked HTLC claim edge case for channels featuring anchors outputs.
Along the way, we also fix a few issues to allow the anchors build to run, and we start running it in CI.
Fixes #1905.