-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Waitblockheight MPP bugs #3914
Waitblockheight MPP bugs #3914
Conversation
Quite a scare you gave me there: it seems like after the lightning/plugins/libplugin-pay.c Lines 1105 to 1109 in 723b722
What bothers me more is that the existing code works for most MPP payments, just not in combination with |
Ok, found the issue: there are three places we add the TLV payload, which may include the lightning/plugins/libplugin-pay.c Lines 1105 to 1109 in 723b722
Which is the only invocation that isn't mediated through the MPP modifiers. Using the root amount is correct always, however I still don't quite get why the MPP modifiers doesn't just overwrite the TLV field here: lightning/plugins/libplugin-pay.c Lines 2613 to 2617 in 723b722
and here lightning/plugins/libplugin-pay.c Lines 2763 to 2767 in 723b722
Still looking into that, but the fix should indeed just be using the |
@cdecker the reason why I think the
As you can see above, the sub-payments used to have a routehint, but after the This is particularly important since completely unpublished payees can only be accessed via the routehints they put in the invoice, so forgetting routehints should be strongly discouraged. With both my changes to the |
7152d76
to
1b86601
Compare
Hm, that is kind of bad tbh: without the We could probably add a signal to the To be honest I don't think the |
1b86601
to
b1f6976
Compare
But we do! See: lightning/plugins/libplugin-pay.c Lines 1797 to 1801 in 723b722
We only try the other ones if the current one is now banned by What we should do instead is what I suggest in #3894, i.e. give each sub-payment its own permutation of the order of routehints, so that each sub-payment tries a different routehint if possible. |
Hm, you're right, we can rotate the routehints without assuming the last retry was due to the routehint being bad. I like your idea of giving each split its own iteration order, but think that could also be achieved simply by offsetting the iteration randomly into the routehints list? Basically use Our current routehint then just becomes |
Nice find that we don't need to increment routehint-offset and retry counter in lock-step btw :-) |
hmm so something like |
That sounds reasonable to me, it'd mean that we always use the first non-excluded routehint, starting from a random position in the array. |
It is less optimal than making a strong effort to ensure that different sub-payments have as diverse ordering as possible, but is very simple to implement. |
b1f6976
to
fb24e00
Compare
Rebased to fix merge conflicts, also add changelogs. Will go put the randomization we suggested, please wait. |
@cdecker there is a good reason for us to advance the Basically, if my understanding of the code is right:
What do you think? |
The issue with the above is that when we exceed the constraints, we also exclude the most expensive channel. And the reason for that is that we used to have an actual route randomization, and sometimes it would allow really expensive channels due to randomization. So we need to factor in route randomization (once implemented) in our decision to advance or not advance the routehints pointer. If the reason for route rejection is due to route randomization going overboard, we should consider whether to advance the routehints pointer, or reduce the randomization, but not both (we should advance the routehints pointer first, and if that runs out, reduce randomization and reset the routehints pointer). @rustyrussell notice interaction with route randomization! |
fb24e00
to
2d43c79
Compare
Okay, went with "advance the routehint only if routing failed", and promote constraint violation to a "routing failed". @cdecker thoughts? I think constraint violation should be consider a routing failure. |
Yet another reason to advance the routehint: https://travis-ci.org/github/ElementsProject/lightning/jobs/716418486#L2348 In this scenario there exists two routehints, in this order: l3->l4->l5, and l3->l5, l5 is the payee. l4 happens to be down. This is what happens in the payer:
So on the first |
2d43c79
to
d701179
Compare
Fixed the above. Hopefully this is the final added fix. |
I now realized that the |
df42e6a
to
586ff15
Compare
586ff15
to
6c63829
Compare
Rebased. Also, bump |
… causes us to advance routehints too aggressively. The worst effect is that unpublished nodes are harder to pay, but even published ones make us do unnecessary work, since we are losing routehints from the published ones that could help us actually route better to them.
…utehints. Only advance through routehints if no route was found at all, or if the estimated capacity at the routehint is lower than the amount that we have to send through the routehint. Changelog-Fixed: pay: Be less aggressive with forgetting routehints.
Reported-by: ZmnSCPxj Signed-off-by: Christian Decker <@cdecker> Changelog-Fixed: pay: Correct a case where we put the sub-payment value instead of the *total* value in the `total_msat` field of a multi-part payment.
…prevent all future progress. Blockheight disagreement is signalled with a permanent failure at the end node, but is actually a transient failure.
6c63829
to
ce05d3f
Compare
Rebased on top of |
ACK ce05d3f |
So I stumbled on a bug where the payee has a longer chain than the payer while developing #3913.
I wrote this little test plus fix, but it looks like we still have issues with being too aggressive in returning a transient error permanently!
Here are logs on the payer, after the (partial fix) is done on the payer side:
And here are the relevant logs on the payee:
I find it very strange that payee still thinks there is some problem with the incoming payment. So it looks like we have yet another bug.... LOL
@cdecker for help at the payer side (I think it is a genuine bug as I find the logic of the
d->offset
fairly weird) and @rustyrussell for help at the payee side (I have no idea about how thehtlc_set
works, and why the source of the error is different in the initial case (where the payer genuinely has the wrong blockheight) then in the second case (where the payer should have advanced its view of the blockheight already)).