-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]: HTLC Forwarder Events #3639
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main challenge is to decide how consistent the stream of data coming from the notifier is going to be and carefully document that in the code. The channel fitness subsystem needs to be able to deal with any inconsistencies that may occur because of for example restarts (replays), delays and out of order delivery (not sure if that last thing happens).
If there is a design possible that is completely safe, I'd prefer that. Is there current ForwardingLog
completely safe atm?
htlcnotifier/htlcnotifier.go
Outdated
|
||
// ForwardingEvent represents a HTLC that was forwarded through our node. | ||
type ForwardingEvent struct { | ||
// IncomingChannel is the channel that the HTLC arrived at our node on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd attach a timestamp as close to the origin as possible.
htlcnotifier/htlcnotifier.go
Outdated
// ForwardingEvent provides the details of the HTLC forward on our node | ||
// which was settled. | ||
ForwardingEvent | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nodes forward htlcs, but - on the same channels - may also send or receive payments for which they are the endpoint. For profitability calculation that takes into account balance, those events are important too. Will they be communicated over this same notifier channel, or do they go a different way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the insights side of things, I see our node's HTLCs as a separate concern to profitability. They capture "personal utility" gained from a channel, which could be another aspect to factor into scoring; routing nodes would prioritize profitability, non-routing would be more concerned with utility.
But from a htlcNotifier
, I think it makes sense to include them to keep the notifier generic. I'd probably go with a our_htlc
bool in the settle/fail event types (since these HTLCs can't be forwarded/have link failures, but could also go with another event type entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mainly thinking that we also need "our htlcs" to be able to reconstruct the channel balance at the time of an event.
htlcnotifier/interface.go
Outdated
|
||
// NotifySettleEvent is called when a HTLC we previously successfully | ||
// forwarded is settled. | ||
NotifySettleEvent(f ForwardingEvent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need some way to correlate the forwarding event and the outcome (fail/settle). Payment hash is not enough, because it may be reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also surface the HTLC IDs if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at using circuit inKey
/outKey
to identify HTLCs?
Forward events can have the outKey
of the outgoing HTLC and they can be matched to settle/fail events.
I need to have a close look at failure scenarios here. I'm worried about some situation where we forward HTLC_1
with htlcID
=3, then experience some kind of failure and fail HTLC_2
with htlc_id
=3 then replay HTLC_1
with htlcID
=4. Althuogh I wouldn't think this can happen, because it would cause chaos for the switch.
What does safe mean here? It's possible that a panic or crash fault causes us to not write 100% of the data in the in-memory buffer. However, on a clean shutdown, we'll attempt to flush all items on to disk. Also note that this isn't just for the notifier, as there's considerable value to be had from exposing this information over RPC for routing node operators (frequently requested). As far as consistency, I wouldn't say we need very strong guarantees on any of this stuff, as it's all just advisory information that shouldn't be used in isolation to make channel termination or double-down decisions. |
I'd then call that not safe, because we can miss events. I was hoping that the replay on startup would re-offer the events to the forwarding log, in case it crashed. But with the flushing, the switch has probably already fully completed the htlc lifecycle before we write. Does this mean that we have an accounting gap there? |
If we want restart safety and assurances that we'll receive every HTLC that's almost a separate project making a persisted subscription of some sort. I see the fear of doing a lot of work then having to redo it, but I think that if we choose to go with a safer option we can just rip the subscribe backend out, which won't be too much work since we're not spending time coding that logic up now?
If the HTLCs are timestamped and uuid'ed with some combination of |
Can you make it so that the tlv payload is exposed for htlc (fail) events over rpc? It is needed for Whatsat 😬 |
For a small stake in this hot new business, anything is possible 😉 Assuming the use case of custom TLV types (unknown to lnd, that is), would TLV just be exposed as a |
Yes, just |
Also, it is only needed for htlcs that fail at the endpoint. It is not needed for intermediate nodes. To be clear, this is totally a nice to have. A use case to keep in the back of your mind only. |
Of course, but I'll keep it in mind nonetheless 😄 |
414fb05
to
a709915
Compare
Not ready for review yet, just backing up what I have so far. |
High level look at this PR I think it is going in the right direction 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized I had this saved as a draft...really thought I submitted it...
htlcnotifier/htlcnotifier.go
Outdated
|
||
// FailReasonInsufficientBandwidth is used when our available balance is | ||
// less than the HTLC amount. | ||
FailReasonInsufficientBandwidth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps these should be defined in the switch, or use the existing error reasons in lnwire
? Otherwise, there's a bit of duplication amongst forwarding error types across the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the error reasons in lnwire are not suffiecient, we could define a custom error type that can act as a oneof between lnwire and custom codes
htlcswitch/interfaces.go
Outdated
// signal to the source of the HTLC, the policy consistency issue. | ||
CheckHtlcForward(payHash [32]byte, incomingAmt lnwire.MilliSatoshi, | ||
amtToForward lnwire.MilliSatoshi, | ||
incomingTimeout, outgoingTimeout uint32, | ||
heightNow uint32) lnwire.FailureMessage | ||
heightNow uint32) *linkFailure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A public method on a public interface shouldn't have a non-exported return type.
htlcswitch/link.go
Outdated
// linkFailure represents failure to forward a HTLC along the link. It contains | ||
// an appropriate lnwire message and the specific reason for failure. | ||
type linkFailure struct { | ||
message lnwire.FailureMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can just using the existing ForwardingError
struct here? It adheres to both lnwire.FailureMessage
and the error
interface. A new field for a failure type can be added, referencing the existing errors in the lnwire
package. For any new errors, we can just add them there as well, but take care to not leak out any non-official errors to the wide network which are only used for purposes of internal accountability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is, many of the new errors are also just duplicates of existing errors that already live in lnwire
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ForwardingError
and using existing lnwire
errors would be good, since there are a lot of duplicates. What about defining the extra errors in the htlcnotifier
(rather than lnwire
) package which implement FailureMessage
and then mapping custom errors back to their broadcast-friendly error? Eg, insufficient balance -> temp channel failure.
htlcnotifier/interface.go
Outdated
|
||
// NotifySettleEvent is called when a HTLC we previously successfully | ||
// forwarded is settled. | ||
NotifySettleEvent(f ForwardingEvent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also surface the HTLC IDs if needed.
htlcnotifier/interface.go
Outdated
// Notifier represents a place through which HTLC events can be piped through. | ||
// | ||
// Concrete implementations of Notifier should be non-blocking. | ||
type Notifier interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit of redundancy across the package name + interface name.
htlcnotifier/interface.go
Outdated
|
||
// NotifyForwardingFailEvent is called when a HTLC we previously | ||
// successfully forwarded fails down the line. | ||
NotifyForwardingFailEvent(failType ForwardFailType, f ForwardingEvent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this use the ForwardingFailEvent
struct? Similar comment applies to the link failure message above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 updated in the more recent WIP draft, cleaned these up into individual event types with a common identifier struct (ID, to be renamed to something better) rather than trying to tack failures onto ForwardingEvent
when needed.
htlcnotifier/interface.go
Outdated
@@ -0,0 +1,21 @@ | |||
package htlcnotifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this should be a sub-package of the switch.
a709915
to
2f044fc
Compare
htlcswitch/switch.go
Outdated
@@ -485,14 +485,18 @@ func (s *Switch) forward(packet *htlcPacket) error { | |||
return err | |||
} | |||
|
|||
var failure lnwire.FailureMessage | |||
var failure *ForwardingError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what bool
does the commit msg refer to?
const ( | ||
// FailureDetailNone is used when there are no further failure details | ||
// beyond lnwire's failure message. | ||
FailureDetailNone FailureDetail = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution 👍
2f044fc
to
999dab1
Compare
@carlaKC Is this still in Draft or should it be moved to Needs Review? |
999dab1
to
15302f7
Compare
This commit adds a HTLCNotifier package which HTLC events will be piped through to provide clients with subscriptions to HTLC level events. The events included are: - ForwardingEvents: When a HTLC is successfully forwarded through our node. - LinkFailEvents: When a HTLC fails on our outgoing or incoming link. - ForwardingFailEvents: When a HTLC we forwarded fails down the line and is removed. - SettleEvents: When a HTLC from a forward or receive on our node is settled.
In this commit, instances of the HTLCNotifier are added to the switch and link stucts.
This commit updates the CheckHtlcTransit and CheckHtlcForward functions on the ChannelLink interface to return ForwardErrors. This will allow returning more detailed information about HTLC forward/transit failures than the current set of lnwire FailureMessages provide. It also updates handlePacketForward to store full forwarding errors when attempting to forward a htlc.
In this commit, a link failure Forwarding Error is added to the htlc packet struct. This error is non-nil for failures which occur at our node, and will be used to notify specific information about the link failure.
This commit updates the switch and link code to send HTLC notifier events when we forward, settle and fail HTLCs, including our own sends and receives. A distinction is made between forwarding failures, which happen further down the line and have unknown failure reasons and link failures, which occur on our incoming or outgoing link so we have more specific failure reasons for them.
This commit exposes a subscription to HTLC events over rpc. For link falures, which are failures which occurred at our node, lnwire failure messages paired with more detailed failure reasons are used to classify failure reasons.
15302f7
to
e4d9bba
Compare
Draft PR replaced with #3781. |
First shot at a HTLC Forwarder events (as specified in #3420). PR is not ready for review at present, but I'd like to get some early feedback on this design.
Specifically:
canSendHTLC
withlinkFailure
FailureReason
andForwardFailType
- both feel like they belong inlink.go
but that creates a circular dep; perhaps a new package just with the errors?To dos: