[WIP]: HTLC Forwarder Events #3639

carlaKC · 2019-10-25T16:06:03Z

First shot at a HTLC Forwarder events (as specified in #3420). PR is not ready for review at present, but I'd like to get some early feedback on this design.

Specifically:

Bubbling errors up from canSendHTLC with linkFailure
Use of an interface to mock HTLCNotifier in switch tests
Placement of FailureReason and ForwardFailType - both feel like they belong in link.go but that creates a circular dep; perhaps a new package just with the errors?

To dos:

Figure out circuits & set payment hash for settle/fail events
Add timestamps to events (and elapsed time to settled/forward fail)
Add subscribe RPC
Tests can be refined, I just wanted to check functionality

joostjager

I think the main challenge is to decide how consistent the stream of data coming from the notifier is going to be and carefully document that in the code. The channel fitness subsystem needs to be able to deal with any inconsistencies that may occur because of for example restarts (replays), delays and out of order delivery (not sure if that last thing happens).

If there is a design possible that is completely safe, I'd prefer that. Is there current ForwardingLog completely safe atm?

joostjager · 2019-10-28T07:48:55Z

htlcnotifier/htlcnotifier.go

+
+// ForwardingEvent represents a HTLC that was forwarded through our node.
+type ForwardingEvent struct {
+	// IncomingChannel is the channel that the HTLC arrived at our node on.


I'd attach a timestamp as close to the origin as possible.

joostjager · 2019-10-28T07:51:13Z

htlcnotifier/htlcnotifier.go

+	// ForwardingEvent provides the details of the HTLC forward on our node
+	// which was settled.
+	ForwardingEvent
+}


Nodes forward htlcs, but - on the same channels - may also send or receive payments for which they are the endpoint. For profitability calculation that takes into account balance, those events are important too. Will they be communicated over this same notifier channel, or do they go a different way?

From the insights side of things, I see our node's HTLCs as a separate concern to profitability. They capture "personal utility" gained from a channel, which could be another aspect to factor into scoring; routing nodes would prioritize profitability, non-routing would be more concerned with utility.

But from a htlcNotifier, I think it makes sense to include them to keep the notifier generic. I'd probably go with a our_htlc bool in the settle/fail event types (since these HTLCs can't be forwarded/have link failures, but could also go with another event type entirely?

I was mainly thinking that we also need "our htlcs" to be able to reconstruct the channel balance at the time of an event.

joostjager · 2019-10-28T07:53:22Z

htlcnotifier/interface.go

+
+	// NotifySettleEvent is called when a HTLC we previously successfully
+	// forwarded is settled.
+	NotifySettleEvent(f ForwardingEvent)


Probably need some way to correlate the forwarding event and the outcome (fail/settle). Payment hash is not enough, because it may be reused.

We can also surface the HTLC IDs if needed.

Looking at using circuit inKey/outKey to identify HTLCs?
Forward events can have the outKey of the outgoing HTLC and they can be matched to settle/fail events.

I need to have a close look at failure scenarios here. I'm worried about some situation where we forward HTLC_1 with htlcID=3, then experience some kind of failure and fail HTLC_2 with htlc_id=3 then replay HTLC_1 with htlcID=4. Althuogh I wouldn't think this can happen, because it would cause chaos for the switch.

Roasbeef · 2019-10-29T00:23:51Z

Is there current ForwardingLog completely safe atm?

What does safe mean here? It's possible that a panic or crash fault causes us to not write 100% of the data in the in-memory buffer. However, on a clean shutdown, we'll attempt to flush all items on to disk.

Also note that this isn't just for the notifier, as there's considerable value to be had from exposing this information over RPC for routing node operators (frequently requested). As far as consistency, I wouldn't say we need very strong guarantees on any of this stuff, as it's all just advisory information that shouldn't be used in isolation to make channel termination or double-down decisions.

joostjager · 2019-10-29T06:41:02Z

What does safe mean here? It's possible that a panic or crash fault causes us to not write 100% of the data in the in-memory buffer. However, on a clean shutdown, we'll attempt to flush all items on to disk.

I'd then call that not safe, because we can miss events. I was hoping that the replay on startup would re-offer the events to the forwarding log, in case it crashed. But with the flushing, the switch has probably already fully completed the htlc lifecycle before we write. Does this mean that we have an accounting gap there?

carlaKC · 2019-10-29T06:44:18Z

I think the main challenge is to decide how consistent the stream of data coming from the notifier is going to be and carefully document that in the code.

If we want restart safety and assurances that we'll receive every HTLC that's almost a separate project making a persisted subscription of some sort. I see the fear of doing a lot of work then having to redo it, but I think that if we choose to go with a safer option we can just rip the subscribe backend out, which won't be too much work since we're not spending time coding that logic up now?

The channel fitness subsystem needs to be able to deal with any inconsistencies that may occur because of for example restarts (replays), delays and out of order delivery (not sure if that last thing happens).

If the HTLCs are timestamped and uuid'ed with some combination of inKey/outKey (soundness of that approach pending further investigation) and the subscription is appropriately documented, would you say that's sufficient?

joostjager · 2019-11-03T09:08:16Z

Can you make it so that the tlv payload is exposed for htlc (fail) events over rpc? It is needed for Whatsat 😬

carlaKC · 2019-11-04T07:04:58Z

Can you make it so that the tlv payload is exposed for htlc (fail) events over rpc? It is needed for Whatsat 😬

For a small stake in this hot new business, anything is possible 😉

Assuming the use case of custom TLV types (unknown to lnd, that is), would TLV just be exposed as a []byte which end users can parse if they understand them?

joostjager · 2019-11-04T08:09:11Z

Yes, just map<uint64, bytes> in the proto. We are also planning to do that for the settled invoices: #3670

joostjager · 2019-11-04T08:15:18Z

Also, it is only needed for htlcs that fail at the endpoint. It is not needed for intermediate nodes. To be clear, this is totally a nice to have. A use case to keep in the back of your mind only.

carlaKC · 2019-11-04T10:30:25Z

Also, it is only needed for htlcs that fail at the endpoint. It is not needed for intermediate nodes. To be clear, this is totally a nice to have. A use case to keep in the back of your mind only.

Of course, but I'll keep it in mind nonetheless 😄
It'll be an easy addition to a solution which flags HTLCs that terminate at our node.

carlaKC · 2019-11-11T15:34:48Z

Not ready for review yet, just backing up what I have so far.
Going to have to look at moving successful forward events onto switch level because outKey is set on link level.

halseth · 2019-11-12T10:52:20Z

High level look at this PR I think it is going in the right direction 👍

Roasbeef

Realized I had this saved as a draft...really thought I submitted it...

Roasbeef · 2019-10-29T00:29:10Z

htlcnotifier/htlcnotifier.go

+
+	// FailReasonInsufficientBandwidth is used when our available balance is
+	// less than the HTLC amount.
+	FailReasonInsufficientBandwidth


Perhaps these should be defined in the switch, or use the existing error reasons in lnwire? Otherwise, there's a bit of duplication amongst forwarding error types across the codebase.

If the error reasons in lnwire are not suffiecient, we could define a custom error type that can act as a oneof between lnwire and custom codes

Roasbeef · 2019-10-29T00:32:11Z

htlcswitch/interfaces.go

 	// signal to the source of the HTLC, the policy consistency issue.
 	CheckHtlcForward(payHash [32]byte, incomingAmt lnwire.MilliSatoshi,
 		amtToForward lnwire.MilliSatoshi,
 		incomingTimeout, outgoingTimeout uint32,
-		heightNow uint32) lnwire.FailureMessage
+		heightNow uint32) *linkFailure


A public method on a public interface shouldn't have a non-exported return type.

Roasbeef · 2019-10-29T00:35:01Z

htlcswitch/link.go

+// linkFailure represents failure to forward a HTLC along the link. It contains
+// an appropriate lnwire message and the specific reason for failure.
+type linkFailure struct {
+	message    lnwire.FailureMessage


Perhaps we can just using the existing ForwardingError struct here? It adheres to both lnwire.FailureMessage and the error interface. A new field for a failure type can be added, referencing the existing errors in the lnwire package. For any new errors, we can just add them there as well, but take care to not leak out any non-official errors to the wide network which are only used for purposes of internal accountability.

As is, many of the new errors are also just duplicates of existing errors that already live in lnwire.

Using ForwardingError and using existing lnwire errors would be good, since there are a lot of duplicates. What about defining the extra errors in the htlcnotifier (rather than lnwire) package which implement FailureMessage and then mapping custom errors back to their broadcast-friendly error? Eg, insufficient balance -> temp channel failure.

Roasbeef · 2019-10-29T00:37:02Z

htlcnotifier/interface.go

+
+	// NotifySettleEvent is called when a HTLC we previously successfully
+	// forwarded is settled.
+	NotifySettleEvent(f ForwardingEvent)


We can also surface the HTLC IDs if needed.

Roasbeef · 2019-10-29T00:37:42Z

htlcnotifier/interface.go

+// Notifier represents a place through which HTLC events can be piped through.
+//
+// Concrete implementations of Notifier should be non-blocking.
+type Notifier interface {


Bit of redundancy across the package name + interface name.

Roasbeef · 2019-10-29T00:38:41Z

htlcnotifier/interface.go

+
+	// NotifyForwardingFailEvent is called when a HTLC we previously
+	// successfully forwarded fails down the line.
+	NotifyForwardingFailEvent(failType ForwardFailType, f ForwardingEvent)


Shouldn't this use the ForwardingFailEvent struct? Similar comment applies to the link failure message above.

👍 updated in the more recent WIP draft, cleaned these up into individual event types with a common identifier struct (ID, to be renamed to something better) rather than trying to tack failures onto ForwardingEvent when needed.

Roasbeef · 2019-10-29T00:39:23Z

htlcnotifier/interface.go

@@ -0,0 +1,21 @@
+package htlcnotifier


Feels like this should be a sub-package of the switch.

halseth · 2019-11-25T15:19:43Z

htlcswitch/switch.go

@@ -485,14 +485,18 @@ func (s *Switch) forward(packet *htlcPacket) error {
 				return err
 			}

-			var failure lnwire.FailureMessage
+			var failure *ForwardingError


what bool does the commit msg refer to?

halseth · 2019-11-25T15:20:35Z

htlcswitch/htlcnotifier/htlcnotifier.go

+const (
+	// FailureDetailNone is used when there are no further failure details
+	// beyond lnwire's failure message.
+	FailureDetailNone FailureDetail = iota


Nice solution 👍

manreo · 2019-11-30T11:37:30Z

@carlaKC Is this still in Draft or should it be moved to Needs Review?

carlaKC · 2019-12-01T09:50:45Z

@carlaKC Is this still in Draft or should it be moved to Needs Review?

Still a WIP, working on some itests for this and then it will be good to go. Should be up early this week @mrmanpew

This commit adds a HTLCNotifier package which HTLC events will be piped through to provide clients with subscriptions to HTLC level events. The events included are: - ForwardingEvents: When a HTLC is successfully forwarded through our node. - LinkFailEvents: When a HTLC fails on our outgoing or incoming link. - ForwardingFailEvents: When a HTLC we forwarded fails down the line and is removed. - SettleEvents: When a HTLC from a forward or receive on our node is settled.

In this commit, instances of the HTLCNotifier are added to the switch and link stucts.

This commit updates the CheckHtlcTransit and CheckHtlcForward functions on the ChannelLink interface to return ForwardErrors. This will allow returning more detailed information about HTLC forward/transit failures than the current set of lnwire FailureMessages provide. It also updates handlePacketForward to store full forwarding errors when attempting to forward a htlc.

In this commit, a link failure Forwarding Error is added to the htlc packet struct. This error is non-nil for failures which occur at our node, and will be used to notify specific information about the link failure.

This commit updates the switch and link code to send HTLC notifier events when we forward, settle and fail HTLCs, including our own sends and receives. A distinction is made between forwarding failures, which happen further down the line and have unknown failure reasons and link failures, which occur on our incoming or outgoing link so we have more specific failure reasons for them.

This commit exposes a subscription to HTLC events over rpc. For link falures, which are failures which occurred at our node, lnwire failure messages paired with more detailed failure reasons are used to classify failure reasons.

carlaKC · 2019-12-02T07:25:01Z

Draft PR replaced with #3781.

joostjager reviewed Oct 28, 2019

View reviewed changes

joostjager mentioned this pull request Nov 3, 2019

Allow running on vanilla lnd (no fork) joostjager/whatsat#1

Closed

Roasbeef added enhancement Improvements to existing features / behaviour HTLC htlcswitch incomplete WIP PR, not fully finalized, but light review possible P3 might get fixed, nice to have rpc Related to the RPC interface v0.9.0 labels Nov 9, 2019

carlaKC force-pushed the 3420-htlcForwarding branch from 414fb05 to a709915 Compare November 11, 2019 15:30

Roasbeef requested changes Nov 13, 2019

View reviewed changes

joostjager added this to the 0.9.0 milestone Nov 18, 2019

carlaKC force-pushed the 3420-htlcForwarding branch from a709915 to 2f044fc Compare November 22, 2019 13:05

halseth reviewed Nov 25, 2019

View reviewed changes

carlaKC force-pushed the 3420-htlcForwarding branch from 2f044fc to 999dab1 Compare November 26, 2019 15:25

carlaKC force-pushed the 3420-htlcForwarding branch from 999dab1 to 15302f7 Compare December 1, 2019 14:39

carlaKC added 9 commits December 2, 2019 08:36

multi: Add HTLC Notifier interface to switch and link

dee1552

In this commit, instances of the HTLCNotifier are added to the switch and link stucts.

htlcswitch: Update sendHTLCError to take Forward Error

ffcc52b

htlcswitch: Add link failure to htlcpacket

74ac6d5

In this commit, a link failure Forwarding Error is added to the htlc packet struct. This error is non-nil for failures which occur at our node, and will be used to notify specific information about the link failure.

htlcswitch: Add failure detail to Forward Error

bf517e5

lnrpc: Add SubscribeHTLCEvents endpoint

947727c

This commit exposes a subscription to HTLC events over rpc. For link falures, which are failures which occurred at our node, lnwire failure messages paired with more detailed failure reasons are used to classify failure reasons.

lntest: Add HTLC Notifier event tests to existing itests

e4d9bba

carlaKC force-pushed the 3420-htlcForwarding branch from 15302f7 to e4d9bba Compare December 2, 2019 06:38

carlaKC closed this Dec 2, 2019

carlaKC removed incomplete WIP PR, not fully finalized, but light review possible v0.9.0 labels Dec 2, 2019

[WIP]: HTLC Forwarder Events #3639

[WIP]: HTLC Forwarder Events #3639

Conversation

carlaKC commented Oct 25, 2019

joostjager left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlaKC Oct 29, 2019 • edited Loading

Choose a reason for hiding this comment

Roasbeef commented Oct 29, 2019 • edited Loading

joostjager commented Oct 29, 2019

carlaKC commented Oct 29, 2019

joostjager commented Nov 3, 2019

carlaKC commented Nov 4, 2019

joostjager commented Nov 4, 2019

joostjager commented Nov 4, 2019 • edited Loading

carlaKC commented Nov 4, 2019

carlaKC commented Nov 11, 2019

halseth commented Nov 12, 2019

Roasbeef left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manreo commented Nov 30, 2019

carlaKC commented Dec 1, 2019

carlaKC commented Dec 2, 2019

carlaKC Oct 29, 2019 •

edited

Loading

Roasbeef commented Oct 29, 2019 •

edited

Loading

joostjager commented Nov 4, 2019 •

edited

Loading