Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: HTLC Forwarder Events #3639

Closed
wants to merge 9 commits into from

Conversation

carlaKC
Copy link
Collaborator

@carlaKC carlaKC commented Oct 25, 2019

First shot at a HTLC Forwarder events (as specified in #3420). PR is not ready for review at present, but I'd like to get some early feedback on this design.

Specifically:

  1. Bubbling errors up from canSendHTLC with linkFailure
  2. Use of an interface to mock HTLCNotifier in switch tests
  3. Placement of FailureReason and ForwardFailType - both feel like they belong in link.go but that creates a circular dep; perhaps a new package just with the errors?

To dos:

  1. Figure out circuits & set payment hash for settle/fail events
  2. Add timestamps to events (and elapsed time to settled/forward fail)
  3. Add subscribe RPC
  4. Tests can be refined, I just wanted to check functionality

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main challenge is to decide how consistent the stream of data coming from the notifier is going to be and carefully document that in the code. The channel fitness subsystem needs to be able to deal with any inconsistencies that may occur because of for example restarts (replays), delays and out of order delivery (not sure if that last thing happens).

If there is a design possible that is completely safe, I'd prefer that. Is there current ForwardingLog completely safe atm?


// ForwardingEvent represents a HTLC that was forwarded through our node.
type ForwardingEvent struct {
// IncomingChannel is the channel that the HTLC arrived at our node on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd attach a timestamp as close to the origin as possible.

// ForwardingEvent provides the details of the HTLC forward on our node
// which was settled.
ForwardingEvent
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes forward htlcs, but - on the same channels - may also send or receive payments for which they are the endpoint. For profitability calculation that takes into account balance, those events are important too. Will they be communicated over this same notifier channel, or do they go a different way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the insights side of things, I see our node's HTLCs as a separate concern to profitability. They capture "personal utility" gained from a channel, which could be another aspect to factor into scoring; routing nodes would prioritize profitability, non-routing would be more concerned with utility.

But from a htlcNotifier, I think it makes sense to include them to keep the notifier generic. I'd probably go with a our_htlc bool in the settle/fail event types (since these HTLCs can't be forwarded/have link failures, but could also go with another event type entirely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mainly thinking that we also need "our htlcs" to be able to reconstruct the channel balance at the time of an event.


// NotifySettleEvent is called when a HTLC we previously successfully
// forwarded is settled.
NotifySettleEvent(f ForwardingEvent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need some way to correlate the forwarding event and the outcome (fail/settle). Payment hash is not enough, because it may be reused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also surface the HTLC IDs if needed.

Copy link
Collaborator Author

@carlaKC carlaKC Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at using circuit inKey/outKey to identify HTLCs?
Forward events can have the outKey of the outgoing HTLC and they can be matched to settle/fail events.

I need to have a close look at failure scenarios here. I'm worried about some situation where we forward HTLC_1 with htlcID=3, then experience some kind of failure and fail HTLC_2 with htlc_id=3 then replay HTLC_1 with htlcID=4. Althuogh I wouldn't think this can happen, because it would cause chaos for the switch.

@Roasbeef
Copy link
Member

Roasbeef commented Oct 29, 2019

Is there current ForwardingLog completely safe atm?

What does safe mean here? It's possible that a panic or crash fault causes us to not write 100% of the data in the in-memory buffer. However, on a clean shutdown, we'll attempt to flush all items on to disk.

Also note that this isn't just for the notifier, as there's considerable value to be had from exposing this information over RPC for routing node operators (frequently requested). As far as consistency, I wouldn't say we need very strong guarantees on any of this stuff, as it's all just advisory information that shouldn't be used in isolation to make channel termination or double-down decisions.

@joostjager
Copy link
Contributor

What does safe mean here? It's possible that a panic or crash fault causes us to not write 100% of the data in the in-memory buffer. However, on a clean shutdown, we'll attempt to flush all items on to disk.

I'd then call that not safe, because we can miss events. I was hoping that the replay on startup would re-offer the events to the forwarding log, in case it crashed. But with the flushing, the switch has probably already fully completed the htlc lifecycle before we write. Does this mean that we have an accounting gap there?

@carlaKC
Copy link
Collaborator Author

carlaKC commented Oct 29, 2019

I think the main challenge is to decide how consistent the stream of data coming from the notifier is going to be and carefully document that in the code.

If we want restart safety and assurances that we'll receive every HTLC that's almost a separate project making a persisted subscription of some sort. I see the fear of doing a lot of work then having to redo it, but I think that if we choose to go with a safer option we can just rip the subscribe backend out, which won't be too much work since we're not spending time coding that logic up now?

The channel fitness subsystem needs to be able to deal with any inconsistencies that may occur because of for example restarts (replays), delays and out of order delivery (not sure if that last thing happens).

If the HTLCs are timestamped and uuid'ed with some combination of inKey/outKey (soundness of that approach pending further investigation) and the subscription is appropriately documented, would you say that's sufficient?

@joostjager
Copy link
Contributor

Can you make it so that the tlv payload is exposed for htlc (fail) events over rpc? It is needed for Whatsat 😬

@carlaKC
Copy link
Collaborator Author

carlaKC commented Nov 4, 2019

Can you make it so that the tlv payload is exposed for htlc (fail) events over rpc? It is needed for Whatsat 😬

For a small stake in this hot new business, anything is possible 😉

Assuming the use case of custom TLV types (unknown to lnd, that is), would TLV just be exposed as a []byte which end users can parse if they understand them?

@joostjager
Copy link
Contributor

Yes, just map<uint64, bytes> in the proto. We are also planning to do that for the settled invoices: #3670

@joostjager
Copy link
Contributor

joostjager commented Nov 4, 2019

Also, it is only needed for htlcs that fail at the endpoint. It is not needed for intermediate nodes. To be clear, this is totally a nice to have. A use case to keep in the back of your mind only.

@carlaKC
Copy link
Collaborator Author

carlaKC commented Nov 4, 2019

Also, it is only needed for htlcs that fail at the endpoint. It is not needed for intermediate nodes. To be clear, this is totally a nice to have. A use case to keep in the back of your mind only.

Of course, but I'll keep it in mind nonetheless 😄
It'll be an easy addition to a solution which flags HTLCs that terminate at our node.

@Roasbeef Roasbeef added enhancement Improvements to existing features / behaviour HTLC htlcswitch incomplete WIP PR, not fully finalized, but light review possible P3 might get fixed, nice to have rpc Related to the RPC interface v0.9.0 labels Nov 9, 2019
@carlaKC carlaKC force-pushed the 3420-htlcForwarding branch from 414fb05 to a709915 Compare November 11, 2019 15:30
@carlaKC
Copy link
Collaborator Author

carlaKC commented Nov 11, 2019

Not ready for review yet, just backing up what I have so far.
Going to have to look at moving successful forward events onto switch level because outKey is set on link level.

@halseth
Copy link
Contributor

halseth commented Nov 12, 2019

High level look at this PR I think it is going in the right direction 👍

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized I had this saved as a draft...really thought I submitted it...


// FailReasonInsufficientBandwidth is used when our available balance is
// less than the HTLC amount.
FailReasonInsufficientBandwidth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps these should be defined in the switch, or use the existing error reasons in lnwire? Otherwise, there's a bit of duplication amongst forwarding error types across the codebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the error reasons in lnwire are not suffiecient, we could define a custom error type that can act as a oneof between lnwire and custom codes

// signal to the source of the HTLC, the policy consistency issue.
CheckHtlcForward(payHash [32]byte, incomingAmt lnwire.MilliSatoshi,
amtToForward lnwire.MilliSatoshi,
incomingTimeout, outgoingTimeout uint32,
heightNow uint32) lnwire.FailureMessage
heightNow uint32) *linkFailure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A public method on a public interface shouldn't have a non-exported return type.

// linkFailure represents failure to forward a HTLC along the link. It contains
// an appropriate lnwire message and the specific reason for failure.
type linkFailure struct {
message lnwire.FailureMessage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can just using the existing ForwardingError struct here? It adheres to both lnwire.FailureMessage and the error interface. A new field for a failure type can be added, referencing the existing errors in the lnwire package. For any new errors, we can just add them there as well, but take care to not leak out any non-official errors to the wide network which are only used for purposes of internal accountability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, many of the new errors are also just duplicates of existing errors that already live in lnwire.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ForwardingError and using existing lnwire errors would be good, since there are a lot of duplicates. What about defining the extra errors in the htlcnotifier (rather than lnwire) package which implement FailureMessage and then mapping custom errors back to their broadcast-friendly error? Eg, insufficient balance -> temp channel failure.


// NotifySettleEvent is called when a HTLC we previously successfully
// forwarded is settled.
NotifySettleEvent(f ForwardingEvent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also surface the HTLC IDs if needed.

// Notifier represents a place through which HTLC events can be piped through.
//
// Concrete implementations of Notifier should be non-blocking.
type Notifier interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of redundancy across the package name + interface name.


// NotifyForwardingFailEvent is called when a HTLC we previously
// successfully forwarded fails down the line.
NotifyForwardingFailEvent(failType ForwardFailType, f ForwardingEvent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use the ForwardingFailEvent struct? Similar comment applies to the link failure message above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 updated in the more recent WIP draft, cleaned these up into individual event types with a common identifier struct (ID, to be renamed to something better) rather than trying to tack failures onto ForwardingEvent when needed.

@@ -0,0 +1,21 @@
package htlcnotifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this should be a sub-package of the switch.

@joostjager joostjager added this to the 0.9.0 milestone Nov 18, 2019
@carlaKC carlaKC force-pushed the 3420-htlcForwarding branch from a709915 to 2f044fc Compare November 22, 2019 13:05
@@ -485,14 +485,18 @@ func (s *Switch) forward(packet *htlcPacket) error {
return err
}

var failure lnwire.FailureMessage
var failure *ForwardingError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what bool does the commit msg refer to?

const (
// FailureDetailNone is used when there are no further failure details
// beyond lnwire's failure message.
FailureDetailNone FailureDetail = iota
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution 👍

@carlaKC carlaKC force-pushed the 3420-htlcForwarding branch from 2f044fc to 999dab1 Compare November 26, 2019 15:25
@manreo
Copy link
Contributor

manreo commented Nov 30, 2019

@carlaKC Is this still in Draft or should it be moved to Needs Review?

@carlaKC
Copy link
Collaborator Author

carlaKC commented Dec 1, 2019

@carlaKC Is this still in Draft or should it be moved to Needs Review?

Still a WIP, working on some itests for this and then it will be good to go. Should be up early this week @mrmanpew

@carlaKC carlaKC force-pushed the 3420-htlcForwarding branch from 999dab1 to 15302f7 Compare December 1, 2019 14:39
This commit adds a HTLCNotifier package which HTLC events will be
piped through to provide clients with subscriptions to HTLC level
events.

The events included are:
- ForwardingEvents: When a HTLC is successfully forwarded through our
node.
- LinkFailEvents: When a HTLC fails on our outgoing or incoming link.
- ForwardingFailEvents: When a HTLC we forwarded fails down the line
and is removed.
- SettleEvents: When a HTLC from a forward or receive on our node
 is settled.
In this commit, instances of the HTLCNotifier are added
to the switch and link stucts.
This commit updates the CheckHtlcTransit and CheckHtlcForward
functions on the ChannelLink interface to return ForwardErrors.
This will allow returning more detailed information  about HTLC
forward/transit failures than the current set of lnwire
FailureMessages provide. It also updates handlePacketForward
to store full forwarding errors when attempting to forward a
htlc.
In this commit, a link failure Forwarding Error
is added to the htlc packet struct. This error
is non-nil for failures which occur at our node,
and will be used to notify specific information
about the link failure.
This commit updates the switch and link code to send
HTLC notifier events when we forward, settle and fail
HTLCs, including our own sends and receives. A distinction
is made between forwarding failures, which happen further
down the line and have unknown failure reasons and link
failures, which occur on our incoming or outgoing link
so we have more specific failure reasons for them.
This commit exposes a subscription to HTLC events
over rpc. For link falures, which are failures which
occurred at our node, lnwire failure messages paired
with more detailed failure reasons are used to classify
failure reasons.
@carlaKC carlaKC force-pushed the 3420-htlcForwarding branch from 15302f7 to e4d9bba Compare December 2, 2019 06:38
@carlaKC carlaKC closed this Dec 2, 2019
@carlaKC
Copy link
Collaborator Author

carlaKC commented Dec 2, 2019

Draft PR replaced with #3781.

@carlaKC carlaKC removed incomplete WIP PR, not fully finalized, but light review possible v0.9.0 labels Dec 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvements to existing features / behaviour HTLC htlcswitch P3 might get fixed, nice to have rpc Related to the RPC interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants