-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature]: allow configuring standard grpc server keepalive settings #7727
Comments
Good point, I think this makes a lot of sense. |
Wondering why you haven't just implemented all 6 of the options found in https://grpc.github.io/grpc/cpp/md_doc_keepalive.html (and the lower case versions of the variables can be found here: https://github.com/grpc/grpc/blob/master/include/grpc/impl/grpc_types.h) using the same variable names? It looks like you've taken the options you've implemented from https://github.com/grpc/grpc-go/blob/master/keepalive/keepalive.go ? It's a bit unclear to me what the mapping is between the parameters defined in there and in https://grpc.github.io/grpc/cpp/md_doc_keepalive.html . Do you know of any table that exists? It's likely that what is in Also, the variable name I like the default values that you've picked for |
The Golang library (AFAIK) only supports configuring the first 3 options in https://grpc.github.io/grpc/cpp/md_doc_keepalive.html (through https://github.com/grpc/grpc-go/blob/master/keepalive/keepalive.go). I omitted the other values in https://github.com/grpc/grpc-go/blob/master/keepalive/keepalive.go because I don't see how it would ever by useful for
Thanks, fixed.
I also think it should be fine. I honestly don't understand how the initial 7200 seconds are at all useful in any context... |
So is the golang version a reimplementation of the C version or does it link to some C libraries? The documentation page for the C version suggest it is used by the golang version.
Agreed on those connection age parameters. However, one thing people might want though is Digging deeper, here is what I think the mapping is:
Agreed, surprised I'm the first person to report this related to LND. |
After studying the options further and doing more testing, it appears as though the default value of If on the client end, I set |
It's a re-implementation, because we can't link into C libraries in I do think that So if I interpret your responses correctly, you're happy with the PR once I add |
I haven't tested #7730, but as it is now, it "looks" to meet my expectations and I think you've answered all of my questions. You can close this issue once the PR is merged. |
Are there any examples of such use cases? I understand the motivation to address the requirement stated in the issue, but failing to see specific use cases like this, where lnd is providing streaming updates to remote clients on unreliable wireless connections |
Distributed Charge is one example of an application that might be on cellular. Other examples are retail fuel pumps, oil and gas pipelines, water meters, candy and soda machines, toll roads, parking meters. All of these things could be a cellular or wired connection, depending on their location, but in places like Africa, it's much more likely that it will be on cellular. Whenever an invoice is issued, you are going to want to subscribe to updates on invoices so that you can detect payment using the least amount of bandwidth (without needing to actively poll specific invoices). The machine out in the field will need to connect to the LND node and monitor the invoices in order to know when to deliver a physical product or service to the customer. Recently I filed this issue because I was having a problem with reliability even on a fiber optic connection, so I brought up the cellular as an even lower quality connection that would suffer much more severely from this issue. |
Is your feature request related to a problem? Please describe.
SubscribeInvoices
hangs randomly.SubscribeInvoices
(and I believe other streaming grpc calls) is perfectly quiet on the network when no new invoice activity is happening. This is great for connection efficiency, however, in the case where the TCP connection is quietly interrupted somewhere in between the client and the server, the client sits indefinitely (days on end) and does not realize anything went wrong. At this point no new information can be obtained from the server. This situation can be commonplace in machine to machine applications where we have the lnd server connected to from clients in remote parts of the world on unstable and potentially wireless internet connections.Operating system level TCP keepalives exist, but they are not used by grpc (and I think they are way too slow of a default (and minimum?) timeout of over 2 hours).
grpc has a keepalive option that occurs over http, but the default is also every two hours. Additionally, the server is by default the only side configured to use this keepalive. By default, the client doesn't care if it never sees any keepalive messages, so it will still keep a broken connection in an apparently open but idle state for an indefinite amount of time.
Describe the solution you'd like
I'd like LND to allow the client to send it more frequent keepalive messages over the grpc, and allow the server to send more frequent keepalives as well. Currently, the lnd grpc server appears to use the defaults for grpc (https://grpc.github.io/grpc/cpp/md_doc_keepalive.html).
If all the standard options available in the grpc (https://grpc.github.io/grpc/cpp/md_doc_keepalive.html) can be exposed as a setting in LND, I think that will be a good solution because we can then allow LND to accept more frequent keepalives instead of just terminating the connection when the client tries to send more frequent keepalives.
For the client side, we can already do whatever we want, but as suggested above, if the server doesn't want to see that many keepalives, it doesn't work.
Describe alternatives you've considered
Tell the client to do regular and frequent keepalive pings and then when the lnd grpc server gets upset because they are coming in too quickly, lnd kills the connection and forces the client to reconnect. This works but it is totally not ideal because it causes the entire connection to be rebuilt very often rather than just when the connection actually gets broken and needs to be re-connected. This workaround uses a lot more bandwidth to solve a problem that happens in situations where bandwidth is usually already limited.
The text was updated successfully, but these errors were encountered: