Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose server support for http2 keep-alive #474

Closed
codermobster opened this issue Oct 14, 2020 · 8 comments · Fixed by #486
Closed

Expose server support for http2 keep-alive #474

codermobster opened this issue Oct 14, 2020 · 8 comments · Fixed by #486
Milestone

Comments

@codermobster
Copy link

Feature Request

Crates

tonic

Motivation

From what I understood from #307, HTTP2 keep alive support was exposed only to the client (unless I missed some way of doing so for the server).

However hyper also supports setting the keep alive interval and timeout for the server.

My motivation for this enhancement is well described here, which I quote:

KeepAlive provides a valuable benefit: periodically checking the health of the connection by sending an HTTP/2 PING to determine whether the connection is still alive. However it has another equally useful benefit: signaling liveness to proxies. Consider a client sending data to a server through a proxy. The client and server may be happy to keep a connection alive indefinitely, sending data as necessary. Proxies, on the other hand, are often quite resource constrained and may kill idle connections to save resources. Google Cloud Platform (GCP) load balancers disconnect apparently-idle connections after 10 minutes, and Amazon Web Services Elastic Load Balancers (AWS ELBs) disconnect them after 60 seconds

Proposal

I suggest that we only support setting http2_keep_alive_interval, with something along the lines of:

--- a/tonic/Cargo.toml
+++ b/tonic/Cargo.toml
@@ -64,7 +64,7 @@ prost-derive = { version = "0.6", optional = true }
 async-trait = { version = "0.1.13", optional = true }
 
 # transport
-hyper = { version = "0.13.4", features = ["stream"], optional = true }
+hyper = { version = "0.13.4", features = ["stream", "runtime"], optional = true }
 tokio = { version = "0.2.13", features = ["tcp"], optional = true }
 tower = { version = "0.3", optional = true}
 tower-make = { version = "0.3", features = ["connect"] }
--- a/tonic/src/transport/server/mod.rs
+++ b/tonic/src/transport/server/mod.rs
@@ -69,6 +69,7 @@ pub struct Server {
     init_stream_window_size: Option<u32>,
     init_connection_window_size: Option<u32>,
     max_concurrent_streams: Option<u32>,
+    http2_keepalive_interval: Option<Duration>,
     tcp_keepalive: Option<Duration>,
     tcp_nodelay: bool,
 }
@@ -221,6 +222,22 @@ impl Server {
         }
     }
 
+    /// Set whether HTTP2 Ping frames are enabled on accepted connections.
+    ///
+    /// If `None` is specified, HTTP2 keepalive is disabled, otherwise the duration
+    /// specified will be the time interval between HTTP2 Ping frames.
+    /// The timeout for receiving an acknowledgement of the keepalive ping is the default of
+    /// [hyper](https://docs.rs/hyper/0.13.4/hyper/server/struct.Builder.html#method.http2_keep_alive_timeout).
+    ///
+    /// Default is no HTTP2 keepalive (`None`)
+    ///
+    pub fn http2_keepalive_interval(self, http2_keepalive_interval: Option<Duration>) -> Self {
+        Server {
+            http2_keepalive_interval,
+            ..self
+        }
+    }
+
     /// Set whether TCP keepalive messages are enabled on accepted connections.
     ///
     /// If `None` is specified, keepalive is disabled, otherwise the duration
@@ -320,6 +337,7 @@ impl Server {
         let init_connection_window_size = self.init_connection_window_size;
         let init_stream_window_size = self.init_stream_window_size;
         let max_concurrent_streams = self.max_concurrent_streams;
+        let http2_keepalive_interval = self.http2_keepalive_interval;
         let timeout = self.timeout;
 
         let tcp = incoming::tcp_incoming(incoming, self);
@@ -336,7 +354,8 @@ impl Server {
             .http2_only(true)
             .http2_initial_connection_window_size(init_connection_window_size)
             .http2_initial_stream_window_size(init_stream_window_size)
-            .http2_max_concurrent_streams(max_concurrent_streams);
+            .http2_max_concurrent_streams(max_concurrent_streams)
+            .http2_keep_alive_interval(http2_keepalive_interval);
 
         if let Some(signal) = signal {
             server

This proposal leaves http2_keep_alive_timeout to be whichever hyper defaults to (currently 20 seconds).

Note that the code above has not been compiled or tested, I'm happy to submit a PR if you guys think this change is worth it.

The main drawback I see is that the runtime feature of hyper must be enabled for hyper::server::Builder::http2_keep_alive_interval to be used (link).

Alternatives

None that I can think of, but I welcome any suggestions :D

Note

Thanks in advance for your help and thank you for the great work on tonic, hats off!

@LucioFranco
Copy link
Member

I think this proposal seems reasonable 👍

@codermobster
Copy link
Author

Thanks! I'll prepare a PR with the changes as soon as I can.

Do you think it is worth implement this as a cargo feature?
Something like this:

--- a/tonic/Cargo.toml
+++ b/tonic/Cargo.toml
@@ -36,6 +36,7 @@ transport = [
 tls = ["transport", "tokio-rustls"]
 tls-roots = ["tls", "rustls-native-certs"]
 prost = ["prost1", "prost-derive"]
+server-http2-keepalive = ["transport", "hyper/runtime"]
 
 # [[bench]]
 # name = "bench_main"

@alce
Copy link
Collaborator

alce commented Oct 21, 2020

@codermobster I don't think it's necessary to feature-gate this. It is also not necessary to add hyper's runtime feature flag.

Why do you suggest to only support http2_keep_alive_interval? Wouldn't it be useful to also expose http2_keep_alive_timeout as part of this?

Anyway, I was playing with these settings, both in the client and server but I don't see any change in behavior, do you?. Maybe my expectations are off but when sniffing the traffic I detect no changes no matter which settings I use.

@codermobster
Copy link
Author

@codermobster I don't think it's necessary to feature-gate this.

Ok, deal.

It is also not necessary to add hyper's runtime feature flag.

Indeed, it's enabled by default. 👍

Why do you suggest to only support http2_keep_alive_interval? Wouldn't it be useful to also expose http2_keep_alive_timeout as part of this?

Sure, I guess I assumed that defaulting to hyper's setting of 20 seconds would be good enough.

Furthermore, since http2_keep_alive_timeout does not take an Option as an argument, it kind of forces us to have a default setting in tonic, of course here we can use the same 20 seconds that hyper does if you agree.

Anyway, I was playing with these settings, both in the client and server but I don't see any change in behavior, do you?. Maybe my expectations are off but when sniffing the traffic I detect no changes no matter which settings I use.

You got me! I totally took hyper for granted here and failed to find any open issues in github related to it.

Tried to tcpdump my way out of this one, here are the results of some tests with:

  • hyper version 0.13.8
  • modified examples/src/health_server.rs and tonic/transport/server/mod.rs for the keep alive settings
  • modified examples/src/health_server.rs to comment out tokio::spawn(twiddle_service_status(health_reporter.clone()));
  • used cargo run --bin heath-server
  • making a request with grpc_cli -proto_path ./tonic-health/proto -protofiles health.proto call --noremotedb localhost:50051 Watch 'service: "helloworld.Greeter"'
  • running tcpdump -i lo port 50051

5 seconds keepalive timeout and 20 seconds keep alive interval

# some output has been elided
# the two following packets are still output after initiating the connection
17:56:10.541558 IP6 localhost.50051 > localhost.csccredir: Flags [P.], seq 61:124, ack 395, win 512, options [nop,nop,TS val 3578121404 ecr 3578121402], length 63
17:56:10.541571 IP6 localhost.csccredir > localhost.50051: Flags [.], ack 124, win 512, options [nop,nop,TS val 3578121404 ecr 3578121404], length 0

17:56:30.543874 IP6 localhost.50051 > localhost.csccredir: Flags [P.], seq 124:141, ack 395, win 512, options [nop,nop,TS val 3578141406 ecr 3578121404], length 17
17:56:30.543901 IP6 localhost.csccredir > localhost.50051: Flags [.], ack 141, win 512, options [nop,nop,TS val 3578141406 ecr 3578141406], length 0

# ... eliding some more

The second batch of packets comes exactly 20 seconds after the first.
No more packets received afterwards.

5 seconds keepalive timeout and 30 seconds keep alive interval

# some output has been elided
# the two following packets are still output after initiating the connection
18:02:32.863479 IP6 localhost.50051 > localhost.43986: Flags [P.], seq 61:124, ack 395, win 512, options [nop,nop,TS val 3578503726 ecr 3578503724], length 63
18:02:32.863493 IP6 localhost.43986 > localhost.50051: Flags [.], ack 124, win 512, options [nop,nop,TS val 3578503726 ecr 3578503726], length 0

18:03:02.864626 IP6 localhost.50051 > localhost.43986: Flags [P.], seq 124:141, ack 395, win 512, options [nop,nop,TS val 3578533727 ecr 3578503726], length 17
18:03:02.864735 IP6 localhost.43986 > localhost.50051: Flags [.], ack 141, win 512, options [nop,nop,TS val 3578533727 ecr 3578533727], length 0

# ... eliding some more

The second batch of packets comes exactly 30 seconds after the first.
No more packets received afterwards.

20 seconds keepalive timeout and 10 seconds keep alive interval

# some output has been elided
# the two following packets are still output after initiating the connection
18:10:44.883423 IP6 localhost.50051 > localhost.48110: Flags [P.], seq 61:124, ack 395, win 512, options [nop,nop,TS val 3578995746 ecr 3578995744], length 63
18:10:44.883436 IP6 localhost.48110 > localhost.50051: Flags [.], ack 124, win 512, options [nop,nop,TS val 3578995746 ecr 3578995746], length 0

18:10:54.885248 IP6 localhost.50051 > localhost.48110: Flags [P.], seq 124:141, ack 395, win 512, options [nop,nop,TS val 3579005747 ecr 3578995746], length 17
18:10:54.885270 IP6 localhost.48110 > localhost.50051: Flags [.], ack 141, win 512, options [nop,nop,TS val 3579005747 ecr 3579005747], length 0

# ... eliding some more

The second batch of packets comes exactly 10 seconds after the first.
No more packets received afterwards.

Concluding

The above is surely not 100% conclusive, but it seems:

  • keep_alive_timeout makes no difference
  • after keep_alive_interval the first ping comes in
  • no more pings recorded after the first one.

Thanks @alce ! I'll prepare an issue in hyper with this info and link it here (credits to you of course!)

@alce
Copy link
Collaborator

alce commented Oct 22, 2020

Yeah, that's what I saw too. I would expect: PING <-> PING interval PING <-> PING interval PING -> ...timeout. I don't know what's going on there but I guess we'll need to figure that out first.

Keeping hyper's default timeout seems fine to me.

@pdcalado
Copy link
Contributor

I've opened a PR in hyper to address this issue: hyperium/hyper#2315
Cheers!

@pdcalado
Copy link
Contributor

Please correct me if I'm wrong, but I'm guessing the adequate approach now is to wait for a new version of hyper to be released (containing the fix) and have tonic depend on it, before going forward with a pull request here, right?
Thanks

@alce
Copy link
Collaborator

alce commented Oct 29, 2020

@pdcalado We don't need to wait for hyper, we could go ahead with this change as soon as you are ready. However, there won't be a new tonic release until there is a new hyper release and these settings won't do anything until then, of course.

pdcalado added a commit to pdcalado/tonic that referenced this issue Oct 30, 2020
Adds two methods to `transport::server::Server` for setting HTTP2 server
keepalive interval and timeout as exposed by `hyper::server::Builder`.

Fixes hyperium#474
@LucioFranco LucioFranco added this to the 0.4 milestone Nov 27, 2020
LucioFranco pushed a commit that referenced this issue Jan 5, 2021
…486)

Adds two methods to `transport::server::Server` for setting HTTP2 server
keepalive interval and timeout as exposed by `hyper::server::Builder`.

Fixes #474
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment