Higher memory usage in pre-0.12 version using HTTP2 compared to h2 branch #1491

pimeys · 2018-04-18T12:32:14Z

We've been using the h2 branch since the early stages, a couple of days ago upgrading to the master pre-0.12. Something in the master branch uses more RAM than the old version, possibly the new pool. Our use case is many concurrent open connections, keeping them alive and using http2.

Here's an image showing the regression:

The deployment was 16.4.2018 around 11am local time. Also the noticeable growth happens when we hit the keep-alive timeout and get lots of requests through a new connection. So the culprit might be how we dispose the dead connections in the pool.

seanmonstar · 2018-04-18T14:10:25Z

Is this http1 or 2 connections?

pimeys · 2018-04-18T14:31:30Z

Http2. I checked what was updated after the deployment:

hyper from h2 branch to 0.12.0-pre.0 (HEAD), tokio-core 0.1.16 replaced with tokio 0.1.5. So there's quite minimal changes in the deployment, that's why the main culprit I can think of is hyper's new Pool implementation.

We'll monitor how it looks like after couple of days, will it settle into some memory usage or keep growing.

seanmonstar · 2018-04-18T16:46:30Z

If at all possible, logs from hyper::client::pool,hyper::proto::h2 would probably also help.

pimeys · 2018-04-18T18:34:10Z

Ok, so I'm having this running and logging with tee some hours with production traffic on a test machine.

pimeys · 2018-04-18T19:13:29Z

First logs, if you need more trace points, please let me know. I could run this longer, but right now the production instances freed some memory, about 10% drop, so at least they're not going to OOM. Just something seems to use more RAM than before...

With long keep-alive timeout, reusing connections:

TRACE 2018-04-18T18:59:05Z: hyper::client::pool: checkout waiting for idle connection: ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T18:59:05Z: hyper::client::connect: Http::connect(https://api.push.apple.com/3/device/###########)
DEBUG 2018-04-18T18:59:05Z: hyper::client::dns: resolving host="api.push.apple.com", port=443
DEBUG 2018-04-18T18:59:05Z: hyper::client::connect: connecting to 17.188.130.151:443
TRACE 2018-04-18T18:59:06Z: hyper::client::pool: Pool::put ("https://api.push.apple.com", Http2)
DEBUG 2018-04-18T18:59:06Z: hyper::client::pool: pooling idle connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T18:59:06Z: hyper::proto::h2: send body chunk: 404B
TRACE 2018-04-18T18:59:06Z: hyper::proto::h2: send body eos
TRACE 2018-04-18T18:59:06Z: hyper::client::pool: Pool::put; existing idle HTTP/2 connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T18:59:42Z: hyper::client::pool: take? ("https://api.push.apple.com", Http2): expiration = Some(Duration { secs: 600, nanos: 0 })
DEBUG 2018-04-18T18:59:42Z: hyper::client::pool: reuse idle connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T18:59:42Z: hyper::proto::h2: send body chunk: 660B
TRACE 2018-04-18T18:59:42Z: hyper::proto::h2: send body eos
TRACE 2018-04-18T18:59:42Z: hyper::client::pool: Pool::put; existing idle HTTP/2 connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:00:13Z: hyper::client::pool: take? ("https://api.push.apple.com", Http2): expiration = Some(Duration { secs: 600, nanos: 0 })
DEBUG 2018-04-18T19:00:13Z: hyper::client::pool: reuse idle connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:00:13Z: hyper::proto::h2: send body chunk: 480B
TRACE 2018-04-18T19:00:13Z: hyper::proto::h2: send body eos
TRACE 2018-04-18T19:00:13Z: hyper::client::pool: Pool::put; existing idle HTTP/2 connection for ("https://api.push.apple.com", Http2)

With short keep-alive timeout, creating new connections:

TRACE 2018-04-18T19:06:44Z: hyper::client::pool: checkout waiting for idle connection: ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:06:44Z: hyper::client::connect: Http::connect(https://api.push.apple.com/3/device/############)
DEBUG 2018-04-18T19:06:44Z: hyper::client::dns: resolving host="api.push.apple.com", port=443
DEBUG 2018-04-18T19:06:44Z: hyper::client::connect: connecting to 17.188.138.24:443
TRACE 2018-04-18T19:06:45Z: hyper::client::pool: Pool::put ("https://api.push.apple.com", Http2)
DEBUG 2018-04-18T19:06:45Z: hyper::client::pool: pooling idle connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:06:45Z: hyper::proto::h2: send body chunk: 440B
TRACE 2018-04-18T19:06:45Z: hyper::proto::h2: send body eos
TRACE 2018-04-18T19:06:45Z: hyper::client::pool: Pool::put; existing idle HTTP/2 connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:07:04Z: hyper::proto::h2::client: client tx dropped
TRACE 2018-04-18T19:07:24Z: hyper::client::pool: checkout waiting for idle connection: ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:07:24Z: hyper::client::connect: Http::connect(https://api.push.apple.com/3/device/##############)
DEBUG 2018-04-18T19:07:24Z: hyper::client::dns: resolving host="api.push.apple.com", port=443
DEBUG 2018-04-18T19:07:24Z: hyper::client::connect: connecting to 17.188.138.24:443
TRACE 2018-04-18T19:07:25Z: hyper::client::pool: Pool::put ("https://api.push.apple.com", Http2)
DEBUG 2018-04-18T19:07:25Z: hyper::client::pool: pooling idle connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:07:25Z: hyper::proto::h2: send body chunk: 404B
TRACE 2018-04-18T19:07:25Z: hyper::proto::h2: send body eos
TRACE 2018-04-18T19:07:25Z: hyper::client::pool: Pool::put; existing idle HTTP/2 connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:07:42Z: hyper::client::pool: take? ("https://api.push.apple.com", Http2): expiration = Some(Duration { secs: 10, nanos: 0 })
TRACE 2018-04-18T19:07:42Z: hyper::client::pool: remove unacceptable pooled connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:07:42Z: hyper::client::pool: checkout waiting for idle connection: ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:07:42Z: hyper::proto::h2::client: client tx dropped
TRACE 2018-04-18T19:07:42Z: hyper::client::connect: Http::connect(https://api.push.apple.com/3/device/#######################)
DEBUG 2018-04-18T19:07:42Z: hyper::client::dns: resolving host="api.push.apple.com", port=443
DEBUG 2018-04-18T19:07:42Z: hyper::client::connect: connecting to 17.188.138.24:443
TRACE 2018-04-18T19:07:42Z: hyper::client::pool: Pool::put ("https://api.push.apple.com", Http2)
DEBUG 2018-04-18T19:07:42Z: hyper::client::pool: pooling idle connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:07:42Z: hyper::proto::h2: send body chunk: 440B
TRACE 2018-04-18T19:07:42Z: hyper::proto::h2: send body eos
TRACE 2018-04-18T19:07:43Z: hyper::client::pool: Pool::put; existing idle HTTP/2 connection for ("https://api.push.apple.com", Http2)
TRACE 2018-04-18T19:07:54Z: hyper::proto::h2::client: client tx dropped
TRACE 2018-04-18T19:07:56Z: hyper::client::pool: pool closed, canceling idle interval

seanmonstar · 2018-04-18T23:39:30Z

The only thing I can think of is that something is holding on a reference to the h2 streams type, and perhaps being stuck in a future that doesn't poll as finished. And then eventually, Apple closes the connection and that triggers the internal h2 streams type to clean itself up.

I've pushed some minor clean up to master, along with some clearer logs, if you want to take a look.

pimeys · 2018-04-20T07:20:57Z

So some notes on the systems we're running with Hyper:

The one with h2 connections, one per customer. We use long keep-alive timeouts, and basically the system grabs more memory without freeing when we reuse an idle connection that needs to be reopened.
We also run a system with Hyper 0.11, using only h1 connections. This time we're just holding one pool and all customers reuse the same connections. We keep them alive, but don't see same memory issues as with h2. My guess though is, that the smaller amount of connections and the constant traffic through them being generic actually keeps them up all the time, unlike the h2 ones that can be idle the whole night and in the morning get more traffic, reopening them and causing the issue.

I can try Hyper 0.12 in our h1 system, but I really doubt I see bigger memory usage.

pimeys · 2018-04-20T08:33:08Z

Oh, one more observation:

When we get a jump in RAM usage, it happens when a dead connection gets a big spike of requests. The memory leak is about 60-100M, meaning it's much more than a couple of dead connections (should be about 6 per customer). I'd say something is not cleaned when we race the new connection in the pool, but I'm not sure.

seanmonstar · 2018-04-20T22:28:19Z

it happens when a dead connection gets a big spike of requests

What determines it is dead? The remote has closed the connection, but the pool hadn't yet cleared out the handle? Or it's reached its expiration and was waiting for the idle interval to clean it up?

pimeys · 2018-04-21T09:17:04Z

What determines it is dead? The remote has closed the connection, but the pool hadn't yet cleared out the handle? Or it's reached its expiration and was waiting for the idle interval to clean it up?

No, I mean I have a keep-alive timeout of 10 minutes, and we have some customers that don't use their connection for the night and then just blast a massive amount of requests, so the pools open new connections and we have like hundreds of thousands of concurrent requests polling for new connections.

But like I said in IRC, I don't think this is an issue now really to block 0.12. We don't OOM, the memory is freed eventually and the difference to h2 0.11 branch is just doubled RAM usage, which is maybe not nice but we're talking about some hundreds of megabytes in our case.

pimeys · 2018-04-22T10:39:42Z

Monitoring the latest 0.12 for the weekend, the memory usage is bigger but doesn't really grow above certain limit. Also we notice drops every now and then, evening it out. The graph starts from a deployment and ends to the current timestamp.

pimeys · 2018-05-02T06:44:34Z

The RAM usage was reduced by several commits before, but this being the best deal and now the usage actually even gets lower when running the system longer. I consider this fixed now.

seanmonstar · 2018-05-02T19:25:42Z

For reference, it was noticed that Weak::new() allocates space for the thing that the related Arc<T> would hold, even though Weak::new() can never upgrade to a strong reference by design. The fix in hyper was to create a single Weak::new() per pool, and clone that instead of creating Weak::new()s for every single waiter.

I've open a PR to libstd to fix this easy to miss issue.

seanmonstar added C-bug Category: bug. Something is wrong. This is bad! A-client Area: client. labels Apr 18, 2018

pimeys closed this as completed May 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Higher memory usage in pre-0.12 version using HTTP2 compared to h2 branch #1491

Higher memory usage in pre-0.12 version using HTTP2 compared to h2 branch #1491

pimeys commented Apr 18, 2018

seanmonstar commented Apr 18, 2018

pimeys commented Apr 18, 2018

seanmonstar commented Apr 18, 2018

pimeys commented Apr 18, 2018

pimeys commented Apr 18, 2018

seanmonstar commented Apr 18, 2018

pimeys commented Apr 20, 2018

pimeys commented Apr 20, 2018

seanmonstar commented Apr 20, 2018

pimeys commented Apr 21, 2018

pimeys commented Apr 22, 2018

pimeys commented May 2, 2018

seanmonstar commented May 2, 2018

Higher memory usage in pre-0.12 version using HTTP2 compared to h2 branch #1491

Higher memory usage in pre-0.12 version using HTTP2 compared to h2 branch #1491

Comments

pimeys commented Apr 18, 2018

seanmonstar commented Apr 18, 2018

pimeys commented Apr 18, 2018

seanmonstar commented Apr 18, 2018

pimeys commented Apr 18, 2018

pimeys commented Apr 18, 2018

seanmonstar commented Apr 18, 2018

pimeys commented Apr 20, 2018

pimeys commented Apr 20, 2018

seanmonstar commented Apr 20, 2018

pimeys commented Apr 21, 2018

pimeys commented Apr 22, 2018

pimeys commented May 2, 2018

seanmonstar commented May 2, 2018