Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process does not exit after https request with keepAlive enabled (now default in Node 19) #47228

Open
jakebailey opened this issue Mar 23, 2023 · 18 comments · May be fixed by #52329
Open

Process does not exit after https request with keepAlive enabled (now default in Node 19) #47228

jakebailey opened this issue Mar 23, 2023 · 18 comments · May be fixed by #52329
Labels
confirmed-bug Issues with confirmed bugs. http Issues or PRs related to the http subsystem. https Issues or PRs related to the https subsystem.

Comments

@jakebailey
Copy link

Version

v19.8.1

Platform

Linux Jake-Framework 6.2.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 18 Mar 2023 01:06:36 +0000 x86_64 GNU/Linux

Subsystem

https

What steps will reproduce the bug?

Run a script containing:

const https = require("https");
const url = "https://github.com/dprint/dprint/releases/download/0.35.3/dprint-x86_64-unknown-linux-gnu.zip";
https.get(url, { method: "HEAD" }, console.log);

This will hang.

How often does it reproduce? Is there a required condition?

100% of the time; downgrade to Node 18 and it no longer hangs.

What is the expected behavior? Why is that the expected behavior?

No hang. Run this:

const https = require("https");
const url = "https://github.com/dprint/dprint/releases/download/0.35.3/dprint-x86_64-unknown-linux-gnu.zip";
https.get(url, { method: "HEAD", agent: new https.Agent({ keepAlive: false }) }, console.log);

And it will succeed.

What do you see instead?

Hang; the request succeeds and the response is given, but the process never exits.

Additional information

This is a change I saw in the Node 19 release blog post; keep-alive is enabled by default now, but, it seems to cause a hang.

@jakebailey jakebailey changed the title Process does not exit after https request with keepAlive enabled Process does not exit after https request with keepAlive enabled (now default in Node 19) Mar 23, 2023
@bnoordhuis bnoordhuis added confirmed-bug Issues with confirmed bugs. https Issues or PRs related to the https subsystem. labels Mar 23, 2023
@bnoordhuis
Copy link
Member

Confirmed. https.request(...).end() shows the same behavior.

@bnoordhuis
Copy link
Member

@nodejs/http

@mcollina mcollina added the http Issues or PRs related to the http subsystem. label Mar 23, 2023
@jakebailey
Copy link
Author

jakebailey commented Mar 23, 2023

I should clarify; in my testing where I found this bug (in dprint's npm install script), it does eventually exit at exactly 2 minutes. So there is a timeout of some sort, somewhere, if that provides any guesses as to what's going on.

(Of course, npm install taking an extra 2 minutes isn't so great 😄)

@ronag
Copy link
Member

ronag commented Mar 23, 2023

The sockets in the keepalive pool should probably be "unreffed". https://nodejs.org/api/net.html#socketunref

@bnoordhuis
Copy link
Member

http.Agent#keepSocketAlive() does just that:

socket.unref();

https.Agent inherits that behavior from http.Agent.

There's #47137 (comment) (tl;dr wrong C++ method called for TLS sockets) although that issue is more or less the exact opposite of what's reported here.

@jakebailey
Copy link
Author

Node 20 recently branched off, so this bug seems like it's about to become LTS and cause a lot of hangs.

Is there anything I can provide to help try and get this fixed? Any pointers as to what might be the problem in case I can go and try to fix it myself?

@jagadeeshmeesala
Copy link

let me know if I can pickup this bug?

@bnoordhuis
Copy link
Member

@jagadeeshmeesala go for it.

@jakebailey
Copy link
Author

jakebailey commented Apr 9, 2023

I've been trying to figure this one out. It turns out that unref (the entire keepSocketAlive function, in fact) is actually never called.

Manually calling socket.unref() earlier if the connection is keepAlive does the trick, but surely isn't the right solution.

@jakebailey
Copy link
Author

jakebailey commented Apr 9, 2023

If anyone else is taking a look, I've gotten this to reproduce in a test:

'use strict';

const common = require('../common');
const cp = require('child_process');
const http = require('http');

if (process.argv[2] === 'server') {
  process.on('disconnect', () => process.exit(0));

  const server = http.createServer((req, res) => {
    res.writeHead(200);
    res.end();
  });
  
  server.listen(0, () => {
    process.send(server.address().port);
  });
} else {
  const serverProcess = cp.fork(__filename, ['server'], { stdio: ['ignore', 'ignore', 'ignore', 'ipc'] });
  serverProcess.once('message', common.mustCall((port) => {
    serverProcess.channel.unref();
    serverProcess.unref();
    const agent = new http.Agent({ keepAlive: true });
    http.get({ host: common.localhostIPv4, port, agent }, common.mustCall());
  }));
  
  // If any sockets are left open, we'll hit the below instead of exiting.
  setTimeout(common.mustNotCall(), common.platformTimeout(3000)).unref();
}

The test also confirms that simply unref-ing the socket on creation if keepAlive is set is not the right move, because the callback passed to http.get is never called.

@jakebailey
Copy link
Author

jakebailey commented Apr 10, 2023

After hitting my head against this all weekend, I don't really know the way forward here. Someone has to call unref, but they also need to be aware that there are potentially outstanding requests or future requests that have yet to come in and so the process can't exit. I am not sure how to do this.

I would have thought unref would work, but maybe it's not becuase the eventual http response handling is a part of some callback chain that only the socket has a reference to and therefore whatever exits when there are no refs left doesn't know there's outstanding work? I am not an expert here. Maybe it does work, I'm just calling it in the wrong place.

@mcollina
Copy link
Member

@ShogunPanda wdyt? I think we should have the Agent call unref() once the socket enters the pool.

@ShogunPanda
Copy link
Contributor

@mcollina I'm not sure about it. I guess we shall not do this automatically but rather have the user opt-in this behavior. The reason being I'm expecting this change to have application exit at the wrong time if people are mismanaging events.

@jagadeeshmeesala
Copy link

@jagadeeshmeesala go for it.

started looking into this.

@chirag-droid
Copy link

I should clarify; in my testing where I found this bug (in dprint's npm install script), it does eventually exit at exactly 2 minutes. So there is a timeout of some sort, somewhere, if that provides any guesses as to what's going on.

(Of course, npm install taking an extra 2 minutes isn't so great 😄)

This bug is also happening to many github actions running or switching to node20. Silently slowing them down by approx. 2min

Jamie- added a commit to pexip/notify-slack-action that referenced this issue Feb 13, 2024
The default for keepAlive on https request changed from false to true in Node 19
nodejs/node#47228
@mweberxyz
Copy link

mweberxyz commented Mar 31, 2024

It looks like the behavior is different depending on the remote server (the Keep-Alive hint header, is the culprit maybe?)

Given test.js:

const http = require("http");
const url = process.argv.at(-1);
http.get(url, { method: "GET" }, function (res) {
	console.log({reqHeader: res.req._header, resHeaders: res.headers})
});

On my laptop (mac, m1, node 20):

Requests to neverssl.com cause node to exit after 5 seconds:

% time node test.js http://www.neverssl.com
{
  reqHeader: 'GET / HTTP/1.1\r\nHost: www.neverssl.com\r\nConnection: keep-alive\r\n\r\n',
  resHeaders: {
    date: 'Sat, 30 Mar 2024 23:55:17 GMT',
    server: 'Apache/2.4.58 ()',
    upgrade: 'h2,h2c',
    connection: 'Upgrade, Keep-Alive',
    'last-modified': 'Wed, 29 Jun 2022 00:23:33 GMT',
    etag: '"f79-5e28b29d38e93"',
    'accept-ranges': 'bytes',
    'content-length': '3961',
    vary: 'Accept-Encoding',
    'keep-alive': 'timeout=5, max=100',
    'content-type': 'text/html; charset=UTF-8'
  }
}
node test.js http://www.neverssl.com  0.05s user 0.05s system 1% cpu 5.435 total

Whereas requests to example.com doesn't exit for 180 seconds:

% time node test.js http://www.example.com
{
  reqHeader: 'GET / HTTP/1.1\r\nHost: www.example.com\r\nConnection: keep-alive\r\n\r\n',
  resHeaders: {
    'accept-ranges': 'bytes',
    age: '568521',
    'cache-control': 'max-age=604800',
    'content-type': 'text/html; charset=UTF-8',
    date: 'Sat, 30 Mar 2024 23:55:51 GMT',
    etag: '"3147526947"',
    expires: 'Sat, 06 Apr 2024 23:55:51 GMT',
    'last-modified': 'Thu, 17 Oct 2019 07:18:26 GMT',
    server: 'ECS (cha/8094)',
    vary: 'Accept-Encoding',
    'x-cache': 'HIT',
    'content-length': '1256'
  }
}
node test.js http://www.example.com  0.07s user 0.05s system 0% cpu 3:01.04 total

If I remove the if condition within _http-agent.js > installListeners > onTimeout, example.com works as expected and node exits after 5 seconds, but given @ronag 's TODO comment I am not sure if this is safe, or if additional logic needs to be added to prevent the socket from re-entering the free pool prior to destroying it.

Edit: Trying to resolve this making me go mad. I was just forgetting to resume the request.

@mweberxyz
Copy link

The problem is that the response is not consumed.

const https = require("https");
const url = "https://github.com/dprint/dprint/releases/download/0.35.3/dprint-x86_64-unknown-linux-gnu.zip";
https.get(url, { method: "HEAD" }, (res) => {
	console.log(res)
	res.resume()
});

exits right away.

MDN says:

Warning: A response to a HEAD method should not have a body. If it has one anyway, that body must be ignored

But the node http.ClientRequest docs say:

However, if a response event handler is added, then the data from the response object must be consumed

... without a carve out for HEAD.

Should HEAD requests always call resume on agent incoming response?

@mweberxyz
Copy link

mweberxyz commented Apr 1, 2024

Similarly to HEAD, what about 3xx's (or other requests with no body)? Should the socket return to freeSockets pool and get unref'd automatically?

Pulling out relevant parts of the dprint install script referenced by @jakebailey at the time:

const fs = require("fs");
const https = require("https");

const url = "https://github.com/dprint/dprint/releases/download/0.35.3/dprint-x86_64-unknown-linux-gnu.zip"

const zipFilePath = "./file.zip"

function downloadZipFile(url) {
  return new Promise((resolve, reject) => {
    const options = {};

    https.get(url, options, function(response) {
      if (response.statusCode != null && response.statusCode >= 200 && response.statusCode <= 299) {
        downloadResponse(response).then(resolve).catch(reject);
      } else if (response.headers.location) {
        downloadZipFile(response.headers.location).then(resolve).catch(reject);
      } else {
        reject(new Error("Unknown status code " + response.statusCode + " : " + response.statusMessage));
      }
    }).on("error", function(err) {
      try {
        fs.unlinkSync(zipFilePath);
      } catch (err) {
        // ignore
      }
      reject(err);
    });
  });

  function downloadResponse(response) {
    return new Promise((resolve, reject) => {
      const file = fs.createWriteStream(zipFilePath);
      response.pipe(file);
      file.on("finish", function() {
        file.close((err) => {
          if (err) {
            reject(err);
          } else {
            resolve(undefined);
          }
        });
      });
    });
  }
}

downloadZipFile(url)
  .then((done) => {
    console.log("done")
  });

hangs for about 5 seconds after "done" is printed.

But if the https.get response handler "consumes" the initial redirect response:

...
    https.get(url, options, function(response) {
      if(response.headers.location) {
        response.resume()
      }
      if (response.statusCode != null && response.statusCode >= 200 && response.statusCode <= 299) {
      ...

then node exits immediately after printing "done".

Edit: 3xx may return a response body. So either automatic resume would need to be based on Content-Length -- or it should be up to the caller to remember.

mweberxyz added a commit to mweberxyz/node that referenced this issue Apr 2, 2024
In accordance with https://www.rfc-editor.org/rfc/rfc9112#name-message-body-length: HEAD, 1xx, 204, and 304 responses cannot contain a message body.

If a socket will be kept-alive, resume the socket during parsing so that it may be returned to the free pool.

Fixes nodejs#47228
@mweberxyz mweberxyz linked a pull request Apr 2, 2024 that will close this issue
mweberxyz added a commit to mweberxyz/node that referenced this issue Apr 5, 2024
In accordance with https://www.rfc-editor.org/rfc/rfc9112#name-message-body-length: HEAD, 1xx, 204, and 304 responses cannot contain a message body.

If a socket will be kept-alive, resume the socket during parsing so that it may be returned to the free pool.

Fixes nodejs#47228
mweberxyz added a commit to mweberxyz/node that referenced this issue Apr 5, 2024
In accordance with https://www.rfc-editor.org/rfc/rfc9112#name-message-body-length: HEAD, 1xx, 204, and 304 responses cannot contain a message body.

If a socket will be kept-alive, resume the socket during parsing so that it may be returned to the free pool.

Fixes nodejs#47228
mweberxyz added a commit to mweberxyz/node that referenced this issue Apr 9, 2024
According to RFC9112 section 6.3.1: HEAD requests, and responses
with status 204 and 304 cannot contain a message body,

If a socket will be kept-alive, resume the socket during parsing
so that it may be returned to the free pool.

Fixes nodejs#47228
mweberxyz added a commit to mweberxyz/node that referenced this issue Apr 9, 2024
According to RFC9112 section 6.3.1: HEAD requests, and responses
with status 204 and 304 cannot contain a message body,

If a socket will be kept-alive, resume the socket during parsing
so that it may be returned to the free pool.

Fixes nodejs#47228
fiji-flo added a commit to fiji-flo/vscode-ripgrep that referenced this issue Dec 20, 2024
Running the postinstall script takes around 30s.
It looks like nodejs/node#47228 is the root cause.

Adding a `response.resume()` is a working workaround.
fiji-flo added a commit to fiji-flo/vscode-ripgrep that referenced this issue Dec 20, 2024
Running the postinstall script takes around 30s.
It looks like nodejs/node#47228 is the root cause.

Adding a `response.resume()` is a working workaround.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. http Issues or PRs related to the http subsystem. https Issues or PRs related to the https subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants