Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky test-http-server-keepalive-req-gc.js #29344

Closed
Trott opened this issue Aug 27, 2019 · 13 comments
Closed

flaky test-http-server-keepalive-req-gc.js #29344

Trott opened this issue Aug 27, 2019 · 13 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. http Issues or PRs related to the http subsystem.

Comments

@Trott
Copy link
Member

Trott commented Aug 27, 2019

  • Version: 13.0.0-pre (master branch)
  • Platform: SmartOS
  • Subsystem: http

https://ci.nodejs.org/job/node-test-commit-smartos/nodes=smartos18-64/27655/console

04:56:09 not ok 965 parallel/test-http-server-keepalive-req-gc
04:56:09   ---
04:56:09   duration_ms: 0.345
04:56:09   severity: fail
04:56:09   exitcode: 1
04:56:09   stack: |-
04:56:09     Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
04:56:09         at Object.mustCall (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos18-64/test/common/index.js:337:10)
04:56:09         at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos18-64/test/parallel/test-http-server-keepalive-req-gc.js:18:36)
04:56:09         at Module._compile (internal/modules/cjs/loader.js:936:30)
04:56:09         at Object.Module._extensions..js (internal/modules/cjs/loader.js:947:10)
04:56:09         at Module.load (internal/modules/cjs/loader.js:789:32)
04:56:09         at Function.Module._load (internal/modules/cjs/loader.js:702:12)
04:56:09         at Function.Module.runMain (internal/modules/cjs/loader.js:999:10)
04:56:09         at internal/main/run_main_module.js:17:11
04:56:09   ...

Others:

https://ci.nodejs.org/job/node-test-commit-smartos/nodes=smartos18-64/27653/console

https://ci.nodejs.org/job/node-test-commit-smartos/nodes=smartos18-64/27651/console

https://ci.nodejs.org/job/node-test-commit-smartos/nodes=smartos18-64/27649/console

https://ci.nodejs.org/job/node-test-commit-smartos/nodes=smartos18-64/27648/console

https://ci.nodejs.org/job/node-test-commit-smartos/nodes=smartos18-64/27647/console

@Trott
Copy link
Member Author

Trott commented Aug 27, 2019

@nodejs/platform-smartos @nodejs/build

@Trott Trott added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Aug 27, 2019
@addaleax addaleax added smartos Issues and PRs related to the SmartOS platform. windows Issues and PRs related to the Windows platform. labels Aug 27, 2019
@cjihrig
Copy link
Contributor

cjihrig commented Aug 27, 2019

FWIW, this failure is definitely not specific to SmartOS. I just observed the same failures on macOS while developing a test for #29288.

@Trott Trott changed the title flaky test-http-server-keepalive-req-gc.js on SmartOS flaky test-http-server-keepalive-req-gc.js Aug 27, 2019
@Trott Trott removed smartos Issues and PRs related to the SmartOS platform. windows Issues and PRs related to the Windows platform. labels Aug 27, 2019
@Trott
Copy link
Member Author

Trott commented Aug 27, 2019

FWIW, this failure is definitely not specific to SmartOS. I just observed the same failures on macOS while developing a test for #29288.

I think @mcollina mentioned seeing it on macOS too. It's definitely occurring more often on SmartOS in CI, but that could very easily be more about (for example) resources that the hosts are provisioned with than OS....

@addaleax addaleax added the http Issues or PRs related to the http subsystem. label Aug 27, 2019
@addaleax
Copy link
Member

For those who can reproduce locally (I am unable on Linux): Could you maybe add this and share the resulting snapshot?

process.on('exit', (code) => code !== 0 && require('v8').writeHeapSnapshot());

@Trott
Copy link
Member Author

Trott commented Aug 27, 2019

For those who can reproduce locally (I am unable on Linux): Could you maybe add this and share the resulting snapshot?

process.on('exit', (code) => code !== 0 && require('v8').writeHeapSnapshot());

I'm able to reproduce reliably on macOS with this:

tools/test.py --repeat 192 parallel/test-http-server-keepalive-req-gc

For other people trying to get a heap snapshot: Adding the above exit handler didn't work, I think because common.mustCall() is failing in an exit handler itself. So I added this exit handler to mustCall() in the common module:

diff --git a/test/common/index.js b/test/common/index.js
index 98a2687222..b861ceb291 100644
--- a/test/common/index.js
+++ b/test/common/index.js
@@ -330,7 +330,10 @@ function runCallChecks(exitCode) {
     console.log(context.stack.split('\n').slice(2).join('\n'));
   });
 
-  if (failed.length) process.exit(1);
+  if (failed.length) {
+    require('v8').writeHeapSnapshot();
+    process.exit(1);
+  }
 }
 
 function mustCall(fn, exact) {

@addaleax
Copy link
Member

@Trott Could you share the heap snapshot? And thanks for fixing that :)

@Trott
Copy link
Member Author

Trott commented Aug 27, 2019

@Trott Could you share the heap snapshot? And thanks for fixing that :)

Yeah, trying to figure out where to upload them.

@Trott
Copy link
Member Author

Trott commented Aug 27, 2019

@Trott Could you share the heap snapshot? And thanks for fixing that :)

Yeah, trying to figure out where to upload them.

@addaleax I've emailed you a link to one of the snapshot files. Let me know if you need more.

@addaleax
Copy link
Member

@Trott Okay, it looks like the req object is not part of the heap snapshot, meaning it has likely been gc’ed, but we didn’t get the notification – Does adding a dummy setTimeout(() => {}, 100); after the global.gc() call help?

@addaleax
Copy link
Member

Alternatively, what would be a good platform to debug this on in CI?

@Trott
Copy link
Member Author

Trott commented Aug 27, 2019

@Trott Okay, it looks like the req object is not part of the heap snapshot, meaning it has likely been gc’ed, but we didn’t get the notification – Does adding a dummy setTimeout(() => {}, 100); after the global.gc() call help?

No. Nor does changing 100 to 1000 or even 10000 (although obviously that slows the test down a lot--but it still fails frequently).

@Trott
Copy link
Member Author

Trott commented Aug 27, 2019

I think I have a solution. PR coming momentarily.

@Trott
Copy link
Member Author

Trott commented Aug 27, 2019

Proposed fix in #29347. Works for me locally.

Trott added a commit to Trott/io.js that referenced this issue Aug 27, 2019
Use `server` to keep the event loop open until the `ongc` listener runs.

Fixes: nodejs#29344
@danbev danbev closed this as completed in 3ae6f5e Aug 28, 2019
BridgeAR pushed a commit that referenced this issue Sep 3, 2019
Use `server` to keep the event loop open until the `ongc` listener runs.

PR-URL: #29347
Fixes: #29344
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
BridgeAR pushed a commit that referenced this issue Sep 4, 2019
Use `server` to keep the event loop open until the `ongc` listener runs.

PR-URL: #29347
Fixes: #29344
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. http Issues or PRs related to the http subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants