-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky test-gc-http-client-onerror #23089
Comments
test-gc-http-client-onerror is resource-intensive. It times out a lot on CI. Move to sequential. Fixes: nodejs#23089
90891b4 introduced a race condition when accessing `slow_io_work_running` – it is being increased and later decreased as part of the worker thread loop, but was accessed with different mutexes during these operations. This fixes the race condition by making sure both accesses are protected through the global `mutex` of `threadpool.c`. This fixes a number of flaky Node.js tests. Refs: libuv#1845 Refs: nodejs/reliability#18 Refs: nodejs/node#23089 Refs: nodejs/node#23067 Refs: nodejs/node#23066 Refs: nodejs/node#23219
90891b4232e91dbd7a2e2077e4d23d16a374b41d introduced a race condition when accessing `slow_io_work_running` – it is being increased and later decreased as part of the worker thread loop, but was accessed with different mutexes during these operations. This fixes the race condition by making sure both accesses are protected through the global `mutex` of `threadpool.c`. This fixes a number of flaky Node.js tests. Refs: libuv/libuv#1845 Refs: nodejs/reliability#18 Refs: nodejs#23089 Refs: nodejs#23067 Refs: nodejs#23066 Refs: nodejs#23219
90891b4 introduced a race condition when accessing `slow_io_work_running` – it is being increased and later decreased as part of the worker thread loop, but was accessed with different mutexes during these operations. This fixes the race condition by making sure both accesses are protected through the global `mutex` of `threadpool.c`. This fixes a number of flaky Node.js tests. Refs: #1845 Refs: nodejs/reliability#18 Refs: nodejs/node#23089 Refs: nodejs/node#23067 Refs: nodejs/node#23066 Refs: nodejs/node#23219 PR-URL: #2021 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Fixes: nodejs#23043 Fixes: nodejs#21773 Fixes: nodejs#16601 Fixes: nodejs#22999 Fixes: nodejs#23219 Fixes: nodejs#23066 Fixes: nodejs#23067 Fixes: nodejs#23089
PR-URL: #23336 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Fixes: #23043 Fixes: #21773 Fixes: #16601 Fixes: #22999 Fixes: #23219 Fixes: #23066 Fixes: #23067 Fixes: #23089
PR-URL: #23336 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Fixes: #23043 Fixes: #21773 Fixes: #16601 Fixes: #22999 Fixes: #23219 Fixes: #23066 Fixes: #23067 Fixes: #23089
PR-URL: nodejs#23336 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Fixes: nodejs#23043 Fixes: nodejs#21773 Fixes: nodejs#16601 Fixes: nodejs#22999 Fixes: nodejs#23219 Fixes: nodejs#23066 Fixes: nodejs#23067 Fixes: nodejs#23089
Made it's return on FreeBSD today: https://ci.nodejs.org/job/node-test-commit-freebsd/21777/nodes=freebsd11-x64/console 18:41:54 not ok 645 parallel/test-gc-http-client-onerror
18:41:54 ---
18:41:54 duration_ms: 0.534
18:41:54 severity: fail
18:41:54 exitcode: 1
18:41:54 stack: |-
18:41:54 We should do 500 requests
18:41:54 Done: 0/500
18:41:54 Collected: 0/170
18:41:54 /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/parallel/test-gc-http-client-onerror.js:51
18:41:54 throw err;
18:41:54 ^
18:41:54
18:41:54 Error: connect ECONNRESET 127.0.0.1:56948 - Local (127.0.0.1:57146)
18:41:54 at internalConnect (net.js:888:16)
18:41:54 at defaultTriggerAsyncIdScope (internal/async_hooks.js:294:19)
18:41:54 at GetAddrInfoReqWrap.emitLookup [as callback] (net.js:1035:9)
18:41:54 at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:63:10)
18:41:54 ... |
Backport-PR-URL: #24103 PR-URL: #23336 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Refael Ackermann <refack@gmail.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Fixes: #23043 Fixes: #21773 Fixes: #16601 Fixes: #22999 Fixes: #23219 Fixes: #23066 Fixes: #23067 Fixes: #23089
https://ci.nodejs.org/job/node-test-commit-freebsd/21934/nodes=freebsd11-x64/console 12:57:07 ok 687 parallel/test-fs-write-string-coerce
12:57:07 ---
12:57:07 duration_ms: 0.159
12:57:07 ...
12:57:08 not ok 688 parallel/test-gc-http-client-onerror
12:57:08 ---
12:57:08 duration_ms: 0.536
12:57:08 severity: fail
12:57:08 exitcode: 1
12:57:08 stack: |-
12:57:08 We should do 500 requests
12:57:08 Done: 0/500
12:57:08 Collected: 0/170
12:57:08 Done: 0/500
12:57:08 Collected: 0/340
12:57:08 /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/parallel/test-gc-http-client-onerror.js:51
12:57:08 throw err;
12:57:08 ^
12:57:08
12:57:08 Error: connect ECONNRESET 127.0.0.1:63673 - Local (127.0.0.1:64090)
12:57:08 at internalConnect (net.js:876:16)
12:57:08 at defaultTriggerAsyncIdScope (internal/async_hooks.js:294:19)
12:57:08 at GetAddrInfoReqWrap.emitLookup [as callback] (net.js:1023:9)
12:57:08 at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:63:10)
12:57:08 ... |
https://ci.nodejs.org/job/node-test-commit-freebsd/22462/nodes=freebsd11-x64/console test-digitalocean-freebsd11-x64-2 00:06:54 not ok 695 parallel/test-gc-http-client-onerror
00:06:54 ---
00:06:54 duration_ms: 0.541
00:06:54 severity: fail
00:06:54 exitcode: 1
00:06:54 stack: |-
00:06:54 We should do 500 requests
00:06:54 Done: 0/500
00:06:54 Collected: 0/170
00:06:54 Done: 0/500
00:06:54 Collected: 0/360
00:06:54 /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/parallel/test-gc-http-client-onerror.js:51
00:06:54 throw err;
00:06:54 ^
00:06:54
00:06:54 Error: connect ECONNRESET 127.0.0.1:20548 - Local (127.0.0.1:20918)
00:06:54 at internalConnect (net.js:859:16)
00:06:54 at defaultTriggerAsyncIdScope (internal/async_hooks.js:294:19)
00:06:54 at GetAddrInfoReqWrap.emitLookup [as callback] (net.js:1006:9)
00:06:54 at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:63:10)
00:06:54 ... |
Stress test with Stress test with (EDIT: Unable to replicate on CI with those settings. May need to increase number of runs and or pass an explicitly large value to |
https://ci.nodejs.org/job/node-test-commit-freebsd/22656/nodes=freebsd11-x64/ test-digitalocean-freebsd11-x64-2 00:06:29 not ok 695 parallel/test-gc-http-client-onerror
00:06:29 ---
00:06:29 duration_ms: 0.539
00:06:29 severity: fail
00:06:29 exitcode: 1
00:06:29 stack: |-
00:06:29 We should do 500 requests
00:06:29 Done: 0/500
00:06:29 Collected: 0/160
00:06:29 /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/parallel/test-gc-http-client-onerror.js:51
00:06:29 throw err;
00:06:29 ^
00:06:29
00:06:29 Error: connect ECONNRESET 127.0.0.1:14414 - Local (127.0.0.1:14625)
00:06:29 at internalConnect (net.js:859:16)
00:06:29 at defaultTriggerAsyncIdScope (internal/async_hooks.js:294:19)
00:06:29 at GetAddrInfoReqWrap.emitLookup [as callback] (net.js:1006:9)
00:06:29 at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:62:10)
00:06:29 ... |
Seems like ECONNRESET might be a local firewall doing some throttling considering the test makes 500 Maybe there's a way to rewrite the test that doesn't use the 500 magic number? |
https://ci.nodejs.org/job/node-test-commit-freebsd/23126/nodes=freebsd11-x64/console test-digitalocean-freebsd11-x64-2 15:54:10 not ok 737 parallel/test-gc-http-client-onerror
15:54:10 ---
15:54:10 duration_ms: 0.541
15:54:10 severity: fail
15:54:10 exitcode: 1
15:54:10 stack: |-
15:54:10 /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/parallel/test-gc-http-client-onerror.js:50
15:54:10 throw err;
15:54:10 ^
15:54:10
15:54:10 Error: connect ECONNRESET 127.0.0.1:37684 - Local (127.0.0.1:37943)
15:54:10 at internalConnect (net.js:855:16)
15:54:10 at defaultTriggerAsyncIdScope (internal/async_hooks.js:294:19)
15:54:10 at GetAddrInfoReqWrap.emitLookup [as callback] (net.js:1002:9)
15:54:10 at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:62:10)
15:54:11 ... |
Remove magic numbers (500, 10, 100) from the test. Instead, detect when GC has started and stop sending requests at that point. On my laptop, this results in 16 or 20 requests per run instead of 500. Fixes: nodejs#23089 PR-URL: nodejs#24943 Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Still there
Message: Error Message
fail (1)
Stacktrace
/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/parallel/test-gc-http-client-onerror.js:50
throw err;
^
Error: connect ECONNRESET 127.0.0.1:43891 - Local (127.0.0.1:44214)
at internalConnect (net.js:833:16)
at defaultTriggerAsyncIdScope (internal/async_hooks.js:294:19)
at GetAddrInfoReqWrap.emitLookup [as callback] (net.js:980:9)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:63:10) |
https://ci.nodejs.org/job/node-test-commit-freebsd/24583/nodes=freebsd11-x64/console test-digitalocean-freebsd11-x64-2
|
https://ci.nodejs.org/job/node-test-commit-freebsd/24946/nodes=freebsd11-x64/console test-digitalocean-freebsd11-x64-2 21:20:39 not ok 743 parallel/test-gc-http-client-onerror
21:20:39 ---
21:20:39 duration_ms: 0.535
21:20:39 severity: fail
21:20:39 exitcode: 1
21:20:39 stack: |-
21:20:39 /usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/parallel/test-gc-http-client-onerror.js:50
21:20:39 throw err;
21:20:39 ^
21:20:39
21:20:39 Error: connect ECONNRESET 127.0.0.1:50188 - Local (127.0.0.1:50411)
21:20:39 at internalConnect (net.js:840:16)
21:20:39 at defaultTriggerAsyncIdScope (internal/async_hooks.js:297:19)
21:20:39 at GetAddrInfoReqWrap.emitLookup [as callback] (net.js:987:9)
21:20:39 at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:63:10)
21:20:39 ... |
Worker: https://ci.nodejs.org/computer/test-digitalocean-freebsd11-x64-2/
|
PR-URL: nodejs#27225 Refs: nodejs#26910 Refs: nodejs#27219 Refs: nodejs#26938 Refs: nodejs#23089 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Yongsheng Zhang <zyszys98@gmail.com>
Are console logs from the FreeBSD VMs/instances available, or is it possible to obtain output from e.g. |
@nodejs/build ^^^^^ |
I don't have a direct answer to that, but @emaste, I assume you have access to your own FreeBSD machines, you may be able to generate the failure locally with something like |
You can also request temporary SSH access to the relevant machine to investigate. You can initiate such a request by opening an issue in https://github.com/nodejs/build. Slightly more information is available at https://github.com/nodejs/build/blob/master/doc/process/special_access_to_build_resources.md#temporary-access. |
anybody from @nodejs/platform-freebsd tried to repro this as suggested in #23089 (comment) ? sucessfully or not? |
I have not been able to reproduce this on my FreeBSD development systems yet. I am hoping to have more cycles today/tomorrow to find a way to reproduce this. |
This hasn't failed a single time in the last 100 runs on FreeBSD. And I haven't noticed it in a long time. I'd be inclined to remove it from flaky status and close. We can re-open if it reappears, but I'd rather not use FreeBSD cycles on this if it's not a significant problem when perhaps we could be using those FreeBSD cycles to make our FreeBSD setup in CI completely awesome. |
Proposed removal from status file and closing in #28429 |
The test has not failed on FreeBSD in the last 100 runs and appears to perhaps not be an issue anymore. Closes: #23089 test-gc-http-client-onerror: PASS,FLAKY PR-URL: #28429 Fixes: #23089 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Sam Roberts <vieuxtech@gmail.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
https://ci.nodejs.org/job/node-test-commit-linux-containered/7325/nodes=ubuntu1604_sharedlibs_withoutintl_x64/console
The text was updated successfully, but these errors were encountered: