-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a segfault by ensuring TLS Sockets are closed if the underlying stream closes #49327
Conversation
This fixes a potential segfault, among various other likely-related issues, which all occur because TLSSockets were not informed if their underlying stream was closed in many cases. This also significantly modifies an existing TLS test. With this change in place, that test no longer works, as it tries to mess with internals to trigger a race, and those internals are now cleaned up earlier. This test has been simplified to a more general TLS shutdown test.
e068064
to
06fc079
Compare
|
Currently the test fails only in a couple of environments (Mac & ASan) reliably, even though it passes reliably in all others. The failures are due to emitted errors on the client H2 stream (not the focus of this test). I'm hoping here that that's due to a timing different in those environments, and scheduling the shutdown like this instead will guarantee the order so that doesn't happen any more.
Thanks @aduh95! It seems this fails because the new segfault test throws a connection reset error on the client H2 stream. That's not a huge problem: it's the server that segfaults without this change, so the test is really interested in server behaviour here, not the client, and a handleable connection error event is far better than a segfault anyway. That said, it's weird that it's different for Mac & ASAN compared to the other cases, I wonder if it's just a timing issue? I've pushed a change to the tests, to see whether it's a timing problem, which moves the shutdown logic into timers started after the response is definitely fully received. Hopefully that will ensure the client session handles this more cleanly. The test still segfaults on Node 20.5.1 and passes reliably with this change (on my Linux machine). If this test still doesn't pass in the same cases, I think just ignoring client errors for this case is fine - it's an unusual shutdown regardless, and really we just want to check the server doesn't crash, so I don't want to get distracted too much by independent client issues (even if they might be worth investigating elsewhere later). |
Seems it was indeed a timing issue - all tests are now passing everywhere 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I can see two CI failures here:
Both seem very unlikely to be related to this change, so I'm going to assume this is all good and done, and those will presumably pass later on (maybe once that node's disk space is cleaned up?). Thanks for the review @mcollina! I think this is ready to merge but let me know if there's anything else I should do to make that happen. |
Commit Queue failed- Loading data for nodejs/node/pull/49327 ✔ Done loading data for nodejs/node/pull/49327 ----------------------------------- PR info ------------------------------------ Title Fix a segfault by ensuring TLS Sockets are closed if the underlying stream closes (#49327) ⚠ Could not retrieve the email or name of the PR author's from user's GitHub profile! Branch pimterry:fix-tls-shutdown -> nodejs:main Labels tls, author ready Commits 2 - tls: ensure TLS Sockets are closed if the underlying wrap closes - Tweak new H2 shutdown test towards reliable ordering & client shutdown Committers 1 - Tim Perry PR-URL: https://github.com/nodejs/node/pull/49327 Reviewed-By: Matteo Collina ------------------------------ Generated metadata ------------------------------ PR-URL: https://github.com/nodejs/node/pull/49327 Reviewed-By: Matteo Collina -------------------------------------------------------------------------------- ℹ This PR was created on Fri, 25 Aug 2023 16:11:30 GMT ✔ Approvals: 1 ✔ - Matteo Collina (@mcollina) (TSC): https://github.com/nodejs/node/pull/49327#pullrequestreview-1600235070 ✘ This PR needs to wait 41 more hours to land (or 0 hours if there is one more approval) ✔ Last GitHub CI successful ℹ Last Full PR CI on 2023-08-30T19:32:01Z: https://ci.nodejs.org/job/node-test-pull-request/53650/ - Querying data for job/node-test-pull-request/53650/ ✔ Last Jenkins CI successful -------------------------------------------------------------------------------- ✔ Aborted `git node land` session in /home/runner/work/node/node/.ncuhttps://github.com/nodejs/node/actions/runs/6030922076 |
Commit Queue failed- Loading data for nodejs/node/pull/49327 ✔ Done loading data for nodejs/node/pull/49327 ----------------------------------- PR info ------------------------------------ Title Fix a segfault by ensuring TLS Sockets are closed if the underlying stream closes (#49327) ⚠ Could not retrieve the email or name of the PR author's from user's GitHub profile! Branch pimterry:fix-tls-shutdown -> nodejs:main Labels tls, author ready, commit-queue-squash Commits 2 - tls: ensure TLS Sockets are closed if the underlying wrap closes - Tweak new H2 shutdown test towards reliable ordering & client shutdown Committers 1 - Tim Perry PR-URL: https://github.com/nodejs/node/pull/49327 Reviewed-By: Matteo Collina ------------------------------ Generated metadata ------------------------------ PR-URL: https://github.com/nodejs/node/pull/49327 Reviewed-By: Matteo Collina -------------------------------------------------------------------------------- ℹ This PR was created on Fri, 25 Aug 2023 16:11:30 GMT ✔ Approvals: 1 ✔ - Matteo Collina (@mcollina) (TSC): https://github.com/nodejs/node/pull/49327#pullrequestreview-1600235070 ✘ This PR needs to wait 27 more hours to land (or 0 hours if there is one more approval) ✔ Last GitHub CI successful ℹ Last Full PR CI on 2023-08-30T23:03:30Z: https://ci.nodejs.org/job/node-test-pull-request/53650/ - Querying data for job/node-test-pull-request/53650/ ✔ Last Jenkins CI successful -------------------------------------------------------------------------------- ✔ Aborted `git node land` session in /home/runner/work/node/node/.ncuhttps://github.com/nodejs/node/actions/runs/6037572193 |
Landed in 048e0be |
@pimterry the
I can reproduce the issue on macOS by running
|
The following patch seems to fix the issue diff --git a/test/parallel/test-tls-socket-close.js b/test/parallel/test-tls-socket-close.js
index 667b291309..70af760d53 100644
--- a/test/parallel/test-tls-socket-close.js
+++ b/test/parallel/test-tls-socket-close.js
@@ -44,9 +44,9 @@ function connectClient(server) {
setImmediate(() => {
assert.strictEqual(netSocket.destroyed, true);
- assert.strictEqual(clientTlsSocket.destroyed, true);
setImmediate(() => {
+ assert.strictEqual(clientTlsSocket.destroyed, true);
assert.strictEqual(serverTlsSocket.destroyed, true);
tlsServer.close();
but I'm not sure if it invalidates some assumptions. |
Ah, sorry about that. I think that patch absolutely makes sense - there's no specific reason that it should be exactly one setImmediate to get to the destroyed state on the remote socket, it just needs to settle there eventually. I assume you're happy to PR that patch, but let me know if you'd prefer me to do it. |
Move the check for the destroyed state of the remote socket to the inner `setImmediate()`. Refs: nodejs#49327 (comment)
Done, thanks for the fast reply. |
This fixes a potential segfault, among various other likely-related issues, which all occur because TLSSockets were not informed if their underlying stream was closed in many cases. This also significantly modifies an existing TLS test. With this change in place, that test no longer works, as it tries to mess with internals to trigger a race, and those internals are now cleaned up earlier. This test has been simplified to a more general TLS shutdown test. PR-URL: #49327 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Move the check for the destroyed state of the remote socket to the inner `setImmediate()`. Refs: #49327 (comment) PR-URL: #49575 Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Move the check for the destroyed state of the remote socket to the inner `setImmediate()`. Refs: #49327 (comment) PR-URL: #49575 Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
This fixes a potential segfault, among various other likely-related issues, which all occur because TLSSockets were not informed if their underlying stream was closed in many cases. This also significantly modifies an existing TLS test. With this change in place, that test no longer works, as it tries to mess with internals to trigger a race, and those internals are now cleaned up earlier. This test has been simplified to a more general TLS shutdown test. PR-URL: nodejs#49327 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Move the check for the destroyed state of the remote socket to the inner `setImmediate()`. Refs: nodejs#49327 (comment) PR-URL: nodejs#49575 Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
* chore: upgrade to Node.js v20 * src: allow embedders to override NODE_MODULE_VERSION nodejs/node#49279 * src: fix missing trailing , nodejs/node#46909 * src,tools: initialize cppgc nodejs/node#45704 * tools: allow passing absolute path of config.gypi in js2c nodejs/node#49162 * tools: port js2c.py to C++ nodejs/node#46997 * doc,lib: disambiguate the old term, NativeModule nodejs/node#45673 * chore: fixup Node.js BSSL tests * nodejs/node#49492 * nodejs/node#44498 * deps: upgrade to libuv 1.45.0 nodejs/node#48078 * deps: update V8 to 10.7 nodejs/node#44741 * test: use gcUntil() in test-v8-serialize-leak nodejs/node#49168 * module: make CJS load from ESM loader nodejs/node#47999 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * chore: address changes to CJS/ESM loading * module: make CJS load from ESM loader (nodejs/node#47999) * lib: improve esm resolve performance (nodejs/node#46652) * bootstrap: optimize modules loaded in the built-in snapshot nodejs/node#45849 * test: mark test-runner-output as flaky nodejs/node#49854 * lib: lazy-load deps in modules/run_main.js nodejs/node#45849 * url: use private properties for brand check nodejs/node#46904 * test: refactor `test-node-output-errors` nodejs/node#48992 * assert: deprecate callTracker nodejs/node#47740 * src: cast v8::Object::GetInternalField() return value to v8::Value nodejs/node#48943 * test: adapt test-v8-stats for V8 update nodejs/node#45230 * tls: ensure TLS Sockets are closed if the underlying wrap closes nodejs/node#49327 * test: deflake test-tls-socket-close nodejs/node#49575 * net: fix crash due to simultaneous close/shutdown on JS Stream Sockets nodejs/node#49400 * net: use asserts in JS Socket Stream to catch races in future nodejs/node#49400 * lib: fix BroadcastChannel initialization location nodejs/node#46864 * src: create BaseObject with node::Realm nodejs/node#44348 * src: implement DataQueue and non-memory resident Blob nodejs/node#45258 * sea: add support for V8 bytecode-only caching nodejs/node#48191 * chore: fixup patch indices * gyp: put filenames in variables nodejs/node#46965 * build: modify js2c.py into GN executable * fix: (WIP) handle string replacement of fs -> original-fs * [v20.x] backport vm-related memory fixes nodejs/node#49874 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * src: avoid copying string in fs_permission nodejs/node#47746 * look upon my works ye mighty and dispair * chore: patch cleanup * [api] Remove AllCan Read/Write https://chromium-review.googlesource.com/c/v8/v8/+/5006387 * fix: missing include for NODE_EXTERN * chore: fixup patch indices * fix: fail properly when js2c fails in Node.js * build: fix js2c root_gen_dir * fix: lib/fs.js -> lib/original-fs.js * build: fix original-fs file xforms * fixup! module: make CJS load from ESM loader * build: get rid of CppHeap for now * build: add patch to prevent extra fs lookup on esm load * build: greatly simplify js2c modifications Moves our original-fs modifications back into a super simple python script action, wires up the output of that action into our call to js2c * chore: update to handle moved internal/modules/helpers file * test: update @types/node test * feat: enable preventing cppgc heap creation * feat: optionally prevent calling V8::EnableWebAssemblyTrapHandler * fix: no cppgc initialization in the renderer * gyp: put filenames in variables nodejs/node#46965 * test: disable single executable tests * fix: nan tests failing on node headers missing file * tls,http2: send fatal alert on ALPN mismatch nodejs/node#44031 * test: disable snapshot tests * nodejs/node#47887 * nodejs/node#49684 * nodejs/node#44193 * build: use deps/v8 for v8/tools Node.js hard depends on these in their builtins * test: fix edge snapshot stack traces nodejs/node#49659 * build: remove js2c //base dep * build: use electron_js2c_toolchain to build node_js2c * fix: don't create SafeSet outside packageResolve Fixes failure in parallel/test-require-delete-array-iterator: === release test-require-delete-array-iterator === Path: parallel/test-require-delete-array-iterator node:internal/per_context/primordials:426 constructor(i) { super(i); } // eslint-disable-line no-useless-constructor ^ TypeError: object is not iterable (cannot read property Symbol(Symbol.iterator)) at new Set (<anonymous>) at new SafeSet (node:internal/per_context/primordials:426:22) * fix: failing crashReporter tests on Linux These were failing because our change from node::InitializeNodeWithArgs to node::InitializeOncePerProcess meant that we now inadvertently called PlatformInit, which reset signal handling. This meant that our intentional crash function ElectronBindings::Crash no longer worked and the renderer process no longer crashed when process.crash() was called. We don't want to use Node.js' default signal handling in the renderer process, so we disable it by passing kNoDefaultSignalHandling to node::InitializeOncePerProcess. * build: only create cppgc heap on non-32 bit platforms * chore: clean up util:CompileAndCall * src: fix compatility with upcoming V8 12.1 APIs nodejs/node#50709 * fix: use thread_local BuiltinLoader * chore: fixup v8 patch indices --------- Co-authored-by: Keeley Hammond <vertedinde@electronjs.org> Co-authored-by: Samuel Attard <marshallofsound@electronjs.org>
* chore: upgrade to Node.js v20 * src: allow embedders to override NODE_MODULE_VERSION nodejs/node#49279 * src: fix missing trailing , nodejs/node#46909 * src,tools: initialize cppgc nodejs/node#45704 * tools: allow passing absolute path of config.gypi in js2c nodejs/node#49162 * tools: port js2c.py to C++ nodejs/node#46997 * doc,lib: disambiguate the old term, NativeModule nodejs/node#45673 * chore: fixup Node.js BSSL tests * nodejs/node#49492 * nodejs/node#44498 * deps: upgrade to libuv 1.45.0 nodejs/node#48078 * deps: update V8 to 10.7 nodejs/node#44741 * test: use gcUntil() in test-v8-serialize-leak nodejs/node#49168 * module: make CJS load from ESM loader nodejs/node#47999 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * chore: address changes to CJS/ESM loading * module: make CJS load from ESM loader (nodejs/node#47999) * lib: improve esm resolve performance (nodejs/node#46652) * bootstrap: optimize modules loaded in the built-in snapshot nodejs/node#45849 * test: mark test-runner-output as flaky nodejs/node#49854 * lib: lazy-load deps in modules/run_main.js nodejs/node#45849 * url: use private properties for brand check nodejs/node#46904 * test: refactor `test-node-output-errors` nodejs/node#48992 * assert: deprecate callTracker nodejs/node#47740 * src: cast v8::Object::GetInternalField() return value to v8::Value nodejs/node#48943 * test: adapt test-v8-stats for V8 update nodejs/node#45230 * tls: ensure TLS Sockets are closed if the underlying wrap closes nodejs/node#49327 * test: deflake test-tls-socket-close nodejs/node#49575 * net: fix crash due to simultaneous close/shutdown on JS Stream Sockets nodejs/node#49400 * net: use asserts in JS Socket Stream to catch races in future nodejs/node#49400 * lib: fix BroadcastChannel initialization location nodejs/node#46864 * src: create BaseObject with node::Realm nodejs/node#44348 * src: implement DataQueue and non-memory resident Blob nodejs/node#45258 * sea: add support for V8 bytecode-only caching nodejs/node#48191 * chore: fixup patch indices * gyp: put filenames in variables nodejs/node#46965 * build: modify js2c.py into GN executable * fix: (WIP) handle string replacement of fs -> original-fs * [v20.x] backport vm-related memory fixes nodejs/node#49874 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * src: avoid copying string in fs_permission nodejs/node#47746 * look upon my works ye mighty and dispair * chore: patch cleanup * [api] Remove AllCan Read/Write https://chromium-review.googlesource.com/c/v8/v8/+/5006387 * fix: missing include for NODE_EXTERN * chore: fixup patch indices * fix: fail properly when js2c fails in Node.js * build: fix js2c root_gen_dir * fix: lib/fs.js -> lib/original-fs.js * build: fix original-fs file xforms * fixup! module: make CJS load from ESM loader * build: get rid of CppHeap for now * build: add patch to prevent extra fs lookup on esm load * build: greatly simplify js2c modifications Moves our original-fs modifications back into a super simple python script action, wires up the output of that action into our call to js2c * chore: update to handle moved internal/modules/helpers file * test: update @types/node test * feat: enable preventing cppgc heap creation * feat: optionally prevent calling V8::EnableWebAssemblyTrapHandler * fix: no cppgc initialization in the renderer * gyp: put filenames in variables nodejs/node#46965 * test: disable single executable tests * fix: nan tests failing on node headers missing file * tls,http2: send fatal alert on ALPN mismatch nodejs/node#44031 * test: disable snapshot tests * nodejs/node#47887 * nodejs/node#49684 * nodejs/node#44193 * build: use deps/v8 for v8/tools Node.js hard depends on these in their builtins * test: fix edge snapshot stack traces nodejs/node#49659 * build: remove js2c //base dep * build: use electron_js2c_toolchain to build node_js2c * fix: don't create SafeSet outside packageResolve Fixes failure in parallel/test-require-delete-array-iterator: === release test-require-delete-array-iterator === Path: parallel/test-require-delete-array-iterator node:internal/per_context/primordials:426 constructor(i) { super(i); } // eslint-disable-line no-useless-constructor ^ TypeError: object is not iterable (cannot read property Symbol(Symbol.iterator)) at new Set (<anonymous>) at new SafeSet (node:internal/per_context/primordials:426:22) * fix: failing crashReporter tests on Linux These were failing because our change from node::InitializeNodeWithArgs to node::InitializeOncePerProcess meant that we now inadvertently called PlatformInit, which reset signal handling. This meant that our intentional crash function ElectronBindings::Crash no longer worked and the renderer process no longer crashed when process.crash() was called. We don't want to use Node.js' default signal handling in the renderer process, so we disable it by passing kNoDefaultSignalHandling to node::InitializeOncePerProcess. * build: only create cppgc heap on non-32 bit platforms * chore: clean up util:CompileAndCall * src: fix compatility with upcoming V8 12.1 APIs nodejs/node#50709 * fix: use thread_local BuiltinLoader * chore: fixup v8 patch indices --------- Co-authored-by: Keeley Hammond <vertedinde@electronjs.org> Co-authored-by: Samuel Attard <marshallofsound@electronjs.org>
This fixes a potential HTTP/2 segfault (#49307, #48567) caused by underlying TLS behaviour, and likely fixes various other apparently related issues (#46094, #35695).
I found this while investigating #48519, which this doesn't directly fix, but definitely relates to, as the JS Stream Sockets there have all sorts of issues directly caused by TLS interactions after the JSS is closed.
The underlying problem here is that TLS sockets which are created by passing an existing socket (either
tlsServer.emit('connection', socket)
ortls.createConnection({ ..., socket })
) do not listen to whether the underlying socket was closed elsewhere. This is easy to demo:This prints:
With this change, the last line is instead
false false
.Clearly the TLS socket should not be readable or writeable 100ms after it's definitely disconnected. This happens even if something is reading data - the data just stops, and the affected TLS socket (the client in this case) never closes.
These TLSSockets do seem to shut down correctly if the remote side closes the connection, but in any other scenario, the TLS connection will remain happily active like this even when the underlying socket is gone, until something more serious interacts with its internals, which then explode (either segfaults in normal cases, or the various
finishWrite
errors in the issues above in JS Stream Socket cases).The first segfault is definitely caused by this case, and I've added a new test for that.
I'm fairly confident this is the underlying issue for the other cases too - note that both cases use the manual socket handling pattern (with HTTP/2, but implicitly on top of TLS) - but neither has a reliable repro to confirm that.
This change notably significantly modifies an existing TLS test too. Unfortunately with this change in place, that test no longer works at all, as it tries to mess with internals to trigger a race condition that crashed in some Node v7 versions, and those internals are now cleaned up before they can be messed with. This test has been simplified to a more general TLS shutdown test instead.
There is some history to this - very similar code did used to exist!
@addaleax and @ronag will likely be interested in this, since they were involved at various stages of that.
I think there's a good case here that something explicitly handling this is necessary. Somehow, TLS sockets need to be informed if the underlying socket is closed. Note that this change as here is not specific to StreamWraps, unless some past changes. AFAICT this is required here for all manually constructed TLS cases - the new test here just passes a net.Socket directly and it still crashes on all current Node versions.