-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
investigate flaky test-tls-enable-trace-cli #27636
Comments
This happens on Linux frequently, but I can't seem to replicate it on macOS. |
/ping @cjihrig (because I'm pretty sure they've looked at this and might have data/info to add?) |
Not much to add. I also can't replicate it on macOS - I've run it tens of thousands of times at this point 😄 |
I'm running it in a loop on my linux laptop to see if I can repro. |
I got a failure within a couple minutes of continuous running, I'll try to fix this, but might not have time until Monday. |
It’s a bit easier to reproduce when running under valgrind (which kinda points to a timing race condition), but that might interfere with debugging. (edit: not the child process, only the test itself) |
Actually, this makes the bug show up pretty reliably for me: diff --git a/test/parallel/test-tls-enable-trace-cli.js b/test/parallel/test-tls-enable-trace-cli.js
index 5b7189af702d..ace9e6710ba9 100644
--- a/test/parallel/test-tls-enable-trace-cli.js
+++ b/test/parallel/test-tls-enable-trace-cli.js
@@ -40,6 +40,7 @@ child.on('exit', common.mustCall((code) => {
}));
function test() {
+ process.stderr.write('a'.repeat(1024*1024) + '\n')
const {
connect, keys
} = require(fixtures.path('tls-connect')); I think the reason might be that stderr refers to a fd that Node has made non-blocking, and which makes OpenSSL’s |
Yeah, I think we can’t use |
Thanks for diagnosing this. I'm OK with messages getting dropped if the pipe is overrun, because in my mind, this kind of logging is "best effort", similar to process._rawDebug's handling of failed fprintfs to stderr, and to I think I can fix the leak by storing the bio in the context, and cleaning up in the destructor, and change the test to assert on something that will be seen in the first records logged, so that overflow won't fail the test. @cjihrig @addaleax Would that be OK, or do you have another suggestion? |
@sam-github I think that is okay, yes. That being said, I think somebody already requested the ability to print the output to a non-stderr location, so maybe making this more flexible by providing a custom BIO instance would be a good idea anyway? |
It's not ideal, but I think it's better than not having this functionality. Ideally, this feature is used to diagnose a bad connection, and not turned on in production to log the details of thousands of connections. |
Hopelessly verbose in production. Mostly, I expect it to be used just to debug handshake problems, to answer the question of why node can't agree with its peer on a shared cipher suite. I didn't see the request for non-stderr location, where was it? If requested location is a specific file, then still wouldn't need a custom BIO, I'd just use a file BIO. Also, a custom BIO would still have the problem of what do when write fails with EAGAIN, what would you suggest? Would you want it to use uv_write, buffering all the written data in memory until it drains, with no backpressure? I'm not sure that's better than dropping. I have a longer-term idea of making enableTrace take an optional js function as an arg instead of just boolean, so that the message can be delivered to js code and it can do whatever it wants with it, but I think the current impl, while being a bit rigid, satisfies the common use cases. |
@sam-github Not saying I have it all figured out :)
I can’t find it rn – it might be mixing it up with something else that unconditionally printed to stderr?
I think this would be best. And if we can use that to replace the current solution by using e.g. |
Yes, pushing to js space is my long term goal, but enough people were needing it I thought it useful getting something out fast. |
I think there were two issues here, one is that when test output to stderr was dropped on buffer overflow, the test would fail. The other, less common, was that The solution to this is to wrap |
See #27841 |
Still open, pending #27841 |
Calls to TLS_trace might leave errors on the SSL error stack, which then get reported as SSL errors instead of being ignored. Wrap TLS_trace to keep the error stack unchanged. Fixes: nodejs#27636
Fixes: #27636 (comment) PR-URL: #27834 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>
I'm pretty sure this one is still flaky. (And if it's not, then it should be removed from parallel.status.) Re-opening. |
Calls to TLS_trace might leave errors on the SSL error stack, which then get reported as SSL errors instead of being ignored. Wrap TLS_trace to keep the error stack unchanged. Fixes: #27636 PR-URL: #27841 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Refael Ackermann <refack@gmail.com>
@Trott Is it flaky on a particular platform? I'm not sure where to look next. |
Most recent one I'm aware of is https://ci.nodejs.org/job/node-test-commit-linux/nodes=ubuntu1804-64/27985/console. test-digitalocean-ubuntu1804-x64-1 05:17:00 not ok 1906 parallel/test-tls-enable-trace-cli # TODO : Fix flaky test
05:17:00 ---
05:17:00 duration_ms: 0.176
05:17:00 severity: flaky
05:17:00 exitcode: 1
05:17:00 stack: |-
05:17:00 (node:24591) Warning: Enabling --trace-tls can expose sensitive data in the resulting log.
05:17:00 Sent Record
05:17:00 Header:
05:17:00 Version = TLS 1.0 (0x301)
05:17:00 Content Type = Handshake (22)
05:17:00 Length = 340
05:17:00 ClientHello, Length=336
05:17:00 client_version=0x303 (TLS 1.2)
05:17:00 Random:
05:17:00 gmt_unix_time=0xE8890755
05:17:00 random_bytes (len=28): 975E4A134E97A9B9447695FF37A02EA32131C5F18AFE04A136773F4F
05:17:00 session_id (len=32): 66B25369EB5CAB03C129B30305D2704C22EE4E6B6838A438C27DBB4C00A98C5B
05:17:00 cipher_suites (len=118)
05:17:00 {0x13, 0x02} TLS_AES_256_GCM_SHA384
05:17:00 {0x13, 0x03} TLS_CHACHA20_POLY1305_SHA256
05:17:00 {0x13, 0x01} TLS_AES_128_GCM_SHA256
05:17:00 {0xC0, 0x2F} TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
05:17:00 {0xC0, 0x2B} TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
05:17:00 {0xC0, 0x30} TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
05:17:00 {0xC0, 0x2C} TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
05:17:00 {0x00, 0x9E} TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
05:17:00 {0xC0, 0x27} TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
05:17:00 {0x00, 0x67} TLS_DHE_RSA_WITH_AES_128_CBC_SHA256
05:17:00 {0xC0, 0x28} TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
05:17:00 {0x00, 0x6B} TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
05:17:00 {0x00, 0xA3} TLS_DHE_DSS_WITH_AES_256_GCM_SHA384
05:17:00 {0x00, 0x9F} TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
05:17:00 {0xCC, 0xA9} TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
05:17:00 {0xCC, 0xA8} TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
05:17:00 {0xCC, 0xAA} TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256
05:17:00 {0xC0, 0xAF} TLS_ECDHE_ECDSA_WITH_AES_256_CCM_8
05:17:00 {0xC0, 0xAD} TLS_ECDHE_ECDSA_WITH_AES_256_CCM
05:17:00 {0xC0, 0xA3} TLS_DHE_RSA_WITH_AES_256_CCM_8
05:17:00 {0xC0, 0x9F} TLS_DHE_RSA_WITH_AES_256_CCM
05:17:00 {0xC0, 0x5D} TLS_ECDHE_ECDSA_WITH_ARIA_256_GCM_SHA384
05:17:00 {0xC0, 0x61} TLS_ECDHE_RSA_WITH_ARIA_256_GCM_SHA384
05:17:00 {0xC0, 0x57} TLS_DHE_DSS_WITH_ARIA_256_GCM_SHA384
05:17:00 {0xC0, 0x53} TLS_DHE_RSA_WITH_ARIA_256_GCM_SHA384
05:17:00 {0x00, 0xA2} TLS_DHE_DSS_WITH_AES_128_GCM_SHA256
05:17:00 {0xC0, 0xAE} TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8
05:17:00 {0xC0, 0xAC} TLS_ECDHE_ECDSA_WITH_AES_128_CCM
05:17:00 {0xC0, 0xA2} TLS_DHE_RSA_WITH_AES_128_CCM_8
05:17:00 {0xC0, 0x9E} TLS_DHE_RSA_WITH_AES_128_CCM
05:17:00 {0xC0, 0x5C} TLS_ECDHE_ECDSA_WITH_ARIA_128_GCM_SHA256
05:17:00 {0xC0, 0x60} TLS_ECDHE_RSA_WITH_ARIA_128_GCM_SHA256
05:17:00 {0xC0, 0x56} TLS_DHE_DSS_WITH_ARIA_128_GCM_SHA256
05:17:00 {0xC0, 0x52} TLS_DHE_RSA_WITH_ARIA_128_GCM_SHA256
05:17:00 {0xC0, 0x24} TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384
05:17:00 {0x00, 0x6A} TLS_DHE_DSS_WITH_AES_256_CBC_SHA256
05:17:00 {0xC0, 0x23} TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
05:17:00 {0x00, 0x40} TLS_DHE_DSS_WITH_AES_128_CBC_SHA256
05:17:00 {0xC0, 0x0A} TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
05:17:00 {0xC0, 0x14} TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
05:17:00 {0x00, 0x39} TLS_DHE_RSA_WITH_AES_256_CBC_SHA
05:17:00 {0x00, 0x38} TLS_DHE_DSS_WITH_AES_256_CBC_SHA
05:17:00 {0xC0, 0x09} TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
05:17:00 {0xC0, 0x13} TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
05:17:00 {0x00, 0x33} TLS_DHE_RSA_WITH_AES_128_CBC_SHA
05:17:00 {0x00, 0x32} TLS_DHE_DSS_WITH_AES_128_CBC_SHA
05:17:00 {0x00, 0x9D} TLS_RSA_WITH_AES_256_GCM_SHA384
05:17:00 {0xC0, 0xA1} TLS_RSA_WITH_AES_256_CCM_8
05:17:00 {0xC0, 0x9D} TLS_RSA_WITH_AES_256_CCM
05:17:00 {0xC0, 0x51} TLS_RSA_WITH_ARIA_256_GCM_SHA384
05:17:00 {0x00, 0x9C} TLS_RSA_WITH_AES_128_GCM_SHA256
05:17:00 {0xC0, 0xA0} TLS_RSA_WITH_AES_128_CCM_8
05:17:00 {0xC0, 0x9C} TLS_RSA_WITH_AES_128_CCM
05:17:00 {0xC0, 0x50} TLS_RSA_WITH_ARIA_128_GCM_SHA256
05:17:00 {0x00, 0x3D} TLS_RSA_WITH_AES_256_CBC_SHA256
05:17:00
05:17:00 (node:24580) internal/test/binding: These APIs are for internal testing only. Do not use them.
05:17:00 assert.js:362
05:17:00 throw err;
05:17:00 ^
05:17:00
05:17:00 AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
05:17:00
05:17:00 assert(/Received Record/.test(stderr))
05:17:00
05:17:00 at ChildProcess.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1804-64/test/parallel/test-tls-enable-trace-cli.js:40:3)
05:17:00 at ChildProcess.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1804-64/test/common/index.js:374:15)
05:17:00 at ChildProcess.emit (events.js:200:13)
05:17:00 at maybeClose (internal/child_process.js:1028:16)
05:17:00 at Process.ChildProcess._handle.onexit (internal/child_process.js:283:5) {
05:17:00 generatedMessage: true,
05:17:00 code: 'ERR_ASSERTION',
05:17:00 actual: false,
05:17:00 expected: true,
05:17:00 operator: '=='
05:17:00 }
05:17:00 ... |
That's the V8 7.5 update, which has been a WIP for a while. Any chance it just needs to be rebased? |
Could be. #28002 will close this. |
....and....nope. https://ci.nodejs.org/job/node-test-commit-linux/27996/nodes=centos7-64-gcc6/console test-rackspace-centos7-x64-1 17:17:36 not ok 1855 parallel/test-tls-enable-trace-cli
17:17:36 ---
17:17:36 duration_ms: 0.359
17:17:36 severity: fail
17:17:36 exitcode: 1
17:17:36 stack: |-
17:17:36 (node:25033) Warning: Enabling --trace-tls can expose sensitive data in the resulting log.
17:17:36 Sent Record
17:17:36 Header:
17:17:36 Version = TLS 1.0 (0x301)
17:17:36 Content Type = Handshake (22)
17:17:36 Length = 340
17:17:36 ClientHello, Length=336
17:17:36 client_version=0x303 (TLS 1.2)
17:17:36 Random:
17:17:36 gmt_unix_time=0x850B1D53
17:17:36 random_bytes (len=28): 1E791588A43E8A2DD12650F8BD9B2C4BF662FC08BDC9141B229E97A1
17:17:36 session_id (len=32): 50F773FFEF6D99494B9768BEABF38C33435DAF896508EDE3B372EDC60F7C93E5
17:17:36 cipher_suites (len=118)
17:17:36 {0x13, 0x02} TLS_AES_256_GCM_SHA384
17:17:36 {0x13, 0x03} TLS_CHACHA20_POLY1305_SHA256
17:17:36 {0x13, 0x01} TLS_AES_128_GCM_SHA256
17:17:36 {0xC0, 0x2F} TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
17:17:36 {0xC0, 0x2B} TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
17:17:36 {0xC0, 0x30} TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
17:17:36 {0xC0, 0x2C} TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
17:17:36 {0x00, 0x9E} TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
17:17:36 {0xC0, 0x27} TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
17:17:36 {0x00, 0x67} TLS_DHE_RSA_WITH_AES_128_CBC_SHA256
17:17:36 {0xC0, 0x28} TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
17:17:36 {0x00, 0x6B} TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
17:17:36 {0x00, 0xA3} TLS_DHE_DSS_WITH_AES_256_GCM_SHA384
17:17:36 {0x00, 0x9F} TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
17:17:36 {0xCC, 0xA9} TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
17:17:36 {0xCC, 0xA8} TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
17:17:36 {0xCC, 0xAA} TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256
17:17:36 Sent Record
17:17:36 Header:
17:17:36 Version = TLS 1.2 (0x303)
17:17:36 Content Type = ApplicationData (23)
17:17:36 Length = 19
17:17:36 Inner Content Type = Alert (21)
17:17:36 Level=warning(1), description=close notify(0)
17:17:36
17:17:36 Sent Record
17:17:36 Header:
17:17:36 Version = TLS 1.2 (0x303)
17:17:36 Content Type = ApplicationData (23)
17:17:36 Length = 266
17:17:36 Inner Content Type = Handshake (22)
17:17:36 NewSessionTicket, Length=245
17:17:36 ticket_lifetime_hint=7200
17:17:36 ticket_age_add=387409500
17:17:36 ticket_nonce (len=8): 0000000000000000
17:17:36 ticket (len=224): F31FCC7D488C5AC48F8DC958CA9A5F5D4C054F135F678859C6DEF999FD47F796F75BF309089D6BD08CCEE0A33BB959E2C2985BAE6B27AD02B73D41A86F63C51435CA38BBE0A30FB4EDC6EFD4D67A3F9030F804B167EC37A9541D630E29E227F4E928C09C4E82CCE1AB755797D3EE8F94A04E0999F76315CE713FBF09CC661CF61F7994CBA5DE152C8039C572BC8CA135112C1B4915CFEF7A9FDAC5D22B4273B6BE5C3AD66D1F6730F5FB68CBFAEFB6B0807E3E65C9A710F3DECD9DFB3EA28609436AF880882DEED50E989B7145CFAB0ACD84154929FF
17:17:36 (node:25022) internal/test/binding: These APIs are for internal testing only. Do not use them.
17:17:36 assert.js:362
17:17:36 throw err;
17:17:36 ^
17:17:36
17:17:36 AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
17:17:36
17:17:36 assert(/Received Record/.test(stderr))
17:17:36
17:17:36 at ChildProcess.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/parallel/test-tls-enable-trace-cli.js:40:3)
17:17:36 at ChildProcess.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/common/index.js:374:15)
17:17:36 at ChildProcess.emit (events.js:200:13)
17:17:36 at maybeClose (internal/child_process.js:1028:16)
17:17:36 at Process.ChildProcess._handle.onexit (internal/child_process.js:283:5) {
17:17:36 generatedMessage: true,
17:17:36 code: 'ERR_ASSERTION',
17:17:36 actual: false,
17:17:36 expected: true,
17:17:36 operator: '=='
17:17:36 }
17:17:36 ... |
I’d be surprised if this issue would have been fully addressed. |
The TLS trace data is best-effort, and enough can be dropped from pipe buffers that only the start of the trace is detected. Only assert on the first line of the trace, it should not get dropped, and it's enough to check that trace was enabled via CLI. PR-URL: #28043 Fixes: #27636 Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
The text was updated successfully, but these errors were encountered: