-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: improve test-child-process-fork-dgram #8697
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Commit message nit:
Maybe: |
* a message. In this case the test runner will timeout after 60 secs | ||
* and the test will fail. | ||
* message. When either the parent or child receives a message we close | ||
* the server on that side so the next message is guarranteed to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/guarranteed/guaranteed/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok will update commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhdawson The typo is in the test comment on line 12.
client.send( | ||
msg, | ||
0, | ||
msg.length, | ||
server.address().port, | ||
serverPort, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to add the serverPort variable? It only seems to be used here, and makes things a bit more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its needed because the server is closed before we have stopped sending. Therefore, you can't call server.address().port. I found this the hard way while building up the changeset.
So if we're now closing the child/parent server once they receive a message, the test should finish after two messages unless some are disappearing (this is UDP so I guess messages can fail). I haven't seen any dropped messages at all in any of the test runs, so having a 45s internal timeout (i.e. 45,000 messages) may be overkill? Changes LGTM if CI passes (probably also worth running a stress test on AIX as well to make sure). |
Also just to document, this fixes #8271 |
Already ran the stress run of 500 on AIX and all passed versus about at 1/10 failure before. CI run here: https://ci.nodejs.org/job/node-test-pull-request/4223/ |
In terms of the timeout as long as it is shorter than the 60s from test harness I don't think it really matters as we will normally finish fast anyway. |
Commit message updated. |
Lots of failures in the CI run, but none related to the test being changed. Several test tick processor related timeouts as well as checkouts failing across a number of platforms. |
Another CI run: https://ci.nodejs.org/job/node-test-pull-request/4225/ |
Ok second CI run only had failures (windows and smartos) related to test-tick-processor which were pre-existing and not related to the changes to this test. |
@mhdawson You missed a nit on L20, otherwise LGTM. @misterdjules LGTY? (also @AndreasMadsen , but feel free to ignore if you don't think you could usefully review). |
@gibfahn Could you explain why this comment isn't a concern here:
|
@AndreasMadsen I think the point is that due to the unreliable nature of the protocol, we have no grounds to expect that messages will reach both while both are listening. The system may route all messages to either child or parent. |
Previous implementation was subject to timing out as there was no guarantee that the parent or child would ever receive a message. Modified test so that once parent or child receives a message it closes its connection to the socket so we can guarantee the next messsage will be recieved by the other node. The test also left lingering processes when it timed out so pulled the timeout into the test itself so that it could properly cleanup in case of failure.
@gibfahn thats for the catch, think I got the nit this time :) |
LGTM |
There are 2 hung processes on test-osuosl-aix61-ppc64_be-1, one from Oct 3 and one from Oct 7 so I think the problem still exists unless it was fixed by something after Oct 7. |
@gibfahn I think the two get separate copies of the handle so one closing may not affect the other. It should however be possible to write a test that validates it one way or the other by making sure it either passes/fails after one side has done the close. |
Lastest process on test-osuosl-aix61-ppc64_be-1 is from Oct 8th. |
Annnnnddddd....now it's failing consistently on freebsd in CI. So, this is probably a thing again. I guess someone should write some code to confirm the speculation in #8697 (comment) that the two copies get separate handles and then, if so, this can get a rebase and land? Or am I mistaken? /cc @mhdawson @gibfahn |
@Trott Well we need to work out why it's failing on macOS 10.12 first I think |
Ooof. OK, for fun, or at least "fun", CI stress test across all available platforms: https://ci.nodejs.org/job/node-stress-single-test/999/ |
@gibfahn and I did discuss writing the test to confirm close behavior across processes. I think he'll be looking at that and the failures on macOS 10.12 soon. |
I think the
|
@santigimeno This might well be the reason that the test was failing on AIX as well. If it fixes the test for all platforms, then I think it is preferable over this PR. |
@santigimeno @gibfahn That unfortunately does not fix it on FreeBSD on our CI, but it is a good thing to add to the test anyway, IMO. And if it clears it up on AIX and/or macos, awesome. |
Perhaps this is a libuv issue ultimately? EDIT: Eh, or maybe not... |
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: nodejs#8697 Fixes: nodejs#8949 Fixes: nodejs#8271
Alternative fix: #9098 Haven't tested it yet on AIX. About to run CI. But if someone is doing these tests locally and whatnot, maybe give it a shot? That alternative definitely works on macos and works on FreeBSD. |
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: nodejs#8697 Fixes: nodejs#8949 Fixes: nodejs#8271 PR-URL: nodejs#9098 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
c133999
to
83c7a88
Compare
Closing since alternative has landed |
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: #8697 Fixes: #8949 Fixes: #8271 PR-URL: #9098 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: nodejs#8697 Fixes: nodejs#8949 Fixes: nodejs#8271 PR-URL: nodejs#9098 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: nodejs#8697 Fixes: nodejs#8949 Fixes: nodejs#8271 PR-URL: nodejs#9098 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: #8697 Fixes: #8949 Fixes: #8271 PR-URL: #9098 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: #8697 Fixes: #8949 Fixes: #8271 PR-URL: #9098 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Checklist
make -j4 test
(UNIX), orvcbuild test nosign
(Windows) passesAffected core subsystem(s)
test
Description of change
Previous implementation was subject to timing out as there
was no guarantee that the parent or child would ever
receive a message. Modified test so that once parent or
child receives a message it stops receiving them so that
we are guaranteed the next message will be received by the
other end.
The test also left lingering processes when it timed out
so pulled the timeout into the test itself so that it
could properly cleanup in case of failure.