Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport v10.x] test: fix pty test hangs on aix #28826

Closed
wants to merge 3 commits into from

Conversation

Trott
Copy link
Member

@Trott Trott commented Jul 23, 2019

Original PRs: #28600 #28516 #28469

Some pty tests persistently hung on the AIX CI buildbots. Fix that by
adding a helper script that properly sets up the pty before spawning
the script under test.

On investigation I discovered that the test runner hung when it tried
to close the slave pty's file descriptor, probably due to a bug in
AIX's pty implementation. I could reproduce it with a short C program.
The test runner also leaked file descriptors to the child process.

I couldn't convince python's subprocess.Popen() to do what I wanted
it to do so I opted to move the logic to a helper script that can do
fork/setsid/etc. without having to worry about stomping on state in
tools/test.py.

In the process I also uncovered some bugs in the pty module of the
python distro that ships with macOS 10.14, leading me to reimplement
a sizable chunk of the functionality of that module.

And last but not least, of course there are differences between ptys
on different platforms and the helper script has to paper over that.
Of course.

Really, this commit took me longer to put together than I care to admit.

Caveat emptor: this commit takes the hacky ^D feeding to the slave out
of tools/test.py and puts it in the *.in input files. You can also feed
other control characters to tests, like ^C or ^Z, simply by inserting
them into the corresponding input file. I think that's nice.

Fixes: nodejs/build#1820
Fixes: #28489

PR-URL: #28600
Reviewed-By: Richard Lau riclau@uk.ibm.com
Reviewed-By: Sam Roberts vieuxtech@gmail.com
Reviewed-By: Anna Henningsen anna@addaleax.net

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added test Issues and PRs related to the tests. tools Issues and PRs related to the tools directory. v10.x labels Jul 23, 2019
@Trott Trott force-pushed the backport-28600w branch from f55a97c to 91715fe Compare July 23, 2019 21:53
@nodejs-github-bot

This comment has been minimized.

@Trott
Copy link
Member Author

Trott commented Jul 23, 2019

Noticed more than one 10.x backport PRs with perma-fail AIX tests. Hoping backporting this fixes it....

@Trott
Copy link
Member Author

Trott commented Jul 23, 2019

/ping @bnoordhuis (original author) @BethGriggs (person I'm hoping will get this on v10.x-staging quickly if it ends up fixing the issues)

@nodejs-github-bot
Copy link
Collaborator

@Trott
Copy link
Member Author

Trott commented Jul 23, 2019

Looks like AIX tests all passed but then some build/JUnit stuff blew up. Trying again...

@nodejs-github-bot
Copy link
Collaborator

@Trott
Copy link
Member Author

Trott commented Jul 23, 2019

AIX tests passed again but then there was an Java/Jenkins failure. 😞

@nodejs-github-bot
Copy link
Collaborator

@Trott
Copy link
Member Author

Trott commented Jul 23, 2019

Trying again, but forcing a different AIX host to see if it makes a difference....

@sam-github
Copy link
Contributor

see: nodejs/build#1849 (comment)

@Trott
Copy link
Member Author

Trott commented Jul 23, 2019

OK, so this will have to also backport 1245aca to pass, it would seem.

@Trott
Copy link
Member Author

Trott commented Jul 23, 2019

OK, added a second backport commit per above. Let's try CI again....

@Trott Trott force-pushed the backport-28600w branch from 6ee1342 to c5f7616 Compare July 23, 2019 23:10
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@Trott
Copy link
Member Author

Trott commented Jul 24, 2019

OK, so tests are passing now and the build problems from the stringbytes tests are no longer a thing...but now there are leftover node processes after the tests and that's resulting in the suite be marked red....

@nodejs-github-bot
Copy link
Collaborator

@Trott Trott force-pushed the backport-28600w branch from c5f7616 to eecd8bd Compare July 24, 2019 04:35
@nodejs-github-bot
Copy link
Collaborator

@Trott
Copy link
Member Author

Trott commented Jul 24, 2019

Let's see if adding 3c9c89d from #28469 fixes it...

@Trott
Copy link
Member Author

Trott commented Jul 24, 2019

Hooray! That fixed it! (Will need to re-run because of a spurious failure on another platform, but AIX is green.)

@nodejs/releasers Is it OK to pile up three commits from three different PRs like this? I hope so...

@richardlau
Copy link
Member

Can you perhaps apply them in order? Or at least check against master because none of the pseudo-tty tests are skipped on AIX on master.

@nodejs-github-bot
Copy link
Collaborator

sam-github and others added 3 commits July 23, 2019 23:05
These tests seem to trigger failures in the entire CI job (not just the
test) on AIX. Skip them to see if that helps alleviate spurious failures
in node-test-commit-aix (and the upstream PR and commit test jobs).

See:
- nodejs/build#1820 (comment)
- nodejs/build#1847 (comment)

PR-URL: nodejs#28469
Backport-PR-URL: nodejs#28826
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Add SKIP status for more tests in stringbytes-external-exceed-max that
are failing on AIX.

PR-URL: nodejs#28516
Backport-PR-URL: nodejs#28826
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Some pty tests persistently hung on the AIX CI buildbots. Fix that by
adding a helper script that properly sets up the pty before spawning
the script under test.

On investigation I discovered that the test runner hung when it tried
to close the slave pty's file descriptor, probably due to a bug in
AIX's pty implementation. I could reproduce it with a short C program.
The test runner also leaked file descriptors to the child process.

I couldn't convince python's `subprocess.Popen()` to do what I wanted
it to do so I opted to move the logic to a helper script that can do
fork/setsid/etc. without having to worry about stomping on state in
tools/test.py.

In the process I also uncovered some bugs in the pty module of the
python distro that ships with macOS 10.14, leading me to reimplement
a sizable chunk of the functionality of that module.

And last but not least, of course there are differences between ptys
on different platforms and the helper script has to paper over that.
Of course.

Really, this commit took me longer to put together than I care to admit.

Caveat emptor: this commit takes the hacky ^D feeding to the slave out
of tools/test.py and puts it in the *.in input files. You can also feed
other control characters to tests, like ^C or ^Z, simply by inserting
them into the corresponding input file. I think that's nice.

Fixes: nodejs/build#1820
Fixes: nodejs#28489

PR-URL: nodejs#28600
Backport-PR-URL: nodejs#28826
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@Trott Trott force-pushed the backport-28600w branch from eecd8bd to dc4b64f Compare July 24, 2019 06:07
@Trott
Copy link
Member Author

Trott commented Jul 24, 2019

Can you perhaps apply them in order?

Rebased and applied in the order in which they landed.

@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot
Copy link
Collaborator

nodejs-github-bot commented Jul 24, 2019

Copy link
Contributor

@sam-github sam-github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK to fast-track

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhdawson
Copy link
Member

+1 to fast track

@Trott
Copy link
Member Author

Trott commented Jul 25, 2019

/ping @nodejs/backporters. AIX tests on v10.x-staging will be red until this lands.

@ZYSzys
Copy link
Member

ZYSzys commented Jul 25, 2019

+1 to fast track.

@ZYSzys ZYSzys added the fast-track PRs that do not need to wait for 48 hours to land. label Jul 25, 2019
BethGriggs pushed a commit that referenced this pull request Jul 25, 2019
These tests seem to trigger failures in the entire CI job (not just the
test) on AIX. Skip them to see if that helps alleviate spurious failures
in node-test-commit-aix (and the upstream PR and commit test jobs).

See:
- nodejs/build#1820 (comment)
- nodejs/build#1847 (comment)

PR-URL: #28469
Backport-PR-URL: #28826
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
BethGriggs pushed a commit that referenced this pull request Jul 25, 2019
Add SKIP status for more tests in stringbytes-external-exceed-max that
are failing on AIX.

PR-URL: #28516
Backport-PR-URL: #28826
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
BethGriggs pushed a commit that referenced this pull request Jul 25, 2019
Some pty tests persistently hung on the AIX CI buildbots. Fix that by
adding a helper script that properly sets up the pty before spawning
the script under test.

On investigation I discovered that the test runner hung when it tried
to close the slave pty's file descriptor, probably due to a bug in
AIX's pty implementation. I could reproduce it with a short C program.
The test runner also leaked file descriptors to the child process.

I couldn't convince python's `subprocess.Popen()` to do what I wanted
it to do so I opted to move the logic to a helper script that can do
fork/setsid/etc. without having to worry about stomping on state in
tools/test.py.

In the process I also uncovered some bugs in the pty module of the
python distro that ships with macOS 10.14, leading me to reimplement
a sizable chunk of the functionality of that module.

And last but not least, of course there are differences between ptys
on different platforms and the helper script has to paper over that.
Of course.

Really, this commit took me longer to put together than I care to admit.

Caveat emptor: this commit takes the hacky ^D feeding to the slave out
of tools/test.py and puts it in the *.in input files. You can also feed
other control characters to tests, like ^C or ^Z, simply by inserting
them into the corresponding input file. I think that's nice.

Fixes: nodejs/build#1820
Fixes: #28489

PR-URL: #28600
Backport-PR-URL: #28826
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@BethGriggs
Copy link
Member

Landed on v10.x-staging

@BethGriggs BethGriggs closed this Jul 25, 2019
@Trott Trott deleted the backport-28600w branch January 13, 2022 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fast-track PRs that do not need to wait for 48 hours to land. test Issues and PRs related to the tests. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants