Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

AIX: ENOTCONN error thrown in spawnSync #9444

Closed
gireeshpunathil opened this issue Mar 20, 2015 · 10 comments
Closed

AIX: ENOTCONN error thrown in spawnSync #9444

gireeshpunathil opened this issue Mar 20, 2015 · 10 comments
Assignees
Milestone

Comments

@gireeshpunathil
Copy link
Member

On AIX we were seeing failures in the Node test:

simple/test-child-process-execsync.js

/bin/sh: iamabadcommand: not found.
child_process.js:1382
throw err;
^
Error: spawnSync ENOTCONN
at exports._errnoException (util.js:746:11)
at spawnSync (child_process.js:1321:20)
at execSync (child_process.js:1373:13)
....

This is also reproducible in OS X, under heavy stress in
the system. The issue is root caused by a race condition between
the parent and child where the short lived child (such as bad
command used in the test case) closed its end of the pipe
before the parent called the shutdown() on its end of the pipe.

A pull request is present in io.js on the same:

nodejs/node#1214

Request to pull this change to Node.js as well.
Thanks

@mhdawson mhdawson added this to the 0.12.2 milestone Mar 20, 2015
@mhdawson mhdawson added the P-3 label Mar 20, 2015
@misterdjules
Copy link

nodejs/node#1214 LGTM, thank you @gireeshpunathil!

@mhdawson mhdawson self-assigned this Mar 27, 2015
@mhdawson
Copy link
Member

I'll create the pull request to move it across

@misterdjules misterdjules modified the milestones: 0.12.3, 0.12.2 Apr 1, 2015
@mhdawson
Copy link
Member

mhdawson commented Apr 6, 2015

Created pull request #14480

@mhdawson
Copy link
Member

mhdawson commented Apr 6, 2015

I tried pulling in but I saw this error (on osx64), along with a failure on linux which I could not reproduce in my local build with the change.

Gireesh could you see if you can recreate this on on osx with/without the change. Since it is a test in the same set as the one we were trying to fix (test-child-process-xxx) it may be that the change has introduced an issue for this test.

  duration_ms: 0.604
  ...
not ok 33 - test-child-process-fork-getconnections.js
#/Development/jenkins/workspace/node-test-commit-unix/ea9f5537/test/simple/test-child-process-fork-getconnections.js:39
#          throw new Error('[c] closing by accident!');
#                ^
#Error: [c] closing by accident!
#    at Socket.<anonymous> (/Development/jenkins/workspace/node-test-commit-unix/ea9f5537/test/simple/test-child-process-fork-getconnections.js:39:17)
#    at Socket.emit (events.js:129:20)
#    at _stream_readable.js:908:16
#    at process._tickCallback (node.js:355:11)
#/Development/jenkins/workspace/node-test-commit-unix/ea9f5537/test/simple/test-child-process-fork-getconnections.js:57
#      throw new Error('child died unexpectedly!');
#            ^
#Error: child died unexpectedly!
#    at ChildProcess.<anonymous> (/Development/jenkins/workspace/node-test-commit-unix/ea9f5537/test/simple/test-child-process-fork-getconnections.js:57:13)
#    at ChildProcess.emit (events.js:110:17)
#    at Process.ChildProcess._handle.onexit (child_process.js:1074:12)
  ---
  duration_ms: 0.507
  ...

@gireeshpunathil
Copy link
Member Author

@mdawsonibm, I am able to reproduce this reported error in OSX (but not in linux), WITH and WITHOUT this change.

The precondition for this failure is heavy stress in the system.

12 clients connect to a server, the sockets passed to a forked child, and each of the sockets are closed, through a message passing between the parent and child. child registers a call back on 'end' event of the socket which should have never called because the sockets are destroyed directly, but due to unknown reason, this end call back is fired causing this issue.

So in short, this issue is not related to the proposed change, and is a pre-existing one.

@mhdawson
Copy link
Member

Ok thanks will try to pull in again

mhdawson pushed a commit that referenced this issue Apr 13, 2015
This is a backport of ea37ac0

Original commit message:

  On AIX, OS X and the BSDs, calling shutdown() on one end of a pipe
  when the other end has closed the connection fails with ENOTCONN.

  The sequential/test-child-process-execsync test failed sporadically
  because of a race between the parent and the child where one closed
  its end of the pipe before the other got around to calling shutdown()
  on its end of the pipe.

  Libuv is not the right place to handle that because it can't tell if
  the ENOTCONN error is genuine but io.js can.

  Refs: libuv/libuv#268
  PR-URL: iojs#1214
  Reviewed-By: Bert Belder <bertbelder@gmail.com>

Fixes: #9444.

Reviewed-By: Julien Gilli <julien.gilli@joyent.com>
PR-URL: #14480
@mhdawson
Copy link
Member

Pull request d5b3224

Landed as d5b3224

@misterdjules
Copy link

@mdawsonibm @gireeshpunathil Would you mind creating an issue regarding test-child-process-fork-getconnections.js being flaky on OSX with your detailed description of the problem as a description?

Also see related issue in io.js: nodejs/node#1100.

@gireeshpunathil
Copy link
Member Author

@misterdjules , done - #16805

@misterdjules
Copy link

@gireeshpunathil Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants