Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIX - failure in parallel/test-cluster-disconnect-handles #7563

Closed
mhdawson opened this issue Jul 6, 2016 · 12 comments
Closed

AIX - failure in parallel/test-cluster-disconnect-handles #7563

mhdawson opened this issue Jul 6, 2016 · 12 comments
Labels
cluster Issues and PRs related to the cluster subsystem. test Issues and PRs related to the tests.

Comments

@mhdawson
Copy link
Member

mhdawson commented Jul 6, 2016

  • Version: master
  • Platform: AIX
  • Subsystem: cluster

https://ci.nodejs.org/job/node-test-commit-aix/247/nodes=aix61-ppc64/console

not ok 104 parallel/test-cluster-disconnect-handles
# Debugger listening on [::1]:12346
# 
# /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-disconnect-handles.js:86
#     throw ex;
#     ^
# AssertionError: worker did not exit normally
#     at Worker.worker.once.common.mustCall (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-disconnect-handles.js:32:12)
#     at Worker. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common.js:407:15)
#     at Worker.g (events.js:286:16)
#     at emitTwo (events.js:111:20)
#     at Worker.emit (events.js:191:7)
#     at ChildProcess. (cluster.js:380:14)
#     at ChildProcess.g (events.js:286:16)
#     at emitTwo (events.js:106:13)
#     at ChildProcess.emit (events.js:191:7)
#     at Process.ChildProcess._handle.onexit (internal/child_process.js:204:12)
  ---
  duration_ms: 0.876
  ...
@mhdawson mhdawson added the cluster Issues and PRs related to the cluster subsystem. label Jul 6, 2016
@mscdex mscdex added the test Issues and PRs related to the tests. label Jul 6, 2016
@gireeshpunathil
Copy link
Member

  1. This is not related to the malloc(0) issue.
  2. The worker exit callback in master receives null, SIGTERM as the worker exit code and signal in the failing case as opposed to 0 and null in the passing case.
  3. If I change the IPV6 address to IPV4 for the worker-master communication, the issue is not seen.

Will debug further with hints 2 and 3 as my starting points.

@gireeshpunathil
Copy link
Member

In truss output, noticed that master fails to connect to the TCP server setup in the worker:
4063854: 16908951: connext(13, 0x0FFFFFFFFFFFB690, 28) Err#79 ECONNREFUSED

@gireeshpunathil
Copy link
Member

Turns out that connect failure is not with the TCP server, but with the debugger.

Threw the error which came into the process' uncaught exception handler, leaving the worker alive:

bash-4.3$ ./node --expose_internals test/parallel/test-cluster-disconnect-handles.js
Debugger listening on [::1]:12346
/home/gireesh/aix/2016/july/node/test/parallel/test-cluster-disconnect-handles.js:76
throw ex;
^

Error: connect ECONNREFUSED :::12346
at Object.exports._errnoException (util.js:1007:11)
at exports._exceptionWithHostPort (util.js:1030:20)
at TCPConnectWrap.afterConnect as oncomplete

Check the worker status:

bash-4.3$ netstat -na | grep 12346
tcp6 0 0 ::1.12346 . LISTEN
bash-4.3$

Clearly, the master is not connecting to where debug server (worker) is listening.

Debugger listening on [::1]:12346 (IPV6 loopback address)
Master connecting to => :::12346 (IPV6 any available address?)

@gireeshpunathil
Copy link
Member

A simplified client-server test case - passes in other platforms (such as OSX) but fails in AIX with:
events.js:160
throw er; // Unhandled 'error' event
^

Error: connect ECONNREFUSED :::45200
at Object.exports._errnoException (util.js:1007:11)
at exports._exceptionWithHostPort (util.js:1030:20)
at TCPConnectWrap.afterConnect as oncomplete
bash-4.3$ cat s6.js

#cat s6.js 
var http = require('http');
var server = http.createServer(function(req, res) {
res.end();
});
server.listen(45200, '::1', function(a, b) {
});
#cat c6.js 
var http = require('http');
var options = { port: 45200, host: '::', family: 6 };
var req = http.get(options, function(res) {
});

@gireeshpunathil
Copy link
Member

This patch resolves the issue, but digging deeper to get better insights


--- a/test/parallel/test-cluster-disconnect-handles.js
+++ b/test/parallel/test-cluster-disconnect-handles.js
@@ -92,7 +92,7 @@ if (cluster.isMaster) {
     debugger;
   };
   if (common.hasIPv6)
-    server.listen(cb);
+    server.listen(0, '::1', cb);
   else
     server.listen(0, common.localhostIPv4, cb);
   process.on('disconnect', process.exit);

@gireeshpunathil
Copy link
Member

https://tools.ietf.org/html/rfc4291#section-2.5.2

suggests that the unspecified address ('::') should not be used for endpoints. However, a default server socket created uses unspecified address as its end-point, rather than the loopback address (::1)

> var k = require('net').createServer().listen();
undefined
> k.address();
{ address: '::', family: 'IPv6', port: 37986 }
> 

thoughts?
/cc @mhdawson @bnoordhuis

@gibfahn
Copy link
Member

gibfahn commented Jul 8, 2016

@gireeshpunathil Might be related to #7288 if server.listen is defaulting to localhost, which might not be defined as ::1 on the boxes you are testing on.

@gireeshpunathil
Copy link
Member

thanks @gibfahn , will check that.

@gireeshpunathil
Copy link
Member

@gibfahn , I checked what you suggested but looks like that is not the cause:

In the case you mentioned, the issue is with the resolution of loopback address to '::1' and the necessity for this mapping be available in /etc/hosts

In this case, the issue is lack of interchangeability between '::' and '::1'

There exists no documented evidence which supports their interchanged use, and yet most of the platforms (except AIX) seem to support this. Precisely, if you start a server socket which is bound to '::1' then it is able to accept requests from sockets which connect through '::' and vice-versa.

This does not happen in AIX. A server socket bound to '::' is reachable for a client only if it connects to '::'. Same thing applies to '::1' - apparently they discretely different in AIX.

My inference is that AIX strictly follows the IPV6 specification with respect to unspecified address ('::') and loopback address ('::1'), and the test case latches on to the behavior exhibited by other platforms, and hence it fails in AIX.

My proposal is to fix it in the test case to make it work in all platforms including AIX. Will wait for comments before raising a PR. Thanks.

@gibfahn
Copy link
Member

gibfahn commented Jul 12, 2016

@gireeshpunathil Sorry, wrong PR! Should have been #7291

Looks like the same problem on Windows

@gireeshpunathil
Copy link
Member

Thanks @gibfahn - that is a perfect match. Can you please run the sample test case in #7563 (comment) in windows and see if it fails as well? My laptop is a new Mac and am yet to figure out how to connect to a remote windows test machine. thanks in advance.

@gireeshpunathil
Copy link
Member

Checked in Windows. Result is that it does not work with unspecified address ('::'), the client throws EADDRNOTAVIL.

Working on a PR

evanlucas pushed a commit that referenced this issue Jul 19, 2016
The test case fails in AIX due to the mixed-use of unspecified
and loopback addresses. This is not a problem in most platforms
but fails in AIX. (In Windows too, but does not manifest as the
test is omitted in Windows for a different reason).

There exists no documented evidence which supports the mixed use
of unspecified and loopback addresses.

While AIX strictly follows the IPV6 specification with respect to
unspecified address ('::') and loopback address ('::1'), the test
case latches on to the behavior exhibited by other platforms,
and hence it fails in AIX.

The proposed fix is to make it work in all platforms including
AIX by using the loopback address for the client to connect,
as that is the address at which the server listens.

Fixes: #7563
PR-URL: #7702
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
evanlucas pushed a commit that referenced this issue Jul 20, 2016
The test case fails in AIX due to the mixed-use of unspecified
and loopback addresses. This is not a problem in most platforms
but fails in AIX. (In Windows too, but does not manifest as the
test is omitted in Windows for a different reason).

There exists no documented evidence which supports the mixed use
of unspecified and loopback addresses.

While AIX strictly follows the IPV6 specification with respect to
unspecified address ('::') and loopback address ('::1'), the test
case latches on to the behavior exhibited by other platforms,
and hence it fails in AIX.

The proposed fix is to make it work in all platforms including
AIX by using the loopback address for the client to connect,
as that is the address at which the server listens.

Fixes: #7563
PR-URL: #7702
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this issue Sep 30, 2016
The test case fails in AIX due to the mixed-use of unspecified
and loopback addresses. This is not a problem in most platforms
but fails in AIX. (In Windows too, but does not manifest as the
test is omitted in Windows for a different reason).

There exists no documented evidence which supports the mixed use
of unspecified and loopback addresses.

While AIX strictly follows the IPV6 specification with respect to
unspecified address ('::') and loopback address ('::1'), the test
case latches on to the behavior exhibited by other platforms,
and hence it fails in AIX.

The proposed fix is to make it work in all platforms including
AIX by using the loopback address for the client to connect,
as that is the address at which the server listens.

Fixes: #7563
PR-URL: #7702
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
rvagg pushed a commit that referenced this issue Oct 18, 2016
The test case fails in AIX due to the mixed-use of unspecified
and loopback addresses. This is not a problem in most platforms
but fails in AIX. (In Windows too, but does not manifest as the
test is omitted in Windows for a different reason).

There exists no documented evidence which supports the mixed use
of unspecified and loopback addresses.

While AIX strictly follows the IPV6 specification with respect to
unspecified address ('::') and loopback address ('::1'), the test
case latches on to the behavior exhibited by other platforms,
and hence it fails in AIX.

The proposed fix is to make it work in all platforms including
AIX by using the loopback address for the client to connect,
as that is the address at which the server listens.

Fixes: #7563
PR-URL: #7702
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this issue Oct 26, 2016
The test case fails in AIX due to the mixed-use of unspecified
and loopback addresses. This is not a problem in most platforms
but fails in AIX. (In Windows too, but does not manifest as the
test is omitted in Windows for a different reason).

There exists no documented evidence which supports the mixed use
of unspecified and loopback addresses.

While AIX strictly follows the IPV6 specification with respect to
unspecified address ('::') and loopback address ('::1'), the test
case latches on to the behavior exhibited by other platforms,
and hence it fails in AIX.

The proposed fix is to make it work in all platforms including
AIX by using the loopback address for the client to connect,
as that is the address at which the server listens.

Fixes: #7563
PR-URL: #7702
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

No branches or pull requests

4 participants