Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: investigate flaky - parallel/test-readline-interface #14674

Closed
refack opened this issue Aug 7, 2017 · 10 comments
Closed

test: investigate flaky - parallel/test-readline-interface #14674

refack opened this issue Aug 7, 2017 · 10 comments
Assignees
Labels
arm Issues and PRs related to the ARM platform. flaky-test Issues and PRs related to the tests with unstable failures on the CI. readline Issues and PRs related to the built-in readline module. test Issues and PRs related to the tests.

Comments

@refack
Copy link
Contributor

refack commented Aug 7, 2017

  • Version: master
  • Platform: pi1-raspbian-wheezy
  • Subsystem: readline
not ok 155 parallel/test-readline-interface
  ---
  duration_ms: 3.59
  severity: fail
  stack: |-
    assert.js:42
      throw new errors.AssertionError({
      ^
    
    AssertionError [ERR_ASSERTION]: 2 === 1
        at Timeout.common.mustCall (/home/iojs/build/workspace/node-test-binary-arm/test/parallel/test-readline-interface.js:349:14)
        at Timeout._onTimeout (/home/iojs/build/workspace/node-test-binary-arm/test/common/index.js:507:15)
        at ontimeout (timers.js:471:11)
        at tryOnTimeout (timers.js:305:5)
        at Timer.listOnTimeout (timers.js:265:5)
  ...

Refs: #13497
https://ci.nodejs.org/job/node-test-binary-arm/9466/RUN_SUBSET=5,label=pi1-raspbian-wheezy/
/cc @nodejs/platform-arm @addaleax @jasnell @Azard @princejwesley

@refack refack added arm Issues and PRs related to the ARM platform. flaky-test Issues and PRs related to the tests with unstable failures on the CI. readline Issues and PRs related to the built-in readline module. test Issues and PRs related to the tests. labels Aug 7, 2017
@addaleax
Copy link
Member

addaleax commented Aug 7, 2017

If somebody has the time, it might be helpful to insert a couple debug statements to check the order in which events happen and then kick of a stress test until this reproduces. Or maybe just increasing the timeouts/using platformTimeout might work?

@Azard
Copy link
Contributor

Azard commented Aug 7, 2017

I support use platformTimeout , 200ms and 500ms maybe too short for raspbian parallel test.

@Trott
Copy link
Member

Trott commented Aug 7, 2017

This is already at the top of my short list of things to deal with. daf5596 was a precursor for making it more modular. Definitely flaky under load when resources are constrained.

@Trott Trott self-assigned this Aug 7, 2017
@Trott
Copy link
Member

Trott commented Aug 7, 2017

It's flaky even if I eliminate all the timers etc. and just have a test that looks like this:

'use strict';
const common = require('../common');
const assert = require('assert');
const readline = require('readline');
const internalReadline = require('internal/readline');
const EventEmitter = require('events').EventEmitter;
const inherits = require('util').inherits;
const { Writable, Readable } = require('stream');

function FakeInput() {
  EventEmitter.call(this);
}
inherits(FakeInput, EventEmitter);
FakeInput.prototype.resume = () => {};
FakeInput.prototype.pause = () => {};
FakeInput.prototype.write = () => {};
FakeInput.prototype.end = () => {};

[false].forEach(function(terminal) {
  {
    const fi = new FakeInput();
    const rli = new readline.Interface(
      { input: fi, output: fi, terminal: terminal }
    );
    const expectedLines = ['foo', 'bar', 'baz', 'bat'];
    let callCount = 0;
    rli.on('line', function(line) {
      assert.strictEqual(line, expectedLines[callCount]);
      callCount++;
    });
    expectedLines.forEach(function(line) {
      fi.emit('data', `${line}\r`);
      fi.emit('data', '\n');
    });
    assert.strictEqual(callCount, expectedLines.length);
    rli.close();
  }
});

Running it with this:

tools/test.py -j 96 --repeat 96 test/parallel/test-readline-interface.js

… still shows failures like this:

AssertionError [ERR_ASSERTION]: '' === 'bar'

This means that line events are being emitted but with the data erroneously set to an empty string.

The question is: Is this a feature (under resource constraints, stuff starts getting dropped) or is it actually a bug in readline because the data should never be dropped?

@Trott
Copy link
Member

Trott commented Aug 7, 2017

The question is: Is this a feature (under resource constraints, stuff starts getting dropped) or is it actually a bug in readline because the data should never be dropped?

@nodejs/streams maybe?

@addaleax
Copy link
Member

addaleax commented Aug 7, 2017

@Trott I don’t think data is getting dropped, it’s that the delay between processing \r and \n is so long that readline thinks they are actually separate line endings.

I do think it’s problematic that this happens even for synchronously emitted input. Maybe we should be using process.binding('timer_wrap').Timer.now() instead of Date.now()?

@Trott
Copy link
Member

Trott commented Aug 8, 2017

@Trott I don’t think data is getting dropped, it’s that the delay between processing \r and \n is so long that readline thinks they are actually separate line endings.

😮 Thank you. That makes way more sense.

@Trott
Copy link
Member

Trott commented Aug 8, 2017

I do think it’s problematic that this happens even for synchronously emitted input. Maybe we should be using process.binding('timer_wrap').Timer.now() instead of Date.now()?

@addaleax Brilliant. That doesn't solve all the test's problems, but it solves most of them (and the primary remaining issue looks like it was solved by @Azard in another PR so we're really close...).

@addaleax
Copy link
Member

addaleax commented Aug 8, 2017

@Trott Did you try that out? If it works, great 👍 Feel free to do a PR yourself if you have the time :)

@Trott
Copy link
Member

Trott commented Aug 8, 2017

@Trott Did you try that out? If it works, great 👍 Feel free to do a PR yourself if you have the time :)

#14681

@Trott Trott mentioned this issue Aug 8, 2017
2 tasks
Trott pushed a commit to Trott/io.js that referenced this issue Aug 8, 2017
Previous unit test delay is too short for parallel test on raspberry pi,
it will fail sometimes.  This PR use common.platformTimeout and widen
the time gap.

PR-URL: nodejs#14677
Ref: nodejs#14674
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Trott added a commit to Trott/io.js that referenced this issue Aug 8, 2017
Using Date.now() introduces problems when operating under load or
otherwise with constrained resources. Use Timer.now() to mitigate.

The problem was identified in `test-readline-interface` where under
heavy load, `\r` and `\n` were received so far apart that they were
treated as separate line endings rather than a single line ending.
Switching to `Timer.now()` prevented this from happening.

Refs: nodejs#14674
Trott added a commit to Trott/io.js that referenced this issue Aug 8, 2017
Two test cases in `test-readline-interface` are sensitive to resource
constraints (probably due to `\r` and `\n` not arriving within the
appropriate delay to be treated as a single line ending). Move those
tests to `sequential`.

Fixes: nodejs#14674
Trott added a commit to Trott/io.js that referenced this issue Aug 10, 2017
Using Date.now() introduces problems when operating under load or
otherwise with constrained resources. Use Timer.now() to mitigate.

The problem was identified in `test-readline-interface` where under
heavy load, `\r` and `\n` were received so far apart that they were
treated as separate line endings rather than a single line ending.
Switching to `Timer.now()` prevented this from happening.

PR-URL: nodejs#14681
Refs: nodejs#14674
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
@Trott Trott closed this as completed in c7a9a2e Aug 10, 2017
addaleax pushed a commit that referenced this issue Aug 10, 2017
Previous unit test delay is too short for parallel test on raspberry pi,
it will fail sometimes.  This PR use common.platformTimeout and widen
the time gap.

PR-URL: #14677
Ref: #14674
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
addaleax pushed a commit that referenced this issue Aug 10, 2017
Using Date.now() introduces problems when operating under load or
otherwise with constrained resources. Use Timer.now() to mitigate.

The problem was identified in `test-readline-interface` where under
heavy load, `\r` and `\n` were received so far apart that they were
treated as separate line endings rather than a single line ending.
Switching to `Timer.now()` prevented this from happening.

PR-URL: #14681
Refs: #14674
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
addaleax pushed a commit that referenced this issue Aug 10, 2017
Two test cases in `test-readline-interface` are sensitive to resource
constraints (probably due to `\r` and `\n` not arriving within the
appropriate delay to be treated as a single line ending). Move those
tests to `sequential`.

PR-URL: #14681
Fixes: #14674
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
MylesBorins pushed a commit that referenced this issue Sep 19, 2017
Previous unit test delay is too short for parallel test on raspberry pi,
it will fail sometimes.  This PR use common.platformTimeout and widen
the time gap.

PR-URL: #14677
Ref: #14674
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
apapirovski added a commit to apapirovski/node that referenced this issue Feb 4, 2018
The readline module wants a truthy time while using Timer.now() doesn't
necessarily guarantee that early on in the process' life. It also
doesn't actually resolve the timing issues experienced in an earlier
issue. Instead, this PR fixes the related tests and moves them back
to parallel.

Refs: nodejs#14674
BridgeAR pushed a commit to BridgeAR/node that referenced this issue Feb 7, 2018
The readline module wants a truthy time while using Timer.now() doesn't
necessarily guarantee that early on in the process' life. It also
doesn't actually resolve the timing issues experienced in an earlier
issue. Instead, this PR fixes the related tests and moves them back
to parallel.

Refs: nodejs#14674

PR-URL: nodejs#18563
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
MylesBorins pushed a commit that referenced this issue Feb 21, 2018
The readline module wants a truthy time while using Timer.now() doesn't
necessarily guarantee that early on in the process' life. It also
doesn't actually resolve the timing issues experienced in an earlier
issue. Instead, this PR fixes the related tests and moves them back
to parallel.

Refs: #14674

PR-URL: #18563
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
MylesBorins pushed a commit that referenced this issue Feb 21, 2018
The readline module wants a truthy time while using Timer.now() doesn't
necessarily guarantee that early on in the process' life. It also
doesn't actually resolve the timing issues experienced in an earlier
issue. Instead, this PR fixes the related tests and moves them back
to parallel.

Refs: #14674

PR-URL: #18563
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
MylesBorins pushed a commit that referenced this issue Feb 21, 2018
The readline module wants a truthy time while using Timer.now() doesn't
necessarily guarantee that early on in the process' life. It also
doesn't actually resolve the timing issues experienced in an earlier
issue. Instead, this PR fixes the related tests and moves them back
to parallel.

Refs: #14674

PR-URL: #18563
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
gibfahn pushed a commit that referenced this issue Apr 13, 2018
The readline module wants a truthy time while using Timer.now() doesn't
necessarily guarantee that early on in the process' life. It also
doesn't actually resolve the timing issues experienced in an earlier
issue. Instead, this PR fixes the related tests and moves them back
to parallel.

Refs: #14674

PR-URL: #18563
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
MayaLekova pushed a commit to MayaLekova/node that referenced this issue May 8, 2018
The readline module wants a truthy time while using Timer.now() doesn't
necessarily guarantee that early on in the process' life. It also
doesn't actually resolve the timing issues experienced in an earlier
issue. Instead, this PR fixes the related tests and moves them back
to parallel.

Refs: nodejs#14674

PR-URL: nodejs#18563
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arm Issues and PRs related to the ARM platform. flaky-test Issues and PRs related to the tests with unstable failures on the CI. readline Issues and PRs related to the built-in readline module. test Issues and PRs related to the tests.
Projects
None yet
Development

No branches or pull requests

4 participants