Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node.js cluster, workers stdouts overlay each other on some hosts #12724

Closed
TheRoSS opened this issue Apr 28, 2017 · 9 comments
Closed

node.js cluster, workers stdouts overlay each other on some hosts #12724

TheRoSS opened this issue Apr 28, 2017 · 9 comments

Comments

@TheRoSS
Copy link

TheRoSS commented Apr 28, 2017

Hi. I've encountered very strange behaviour while redirecting workers stdout to file.
On hosts host-1-ok and host-2-ok my code works always as expected but on host host-3-fail the stdout streams from workers overlay each other.

I cannot understand why and what I can do to fix this problem on host-3-fail. Even if my issue is not a core bug please help me to gain an understanding of what is going on and how to solve this problem.

The problem details are here:
http://stackoverflow.com/questions/43663065/node-js-cluster-redirecting-the-childs-stdout-to-file-breaks-data

@mscdex
Copy link
Contributor

mscdex commented Apr 28, 2017

FWIW v0.10.x and v0.12.x are no longer supported.

@cjihrig
Copy link
Contributor

cjihrig commented Apr 28, 2017

If you're using Node 0.10.47, 0.10.48, and 0.12.7 then you're going to need to update. Those release lines are EOL

@TheRoSS
Copy link
Author

TheRoSS commented Apr 28, 2017

Just tried to use v6.3.0 and v7.9.0
nothing changed

But the main issue is why on host1 and host2 the code always works good, but on host3 it always fails

@cjihrig
Copy link
Contributor

cjihrig commented Apr 28, 2017

So you have multiple cluster worker processes reading files at the same time and sending the output to the same place? I think it's expected that things could be interleaved.

@TheRoSS
Copy link
Author

TheRoSS commented Apr 28, 2017

If it would be a multi threaded C++ program I would expect interleaving because it have no locking from the box. But from the node documentation I see that cluster workers send their output to parent and parent synchronously writes it to file or console (page describing api 'process').

So I expect no interleaving in node.
Even more: why is there no interleaving on host1 and host2 at all?

If no interleaving on host1 and host2 is some unexpected side behaviour than give me please some hints how to do my task with node correct way

@sam-github
Copy link
Contributor

sam-github commented Apr 28, 2017

But from the node documentation I see that cluster workers send their output to parent

They do not do that. The data reading and writing your cluster workers are doing are happening in completley independent threads (EDIT: I mean processes, which are more independent than threads), there is no synchronization. What docs did you see that makes you think cluster workers do I/O thorugh their parent?

var cluster = require("cluster");
var fs = require("fs");

var workerId = cluster.worker && cluster.worker.id || 1;
var stream = fs.createReadStream("./data" + workerId + ".log");

stream.on("end", function () {
    process.exit();
});

stream.pipe(process.stdout);

This cluster code is the equivalent of cat data1.log >> data.log & cat data2.log >> data.log&.

The code has several problems:

  • you assume that lines of output will equal lines of input, but you aren't reading atomically by line, or writing atomically by line, so this assumption is wrong (the reads and writes will be in terms of arbitrarily sized buffers, with no likelyhood of falling on line boundaries)
  • you exit after reading all data, but before writing all data, as @mscdex pointed out in stackoverflow
  • the output ordering is indeterminate, depending on I/O scheduling and process scheduling, none of which are guaranteed, and can vary seemingly randomly based on kernel version, memory, other processing, etc. I don't think its coincidence that the two hosts with the same kernel version schedule more consistently similarly than the host with a different kernel

@Fishrock123
Copy link
Contributor

You should output to separate files.

@TheRoSS
Copy link
Author

TheRoSS commented May 3, 2017

  • I simplified the example to make it as more obvious as possible. My program reads and writes lines atomically, but this does not affect the result so I omitted excess code
  • as mentioned in https://nodejs.org/api/process.html#process_process_stdout "process.stdout and process.stderr differ from other Node.js streams in important ways: ... Writes may be synchronous depending on the what the stream is connected to and whether the system is Windows or Unix: Files: synchronous on Windows and Linux". So I think all data will be written to file before process.exit. But you are right it would be better to use disconnect instead of process.exit
  • I think too it's not coincidence. And I want to understand what does affect on it? And what can I tune to make it work on ubuntu 14.04? Is there a right way to code stdout redirection of child processes?

@TheRoSS
Copy link
Author

TheRoSS commented May 17, 2017

I found solution
To fix the problem one must explicitly set piping mode in master code:

var cluster = require("cluster");

cluster.setupMaster({
    exec: "./worker.js",
=>    silent: true
});

for (var i = 0; i < 2; i++) {
        var worker = cluster.fork();
=>        worker.process.stdout.pipe(process.stdout);
=>        worker.process.stderr.pipe(process.stderr);
}

I think the reason for such differences in the behaviour of the same code on different versions of ubuntu is in the linux kernel file handle buffer implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants