regression: data truncation with stream.writev in IPC channel #24992

gireeshpunathil · 2018-12-12T14:09:56Z

Version: all supported versions
Platform: all UNIX variants
Subsystem: stream, child_process

I was working on a potential solution for the data truncation in child standard streams upon process exit problem, and came across this scenario:

const cp = require('child_process')
const KB = 1024
const MB = KB * KB
if (process.argv[2] === 'child') {
  const data = Buffer.alloc(KB).fill('x').toString()
  for (var i = 0; i < MB; i++)
    console.log(data)
  // process.exit(0)
} else {
  const child = cp.spawn(process.execPath, [__filename, 'child'])
  var count = 0
  child.stdout.on('data', c => count += c.length)
  child.stdout.on('end', () =>  console.log(count))
  child.stderr.on('data', (d) =>  console.log(d.toString()))
}

create a 1KB buffer in a spawned child.
write it to the parent 1MB times.
parent should get 1074790400 bytes (1GB +1MB new line chars that console.log adds)

actual data obtained in UNIX is much less, and is arbitrary in each run. Windows works fine.

The behavior is well understood in the presence of process.exit but in the absence of it, the process is supposed to drain all the data while there are active requests in the event loop. That is not happening here.

Tracing back, I landed at d85d120 which introduced vectored writes in pipe streams. With this in place, when the channel experiences congestion, it throws UV_ENOBUFS which is passed to the calling code unhandled:

Error: write ENOBUFS
    at exports._errnoException (util.js:1026:11)
    at Socket._writeGeneric (net.js:715:26)
    at Socket._writev (net.js:729:8)
    at doWrite (_stream_writable.js:324:12)
    at clearBuffer (_stream_writable.js:413:5)
    at onwrite (_stream_writable.js:365:7)
    at WriteWrap.afterWrite [as oncomplete] (net.js:816:12)

Further, commit 1d6b729 inadvertently caused this error from not being visible.

Subsequently there are lot of changes in stream_base.cc and stream_wrap.cc around DoTryWrite and Writev which makes it difficult to pin-point any particular point where this can be rectified.

Looking at Writev implementation, it does not handle when the data size is more than INT_MAX. In my opinion this should be fixed, as vectored write is an internal mechanism that should hide itself from the JS API. Also, looks like it does not leverage libuv's write-when-you-can capability (uv__write)?

/cc @nodejs/streams @nodejs/child_process

Happy to do any further assistance / testing.

The text was updated successfully, but these errors were encountered:

addaleax · 2018-12-12T17:10:16Z

Do you know the root cause of ENOBUFS? The error code seems to come from libuv, not Node.js or the OS.

gireeshpunathil · 2018-12-13T05:37:43Z

@addaleax - libuv does not throw ENOBUFS for writes AFAIK (as it already has the write buffers with it), only for reads.

this is where it is thrown:

node/src/stream_base.cc

Lines 106 to 107 in a1b283c

    
           if (storage_size > INT_MAX) 
        
             return UV_ENOBUFS;

Here is the sequence of events that led to this scenario IIUC.

initial chunks are dispatched and written individually, with no issues.
at some point, the channel experiences congestion.
libuv returns EAGAIN (resources temporarily unavailable).
this causes the stream to accumulate further requests within itself.
at some point (probably when one of the POLLOUT watcher states readiness) the stream attempts to write, basically drain all the buffers - by this time there are a lot.
node::StreamBase::Writev manages this, and sees that the total data in all the chunks are more than INT_MAX and it fails.

IMO, either

Writev should manage the complexity by writing as much as it can and return the rest of data back to stream at JS side or
JS stream should send only chunks that Writev can manage and buffer the rest from within.

gireeshpunathil · 2018-12-13T08:39:02Z

ok, I am able to confirm above theory - If I comment the fast path block in stream_writable

node/lib/_stream_writable.js

Lines 504 to 519 in a3801e9

    
           if (stream._writev && entry && entry.next) { 
        
             // Fast case, write everything using _writev() 
        
             var l = state.bufferedRequestCount; 
        
             var buffer = new Array(l); 
        
             var holder = state.corkedRequestsFree; 
        
             holder.entry = entry; 
        
             var count = 0; 
        
             var allBuffers = true; 
        
             while (entry) { 
        
               buffer[count] = entry; 
        
               if (!entry.isBuf) 
        
                 allBuffers = false; 
        
               entry = entry.next; 
        
               count += 1; 
        
             }

then I get the full data.

So the inference and fix path remain in tact. Ability to manage chunks whose cumulative byte count exceeds INT_MAX - either in JS side (stream.writev), or in C++ side (StreamBase::Writev), is what we are looking for.

mcollina · 2018-12-13T09:33:14Z

Good analysis @gireeshpunathil! I stumbled upon some bad EAGAIN behavior recently in a userland module that I maintain (pinojs/sonic-boom#17).
I know these scenarios are really hard to test and fix and I dig into similar variations of this problem.

this causes the stream to accumulate further requests within itself.

Backpressure/flow control should be kicking in at this point and well-written applications should limit memory usage. Isn't it? (There are a lot of cases where it's not possible to apply backpressure correctly, such as with logging, console.log and ttys).

IMO, either

Writev should manage the complexity by writing as much as it can and return the rest of data back to stream at JS side or

JS stream should send only chunks that Writev can manage and buffer the rest from within.

We are not in agreement. I think this is a problem related to how console.log operates and the constraints we have for stdio. In fact, this is only one example of possible data corruption that can happen in this scenario, i.e. if the user calls process.exit(0) while some data is still buffered in the stream, that data will be lost. In order to avoid this data loss, we go to long length to ensure stdout and stderr is synchronous when writing to TTYs and files (https://github.com/nodejs/node/blob/master/lib/internal/process/stdio.js#L168-L179). For example, if stdout is a file and we cannot write fast enough, we block the whole process to avoid missing data - and our users expects that. Unfortunately, we are not doing the same when stdout and stderr are pipes.

I propose:

for all of fs and net, we keep the current behavior of erroring the stream when the mentioned condition happens. This is the safest and more performant behavior.
for stdout and stderr, we should have the main thread wait/stall until the data cannot be flushed. Basically there is a misbehaving user (console.log) that does not apply backpressure, and the only solution is to apply backpressure to the whole process.

addaleax · 2018-12-13T11:44:07Z

@gireeshpunathil Thanks, this makes sense!

I think I agree with you about our options, but just to make sure we’re in agreement: We would either

transparently modify the native Writev to keep an extra queue for data exceeding INT_MAX, or
introduce a maximum total data size to _writev in streams (e.g. 1 GB)

Right?

@mcollina I think you’re talking about a different kind of issue; this isn’t really related to stdio specifically, or premature exits?

this causes the stream to accumulate further requests within itself.

Backpressure/flow control should be kicking in at this point and well-written applications should limit memory usage. Isn't it?

@mcollina The problem is that, when we provide _writev, it will try to write all buffered data in a single batch, even if that batch is huge (and in this case exceeds our own storage limits).

for stdout and stderr, we should have the main thread wait/stall until the data cannot be flushed.

@mcollina I have libuv/libuv#390 on my backlog … once that is implemented, I think a clean solution would be to move stdio streams to a new, separate loop when Node exits, and to spin that loop until it exits just for stdio?

mcollina · 2018-12-13T11:56:23Z

transparently modify the native Writev to keep an extra queue for data exceeding INT_MAX, or

I'm -1 to this approach, as we will keep accepting data from a misbehaving source. The source should slow down otherwise memory usage will grow without control. At some point, the process will crash for out of memory.

Introduce a maximum total data size to _writev in streams (e.g. 1 GB)

Then we will still error/lose data, just at a different point. It will not solve the issue.
I'm -1 to this solution as well.

@mcollina I think you’re talking about a different kind of issue; this isn’t really related to stdio specifically, or premature exits?

I think it is, check the example - it's using console.log. The behavior this issue describes can only happen with a client that writes too much data without checking for backpressure. If this happens for any stream, the solution is to properly implement backpressure in the producer, not to keep increasing our memory footprint. Why would console.log be any different?

gireeshpunathil · 2018-12-13T13:14:20Z

@addaleax -

transparently modify the native Writev to keep an extra queue for data exceeding INT_MAX

agree. We just need to invoke libuv writers multiple times instead of once, slicing it appropriately. The POLLOUT watchers that libuv inserts take care of draining those at appropriate times? If the data grows indefinitely and leads to OOM at that stage, that is fine.

introduce a maximum total data size to _writev in streams (e.g. 1 GB)

agree, but dispatch the rest of data in a subsequent call(s)? In other words, call native Writev only with how much data it can handle.

I have libuv/libuv#390 on my backlog

An approach that I was trying is here: gireeshpunathil/libuv@e6e3702

@mcollina -

If we use a buffer instead of string, this truncation does not happen. With buffers, native Writev just transports as much data it was sent with. Only when the chunks contain strings it fails. The failure reason IMO is that it cannot compose a single v8::String of that size from coalescing the fragment strings. But from user's point of view, they did not mean to send those as a single string, so it is not their problem?
backpressure makes relevance only if we are looking at real OOM situations, which is not the case here? for example, the data in question is already stored in the JS heap without any issues, and problem is with the transport logic; and hence I called it as an internal complexity.
and, this worked without any issue prior to commit d85d120 so rate limiting / managing at userland does not apply?
and finally, process.exit scenario is different, though that was my starting point for this finding. Also this is not specific to console.log or standard streams, any stream that employs writev function are affected.

mcollina · 2018-12-13T14:07:04Z

If we use a buffer instead of string, this truncation does not happen. With buffers, native Writev just transports as much data it was sent with. Only when the chunks contain strings it fails. The failure reason IMO is that it cannot compose a single v8::String of that size from coalescing the fragment strings. But from user's point of view, they did not mean to send those as a single string, so it is not their problem?

It is their problem. They are creating an application that is writing a huge amount of data to standard output without any backpressure controls. The root cause of the problem is that console.log is completely ignoring backpressure in https://github.com/nodejs/node/blob/master/lib/internal/console/constructor.js#L224: Writable is pushing back saying that it's over its highwatermark, but console is not doing anything with it, and it keeps writing without a chance for the stream to pick up and write that data.

backpressure makes relevance only if we are looking at real OOM situations, which is not the case here? for example, the data in question is already stored in the JS heap without any issues, and problem is with the transport logic; and hence I called it as an internal complexity.

Actually, it will end up in a OOM situation if you increase enough the amount of data that is being sent. Moreover, this will cause a huge memory usage spike that the user would not predict for. If we are not slowing the producer down, the process is going to crash anyway.

and, this worked without any issue prior to commit d85d120 so rate limiting / managing at userland does not apply?

I think d85d120 just exposed an issue that has always been there. The buffering issue with pipes and console.log was just happening on the JS level.

and finally, process.exit scenario is different, though that was my starting point for this finding. Also this is not specific to console.log or standard streams, any stream that employs writev function are affected.

My point is that for streams, this is totally normal and expected. For console.log, it's not. I think to fix this class of bugs, we must solve the discrepancy between console.log when using files or TTYs and when there is a pipe. Somehow, if stream.write() return false in https://github.com/nodejs/node/blob/master/lib/internal/console/constructor.js#L224 we should slow down the producer, i.e. the main JS thread.
I think for console.log usage we really need a writeSync operation.

gireeshpunathil · 2018-12-13T15:14:29Z

ok, here is a version that does not use console:
$ cat noipc.js

const n = require('net')
const KB = 1024
const MB = KB * KB
n.createServer((m) => {
  var count = 0
  setTimeout(() => {
    m.on('data', c => count += c.length)
    m.on('end', () =>  console.log(count))
  }, 2000)
}).listen(32000, () => {
  const p = n.connect(32000, () => {
  const data = Buffer.alloc(KB).fill('x').toString()
  for (var i = 0; i < MB; i++)
    p.write(data)
    p.end()
  })
})

$ node noipc

Error: write ENOBUFS
    at afterWriteDispatched (internal/stream_base_commons.js:125:25)
    at writevGeneric (internal/stream_base_commons.js:111:3)
    at Socket._writeGeneric (net.js:720:5)
    at Socket._writev (net.js:729:8)
    at doWrite (_stream_writable.js:413:12)
    at clearBuffer (_stream_writable.js:522:5)
    at onwrite (_stream_writable.js:470:7)
    at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:75:10)
Emitted 'error' event at:
    at errorOrDestroy (internal/streams/destroy.js:98:12)
    at onwriteError (_stream_writable.js:430:5)
    at onwrite (_stream_writable.js:461:5)
    at _destroy (internal/streams/destroy.js:40:7)
    at Socket._destroy (net.js:619:3)
    at Socket.destroy (internal/streams/destroy.js:32:8)
    at afterWriteDispatched (internal/stream_base_commons.js:125:17)
    at writevGeneric (internal/stream_base_commons.js:111:3)
    at Socket._writeGeneric (net.js:720:5)
    at Socket._writev (net.js:729:8)

mcollina · 2018-12-14T08:54:40Z

I do not see any issue at all with that example (#24992 (comment)), and I do not think that is a bug. An application that manipulates streams without handling backpressure is going to have major issues in any case, and this is just one of many.

I consider the data truncation with console.log a bug.

On a side note, I did solve this issue in pino pinojs/pino#566 by using fs.writeSync(fd, str).

gireeshpunathil · 2018-12-14T15:14:49Z

@mcollina - so who / what defines a backpressure limit? is it a string whose un-tersed byte length is INT_MAX? and who controls it? If I increase --max-old-space-size or any other tunables for that matter, I am unable to change this behavior?

From the VM perspective, it should define how much data it can handle under which abstractions, and the limitations should be documented under the corresponding APIs. Here, _writev was not a user's choice, so any limitations affects the stream in general.

From an end-user perspective, inability to transport a 1GB data (that was accumulated over time due reasons outside of program's control) looks like a bug to me.

And, we can actually fix it! I have come up with something that works: #25044. But given the sensitivity of the code area, would love more feedback and view points. thanks!

Vectored writes that contain large string data (accumulated from small strings over time due to congestion in the stream) fails with ENOBUFS if the cumulative chunk size is more than INT_MAX. Under backpressure situations failure is justified in JS land with heap OOM as well as in the native land with libuv resource exhaustion etc, but the stream wrap that sits in the middle which just facilitates the transport between layers is not. Detect the large data situation, and split those at right boundaries. Carry out intermediary writes through dummy write_wrap objects to avoid multiple callbacks to the requestor. Fixes: nodejs#24992

mcollina · 2018-12-14T16:55:34Z

I think that an end-user trying to append 1GB of data to a Stream synchronously has a bug in their code. We are not in agreement if Node.js should keep buffering until it crashes for OOM or it should emit a proper error if a stream do not use backpressure. I do not think this disagreement is solvable with further discussion.

gireeshpunathil · 2019-01-04T04:12:02Z

closing based on @mcollina 's feedback.

I have experimented with a 'workround' in JS side as opposed to the C++ side as attempted in #25044 , that detects the ENOBUF situation pro-actively and falls back to slow path (which is always capable to handling any amount of data chunks). Just keeping it here for future reference.

diff --git a/lib/_stream_writable.js b/lib/_stream_writable.js
index 570c4c4..035cf25 100644
--- a/lib/_stream_writable.js
+++ b/lib/_stream_writable.js
@@ -28,6 +28,7 @@
 module.exports = Writable;
 Writable.WritableState = WritableState;
 
+const MAX_CHAR = 536870911; // INT_MAX / 4
 const internalUtil = require('internal/util');
 const Stream = require('stream');
 const { Buffer } = require('buffer');
@@ -500,8 +501,17 @@ function onwriteDrain(stream, state) {
 function clearBuffer(stream, state) {
   state.bufferProcessing = true;
   var entry = state.bufferedRequest;
+  var count = 0;
+  var fast = true;
+  while (entry) {
+    var chunk = entry.chunk;
+    count += state.objectMode ? 1 : chunk.length;
+    if (count >= MAX_CHAR) { fast = false; break;}
+    entry = entry.next;
+  }
+  entry = state.bufferedRequest;
 
-  if (stream._writev && entry && entry.next) {
+  if (fast && stream._writev && entry && entry.next) {
     // Fast case, write everything using _writev()
     var l = state.bufferedRequestCount;
     var buffer = new Array(l);

mcollina · 2019-01-04T07:10:31Z

My feedback is related to writing down to a stream. However, I think the console.log behavior is a bug when a pipe is used.

Fixes: nodejs#24992

mcollina · 2019-01-22T12:55:38Z

Here is my take to solve this issue: #25638.

jasnell · 2020-06-26T01:16:28Z

ping @mcollina and @gireeshpunathil ... any further thoughts on what we should do here?

gireeshpunathil · 2020-06-26T02:32:29Z

My assessment is that it cannot be solved without side effects. So may be, close and re-visit if this becomes problematic for real world scenarios?

gireeshpunathil added child_process Issues and PRs related to the child_process subsystem. stream Issues and PRs related to the stream subsystem. labels Dec 12, 2018

gireeshpunathil mentioned this issue Dec 14, 2018

src: enable Writev to write beyond INT_MAX #25044

Closed

2 tasks

gireeshpunathil closed this as completed Jan 4, 2019

gireeshpunathil reopened this Jan 4, 2019

mcollina added a commit to mcollina/node that referenced this issue Jan 22, 2019

console: use synchronous write when the process is piped

ffae4ab

Fixes: nodejs#24992

mcollina mentioned this issue Jan 22, 2019

console: use synchronous write when the process is piped #25638

Closed

3 tasks

jasnell added the help wanted Issues that need assistance from volunteers or PRs that need help to proceed. label Jun 26, 2020

gireeshpunathil closed this as completed Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression: data truncation with stream.writev in IPC channel #24992

regression: data truncation with stream.writev in IPC channel #24992

gireeshpunathil commented Dec 12, 2018

addaleax commented Dec 12, 2018

gireeshpunathil commented Dec 13, 2018

gireeshpunathil commented Dec 13, 2018

mcollina commented Dec 13, 2018

addaleax commented Dec 13, 2018

mcollina commented Dec 13, 2018

gireeshpunathil commented Dec 13, 2018

mcollina commented Dec 13, 2018

gireeshpunathil commented Dec 13, 2018

mcollina commented Dec 14, 2018

gireeshpunathil commented Dec 14, 2018

mcollina commented Dec 14, 2018

gireeshpunathil commented Jan 4, 2019

mcollina commented Jan 4, 2019

mcollina commented Jan 22, 2019

jasnell commented Jun 26, 2020

gireeshpunathil commented Jun 26, 2020

regression: data truncation with stream.writev in IPC channel #24992

regression: data truncation with stream.writev in IPC channel #24992

Comments

gireeshpunathil commented Dec 12, 2018

addaleax commented Dec 12, 2018

gireeshpunathil commented Dec 13, 2018

gireeshpunathil commented Dec 13, 2018

mcollina commented Dec 13, 2018

addaleax commented Dec 13, 2018

mcollina commented Dec 13, 2018

gireeshpunathil commented Dec 13, 2018

mcollina commented Dec 13, 2018

gireeshpunathil commented Dec 13, 2018

mcollina commented Dec 14, 2018

gireeshpunathil commented Dec 14, 2018

mcollina commented Dec 14, 2018

gireeshpunathil commented Jan 4, 2019

mcollina commented Jan 4, 2019

mcollina commented Jan 22, 2019

jasnell commented Jun 26, 2020

gireeshpunathil commented Jun 26, 2020