Add guide: "Backpressuring in Streams" #1109

jalafel · 2017-01-23T04:22:30Z

This is was an issue opened on the nodejs/node repository before guides were transferred directly to node/nodejs.org.

This PR contains a first go at a guide on backpressures, and missing some vital points that I do not feel equipped to answer. I have left some empty spaces for them! I would appreciate anyone's input on the content, structure, and any additions to the guide.

Some points to expand on would be the problems that streams pose which are solved by backpressure. I think I hit most of the points, but I don't think I'm versed enough to go into detail.

There is also space for an example app, but again, I am not familiar on the current best practices on building streams and for implementing your own backpressure.

Thanks!

Refs: nodejs/node#10766

lpinca · 2017-01-23T09:56:58Z

cc: @nodejs/streams

mcollina

The work is solid, I think the intro should be changed to explain what is the problem that streams solve.

mcollina · 2017-01-25T18:00:42Z

locale/en/docs/guides/backpressuring-in-streams.md

+data and the _consumer_.
+
+In Node.js the source is a [`Readable`][] stream and the consumer is the 
+[`Writable`][] stream (both of these may be interchanged with a [`Duplex`][] 


or a Transform

mcollina · 2017-01-25T18:05:42Z

locale/en/docs/guides/backpressuring-in-streams.md

+help expand your understanding while reading this guide and for following along 
+with examples.
+
+## The Problem with Streams


There is not a problem with streams. There is a problem that streams solve, which is backpressure, also called flow control in other communities. There are other solutions to backpressure. Unix pipes and TCP sockets solves this as well, and are the root of the following. Also, streams can be used for pull-based data handling.

The usual demo I do to show the problem is zipping a very large data file both in one go and with streams. As the first one crashes, the second one can work with data of any size. (the original credit goes to @davidmarkclements for that example). That shows the problem.

Thanks for this review @mcollina. So essentially there is not separation from streams & backpressure? As in ... data-handling is tricky, but streams solves it with backpressure?

As in "data handling is tricky, and streams is the solution that Node has adopted".

Syntactically, I am still a bit confused. Wikipedia describes backpressure in IT as

the build-up of data behind an I/O switch if the buffers are full and incapable of receiving any more data; the transmitting device halts the sending of data packets until the buffers have been emptied and are once more capable of storing information.

In this instance .. is backpressure the problem or the I/O switch that streams implements?

In this instance .. is backpressure the problem or the I/O switch that streams implements?

Backpressure itself is the problem, and the backpressure handling mechanisms in the streams classes are our solution to it. :)

mcollina · 2017-01-25T18:07:49Z

locale/en/docs/guides/backpressuring-in-streams.md

+                                           +------------+  add chunk to queue  |
+                                           |            <--^-------------------<
+                                           +============+
+```


This diagram is not exactly clear. Why everything is going back to pipe()?

Good question. I assumed every event emitter was being found back into pipe, and then pipe would delegate what would do next. Is that incorrect?

@jessicaquynh it is incorrect. pipe sets up some closures, who would keep working, but the method is not invoked anymore. The diagram should represent what pipe set up, but not show pipe at all: https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L473-L613

mcollina · 2017-01-25T18:10:35Z

locale/en/docs/guides/backpressuring-in-streams.md

+
+* If the write queue is busy, [`.write()`][] will return false.
+* If the data chunk is too large, [`.write()`][] will return false (the limit 
+is indicated by the variable, [`highWaterMark`][]).


If we are building a Writable directly, this is not needed, as it is handled by the stream state machine. However, it is needed to use a Writable directly, see my example in the Node docs: https://github.com/nodejs/node/blob/master/doc/api/stream.md#writablewritechunk-encoding-callback

eljefedelrodeodeljefe · 2017-02-19T17:44:06Z

@jessicaquynh can you incorporate Matteo's comments?

jalafel · 2017-02-20T00:10:57Z

@eljefedelrodeodeljefe yes definitely! sorry I've been mia on this issue .. will submit changes tomorrow night!

TimothyGu

Nice work, and I'm excited to read the currently unfinished sections.

One thing I noticed is the emphasis on writable streams. Writing readable streams that stop when asked to do so is just as important as teaching the writable stream to say stop when needed IMO.

TimothyGu · 2017-02-20T08:54:40Z

locale/en/docs/guides/backpressuring-in-streams.md

+
+Once the the queue is finished, backpressure will allow data to be sent again.
+The space in memory that was being used will free itself up and prepare for the 
+next glob of data. 


glob → blob / batch

TimothyGu · 2017-02-20T08:54:53Z

locale/en/docs/guides/backpressuring-in-streams.md

+
+# Backpressuring in Streams
+
+The purpose of this guide will describe what backpressure is and the 


will → is to

TimothyGu · 2017-02-20T08:56:28Z

locale/en/docs/guides/backpressuring-in-streams.md

+
+This effectively allows an fixed amount of memory to be used at any given
+time for a [`.pipe()`][] function. There will be no memory leakage, no 
+indefinite buffering, and the garbage collector will only have to deal with


indefinite → infinite

TimothyGu · 2017-02-20T09:02:13Z

locale/en/docs/guides/backpressuring-in-streams.md

+
+## Build a WritableStream
+
+Let's extend the prototypical function of [`.write()`][]: 


We are not extending .write() here, but extending stream.Writable with a custom ._write() method, which is then used internally by .write().

TimothyGu · 2017-02-20T09:02:46Z

locale/en/docs/guides/backpressuring-in-streams.md

+value for what system is running the application. In instances where you might 
+want to raise that value, go for it, but do so with caution!  
+
+## Build a WritableStream


"Writable Stream" (note the space) for consistency

TimothyGu · 2017-02-20T09:21:15Z

locale/en/docs/guides/backpressuring-in-streams.md

+  var fs = require('fs');
+
+  var inputFile  = fs.createReadStream('REALLY_BIG_FILE.x');
+  var outputFile = fs.createWriteStream('REALLY_BIG_FILE_DEST.x');


TimothyGu · 2017-02-20T09:22:39Z

locale/en/docs/guides/backpressuring-in-streams.md

+  var inputFile  = fs.createReadStream('REALLY_BIG_FILE.x');
+  var outputFile = fs.createWriteStream('REALLY_BIG_FILE_DEST.x');
+
+  // Secretly the stream is saying: "whoa, whoa! hang on, this is way too much!"


It's not immediately clear that writing to the disk is slower than reading. Either mention it explicitly ("on a computer where writing is slower than reading"), or use a slower writable stream (http? zlib?)

Writing to the disk is slower than reading from it. It might not be clear for a reader though, so I'm 👍 in adding a zlib compression before writing down.

TimothyGu · 2017-02-20T09:26:20Z

locale/en/docs/guides/backpressuring-in-streams.md

+one area in memory!
+
+So, if backpressure is so important, why have you (probably) not heard of it? 
+Well the answer is simple: Node.js does all of this automatically for you. 


Make it be known somewhere that only .pipe() does this automatically (readable.on('data', buf => writable.write(buf)); doesn't use backpressure, for example). I know in the next section you are looking more closely at .pipe(), but it seems to be assumed that the reader knows all this is talking about .pipe().

TimothyGu · 2017-02-20T09:27:52Z

locale/en/docs/guides/backpressuring-in-streams.md

+an isolate application, both [`Readable`][] and [`Writable`][] streams 
+should be present. If you are writing an application meant to accept a 
+[`Readable`][] stream, or pipes to a [`Writable`][] stream from another app, 
+you may omit this detail.


Not sure what "application" / "app" means here. A standalone program? A library? A function?

TimothyGu · 2017-02-20T09:28:42Z

locale/en/docs/guides/backpressuring-in-streams.md

+## Example App
+
+Since [Node.js v0.10][], the [`Streams`][] class has offered the ability to 
+overwrite the functionality of [`.read()`][] or [`.write()`][] by using the


I wouldn't say "overwrite", but rather "modify the behavior"

mcollina · 2017-02-21T07:52:34Z

One thing I noticed is the emphasis on writable streams. Writing readable streams that stop when asked to do so is just as important as teaching the writable stream to say stop when needed IMO.

As we have a backpressure (or flow-control) mechanism in place, the maximum amount of data that can flow is controlled by downstream, hence the focus on Writable is correct. However this is not explicit. We should also note that calling readable.push whenever data is ready is as bad as readable.on('data', data => writable.write(data)).

jalafel · 2017-02-26T22:25:04Z

Sorry it took so long to update the guide. Hope I interpreted the change requests correctly. Ultimately, I am still not 100% what constitutes best practice for building streams.

Would it be fair to say that best practice is also non-contradictory practice? As in, "you have a good stream if you did not do A, B, C"?

mcollina · 2017-02-27T08:38:17Z

Sorry it took so long to update the guide. Hope I interpreted the change requests correctly. Ultimately, I am still not 100% what constitutes best practice for building streams.

I am almost always dropping 50-70% of the "best practices" for Streams. There are certain rules to follow, all the rest can be overruled. However, I would not recommend anyone to try writing something like https://github.com/mcollina/syncthrough without a great experience in the internals.

Would it be fair to say that best practice is also non-contradictory practice? As in, "you have a good stream if you did not do A, B, C"?

Yes! The golden rule of streams is to always respect backpressure, A) never push() if you are not asked, and B) never call write() after it returns false but wait for 'drain' instead. The latter holds more power for me and help me reason about things better.

The third rule: C) streams changes between different node versions, and the library you use. Be careful and test things.

mcollina

Wow, this is great work. We are getting close, thanks so much for the effort in this. There is one thing that is a must to fix, and the rest is bonus content.

mcollina · 2017-02-27T08:43:03Z

locale/en/docs/guides/backpressuring-in-streams.md

+                                       +------------+   add chunk to queue    |
+                                       |            <---^---------------------<
+                                       +============+
+```


This diagram is making good progress! But I think you should split pipe() in its own step of the diagram, and not make it part of the "loop". The event handlers are, so we should show the distinction.

The syntax for events, e.g. drain(), is not common throughout the docs, so we should probaly use on('drain', cb) instead.

also cc @lrlna who did some visualization on this in the past on paper.

Wondering if by this you meant something like:

+===================+ +--> Piping functions +---> src.pipe(dest) | x are set up during |===================| x the .pipe method. | Event callbacks | +===============+ x |-------------------| | Your Data | x They exist outside | .on('close', cb) | +=======+=======+ x the data flow, but | .on('data', cb) | | x importantly attach | .on('drain', cb) | | x events, and their | .on('unpipe', cb) | +-------v-----------+ x respective callbacks. | .on('error', cb) | | Readable Stream +----+ | .on('finish', cb) | +-^-------^-------^-+ | | .on('end', cb) | ^ | ^ | +-------------------+ | | | | | ^ | | ^ ^ ^ | +-------------------+ +=================+ ^ | ^ +----> Writable Stream +---------> .write(chunk) | | | | +-------------------+ +=======+=========+

Or even have more explicit steps come in?

Yes, something like that is way better.

hi! this is so so good, i love it!

in terms of the diagram, i find the loop a little weird to get around and understand, although i definitely see where you're coming from.

What about making it a bit more linear? I am not good with replicating the fancy thing you have going, but something like this on paper?

mcollina · 2017-02-27T08:43:43Z

locale/en/docs/guides/backpressuring-in-streams.md

+an isolate application, both [`Readable`][] and [`Writable`][] streams
+should be present. If you are writing an application meant to accept a
+[`Readable`][] stream, or pipes to a [`Writable`][] stream from another
+component, you may omit this detail.


We should add a paragraph or a section here to discuss streams pipelines. What happens when a transform is in place?

Heh this might be rhetorical question, but I will answer just in case ... when Transform is in the pipeline, it takes in the Readable output and does the ._transform() and pipes that into the destination Writable? Do you mean to explicate this process in the guide?

Edit: Jw, do Transform Streams or Duplex Streams have special conditions to respect backpressure as well?

A Transform has both an outgoing and an incoming highWaterMark. Most people write Transform to setup pipelines, so it's important to explain how it all plays out when you chain multiple streams.

So, these are the true facts say about a Transform stream:

It is important because its used often to setup pipelines. In it, there are _write() and _read() calls that must be implemented with the same care as with Readable and Writable (with respect to the return values .push() and .write(). And also HWM may change along the pipeline.

It is not correct. Transform implementors should add _transform, as it is a Duplex that implement both _read and _write, i.e. Transform has those method implemented.

mcollina · 2017-02-27T08:46:02Z

locale/en/docs/guides/backpressuring-in-streams.md

+
+  }
+}
+```


We need an equal section on 'Write a Readable Stream'.

We should mention that [readable-stream][https://github.com/nodejs/readable-stream] is the best way to build a stream application across multiple versions of Node.js, and the browser too! This will make your application have a stable base, and an implementation that changes with its own lifecycle compared to node core.

As a general question, is the primarily Node.js-based Readable stream still the best option in the browser? In my experience, the most often cause of using Node.js Readable in the browser is as a shim when creating Browserify bundle. At the same time, readable-stream IIRC depends on additional polyfills like one for events module. Plus, I know WHATWG has been working on their version of streams.

Regarding using the userland module in Node.js, definitely. You may also want to link to @rvagg's blog post

readable-stream is the way to go in the browser. readable-stream exports Readable & co, so all that is written here is valid there as well.

By using streams in the browser, I mean mostly to leverage code already written elsewhere for the purpose, i.e. the streams ecosystem.

mcollina · 2017-02-27T08:47:02Z

locale/en/docs/guides/backpressuring-in-streams.md

+const inp = fs.createReadStream('The.Matrix.1080p.mkv');
+const out = fs.createWriteStream('The.Matrix.1080p.mkv.gz');
+
+inp.pipe(gzip).pipe(out);


we should add a note here and refer to pump. It's a must have, as this example does not handle error correctly.

Is the vital difference between chaining pipes together vs pump have to do with error handling? As in chaining pipes together can be erroneous due to the fact there is no error handling? Heh, sorry for my ignorance!

Yes. the problem is that in case of errors the underlining file descriptor on the other side might not get closed.

TimothyGu

Great to see how this is coming along. Apologies for the number of nits, just want to make sure those don't get forgotten when this is applied.

TimothyGu · 2017-02-27T08:43:32Z

locale/en/docs/guides/backpressuring-in-streams.md

+```
+
+While that will take a few minutes to compress, in another shell we may run
+a script that takes Node.js' module [`Zlib`][], that wraps around another compression tool, [`gzip(1)`][].


"zlib" is how the documentation refers to itself

TimothyGu · 2017-02-27T08:50:51Z

locale/en/docs/guides/backpressuring-in-streams.md

+inp.pipe(gzip).pipe(out);
+```
+
+While the first `zip` function will ultimately fail, Node will be able to handle 


Not sure what the "zip function" is referring to. The command run in the shell?

Yes! I was referencing the example mcollina was describing. I hope I did it right, should this be more explicit?

The "function" was throwing me off a bit. And also why would it fail?

I am not entirely sure. My layman's understanding of its failure is because of the lack of backpressure mechanism. There is a lot of data that is being compressed and the .zip file gets corrupted. But if @mcollina might be able to shed a technical light on this, it'd be much appreciated!

The reason why I asked is that I'm fairly certain zip will not encounter this problem. Unlike Node.js, zip and other UNIX utilities operate synchronously. What that means is that they only read() as much as they need at one time, and immediately write() it out afterwards. Since these two system calls are synchronous, there would not be any backpressure problems since by definition they only appear in an asynchronous system.

Interesting. So to be clear, synchronous systems do not encounter backpressure buildup like async does?

I also did test out the zip tool and it does, in fact, corrupt the large file, though I am not sure why, if those are the mechanics behind data flow. (Though gzip on my UNIX command line does not corrupt the large file.)

Any thoughts? My only guess could be because the filetype I am using is video!

TimothyGu · 2017-02-27T08:52:36Z

locale/en/docs/guides/backpressuring-in-streams.md

+* A very overworked garbage collector
+* Memory exhaustion
+
+## Overall Dip in System Performance


Remove this heading maybe. The following paragraph seems to directly follow the list above.

TimothyGu · 2017-02-27T08:54:10Z

locale/en/docs/guides/backpressuring-in-streams.md

+`1.52 gb`.
+
+Without streams in place to delegate the backpressure, there is a memory space 
+with an entire degree of magnitude greater that is being allocated. That is a 


I think "order of magnitude" is the more common expression

Tbh I went back to edit this many times and couldn't figure out why it sounded so off .. hehe thanks!

TimothyGu · 2017-02-27T08:54:39Z

locale/en/docs/guides/backpressuring-in-streams.md

+```
+
+The maximum byte size occupied by virtual memory turns out to be approximately
+`1.52 gb`.


TimothyGu · 2017-02-27T09:14:19Z

locale/en/docs/guides/backpressuring-in-streams.md

+
+## Example App
+
+Since [Node.js v0.10][], the [`Streams`][] class has offered the ability to


Stream class

TimothyGu · 2017-02-27T09:33:32Z

locale/en/docs/guides/backpressuring-in-streams.md

+
+  }
+}
+```


As a general question, is the primarily Node.js-based Readable stream still the best option in the browser? In my experience, the most often cause of using Node.js Readable in the browser is as a shim when creating Browserify bundle. At the same time, readable-stream IIRC depends on additional polyfills like one for events module. Plus, I know WHATWG has been working on their version of streams.

Regarding using the userland module in Node.js, definitely. You may also want to link to @rvagg's blog post

TimothyGu · 2017-02-27T09:34:27Z

locale/en/docs/guides/backpressuring-in-streams.md

+colleagues and friends.
+
+Be sure to read up more on [`Streams`][] for other [`EventEmitters`][] to help
+improve your knowledge when building applications with Node.js.


Not sure what "for other EventEmitters" is trying to say...

TimothyGu · 2017-02-27T09:36:11Z

locale/en/docs/guides/backpressuring-in-streams.md

+// stream is ready or not.
+readable.on('data', data =>
+  writable.write(data);
+);


Might want to add braces around writable.write(data) just to be clear on doesn't care about the return value of write().

TimothyGu · 2017-02-27T09:37:54Z

locale/en/docs/guides/backpressuring-in-streams.md

+is full (which will vary across different machines). Node.js allows you to set
+your own custom [`highWaterMark`][], but commonly, the default is the optimal
+value for what system is running the application. In instances where you might
+want to raise that value, go for it, but do so with caution!


This section only deals with rules for writing to a Writable stream from outside the stream. We also need rules for writing to a Readable stream from inside that Readable stream (by respecting the return value of this.push()).

jalafel · 2017-04-07T05:32:14Z

Sorry for the static on this! I took a look back on this, realized some grammar was off and fixed some formatting.

Added in a few pointers mentioned earlier in the thread, though they didn't necessarily close off all the reviews.

Skimmed out the example application because I felt that was redundant since there are a rich supply of examples found in the documentation, and I didn't want to reinvent the wheel.

I only have one question and that's in regards to the implementation of a Writable stream. Are there any mistakes one can make in _write() or _writev() that would damage backpressure? They seem like pretty foolproof functions to me.

Should the guide extend its scope and talk about creating a custom writable wtream from scratch then? As in:

MyWritable.prototype.write = function(chunk, enc, cb) {
  // code that respects return value here
}

util.inherits(MyWritable, Stream);

Thanks for all the help! Hope to land this soon!

mcollina · 2017-04-07T07:17:32Z

@jessicaquynh _writev goes in combination with .cork(), and it is especially useful if we are writing to a stream directly. The biggest mistake is this pattern:

ws.cork()
ws.write('hello ')
ws.write('world ')
ws.uncork()

ws.cork()
ws.write('from ')
ws.write('Matteo')
ws.uncork()

This will cause two transfers into C++ realm, making the use of cork() a bit useless the correct pattern is:

ws.cork()
ws.write('hello ')
ws.write('world ')
process.nextTick(doUncork, ws)

ws.cork()
ws.write('from ')
ws.write('Matteo')
process.nextTick(doUncork, ws)

// as a global function
function doUncork (stream) {
  stream.uncork()
}

cork() can be called as many times we want, we just need to be careful to call uncork() the same amount of times to make it flow again.

jalafel · 2017-04-07T18:39:30Z

@mcollina Thank you for your input and help! I've added the changes to the writable section!

mcollina

LGTM with a nit

mcollina · 2017-04-13T08:21:29Z

locale/en/docs/guides/backpressuring-in-streams.md

+is full (which will vary across different machines). Node.js allows you to set
+your own custom [`highWaterMark`][], but commonly, the default is the optimal
+value for what system is running the application. In instances where you might
+want to raise that value, go for it, but do so with caution!


I think you should write here the defaults, it usually helps people

mcollina · 2017-04-14T12:19:08Z

LGTM

fhemberger · 2017-04-22T16:18:44Z

Great, thank you!

Add guide: "Backpressuring in Streams"

e646720

jalafel mentioned this pull request Jan 23, 2017

doc: guide on backpressure nodejs/node#10766

Closed

mcollina requested changes Jan 25, 2017

View reviewed changes

TimothyGu reviewed Feb 20, 2017

View reviewed changes

review changes and update empty fields

26bbb66

mcollina requested changes Feb 27, 2017

View reviewed changes

TimothyGu reviewed Feb 27, 2017

View reviewed changes

additional changes from review

3119d13

mcollina mentioned this pull request Mar 10, 2017

Readable.push calling _read nodejs/node#3203

Closed

grammar fixes

c62f252

add ._writev pattern

56bdcdb

mcollina approved these changes Apr 13, 2017

View reviewed changes

highwatermark default

d78008e

fhemberger merged commit 8491693 into nodejs:master Apr 22, 2017


		# Backpressuring in Streams

		The purpose of this guide will describe what backpressure is and the


		## Build a WritableStream

		Let's extend the prototypical function of [`.write()`][]:


		## Example App

		Since [Node.js v0.10][], the [`Streams`][] class has offered the ability to

Add guide: "Backpressuring in Streams" #1109

Add guide: "Backpressuring in Streams" #1109

Conversation

jalafel commented Jan 23, 2017

lpinca commented Jan 23, 2017

mcollina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eljefedelrodeodeljefe commented Feb 19, 2017

jalafel commented Feb 20, 2017

TimothyGu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcollina commented Feb 21, 2017

jalafel commented Feb 26, 2017

mcollina commented Feb 27, 2017

mcollina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jalafel Feb 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimothyGu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jalafel commented Apr 7, 2017

mcollina commented Apr 7, 2017

jalafel commented Apr 7, 2017

mcollina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcollina commented Apr 14, 2017

fhemberger commented Apr 22, 2017

jalafel Feb 28, 2017 •

edited

Loading