Proposal: add _flush to Writable streams #112

markstos · 2015-02-12T16:57:20Z

@TomFrost made a good case for this pre-fork:

nodejs/node-v0.x-archive#7631

There was also significant discussion previously in this issue:
nodejs/node-v0.x-archive#7348

@TomFrost also implemented a workaround that is published on npmjs.org:

https://www.npmjs.com/package/flushwritable

Perhaps it will be helpful in considering the issue.

I'm one of the other developers from the previous thread that would also find it to be useful.

calvinmetcalf · 2015-02-12T17:48:21Z

This is something I've needed and had to implement my own half baked
implementation. I'd be happy to champion this (do we do that?)

On Thu, Feb 12, 2015, 11:57 AM Mark Stosberg notifications@github.com
wrote:

@TomFrost https://github.com/TomFrost made a good case for this
pre-fork:

nodejs/node-v0.x-archive#7631 nodejs/node-v0.x-archive#7631

There was also significant discussion previously in this issue:
nodejs/node-v0.x-archive#7348 nodejs/node-v0.x-archive#7348

@TomFrost https://github.com/TomFrost also implemented a workaround
that is published on npmjs.org:

https://www.npmjs.com/package/flushwritable

Perhaps it will be helpful in considering the issue.

I'm one of the other developers from the previous thread that would also
find it to be useful.

—
Reply to this email directly or view it on GitHub
#112.

domenic · 2015-02-12T17:51:41Z

+1, got one of these in WHATWG streams already (under the name "close", although now I am contemplating whether flush is nicer).

calvinmetcalf · 2015-02-12T17:56:46Z

The tricky thing is that transform flush is after writable half has
finished but before the readable one is finished. For writable ones we'd
need it to be earlier so either a breaking change or a different name like
close.

On Thu, Feb 12, 2015, 12:52 PM Domenic Denicola notifications@github.com
wrote:

+1, got one of these in WHATWG streams already (under the name "close",
although now I am contemplating whether flush is nicer).

—
Reply to this email directly or view it on GitHub
#112 (comment)
.

chrisdickinson · 2015-02-12T20:04:20Z

@calvinmetcalf I'm a little confused by:

The tricky thing is that transform flush is after writable half has finished but before the readable one is finished.

My impression is that we would preserve and extend this behavior to Writables as well. Am I misunderstanding?

I am also curious: what is the order of events?

emit prefinish
_flush
emit finish

OR:

emit _flush
prefinish
emit finish

vkurchatkin · 2015-02-12T20:07:45Z

My impression is that we would preserve and extend this behavior to Writables as well. Am I misunderstanding?

there is no point in flushing writable after finish. the whole idea is to defer finish

calvinmetcalf · 2015-02-12T20:21:02Z

@chrisdickinson like @vkurchatkin said we want to defer finish, adding in new function (like calling it close) that fires earlier would likely be the only backwards compatible way to do it, we could leave _flush in but depreciate it.

vkurchatkin · 2015-02-12T20:27:28Z

@calvinmetcalf I was thinking about _finish. I want _close to be used to close underlying resource after finish.

sonewman · 2015-02-12T21:16:52Z

I am not that opinionated about this. Is there any reason why we couldn't move it from transform?

calvinmetcalf · 2015-02-12T21:21:32Z

@vkurchatkin I was assuming closing resources would be a use case for this

@sonewman because the transform one happens after finish is emitted and we'd want it to delay finish being emitted.

mafintosh · 2015-02-12T22:21:48Z

Why not call it ._end?
Similar to how .write calls ._write internally .end would call ._end before emitting finish

calvinmetcalf · 2015-02-12T22:39:06Z

@mafintosh and then simply end when passing it to the simplified constructor

mafintosh · 2015-02-12T22:45:04Z

@calvinmetcalf yea! and transforms would support both flush and end (different use cases). i know this would simplify a lot of my code.

sonewman · 2015-02-12T22:52:29Z

@calvinmetcalf prefinish is emitted before finish

function prefinish(stream, state) {
  if (!state.prefinished) {
    state.prefinished = true;
    stream.emit('prefinish');
  }
}

function finishMaybe(stream, state) {
  var need = needFinish(stream, state);
  if (need) {
    if (state.pendingcb === 0) {
      prefinish(stream, state);
      state.finished = true;
      stream.emit('finish');
    } else
      prefinish(stream, state);
  }
  return need;
}

lib/_stream_writable.js#L464-L482

sonewman · 2015-02-12T22:53:46Z

Also i don't think it would be right for _flush to be called after finish since it is usually where you would push some last bit of protocol data...

Having a close or ended event or implementation method, could be useful, but you wouldn't want to accept data during this time.

I do think we should start addressing these stream lifecycle things as part of a stream strategy similar to WHATWG underlyingSource or @chrisdickinson flows strategy idea.

sonewman · 2015-02-12T23:05:46Z

The thing is, there is always going to be more, and more(,...and more,...) new things people are going to want, to instrument their stream in some specific way. Open up a model for the underlying sources and those things are easy for anyone to instrument to any requirement.

To be clear, I am not saying any of these ideas are bad, as they are generic and useful.

But IMO going forward we need to open up this cycle/internals for people to customise at their will, meaning the base of streams can be simple and avoid continuous scope creep.

@sonewman - gets off soap box

calvinmetcalf · 2015-02-12T23:07:48Z

sorry it's that finish can be called before _flush is done if it's async. I seem to remember testing what would happen if you did delay it and it causing quite a few node transform streams to break.

sonewman · 2015-02-12T23:15:32Z

@calvinmetcalf hmmm, interesting. I can see how that would be true, since _flush is called as of an event, then finish is triggered synchronously afterwards. If we moved _flush to the writable stream, we could remove the internal dependency on that event completely (although we would not be able to remove it, since it is used by http)

mafintosh · 2015-02-12T23:15:51Z

the prefinish event isn't super useful since you can't do anything async before finish happens

mafintosh · 2015-02-12T23:17:00Z

@sonewman @calvinmetcalf moving the flush function to writable streams is a major change since the flush function currently is called before the readable part of the stream ends (not the writable).

calvinmetcalf · 2015-02-12T23:17:47Z

so _flush is part of the public api and when it is called is documented. This was why I was suggesting a new function name.

sonewman · 2015-02-12T23:22:47Z

I think I am +1 on doing this, it seems like that would actually be useful.

@mafintosh prefinish is emitted in lib/_stream_writable.js#L467, forgive me if I am wrong, but I don't see why it would have any affect on the readable side of a transform stream.

mafintosh · 2015-02-12T23:36:02Z

@sonewman my point is that the ._flush function used in transform streams is used to do stuff before the readable part of a stream ends - not the writable.

var transform = stream.Transform({
  transform: function (data, enc, cb) {
    cb(null, data)
  },
  flush: function(cb) {
    setTimeout(function () {
      transform.push('world')
      cb()
    }, 1000)
  }
})

transform.on('data', function (data) {
  console.log(data.toString())
})
transform.on('finish', function() {
  console.log('(finish)')
})
transform.on('end', function() {
  console.log('(end)')
})
transform.write('hello')
transform.end()

running the above will result in

hello
(finish)
world
(end)

mafintosh · 2015-02-12T23:44:46Z

which is why i think having it being called something like end would be useful. this would allow you to async stuff before finish is emitted.

var stream = require('stream')

var transform = stream.Transform({
  transform: function (data, enc, cb) {
    cb(null, data)
  },
  end: function (cb) {
    setTimeout(function() {
      transform.push('before finish')
      cb()
    }, 1000)
  },
  flush: function (cb) {
    setTimeout(function () {
      transform.push('world')
      cb()
    }, 1000)
  }
})

transform.on('data', function (data) {
  console.log(data.toString())
})
transform.on('finish', function() {
  console.log('(finish)')
})
transform.on('end', function() {
  console.log('(end)')
})
transform.write('hello')
transform.end()

-->

hello
before finish
(finish)
world
(end)

sonewman · 2015-02-13T00:48:18Z

@mafintosh whether something comes after _flush is probably a separate discussion.

But the point I am trying to make is that _flush when called, should allow for asynchronicity e.g.:

function finishWritable(stream, state) {
  if (state.pendingcb === 0) {
    state.finished = true;
    stream.emit('finish');
  }
}

function prefinish(stream, state) {
  if (!state.prefinished) {
    state.prefinished = true;
    stream.emit('prefinish');

    if ('function' === typeof stream._flush)
      stream._flush(() => finishWritable(stream, state));
    else
      finishWritable(stream, state);
  }
}

function finishMaybe(stream, state) {
  var need = needFinish(stream, state);
  if (need)
      prefinish(stream, state);

  return need;
}

chrisdickinson · 2015-02-13T04:40:16Z

@vkurchatkin Ah, I see. Transform currently hooks onto prefinish, but flush does not block finish.

@mafintosh I'm -1 on calling it "end". Between "end" events, "end" methods, and the "finish" events, the term is a bit overloaded as it stands.

@sonewman I agree that the proposed flush should be able to "block" writables from emitting finish. This is currently incompatible with Transform#_flush, but what if we took advantage of the simplified stream constructor interface to give the old-style TransformSubclass.prototype._flush = <fn> different semantics from new-style Tranfrom({flush: (cb) => {}}) creation?

chrisdickinson · 2015-02-13T04:48:32Z

Arg, yes, flush as a shorthand option has already been released. Changing that behavior would be breaking, though probably only minimally so due to how recently introduced it was.

Alternatively, Transform can have an "oddball" flush that cannot block "finish", while all other writable finish functions would be able to.

mafintosh · 2015-02-13T05:12:01Z

@chrisdickinson having simple constructor flush and .prototype._flush do different things seems weird though. couldn't flush on Transforms just block both finish and end?

is there a use case for adding ._flush on Readable streams as well?

chrisdickinson · 2015-02-13T05:50:58Z

@mafintosh Re: delaying finish until flush completes, quoting @calvinmetcalf:

sorry it's that finish can be called before _flush is done if it's async. I seem to remember testing what would happen if you did delay it and it causing quite a few node transform streams to break.

If we use the simplified constructor API to introduce this behavior, we can sidestep that breakage, but still introduce the functionality for all Writables (duplexes and transforms included) It is a little weird that new Transform({flush: (cb) => {}}) will be different than t = new Transform; t._flush = () => {}, but we should strongly discourage use of the latter usage pattern for all streams going forward.

I just noticed this when rereading the thread, but one of the originally linked issues suggested this feature for the purposes of batch writes – specifically, so that a batching writer can be made aware of the end of the stream and output the rest of its data before close. However, we already have a mechanism for batching writes – writev, cork, and uncork – which should make a userland batching writestream superfluous.

Are there other valid uses for this feature outside of batch writes?

mafintosh · 2015-02-13T06:38:59Z

I have a couple of use cases myself:

In pumpify (turns multiple writable/readable streams into a single one) I don't want to emit finish until the last stream in the pipeline has emitted finish after all writes are done. Since we don't have a flush function on writable streams I do a bunch of hacks to do this.

In content-addressable-blob-store we write to a tmp file that we want to move somewhere before emitting finish after all writes are done.

mafintosh · 2015-02-13T06:43:26Z

@chrisdickinson @calvinmetcalf we could just break prototype._flush and bump the major version \o/

calvinmetcalf · 2015-02-13T11:26:31Z

@chrisdickinson writev, cork, and uncork don't actually do the trick because they must be explicitly called by the caller so don't work cases like when the stream is being piped too.

Besides batch writes there is also closing underlying resources hence why the name _close came up. While _end is a bit confusing due to the event it d ones fit with the pattern of write, writev, and read.

@mafintosh I totally don't think it's worth a major version bump, I'd call it __flush before bumping the major.

sonewman · 2015-02-13T21:16:19Z

Hmm, i don't know. I can't see how finish would be emitted asynchronous in a transform stream, but perhaps I am missing something. I guess it could break any modules, which were rely on a transform to emit that it is finished before it flushes...

chrisdickinson · 2015-03-16T19:06:22Z

Noting this here: it seems like there's interest in flush for the purposes of net.Socket cleanup in core (per @indutny in the io.js IRC channel.)

indutny · 2015-03-16T19:17:20Z

I will expand my thoughts:

We have a problem with TLS streams in core at the moment. Current implementation calls socket.shutdown() on finish event, and in many cases does not wait for the completion before destroying the socket. This thing is totally fine, unless you are doing TLS, because TLS sends additional packet on shutdown() and prematurely closing the socket will lead to errors, or non-graceful destruction of connection.

_flush would help there a lot.

indutny · 2015-03-16T19:19:54Z

Yeah, and my use case needs callback.

chrisdickinson · 2015-03-16T20:36:53Z

@indutny Curious: if there's a write going out during flush, would you expect flush to be called again once that write has completed?

indutny · 2015-03-16T20:41:39Z

@chrisdickinson nope. In case of TLS, this write is happening internally in C++.

indutny · 2015-03-16T20:45:02Z

Just in case, I'm going to stub out some implementation and share it with you guys.

indutny · 2015-03-16T20:56:37Z

See nodejs/node#1164

indutny · 2015-03-16T20:56:53Z

So far I haven't reunified it with transform, so it is called __flush as suggested here.

indutny · 2015-03-17T01:21:50Z

Finished the PR there, PTAL.

mcollina · 2017-05-19T09:38:50Z

This is currently being proposed in nodejs/node#12828.

calvinmetcalf · 2017-05-24T19:16:21Z

merged into node as final

This was referenced Feb 12, 2015

Implement stream.Writable._flush nodejs/node-v0.x-archive#7631

Closed

[stream] Standardize a way for writable streams to do work after all data has been consumed. nodejs/node#821

Closed

markstos mentioned this issue Feb 13, 2015

Standardize a way for writable streams to do work after all data has been consumed nodejs/node-v0.x-archive#7348

Closed

calvinmetcalf mentioned this issue Feb 24, 2015

io.js readable-streams WG meeting #106

Closed

chrisdickinson added the wg-agenda label Mar 3, 2015

chrisdickinson mentioned this issue Mar 3, 2015

io.js readable-streams WG meeting #2 #120

Closed

calvinmetcalf mentioned this issue Aug 6, 2015

stream: add _end method to write streams (equivilent to _flush) nodejs/node#2314

Closed

mcollina added enhancement and removed wg-agenda labels May 19, 2017

calvinmetcalf closed this as completed May 24, 2017

Proposal: add _flush to Writable streams #112

Proposal: add _flush to Writable streams #112

Comments

markstos commented Feb 12, 2015

calvinmetcalf commented Feb 12, 2015

domenic commented Feb 12, 2015

calvinmetcalf commented Feb 12, 2015

chrisdickinson commented Feb 12, 2015

vkurchatkin commented Feb 12, 2015

calvinmetcalf commented Feb 12, 2015

vkurchatkin commented Feb 12, 2015

sonewman commented Feb 12, 2015

calvinmetcalf commented Feb 12, 2015

mafintosh commented Feb 12, 2015

calvinmetcalf commented Feb 12, 2015

mafintosh commented Feb 12, 2015

sonewman commented Feb 12, 2015

sonewman commented Feb 12, 2015

sonewman commented Feb 12, 2015

calvinmetcalf commented Feb 12, 2015

sonewman commented Feb 12, 2015

mafintosh commented Feb 12, 2015

mafintosh commented Feb 12, 2015

calvinmetcalf commented Feb 12, 2015

sonewman commented Feb 12, 2015

mafintosh commented Feb 12, 2015

mafintosh commented Feb 12, 2015

sonewman commented Feb 13, 2015

chrisdickinson commented Feb 13, 2015

chrisdickinson commented Feb 13, 2015

mafintosh commented Feb 13, 2015

chrisdickinson commented Feb 13, 2015

mafintosh commented Feb 13, 2015

mafintosh commented Feb 13, 2015

calvinmetcalf commented Feb 13, 2015

sonewman commented Feb 13, 2015

chrisdickinson commented Mar 16, 2015

indutny commented Mar 16, 2015

indutny commented Mar 16, 2015

chrisdickinson commented Mar 16, 2015

indutny commented Mar 16, 2015

indutny commented Mar 16, 2015

indutny commented Mar 16, 2015

indutny commented Mar 16, 2015

indutny commented Mar 17, 2015

mcollina commented May 19, 2017

calvinmetcalf commented May 24, 2017