A binary data "streams+" API & implementations via data producers, data consumers, and pull flow.
The name? BLOB — Matteo Collina.
Bytes Over Buffers — Thomas Watson
This is a Node.js strategic initiative aiming to improve Node.js streaming data interfaces, both within Node.js core internally, and hopefully also as future public APIs.
The following modules contain usable components (sources, sinks, or transforms) and are published to npm.
- The status codes enum: bob-status (npm)
- A file system source: fs-source (npm)
- A file system sink: fs-sink (npm)
- A zlib transform: zlib-transform (npm)
- A crc32 transform: crc-transform (npm)
- Header for the C++ api: bob-base (npm)
The following modules are not published but are 'functional'.
- A TCP socket "duplex": in "socket"
- A TCP server of "duplex" sockets: also in "socket"
The following files serve as the API's reference:
- The Status Enum - Status codes
- A Source - The data provider
- A Sink - The data consumer
- A Passthrough - A good example of the whole API
- A Verify Passthrough - A typechecking API enforcement passthrough
- A Buffered Transform - An example of buffering
bob.h
- The C++ header in 'bob-base'
The composition of the classes looks like this:
const { Stream } = require('bob-streams')
const source = new Source(/* args */)
const xform = new Transform(/* args */)
const sink = new Sink(/* args */)
const stream = new Stream(source, xform, sink)
stream.start(error => {
// The stream is finished when this is called.
})
An entire passthrough could look like this:
class PassThrough {
bindSource (source) {
source.bindSink(this)
this.source = source
return this
}
bindSink (sink) {
this.sink = sink
}
next (status, error, buffer, bytes) {
this.sink.next(status, error, buffer, bytes)
}
pull (error, buffer) {
this.source.pull(error, buffer)
}
}
The following files serve as API extension references:
- extension-stop - Tell a source to stop.
- Useful for dealing with timeouts on network APIs.
High-level timeline:
- Prototype separate from core entirely.
- Move into nodejs org once JS & C++ APIs are significantly prototyped.
- Begin transitioning Node.js internals once the APIs and perf are proved.
- If an internal transition works out well, begin planning public APIs.
All of these steps necessitate the buy-in of many stakeholders, both in Node.js core and the greater Node.js ecosystem. This is a long-term project by necessity and design.
Some collective goals for this initiative.
- Both performance and ease-of-use are key.
- Implementable in a performant and usable way for both JS and C++.
- Browser portability is preferable.
As a preface, "protocol" refers to a system with "producer / source" and "consumer / sink" endpoints.
The Protocol itself must be simple:
- Pull-based: The consumer requests ("pulls") data from the producer.
- Binary-only: Data is binary buffers only, "object mode" and string encodings are not supported at the protocol level.
- Stateless: The protocol must not require state to be maintained out-of-band.
- Non-normative: While the protocol itself does not require out-of-band state, actual operations almost always do.
- Minimize state assumed between calls.
- One-to-one: The protocol assumes a one-to-one relationship between producer and consumer.
- Timing agnostic: The protocol makes no timing (sync or async) assumptions.
- No buffering: The protocol must not require buffering (although specific implementations might).
- Non-normative: While the protocol itself does not require buffering, starting sources almost always do (including transforms).
- In-line errors and EOF: Errors, data, and EOF ("end") should flow through the same call path.
- Should make no assumption on the timing of when data will be received (sync or async).
- Should own any preallocated memory (the buffer).
- Must never make more than one data request upstream at the same time.
Please see performance.md for profiling results & information.
Current results estimate a 30% decrease of CPU time in bad cases, and up to 8x decrease in good cases. This should correlate to overall throughput but may not be exact.
API reference examples sit in the top-level directory and are prefixed by reference-
.
These are functional and tested when practical, notably reference-verify
, reference-passthrough
, and verify-buffered-transform
.
Other helpers, such as Stream()
, reside in the /helpers/
and /tests/helpers
directories.
All useful and usable components in this repo are exported from index.js
with the bob-streams
npm module.
Functional sources, sinks, and so on can be found in their own npm modules. See [Published Modules](#Published Modules).
npm install && npm test
The addons are presently very out-of-date.
You must have a local install of Node master @ ~ 694ac6de5ba2591c8d3d56017b2423bd3e39f769
npm i node-gyp
node-gyp rebuild --nodedir=your/local/node/dir -C ./addons/passthrough
node-gyp rebuild --nodedir=your/local/node/dir -C ./addons/fs-sink
node-gyp rebuild --nodedir=your/local/node/dir -C ./addons/fs-source