-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async / stream support #195
Conversation
Optimize the call to file(name) : instead of using filter (in O(n)), fetch the entry directly (in O(1)).
The index went too far, causing the optimizer to drop the compiled code.
Because of the try/catch, the JIT compiler can't do its magic. This patch moves some of the hot code into separate functions : those functions can be compiled, the try/catch remains in the main loop.
When loading a zip file with unicode path/comment, we were combining two costly operations : read data as (binary) string and then convert the binary string to a real string. This patch skip the first string conversion.
The previous code used a closure to keep a reference to the compressed content and pointers to the beginning/end of the file in this content. Then we had some complicated code to extract the content and handle the corner cases. This new version adds two new ("private" with a _) methods to the ZipObject instances : _compress and _decompress. A ZipObject instance will have two states : compressed (with a CompressedObject) and decompressed (with a string, uint8array, etc). With this patch, no more closure : for each file we create a sub{string,array} for the data and then create a CompressedObject.
The transformation uint8array -> string (to check against the signature) can be costly if repeated for a lot of entries. This commit uses the reader to check the signature, speeding things with a Uint8ArrayReader.
The separation between the generation of the CompressedObjects and the actual use of them will help for the next commit, adding generateAsync().
The "content is empty" check can be done at the insertion of the content and not when compressing it.
This commit addresses the timeout issue. The current API is synchronous : if JSZip takes too much time to finish its task, the page crashes (it freezes during the task anyway). This commit does a the following : - rewrite the code into workers which can be asynchronous - add the needed public methods - add nodejs stream support - break the compatibility with existing code Workers ------- A worker is like a nodejs stream but with some differences. On the good side : - it works on IE 6-9 without any issue / polyfill - it weights less than the full dependencies bundled with browserify - it forwards errors (no need to declare an error handler EVERYWHERE) On the bad side : To get sync AND async methods on the public API without duplicating a lot of code, this class has `isSync` attribute and some if/then to choose between doing stuff now, or using an async callback. It is dangerously close to releasing Zalgo (see http://blog.izs.me/post/59142742143/designing-apis-for-asynchrony for more). A chunk is an object with 2 attributes : `meta` and `data`. The former is an object containing anything (`percent` for example), see each worker for more details. The latter is the real data (String, Uint8Array, etc). Public API ---------- Each method generating data (generate, asText, etc) gain a stream sibling : generateStream, asTextStream, etc. This will need a solid discussion because I'm not really satified with this. Nodejs stream support --------------------- With this commit, `file(name, data)` accepts a nodejs stream as data. It also adds a `asNodejsStream()` on the StreamHelper. Breaking changes ---------------- The undocumented JSZip.compressions object changed : the object now returns workers to do the job, the previous methods are not used anymore. Not broken yet, but the the `checkCRC32` (when loading a zip file, it synchronously check the crc32 of every files) will need to be replaced.
This commit updates the saucelabs dependencies to the latest and update the configuration : - wait up to 10minutes, IE 6 is slow - launch max 3 parallel jobs - add IE6, IE11, safari 7, safari 8
Grunt-browserify adde in v3.2.0 a `banner` option, use it. Also ignore the lib/nodejs/* files : they only make sense with streams.
When paused, a nodejs stream CAN and WILL emit "end" events.
Keeping the sync methods will be too much troubles, I'll remove them. That would give zip.file("content.txt").async("uint8array", function onComplete(){...});
zip.file("content.txt").async({
type: "uint8array",
onUpdate: function () {...} // allow progress update notifications
}, function onComplete(){...});
zip.file("content.txt").stream("uint8array").on("data", ...); // our simple stream
zip.file("content.txt").nodeStream("uint8array").on("data", ...); // nodejs stream
zip.generateAsync({...}, function onComplete(){...});
zip.generateStream({...}).on("data", ...);
zip.generateNodeStream({...}).on("data", ...); The old methods will throw an exception like "This method has been removed in JSZip 3.0, please check the upgrade guide.". I'll remove the synchronous For the async operations, ES6 Promises look like the right choice (instead of manually adding a callback) but that means adding more bytes in the build. @Stuk Does this sound reasonable ? |
Is this going to be part of the stable release some time? (Or any other async (webworker) support? |
I'm still working on it... When I have enough time. I may have some time this weekend to continue working on the async |
Let me know if I can help, I could actually make it work in a webworker really easily by just adding some wrapper methods on top of jszip.min.js like: self.onmessage = function(message) {
var messageData = message.data;
var command = messageData.command;
var args = messageData.args;
var requestId = messageData.requestId;
switch (command) {
case "load":
jsZip = new JSZip();
jsZip.load(args[0]);
self.postMessage({
requestId: requestId
});
break;
case "parse":
// ... etc This way I didn't need to touch the actual JSZIP code at all, it just allows to call its functions in a webworker. |
This commit removes most of the sync methods (everything but the load method). The new getters are : - zip.async(type) : es6 promises - zip.stream(type) : nodejs Stream3 - zip.internalStream(type) : our internal stream (useful to create wrappers). The new generate methods are : - zip.generateAsync() : es6 promises - zip.generateStream() : nodejs Stream3 - zip.generateInternalStream() : our internal stream With the "stream" keyword, everyone will think of the nodejs stream : it's safer to use "internalStream" for our own. The old sync versions will throw an exception. Also fix the stream version we use : we use Stream3, not the stream offered by the current node runtime. Also changed in generate : prefer "binarystring" over "string". That way, JSZip#generate will be consistent with ZipObject#async. "string" is still supported because a zip as a text don't make sense.
This commit remove the sync "load()" method. It has been replaced by "loadAsync()" which returns a Promise.
I've updated the pull request with async methods and I've updated the description. I've updated my gh-pages with the latest API documentation. @andrewvarga doesn't |
Looks great, async is very necessary for large files. |
JSZip 2.5.0
setTimeout has 4ms minimum delay while on a recent browser a chunk takes ~0.2ms of processing. removing these 4ms make a huge difference. Main downside: the polyfill globally declares setImmediate.
Up-to-date with v2.5.0 (I moved the branch |
Browsers are also adding streams, see [1], [2], [3]. While the spirit is the same, the API is different from nodejs streams. To avoid future confusion, I prefer renaming `generateStream` to `generateNodeStream` and `stream` to `nodeStream` (we already have `nodebuffer` as type). [0]: https://streams.spec.whatwg.org/ [1]: https://blog.wanderview.com/blog/2015/06/19/intent-to-implement-streams-in-firefox/ [2]: https://www.chromestatus.com/features/5804334163951616
When I migrated the tests from `asText()` to `async("string")`, I typed a lot of `async("text")`. I strongly suspect that other users will have the issue too while migrating.
I've prefixed stream methods with "node": browsers are also adding streams, see 1, 2, 3. While the spirit is the same, the API is different from nodejs streams. To avoid future confusion, I prefer renaming |
return zipFileWorker; | ||
}; | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ZipFileWorker
be put in its own file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Split the generate.js file into generate/index.js and generate/ZipFileWorker.js.
The error event wasn't sent upward, the parent workers were left paused (or running). Also make ".pause()" and ".resume()" return boolean, extending the default behavior is easier with it.
The module `asap` doesn't update the global scope. The only downside: we can't cancel the async call anymore. Instead, we now let the async call happen and return early if the stream is paused or finished.
👍 is this ready to me merged then? |
damned, this is amazing, seriously. v2.5 just failed and crashed after 1 minute or so on few hundreds files weighting between 10KB to 500KB each and this new version just did the whole job in half a second! Unless there are some bugs (which I have not found on my few basic tests), it would make sense it becomes the new official version. Small info: grunt (which I had to use to produce the dist v3.0) is somehow outdated with its uglify dependency and throws an error. Updating uglify to the latest version simply fixed the problem. |
This pull request replaces #141.
The current API is synchronous : if JSZip takes too much time to finish its task, the page crashes
(it freezes during the task anyway). This commit does a the following :
Note : I created a branch
jszip_v3
which I target : a change this big can't goto the current v2.
Workers
A worker is like a nodejs stream but with some differences. On the good side :
On the bad side :
It adds new code to maintain, fix, optimize.
A chunk is an object with 2 attributes :
meta
anddata
. The former is anobject containing anything (
percent
for example), see each worker for moredetails. The latter is the real data (String, Uint8Array, etc).
Public API
I've updated my gh-pages with the latest API documentation.
Getters are replaced by
nodeStream(type)
andasync(type)
,generate()
is replaced bygenerateNodeStream
andgenerateAsync
. The "async" methods return a Promise, the "stream" methods return a Stream3 (from nodejs) if available.They have an update callback to get access to metadata like the
name of the current file being compressed or the operation progress (with
percent
).With this pull request, the API will be :
The previous sync methods now throw exceptions.
Nodejs stream support
file(name, data)
now accepts a nodejs stream as data. The "nodeStream" methods generates streams.Breaking changes
The undocumented JSZip.compressions object changed : the object now returns
workers to do the job, the previous methods are not used anymore.
A lot of sync methods has been removed.