-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native zip support #45434
Comments
You should probably also enumerate what zip file features you think are in or out of scope. To name a few: large archive support, encryption, bzip2/lzma compression, etc. |
I'd tend to see the following: Filesystem access (👎 out of scope)Node already has primitives allowing to access the filesystem, so the native archive implementation shouldn't open/write files directly. It should only accept and output raw buffers, leaving it up to the user to retrieve them (usually with A post-mvp follow-up might be to make it work with Arbitrary compression (👍 in scope)Generating data would require to provide the API with pre-compressed buffers (and algorithm), whereas reading data would yield the compressed data - that users would then uncompress themselves. This would also make it possible for users to generate stable archives (same input => same output), which is more difficult if the compression is left to Node. Would it be worthwhile to have a fast path for deflate, which is the most common compression out there? Stat / External attributes (👍 in scope)The API should offer a way to set the files' stat, including external attributes (required for symlink support). Encryption (🔮 post-mvp scope)It should be technically fairly easy to do via libzip. Main consideration would be how basic we want the first iteration to be (if we want to start small with only the features that 95% of consumers would use, then encryption might not be needed). Large Archives (🔮 post-mvp scope)Zip is limited to 32 bits in a couple of places, but libzip supports Zip 64 which doesn't have those problems. Same thing as the previous point, I'd tend to see it as something that we could expose, but that I'd see as a second iteration (although unlike encryption it probably doesn't impact the public API much). Splitting a single archive into multiple files ( |
I just discovered this issue. Here’s what I’m currently doing; I’d love to be able to do this with import { readFile } from 'node:fs/promises'
import jszip from 'jszip'
const code = await readFile('./index.js', 'utf8')
const zip = new jszip()
zip.file('index.js', code, { unixPermissions: 755 })
const zipBuffer = await zip.generateAsync({
type: 'nodebuffer',
compression: 'DEFLATE',
platform: 'UNIX',
}) |
Another approach is by spawning a native shell tool from Linux. In this way, we do not have to install any third-party js dependencies. first, install the zip tool (e.g. debian), then, in the code, simply call https://packages.debian.org/bullseye/zip |
that will only work for linux/unix , and only assuming the tool is installed - which is not the case for many ditros |
I started a prototype implementation over at #45651 |
-1 I know this is an unpopular opinion, but I'm not convinced this should live in node core. Sure it's a common file format, but then again so are tar, rar, 7z, zstd, xz, and others, which also don't belong in node core. I feel like adding such modules to node core is further leading to feature creep. I get that other platforms like PHP and such may have zip modules, but they also include a ton of other modules that make them "kitchen sink" platforms, which I would hate to see node.js become. |
I understand the general concern, and I wouldn't want to increase the Node.js surface if the value wasn't well identified. But while some modules can happen in userland, I think archive support (whether it's zip or something else) is important:
To summarize, I think an archive manipulation library would have a reasonable amount of practical use cases, for a comparatively low cost. It may not be a feature commonly used by end-users, but I expect tooling authors to benefit from it in ways that will still benefit end-users.
I don't see it as a slippery slope situation - this feature request is about providing efficient support for a use case that cannot be implemented efficiently today. Once this use case is fulfilled, regardless which format is picked, you will then be able to point out to future requesters that Node.js already supports an archive format well - which isn't the case today. As for why zip, I went with it for a couple of reasons:
Other formats may have better compression, but since the goal of this work is first and foremost to address use cases, I went with the format that offered a good compromise between popularity and feature set. |
I'm pretty sure we've heard this argument for just about any proposed feature. In this issue alone, the same thing can be said about tar, xz, etc., which is the slippery slope that @mscdex mentioned.
This can, again, be said about everything that's best implemented as a native C++ addon right now. Native addons are a burden, and WebAssembly can be a meaningful alternative, but we can't add everything to Node.js that would otherwise require a native addon.
I might be mistaken but I generally consider Linux-like operating systems the primary domain of Node.js, maybe with the exception of Electron apps. And isn't zip terrible at preserving unix file attributes, for example? I don't see how we could possibly reject requests for tar or other archive formats by pointing to zip. |
I'm not aware of other Node.js C++ addons that would pretend to be used in the critical path of loading an entire Node.js application. Please consider the arguments I exposed as a whole - I'm sure we can find reasons why any single one wouldn't be a good enough reason if alone, but once taken all together I find they make a lot more sense.
Symlinks can be represented just fine, mtime as well, and chmod equally. It doesn't support uid / gid / dev, but I never found that a limitation - at least not one I saw when I used it in my tools, which are used in production across all systems. And since you mention Electron apps, ASAR archives don't contain any of these metadata, with the exception of the +x bit. That seems to satisfy their needs just fine. To be clear, I'm not rejecting tar, or any other format. As I said, since the goal of this work is first and foremost to address use cases, I went with the format that offered a good compromise between popularity and feature set. Random access is a very important feature, as it's what supports the use cases I described. |
For me this feels like a missing feature that we should add. We have utilities for all sorts of compression and decompression algorithms in |
I believe the original reason for adding a zlib binding was for the purposes of compressing HTTP responses, not just for the sake of having common compression formats available. Now you might argue the relevancy of deflate for HTTP responses in modern day HTTP servers, but that was most likely why it was originally added (back in 2011). If zlib had some kind of built-in support for zip, that would be one thing and would mean we wouldn't need to pull in an additional dependency and such, but that's not the case today. |
I should add that there is minizip in the zlib source code, but I'm not entirely sure of the quality/reliability of that code (since it lives in the contrib/ dir) and/or if its feature set/performance would suffice for everyone interested in zip support. Additionally I'm not sure if the Chromium team would want to keep the minizip files up to date since I presume they're not actively using them? Otherwise we'd have to pull in minizip changes from the madler/zlib repo. |
And the reason for adding an archive binding is for the purpose of efficiently storing and transferring multiple files inside portable packages, not just for the sake of having common archive formats available 🙂
The zlib project only covers the deflate algorithm, not file archives or other compression algorithms1, so the minizip in their contrib folder has little to no maintenance; its very website suggests minizip-ng instead (or libzip, which is used in both Chrome and the PHP-Zip library). Footnotes
|
HTTP was a core feature of node from the beginning, storing and transferring multiple files inside a portable package was not.
For the record, I wasn't keen on adding brotli either, for similar reasons as zip. |
If the experimental SEA implementation moves forward to the point where Node.js internally needs support for any specific archive format, be it ASAR, tar, or even zip, that is still not the same as adding a public API for a specific archive format. If there is a requirement for user code to access archive members when packaged as some SEA, that also does not necessarily require a general-purpose API for said archive format. |
SEA-related experiments are a use case, not the use case. This issue is about the general need, not an Node.js-internal one. |
Specifically, it was added so npm could do its thing. (Hi, node's living memory here!) |
IMHO, the fact that archives (tar) or members of archives (zip) are commonly compressed does not mean that supporting compression is any justification for supporting archives. |
I think the average user probably doesn’t grasp the distinction between compressed data and a compressed archive. I was one such user who thought that We don’t have clear criteria (that I’m aware of) for what belongs in core versus not, and lately we’ve been relaxing a bit from the project’s earlier “small core” devotion. It’s obviously always debatable, but this does have several aspects that usually lean in favor of inclusion: it’s something that benefits from native compilation, it’s related to existing APIs, and it covers a use case commonly handled in scripts (where importing third-party dependencies is cumbersome). Yes Node scripts could call their platforms’ Ultimately it’s a subjective call whether it’s useful enough, and I think we usually settle decisions like those via votes since that’s a hard question to reach consensus on. |
I don't know if this really is a common misconception. I don't remember seeing many feature requests related to this. (
Scripts are an interesting use case. Again, I have no data to base this claim on, but I would assume that Node.js scripts much more often run on non-Windows platforms than on Windows. On non-Windows platforms, Consider, for example, the scripts that produce Node.js builds -- the Node.js 19.2.0 download directory contains 19 This does not seem in line with @arcanis's earlier argument:
|
FWIW Windows 10 got |
This comment was marked as off-topic.
This comment was marked as off-topic.
There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment. For more information on how the project manages feature requests, please consult the feature request management document. |
I still want to bring it to the finish line; last blocker so far is a memory leak reported by Asan which I found difficult to investigate. I'll make another attempt in the coming weeks. |
@arcanis I don't think the memory leak is the main issue. There simply is no consensus that this should be added. |
I think both pros and cons have been clearly stated, yes. As you say, consensus on yes/no hasn't been reached yet. My understanding is that in cases like this whether to merge it or not will be a TSC decision. Given that the major concerns you and other have raised are around scope and maintainability, I'd prefer to avoid theoretical back and forth, and have a working prototype showing exactly what's the envisioned scope and maintenance burden At the very least, this might prove I'm committed to maintain it myself on the long term, and be enough to address some concerns on this side. Worst case, I lose some of my time, but I believe this feature is worth giving a real try, and seeing the upvotes, I'm not alone 🙂 (In any case, my previous comment was mostly intended to avoid stale close, not to add new elements to the discussion) |
There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment. For more information on how the project manages feature requests, please consult the feature request management document. |
I would very much like to see this feature. |
There has been no activity on this feature request for 5 months. To help maintain relevant open issues, please add the
never-stale
|
🪄 unstale: The basic feature set outlined in the OP would cover my needs and would allow me to drop a native addon dependency. |
There has been no activity on this feature request for 5 months. To help maintain relevant open issues, please add the
never-stale
|
To remove stale... |
What is the problem this feature will solve?
This issue is to test the water a little bit.
Node supports zlib natively, which takes care of the compression / decompression aspect of sharing files, but it doesn’t include any primitive allowing to manipulate file archives of any kind. It can be implemented in userland (and in a project of mine we’re doing this by compiling the libzip to wasm), but it’s significantly slower (both in runtime, boot time, and size).
Being able to read file archives is especially useful when coupled with the ESM loaders, since it then becomes possible to implement loading modules straight from packaged vendors - a more performant alternative to bundled packages which doesn’t require to precompile the code and allows distributing it along with non-JS files.
What is the feature you are proposing to solve the problem?
I’d like to investigate whether it’d make sense to implement a basic archive manipulation library in Node.js. This isn’t unheard of, multiple interpreters having similar capabilities, sometimes even supporting multiple formats:
The API of this basic archive library would be to define, but generally we can assume some simple requirements:
I’d be interested to explore an implementation myself.
What alternatives have you considered?
No response
@nodejs/zlib
The text was updated successfully, but these errors were encountered: