Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd command-line tool should accept options from the environment #1423

Closed
thiagomacieira opened this issue Nov 21, 2018 · 9 comments
Closed

Comments

@thiagomacieira
Copy link

Like gzip with $GZIP and xz with $XZ_OPT . Quoting xz's manpage:

   XZ_OPT This is for passing options to xz when it  is  not  possible  to  set  the  options
          directly  on the xz command line.  This is the case e.g. when xz is run by a script
          or tool, e.g. GNU tar(1):

                 XZ_OPT=-2v tar caf foo.tar.xz foo

          Scripts may use XZ_OPT e.g. to set script-specific default compression options.  It
          is  still recommended to allow users to override XZ_OPT if that is reasonable, e.g.
          in sh(1) scripts one may use something like this:

                 XZ_OPT=${XZ_OPT-"-7e"}
                 export XZ_OPT
@Cyan4973
Copy link
Contributor

This is debatable.
The environment variables have been curtailed within gzip because they can be abused and become a security hazard. I don't know the details, but it's something like it becomes possible to make gzip erase files, replace files or change rights to files, all transparently, piggy-backing on user commands through environment variables.

Given that we have no strong scenario where environment variables are needed, but we already know this feature caused security risks in the past, we prefer to not go into this direction.

@thiagomacieira
Copy link
Author

thiagomacieira commented Nov 21, 2018

Understood, but this implies that GNU tar should not receive a short-hand switch to compress as .tar.zst, like -z for gzip, -j for .tar.bz2 and -J for .tar.xz. Compression should always be done with a pipeline or by using "-Izstd -19".

@Cyan4973
Copy link
Contributor

You mean, in order to pass a specific compression level since the short-hand will only use the default one ?
Fair point.

I believe we could consider supporting an environment variable limited to compression level only.
This would reduce opportunity to do strange things through combination of commands bundled into the environment variables.

@Cyan4973
Copy link
Contributor

Support for ZSTD_CLEVEL environment variable
has been added by @yijinfb in #1463 .

It makes it possible to override the default compression level.

@Cyan4973 Cyan4973 mentioned this issue Dec 27, 2018
@dkasak
Copy link

dkasak commented Apr 17, 2019

What about using a dictionary when decompressing a tar archive?

@Cyan4973
Copy link
Contributor

This is not recommended.

Dictionaries are effective when shared across a lot of small data.

A tar archive is like a single big stream of bytes. Dictionary efficiency will be underwhelming, since it will save a few KB, which is likely a tiny amount compared to the archive. On the other hand, the requirement to have the dictionary available at decompression time will be a deployment challenge.

@dllud
Copy link

dllud commented Oct 20, 2019

Hi @Cyan4973, thanks for the great work around here.

What about setting the number of parallel threads used for compression?
export XZ_OPT="-T 0" does the trick for xz.

I see you opted for environment variables that are limited to a specific argument. Could you add a new one for the number of threads, say ZSTD_THREADS?

BTW, ZSTD_CLEVEL should be documented on the zstd man page.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Oct 22, 2019

ZSTD_CLEVEL should be documented on the zstd man page.

You are right. It has been recently added to the man page (within dev branch), it should be present in next release.

What about setting the number of parallel threads used for compression?

That could be an interesting option.

Our concerns is that environment arguments can be used to silently alter program behavior, potentially leading to some form of denial, such as resource exhaustion.

When it comes to multithreading, the memory budget is basically multiplied by the number of threads. 2 threads is unlikely to be more dangerous than one, but what about 200 ?

I believe that if we can have a good answer to this risk, it could unlock this candidate feature.

@dllud
Copy link

dllud commented Oct 22, 2019

Good to know it is already documented and thanks for the detailed reply.

Unfortunately I am unable to provide a good answer to the "risk" you mention because I do not perceive it as a risk. It seems to me that you are trying to protect the user from himself. If something else but the user is setting such environment variable, then he probably has much greater concerns other than resource exhaustion.

Silently altering zstd behaviour is precisely what I would like to have, in order to be able to use it comfortably from tar, as can be done with other compression programs. As is, I have to use -Izstd -T0 or -Izstdmt, similar to what @thiagomacieira mentions. Otherwise I would be wasting resources everytime I use it on my machines.

zstd is a command line tool, meaning it is mostly used by professionals or enthusiasts. Like many other professional tools, it can do bad things, but only if you use it improperly. That's why we have manuals. Limiting the flexibility of a professional tool in order to make it just a little less dangerous to the novice (that doesn't care to read the manual) is a tradeoff that I am not fond of.
(Yes, zstd will certainly be used by novice users through some sort of GUI. In that event it is up to the GUI programmer, whom is a professional or enthusiast, to make the right decisions.)

Summing up, IMHO it would suffice to add a warning to the manual stating that multi-threading will lead to memory exhaustion.

Anyway, as an half-baked measure to do damage control to the unwary user, instead of ZSTD_THREADS you could have a boolean variable, ZSTD_MT or ZSTD_MULTITHREADED, that adds -T 0 when set to true. Most machines tend to have a good balance between available CPU cores and installed memory. Seems safe to assume that -T 0 will leave most machines way bellow the swap line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants