-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security] CVE-2007-4559: tarfile: Add absolute_path option to tarfile, disabled by default #73974
Comments
I noticed that "python3 -m tarfile -x archive.tar" uses absolute paths by default, whereas the UNIX tar command doesn't by default. The UNIX tar command requires to add explicitly --absolute-paths (-P) option. I suggest to add a boolean absolute_path option to tarfile, disabled by default. Example to create such archive. See that tar also removes "/" by default and requires to pass explicitly -P: $ cd $HOME
# /home/haypo
$ echo TEST > test
$ tar -cf test.tar /home/haypo/test
tar: Removing leading `/' from member names
$ rm -f test.tar
$ tar -P -cf test.tar /home/haypo/test
$ rm -f test Extracting such archive using tar is safe *by default*: $ mkdir z
$ cd z
$ tar -xf ~/test.tar
tar: Removing leading `/' from member names
$ find
.
./home
./home/haypo
./home/haypo/test Extracting such archive using Python is unsafe: $ python3 -m tarfile -e ~/test.tar
$ cat ~/test
TEST
$ pwd
/home/haypo/z Python creates files outside the current directory which is unsafe, wheras tar doesn't. |
The CLI was added in bpo-13477. I didn’t see any discussion of traversal attacks there, so maybe it was overlooked. Perhaps there should also be a warning, like with the Tarfile.extract and “extractall” methods. However I did see one of the goals was to keep the CLI simple, which I agree with. I would suggest that the CLI get this proposed behaviour by default (matching the default behaviour of modern “tar” commands), with no option to restore the current less-robust behaviour. To implement it, I suggest to fix the module internals first: bpo-21109 and/or bpo-17102. FWIW BSD calls the option “--absolute-paths” (plural paths) <https://www.freebsd.org/cgi/man.cgi?tar%281%29#OPTIONS\>, while Gnu calls it “--absolute-names” <https://www.gnu.org/software/tar/manual/html_chapter/tar_6.html#SEC121\>. Both these options disable other checks, such as for parent directories (..) and external symbolic link targets, so I think the term “absolute” is too specific. But please use at least replace the underscore with a dash or hyphen: “--absolute-path”, not “--absolute_path”. |
Questions:
|
Which versions are affected by this CVE? |
To be honest, whenever path is non-empty, I would expect it to be regarded "safe" to the extent that any final path must be contained inside the path user gives. So essentially passing relative paths would be fine as long as it does not escape the directory given by the user of the API. So eg I give path "/", then tarfile is allowed to unpack anywhere. I give path "", then there is no checking whatsoever (for backwards compat reasons). When I give path "something-else", then trying to unpack outside "something-else" raises an exception. |
Another possibility would be raising audit event for the path names so you can write an audit hook that prevents unpacking unsafe paths. This implies no API changes. |
@alittlesir everything everywhere. Documentation says you must check paths to be safe before unpacking. |
As a commercial python developer I would greatly appreciate this to become part of CPython. So while "technically correct" python implementation was deemed "non-security issue" (#45385), the fact that it resoluted in thousands of vulnerable projects seems to indicate that in retrospect, it is an issue. Despite documentation warning. Like great, we have documentation which says to watch out, but here we are. People are not reading it. And all that for "being technically correct" and in support of a feature that hardly every anyone wants or needs. Alternative non-breaking (and I really wonder which projects would be broken here) would be to have a subclass that does the protecting by default, in similar fashion as tarsafe project (https://github.com/beatsbears/tarsafe , but with different implementation, as the one there seems quite inefficient). |
I would prefer secure by default. This is implicitly used by various API's like shutil.unpack_archive which does not check TarInfo items are safe paths. Even standard library is not following tarfile module documentation. |
I wrote a comment back in 2014 on the various security aspects of tarfile: #65308 (comment) |
@gustaebel yes. I was mostly talking about other higher level tools in standard library that build on top of tarfile not being remotely secure either. I suggested adding audit event for unpacked TarInfo, that would easily allow gluing security on top without dropping all these nice things standard library has gathered. As said, shutil.unpack_archive will automatically call into insecure code in tarfile behind your back after sniffing that it needs tarfile based on file extension. This is not great design. |
I'd like to point out that PyPI itself is at risk, given that PEP-458 for signed package metadata has still not been implemented a decade after it was proposed. In short, every PyPI package should be considered "untrusted". While |
@jorhett the funny thing there is that this issue is about tarfile when in fact zipfile has the exact same vulnerability class considered by the CVE and PyPI has now converged around wheels that are essentially ZIP packages. Back when the CVE was filed tarfile was still used more. |
The not-funny thing is that PyPI packing is not cryptographically verifiable, there are no published instructions for validating PyPI packages, and yet Python maintainers consider it the user's responsibility to inspect the code before using it. I do this for a living, and Python is the hardest to evaluate. It has the least verifications/validations of origin possible. So we have to build a VM and diff before/after to be certain of what it does. Yeah, all the people using Python plugins in their tools are responsible for taking this kind of effort 🤦 |
See also issue #94531 "Document zlib, gzip, bz2 and tarfile known vulnerabilities". |
it seems to have a big impact. when are you going to fix this cve? @vstinner |
@Denelvo I rather suggest everyone take a look at the actual code in Spyder IDE that is the basis for the exploit (from this blog article from Trellix).
To quote the pickle documentation:
This is a security vulnerability in Spyder IDE. |
Kind of yes but also if you have code that doesn't sanitize unpacking paths, then unrelated code cannot trust any path in entire hard disk anymore for safe unpickling. And you could even simpler attack write on Linux .bashrc file into user home for arbitrary code execution afterwards. |
Making tarfile safe by default was never expected to solve all software problems such as unsafe use of the pickle module. Can we move on to discussing what the PR should look like? How about the following default behaviour:
At minimum, the tarfile module should provide a |
I don't think that's enough: https://gist.github.com/Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e?permalink_comment_id=4358347#gistcomment-4358347 |
Digging deeper in the GNU How does GNU |
Note that the symlink trick can be applied also to file symlinks; the tar file created by:
will try to write passwd when extracted by
(needs some more work, like a test, plus I suppose |
The tarfile.extractall() command is vulnerable to path traversal, which may be exploited by adding a member with an "../" path to the tarball. In our case, this might open up the possibility of malicious data injection to someone that doesn't normally have access to the Open edX cluster, but does have write access to the S3 bucket. In that case, bad things could happen upon extraction of a thus-crafted archive, during an automated restore. This shouldn't have particularly wide-ranging implications since the only filesystem affected by such an attack would be the restore job's container, which is by definition short-lived. And an attacker with access to the S3 bucket could already do far greater damage to the Open edX installation by simply modifying the MongoDB or MySQL data contained in the tarball. Still, it does not hurt to use a safer (if slightly slower) approach that is provided by the tarsafe module. References: python/cpython#73974 https://mail.python.org/pipermail/python-dev/2007-August/074290.html https://nvd.nist.gov/vuln/detail/CVE-2007-4559
The tarfile.extractall() command is vulnerable to path traversal, which may be exploited by adding a member with an "../" path to the tarball. In our case, this might open up the possibility of malicious data injection to someone that doesn't normally have access to the Open edX cluster, but does have write access to the S3 bucket. In that case, bad things could happen upon extraction of a thus-crafted archive, during an automated restore. This shouldn't have particularly wide-ranging implications since the only filesystem affected by such an attack would be the restore job's container, which is by definition short-lived. And an attacker with access to the S3 bucket could already do far greater damage to the Open edX installation by simply modifying the MongoDB or MySQL data contained in the tarball. Still, it does not hurt to use a safer (if slightly slower) approach that is provided by the tarsafe module. References: python/cpython#73974 https://mail.python.org/pipermail/python-dev/2007-August/074290.html https://nvd.nist.gov/vuln/detail/CVE-2007-4559
FWIW, there is a spam bot PR campaign by @TrellixVulnTeam and @Kasimir123 (they should be banned) that has been going on since September and spams a large number of python projects with junk PRs with incorrect patches. Examples include: |
Hi All, My name is Jonathan Leitschuh, I'm a Senior Software Security Researcher for Project Alpha Omega at the Open Source Security Foundation under the Linux Foundation. I recently attended a talk by @Kasimir123 at ShmooCon about this vulnerability and this work. They recently opened 61,895 pull requests across the OSS ecosystem to fix this vulnerability as the Python maintainers have taken no action to fix this severe vulnerability after 6 years. To be clear, vulnerabilities like TAR-slip (aka Zip-slip) can lead to remote code execution by allowing an attacker to write code into executable file locations. In the right contexts, this vulnerability has a CVSSv3.1 score of 10/10 (AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H). While I believe that fixing the simlink problem is important. I believe that this should be broken down into two issues:
Given the first is far more likely to be exploited in the real world, I propose fixing it first, then focusing on the second problem. This will ensure that end-users are protected against the most likely risk sooner, while the less likely risk discussion continues. To provide an incentive: I would also like to note that any developer who resolves this security issue in the python standard library would most likely be eligible for a multi-thousand dollar financial reward under the Secure Open Source Rewards (https://sos.dev) project which is also operated out of the Open Source Security Foundation. |
FWIW, I'm reading up on this issue, and I plan to post a proposal on Discourse this week to unblock things. |
@encukou We have multiple proposals in the comments here and the linked issues. The problem is that no proposal has received positive feedback from the core cpython team to motivate one of us go to the next step of implementing it and making a PR. Since a previous PR fixing the issue has been rejected, rewards from a new PR (financial or not) are uncertain. If you make yet another proposal please link to it from here. |
Yeah, it would be unfair if I ended up getting money for this, outside my normal salary. |
My proposal: https://discuss.python.org/t/23149 |
PEP-706, which adds Python 3.12, and security updates to some earlier releases, will allow users to avoid CVE-2007-4559 by changing their code/settings. |
@encukou have I missed something or do these changes now break shutil.unpack_archive abstraction layer for zipfile vs tarfile? |
Yes. Sadly, in Python 3.12 and 3.13 you'll get a warning about changing behaviour if you use shutil.unpack_archive with a tarball. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: