-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use extractall(filter=) Python 3.12+ feature #96
Conversation
@@ -44,7 +45,14 @@ def checked_members(): | |||
yield member | |||
|
|||
try: | |||
tar_file.extractall(path=dest_dir, members=checked_members()) | |||
if HAS_TAR_FILTER: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*puts tar filter on*
I'm not sure I understand why the tests fail.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should adopt tar_filter, but I may be not fully aware what other limits it implements, my POSIX foo is pretty weak.
https://github.com/python/cpython/blob/54a05a46002ee1c9211f299df38f444f16866ef5/Lib/tarfile.py#L769-L772 seems pretty close to our checked_members()
function though.
conda_package_streaming/extract.py
Outdated
tar_file.extractall( | ||
path=dest_dir, | ||
members=checked_members(), | ||
filter=tarfile.fully_trusted_filter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From reading https://docs.python.org/3/library/tarfile.html#tarfile-extraction-filter closely:
The filter argument to TarFile.extract() or extractall() can be:
the string 'fully_trusted': Honor all metadata as specified in the archive. Should be used if the user trusts the archive completely, or implements their own complex verification.
the string 'tar': Honor most tar-specific features (i.e. features of UNIX-like filesystems), but block features that are very likely to be surprising or malicious. See tar_filter() for details.
the string 'data': Ignore or block most features specific to UNIX-like filesystems. Intended for extracting cross-platform data archives. See data_filter() for details.
None (default): Use TarFile.extraction_filter.
filter=tarfile.fully_trusted_filter, | |
filter="fully_trusted", |
I'm aware that _get_filter_function
will check for the callable internally, but that's an implementation detail and we should use the standard API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stringly typed API, yuck. I agree that tar filter is probably the correct one. The tests must be wrong. Obviously when thinking about changing the extracted permissions I want to be careful. Maybe proceed by comparing permissions before and after using a large sample of real packages?
conda_package_streaming/extract.py
Outdated
try: | ||
tar_file.extractall(path=dest_dir, members=checked_members()) | ||
if HAS_TAR_FILTER: | ||
tar_file.extractall( | ||
path=dest_dir, | ||
members=checked_members(), | ||
filter=tarfile.fully_trusted_filter, | ||
) | ||
else: | ||
tar_file.extractall(path=dest_dir, members=checked_members()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth considering the partial here to reduce the size of the try block and avoid some duplication?
if HAS_TAR_FILTER:
extractall = functools.partial(tar_file.extractall, filter="fully_trusted")
else:
extractall = functools.partial(tar_file.extractall)
try:
extractall(path=dest_dir, members=checked_members())
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or alternatively:
kwargs = {"filter": "fully_trusted"} if HAS_TAR_FILTER else {}
try:
tar_file.extractall(
...,
**kwargs,
)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combining fully_trusted with our existing filter is the least risky option.
@@ -36,7 +36,7 @@ def __str__(self): | |||
|
|||
|
|||
class TarfileNoSameOwner(tarfile.TarFile): | |||
def __init__(self, *args, umask=UMASK, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change causing our test to fail?
I've concluded that the umask handling is causing most of the test failures here (where we attempt to grab umask on the fly instead of as a default function argument). Will separate that change and the tar-filter bit into two PR's |
ac38f94
to
dd4e79e
Compare
Now that the tests seem to have been worked out, we could use a better filter than fully_trusted... |
Thanks |
This commit addresses some warnings issued by Python 3.12+ with regards to the behavior of `tarfile.extractall()`. Currently that method gets called in two places: the URL and GitHub provider. Both essentially trust the archive to not be malicious at the moment: tarfiles can be crafted to overwrite other parts of the system and have strange links or even device files... [PEP 706] tries to fix this potential security vulnerability in a large amount of code written today by restricting changing the behavior of the aforementioned method in Python 3.14. At the moment, a warning is issued to apply a suitable filter parameter. This commit uses such a filter, if the python version running the code supports it. The implementation is based on conda/conda-package-streaming#96, which is a pull request fixing the same thing. The solution of adding the new filter argument only if supported is elegant and backwards-compatible. The `data`-filter was chosen, since the archives this project deals with are typically exactly that: an archive of plain old directories with regular files in them. Applying this commit reduces the number of warnings reported by the test suite from five down to zero. The previous output was: ```log =================================================== warnings summary =================================================== tests/test_coremanager.py::test_export tests/test_coremanager.py::test_export tests/test_coremanager.py::test_export tests/test_provider.py::test_github_provider /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/github.py:44: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. t.extractall(cache_root) tests/test_provider.py::test_url_provider /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/url.py:47: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. t.extractall(local_dir) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ``` [PEP 706]: https://peps.python.org/pep-0706/
This commit addresses some warnings issued by Python 3.12+ with regards to the behavior of `tarfile.extractall()`. Currently that method gets called in two places: the URL and GitHub provider. Both essentially trust the archive to not be malicious at the moment: tarfiles can be crafted to overwrite other parts of the system and have strange links or even device files... [PEP 706] tries to fix this potential security vulnerability in a large amount of code written today by restricting changing the behavior of the aforementioned method in Python 3.14. At the moment, a warning is issued to apply a suitable filter parameter. This commit uses such a filter, if the python version running the code supports it. The implementation is based on conda/conda-package-streaming#96, which is a pull request fixing the same thing. The solution of adding the new filter argument only if supported is elegant and backwards-compatible. The `data`-filter was chosen, since the archives this project deals with are typically exactly that: an archive of plain old directories with regular files in them. Applying this commit reduces the number of warnings reported by the test suite from five down to zero. The previous output was: ```log =================================================== warnings summary =================================================== tests/test_coremanager.py::test_export tests/test_coremanager.py::test_export tests/test_coremanager.py::test_export tests/test_provider.py::test_github_provider /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/github.py:44: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. t.extractall(cache_root) tests/test_provider.py::test_url_provider /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/url.py:47: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. t.extractall(local_dir) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ``` It's a bit unfortunate, that the same change needed to be performed in two places, but I wanted to keep the diff as minimal as possible and did not want to introduce a new module or similar. [PEP 706]: https://peps.python.org/pep-0706/
This commit addresses some warnings issued by Python 3.12+ with regards to the behavior of `tarfile.extractall()`. Currently that method gets called in two places: the URL and GitHub provider. Both essentially trust the archive to not be malicious at the moment: tarfiles can be crafted to overwrite other parts of the system and have strange links or even device files... [PEP 706] tries to fix this potential security vulnerability in a large amount of code written today by restricting changing the behavior of the aforementioned method in Python 3.14. At the moment, a warning is issued to apply a suitable filter parameter. This commit uses such a filter, if the python version running the code supports it. The implementation is based on conda/conda-package-streaming#96, which is a pull request fixing the same thing. The solution of adding the new filter argument only if supported is elegant and backwards-compatible. The `data`-filter was chosen, since the archives this project deals with are typically exactly that: an archive of plain old directories with regular files in them. Applying this commit reduces the number of warnings reported by the test suite from five down to zero. The previous output was: ```log =================================================== warnings summary =================================================== tests/test_coremanager.py::test_export tests/test_coremanager.py::test_export tests/test_coremanager.py::test_export tests/test_provider.py::test_github_provider /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/github.py:44: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. t.extractall(cache_root) tests/test_provider.py::test_url_provider /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/url.py:47: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. t.extractall(local_dir) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ``` It's a bit unfortunate, that the same change needed to be performed in two places, but I wanted to keep the diff as minimal as possible and did not want to introduce a new module or similar. [PEP 706]: https://peps.python.org/pep-0706/
Description
Fix #87
Code coverage testing might be tricky on this one?
Did we pick the correct filter? Can we backport it ourselves to older Python, since the Python filter is more sophisticated than our internal version?
(The tests are failing because data_filter has been backported to Python < 3.12)