Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use extractall(filter=) Python 3.12+ feature #96

Merged
merged 3 commits into from
Oct 7, 2024
Merged

Conversation

dholth
Copy link
Contributor

@dholth dholth commented Jul 31, 2024

Description

Fix #87

Code coverage testing might be tricky on this one?

Did we pick the correct filter? Can we backport it ourselves to older Python, since the Python filter is more sophisticated than our internal version?

(The tests are failing because data_filter has been backported to Python < 3.12)

@dholth dholth requested review from jezdez and a team July 31, 2024 20:00
jezdez
jezdez previously approved these changes Aug 8, 2024
@@ -44,7 +45,14 @@ def checked_members():
yield member

try:
tar_file.extractall(path=dest_dir, members=checked_members())
if HAS_TAR_FILTER:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*puts tar filter on*

@jezdez
Copy link
Member

jezdez commented Aug 8, 2024

I'm not sure I understand why the tests fail..

Copy link
Member

@jezdez jezdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should adopt tar_filter, but I may be not fully aware what other limits it implements, my POSIX foo is pretty weak.

https://github.com/python/cpython/blob/54a05a46002ee1c9211f299df38f444f16866ef5/Lib/tarfile.py#L769-L772 seems pretty close to our checked_members() function though.

tar_file.extractall(
path=dest_dir,
members=checked_members(),
filter=tarfile.fully_trusted_filter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading https://docs.python.org/3/library/tarfile.html#tarfile-extraction-filter closely:

The filter argument to TarFile.extract() or extractall() can be:

  • the string 'fully_trusted': Honor all metadata as specified in the archive. Should be used if the user trusts the archive completely, or implements their own complex verification.

  • the string 'tar': Honor most tar-specific features (i.e. features of UNIX-like filesystems), but block features that are very likely to be surprising or malicious. See tar_filter() for details.

  • the string 'data': Ignore or block most features specific to UNIX-like filesystems. Intended for extracting cross-platform data archives. See data_filter() for details.

  • None (default): Use TarFile.extraction_filter.

Suggested change
filter=tarfile.fully_trusted_filter,
filter="fully_trusted",

I'm aware that _get_filter_function will check for the callable internally, but that's an implementation detail and we should use the standard API.

Copy link
Contributor Author

@dholth dholth Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringly typed API, yuck. I agree that tar filter is probably the correct one. The tests must be wrong. Obviously when thinking about changing the extracted permissions I want to be careful. Maybe proceed by comparing permissions before and after using a large sample of real packages?

@jezdez jezdez requested a review from a team September 3, 2024 14:35
Comment on lines 47 to 55
try:
tar_file.extractall(path=dest_dir, members=checked_members())
if HAS_TAR_FILTER:
tar_file.extractall(
path=dest_dir,
members=checked_members(),
filter=tarfile.fully_trusted_filter,
)
else:
tar_file.extractall(path=dest_dir, members=checked_members())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth considering the partial here to reduce the size of the try block and avoid some duplication?

        if HAS_TAR_FILTER:
            extractall = functools.partial(tar_file.extractall, filter="fully_trusted")
        else:
            extractall = functools.partial(tar_file.extractall)
        try:
            extractall(path=dest_dir, members=checked_members())
         ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or alternatively:

kwargs = {"filter": "fully_trusted"} if HAS_TAR_FILTER else {}

try:
    tar_file.extractall(
        ...,
        **kwargs,
    )
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining fully_trusted with our existing filter is the least risky option.

@@ -36,7 +36,7 @@ def __str__(self):


class TarfileNoSameOwner(tarfile.TarFile):
def __init__(self, *args, umask=UMASK, **kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change causing our test to fail?

@dholth
Copy link
Contributor Author

dholth commented Oct 2, 2024

I've concluded that the umask handling is causing most of the test failures here (where we attempt to grab umask on the fly instead of as a default function argument). Will separate that change and the tar-filter bit into two PR's

@dholth dholth force-pushed the 87-tarfile-filter branch from ac38f94 to dd4e79e Compare October 2, 2024 20:18
@dholth
Copy link
Contributor Author

dholth commented Oct 2, 2024

Now that the tests seem to have been worked out, we could use a better filter than fully_trusted...

@dholth dholth enabled auto-merge (squash) October 3, 2024 19:38
@dholth dholth merged commit d54c487 into main Oct 7, 2024
9 checks passed
@dholth dholth deleted the 87-tarfile-filter branch October 7, 2024 20:12
@dholth
Copy link
Contributor Author

dholth commented Oct 7, 2024

Thanks

jfrimmel added a commit to jfrimmel/fusesoc that referenced this pull request Jan 17, 2025
This commit addresses some warnings issued by Python 3.12+ with regards
to the behavior of `tarfile.extractall()`. Currently that method gets
called in two places: the URL and GitHub provider. Both essentially
trust the archive to not be malicious at the moment: tarfiles can be
crafted to overwrite other parts of the system and have strange links or
even device files...

[PEP 706] tries to fix this potential security vulnerability in a large
amount of code written today by restricting changing the behavior of the
aforementioned method in Python 3.14. At the moment, a warning is issued
to apply a suitable filter parameter. This commit uses such a filter, if
the python version running the code supports it.
The implementation is based on conda/conda-package-streaming#96, which
is a pull request fixing the same thing. The solution of adding the new
filter argument only if supported is elegant and backwards-compatible.

The `data`-filter was chosen, since the archives this project deals with
are typically exactly that: an archive of plain old directories with
regular files in them.

Applying this commit reduces the number of warnings reported by the test
suite from five down to zero. The previous output was:
```log
=================================================== warnings summary ===================================================
tests/test_coremanager.py::test_export
tests/test_coremanager.py::test_export
tests/test_coremanager.py::test_export
tests/test_provider.py::test_github_provider
  /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/github.py:44: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
    t.extractall(cache_root)

tests/test_provider.py::test_url_provider
  /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/url.py:47: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
    t.extractall(local_dir)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
```

[PEP 706]: https://peps.python.org/pep-0706/
jfrimmel added a commit to jfrimmel/fusesoc that referenced this pull request Jan 17, 2025
This commit addresses some warnings issued by Python 3.12+ with regards
to the behavior of `tarfile.extractall()`. Currently that method gets
called in two places: the URL and GitHub provider. Both essentially
trust the archive to not be malicious at the moment: tarfiles can be
crafted to overwrite other parts of the system and have strange links or
even device files...

[PEP 706] tries to fix this potential security vulnerability in a large
amount of code written today by restricting changing the behavior of the
aforementioned method in Python 3.14. At the moment, a warning is issued
to apply a suitable filter parameter. This commit uses such a filter, if
the python version running the code supports it.
The implementation is based on conda/conda-package-streaming#96, which
is a pull request fixing the same thing. The solution of adding the new
filter argument only if supported is elegant and backwards-compatible.

The `data`-filter was chosen, since the archives this project deals with
are typically exactly that: an archive of plain old directories with
regular files in them.

Applying this commit reduces the number of warnings reported by the test
suite from five down to zero. The previous output was:
```log
=================================================== warnings summary ===================================================
tests/test_coremanager.py::test_export
tests/test_coremanager.py::test_export
tests/test_coremanager.py::test_export
tests/test_provider.py::test_github_provider
  /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/github.py:44: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
    t.extractall(cache_root)

tests/test_provider.py::test_url_provider
  /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/url.py:47: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
    t.extractall(local_dir)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
```

It's a bit unfortunate, that the same change needed to be performed in
two places, but I wanted to keep the diff as minimal as possible and did
not want to introduce a new module or similar.

[PEP 706]: https://peps.python.org/pep-0706/
jfrimmel added a commit to jfrimmel/fusesoc that referenced this pull request Jan 17, 2025
This commit addresses some warnings issued by Python 3.12+ with regards
to the behavior of `tarfile.extractall()`. Currently that method gets
called in two places: the URL and GitHub provider. Both essentially
trust the archive to not be malicious at the moment: tarfiles can be
crafted to overwrite other parts of the system and have strange links or
even device files...

[PEP 706] tries to fix this potential security vulnerability in a large
amount of code written today by restricting changing the behavior of the
aforementioned method in Python 3.14. At the moment, a warning is issued
to apply a suitable filter parameter. This commit uses such a filter, if
the python version running the code supports it.
The implementation is based on conda/conda-package-streaming#96, which
is a pull request fixing the same thing. The solution of adding the new
filter argument only if supported is elegant and backwards-compatible.

The `data`-filter was chosen, since the archives this project deals with
are typically exactly that: an archive of plain old directories with
regular files in them.

Applying this commit reduces the number of warnings reported by the test
suite from five down to zero. The previous output was:
```log
=================================================== warnings summary ===================================================
tests/test_coremanager.py::test_export
tests/test_coremanager.py::test_export
tests/test_coremanager.py::test_export
tests/test_provider.py::test_github_provider
  /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/github.py:44: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
    t.extractall(cache_root)

tests/test_provider.py::test_url_provider
  /home/jfrimmel/git/fusesoc/.tox/py3/lib/python3.13/site-packages/fusesoc/provider/url.py:47: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
    t.extractall(local_dir)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
```

It's a bit unfortunate, that the same change needed to be performed in
two places, but I wanted to keep the diff as minimal as possible and did
not want to introduce a new module or similar.

[PEP 706]: https://peps.python.org/pep-0706/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

tarfile filters DeprecationWarning
5 participants