Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: perform a single hash validation of wheel files #8027

Conversation

ralbertazzi
Copy link
Contributor

Re-introduce wheel content file validation in a more performant way, aka computing (and validating) hash once while installing the wheel (and verifying the wheel at the end of the installation process).

Related to #7983 and #7987

@dimbleby
Copy link
Contributor

I do not like the amount of pypa/installer that is being duplicated by poetry already, and this isn't helping!

Are you sure you wouldn't prefer to fix up your MR in that project - I see that the pipeline is currently failing - and wait for it to be released? Then just use pypa/installer instead of re-implementing it?

Copy link
Member

@radoering radoering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have copied my comments from the other PR so they will not be forgotten.

I agree that we should prefer a solution in installer if possible. Another approach to avoid calculating the hashes twice might be to add a "use hashes from RECORD file" parameter to the install() function that can be set if validation was successful.

By the way, what's the performance penalty for calculating the hashes twice for a big wheel like torch?

Comment on lines +110 to +112
computed_record = next(
record for _, record in records if record.path == item.filename
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't always work. I tested a large project and got a StopIteration for some packages.

You can reproduce it by running poetry add fonttools for example.

What's even worth: The package is installed, but the RECORD file is not written.

poetry run pip uninstall fonttools now throws ERROR: Cannot uninstall fonttools 4.39.4, RECORD file not found. Hint: The package was installed by Poetry 1.6.0.dev0. and poetry run pip install fonttools gives Requirement already satisfied.

You can only "manually" uninstall the package.

Comment on lines +133 to +134
self._validate_hash_and_size(records)
return super().finalize_installation(scheme, record_file_path, records)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._validate_hash_and_size(records)
return super().finalize_installation(scheme, record_file_path, records)
try:
self._validate_hash_and_size(records)
finally:
return super().finalize_installation(scheme, record_file_path, records)

I wonder if we should do something like that to make sure the RECORD file is written despite of bugs in _validate_hash_and_size(). However, when I tried this, the exception was swallowed, which not good either. Thus, we would have to put more effort here...

@ralbertazzi
Copy link
Contributor Author

By the way, what's the performance penalty for calculating the hashes twice for a big wheel like torch?

It's almost double. Here's the benchmark while installing https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp39-cp39-linux_x86_64.whl (2.1 GB) performed on Poetry 1.5.1 and pypa/installer@5f281b8 (ie with the hash in-memory computation fix)

  • validate_contents = true -> 97 s
  • validate_contants = false -> 55 s
  • installer.modern-installation false -> 51 s

cc @Secrus as now maintainer of installer.

I agree this is not the best place to fix it, it was a quick hack in case we wanted to keep the feature in 1.5.1. I'll open an issue on installer.

@ralbertazzi
Copy link
Contributor Author

pypa/installer#189

@ralbertazzi ralbertazzi deleted the feat/wheel-one-hash-computation branch May 30, 2023 07:37
Copy link

github-actions bot commented Mar 3, 2024

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants