Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling RECORD files with invalid entries #6198

Open
pradyunsg opened this issue Jan 25, 2019 · 9 comments · May be fixed by #11762
Open

Handling RECORD files with invalid entries #6198

pradyunsg opened this issue Jan 25, 2019 · 9 comments · May be fixed by #11762
Labels
state: needs discussion This needs some more discussion

Comments

@pradyunsg
Copy link
Member

Let's remember to perhaps create a new issue regarding how to handle invalid RECORD lines like this (if there isn't already an issue). (It's a somewhat similar discussion to #5913, regarding a different way in which RECORD files can be problematic.)

Originally posted by @cjerdonek in #6191 (comment)

@pradyunsg pradyunsg added the state: needs discussion This needs some more discussion label Jan 25, 2019
@cjerdonek
Copy link
Member

See also:

@zooba
Copy link
Contributor

zooba commented Feb 6, 2019

I'd like to implement #4705, and if there's a broader task to handle a variety of validation issues then I can do those as well.

One thing I am very keen to have though is a way to validate the RECORD file before reading it (essentially, support for RECORD.jws or RECORD.p7s as described in the spec). We'd use this as part of our production builds to ensure that all wheels being installed are not only coming from our internal servers, but have been signed with our certificates.

My ideal overall workflow is:

  • deploy Python+pip+some extension in an independently managed way (i.e. pip doesn't have to solve the bootstrap issue here)
  • configure a site setting that always requires full metadata verification
  • pip extracts only the .dist-info directory
  • implement metadata verification in our extension (e.g. uses Windows cert store for validation, pings internal servers for CRL lists, etc.)
  • once verified, pip extracts only files that are listed in RECORD with matching hashes - fails otherwise (without having written the offending file to disk)
  • continue as normal

I know we've had the discussion before about extension vs. vendoring, and I believe this is another case where we really don't want pip to have to include knowledge about our internal systems :) But I see a couple of valid options here:

  • by default, only extract files that match hashes in RECORD (since pip does not check hashes in wheel RECORD contents during installation #4705 is marked as a bug, I assume this is an okay default)
  • offer a configuration option like dist_info_verifier = my_private_helper:verify (entrypoint style, though not a regular entrypoint) that will 👍/👎 the entire .dist-info directory before extracting anything else
  • design an optional internal package that distributors can add to pip's _vendor directory that, if present, will do the verification unconditionally

The last option seems pretty straightforward to implement and pushes most of the work onto the people who want to add this (i.e. me). The second option could have a default that verifies hashes and can optionally be replaced by something more thorough, which would also allow an "insecure" mode of not even checking hashes.

Thoughts?

@pfmoore
Copy link
Member

pfmoore commented Feb 6, 2019

I don't have a general opinion on the proposal, but I will say that with regard to #4705 I would consider correct behaviour to be to fail the install completely if a hash fails to validate. I'm not keen on the idea of extracting the subset of records whose hashes are OK.

I don't follow your use case, but if it depends on being able to part-install wheels that have files which fail their hash check, then I'm not sure it's something we should be supporting.

@zooba
Copy link
Contributor

zooba commented Feb 6, 2019

I don't follow your use case, but if it depends on being able to part-install wheels that have files which fail their hash check, then I'm not sure it's something we should be supporting.

You're right. I was trying to imply that if a file doesn't validate, it should never hit the disk. If no files hit the disk at all, that's also fine (I happen to know that right now it gets extracted in a code path that has no special knowledge of the source of the ZIP file - I should have ignored that context and specified a more pure requirement.)

In short: yes, fail the install completely if a hash doesn't match. But please do it before extracting the files, not after extracting. (Open question: other file types that don't have RECORD? Arbitrary verification through an extension would be great, even if pip can't do its own RECORD verification when there is no RECORD.)

@dstufft
Copy link
Member

dstufft commented Feb 6, 2019

Wheel signatures are kind of a bad misfeature and I'm not really very enthusiastic about adding code to support them. Does TUF not solve your use case?

I am +1 on checking hashes before files touch disk if possible.

@zooba
Copy link
Contributor

zooba commented Feb 6, 2019

Does TUF not solve your use case?

If it does, it would seem to be by accident as it solves 100 other use cases :( But I suspect "verify the RECORD file is signed using the WinVerifyTrust API" is not one of their goals, though it is one of mine.

One use case I don't need solved is how to let arbitrary people publish their own signed wheels. I have control over that, so I can force my users to sign with a particular certificate, and can manage the public key being on the target machine myself (indeed, this is already managed for me, which is why I have a requirement to use specific Windows APIs to verify).

The alternative for me is to implement my own frontend that can do it. Again, since I control the source of our packages, this is a viable option - I only have to install wheels if that's all I want to do. But there are plenty of other people with similar needs out there and our frontend would not help them (because it'd likely never get released, tbh).

Either way, happy to help with the hash checking on files, even if I end up building a private installer. It raises the bar a little bit for messing with wheels between build/install.

@dstufft
Copy link
Member

dstufft commented Feb 8, 2019

@zooba So I've been thinking about this case (the WinVerifyTrust api) and I remembered this ticket: #1035. The scope of that ticket was a bit wider than what you proposed here, but it would allow you to do what you wanted here (as well as enable other people to do other things).

Does something like that seem reasonable? If so we can re-open that ticket there and I think that accepting a generic "pluggable file verifier" is probably a reasonable thing to do.

@zooba
Copy link
Contributor

zooba commented Feb 8, 2019

@dstufft #1035 (after accounting for your feedback and ignoring the side discussions) looks very much like what I have in mind. (Though frankly I'd prefer to not reopen that issue and avoid notifying everyone who was involved that they should come and rant about GPG/TUF/etc. again...)

I'm also okay with deliberately narrowing the scope, for example:

  • A configurable wheel verifier specified as --verifier "<import module>:<callable>" (or ini file/env/etc.) that takes an open ZipFile object (and dist name/logger/cmdline context) and either returns silently or raises an exception; in the latter case, the wheel is rejected

The default could then be a verifier that checks the hashes in RECORD, which fills that need while also allowing an override without a specific new option. Exceptions are extensible into the future if we see some value in a "try alternate source" result, though I personally don't have a need for that.

If/when all sdist installs are going via wheel, I assume this would still kick in. But maybe there's some context we can pass in to indicate that it's a local build?

For our internal use, I'd even consider having the verifier ping our internal inventory system so we know that a package is being used (tracking everything that gets installed in a 100k person company is hard ;) ). So yeah, there's a lot of value in this kind of hook.

@dstufft
Copy link
Member

dstufft commented Feb 8, 2019

@zooba I think it's fair to open another issue then if you'd rather not have that one re-opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: needs discussion This needs some more discussion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants