Handling RECORD files with invalid entries #6198

pradyunsg · 2019-01-25T02:02:52Z

Let's remember to perhaps create a new issue regarding how to handle invalid RECORD lines like this (if there isn't already an issue). (It's a somewhat similar discussion to #5913, regarding a different way in which RECORD files can be problematic.)

Originally posted by @cjerdonek in #6191 (comment)

cjerdonek · 2019-01-25T02:38:23Z

See also:

People are distributing invalid wheels because pip is sloppy about checking them; pip should be pickier #3513 ("People are distributing invalid wheels because pip is sloppy about checking them")
pip does not check hashes in wheel RECORD contents during installation #4705 ("pip does not check hashes in wheel RECORD contents during installation")
Handling RECORD files with duplicate entries #5913 ("Handling RECORD files with duplicate entries")

zooba · 2019-02-06T18:07:14Z

I'd like to implement #4705, and if there's a broader task to handle a variety of validation issues then I can do those as well.

One thing I am very keen to have though is a way to validate the RECORD file before reading it (essentially, support for RECORD.jws or RECORD.p7s as described in the spec). We'd use this as part of our production builds to ensure that all wheels being installed are not only coming from our internal servers, but have been signed with our certificates.

My ideal overall workflow is:

deploy Python+pip+some extension in an independently managed way (i.e. pip doesn't have to solve the bootstrap issue here)
configure a site setting that always requires full metadata verification
pip extracts only the .dist-info directory
implement metadata verification in our extension (e.g. uses Windows cert store for validation, pings internal servers for CRL lists, etc.)
once verified, pip extracts only files that are listed in RECORD with matching hashes - fails otherwise (without having written the offending file to disk)
continue as normal

I know we've had the discussion before about extension vs. vendoring, and I believe this is another case where we really don't want pip to have to include knowledge about our internal systems :) But I see a couple of valid options here:

by default, only extract files that match hashes in RECORD (since pip does not check hashes in wheel RECORD contents during installation #4705 is marked as a bug, I assume this is an okay default)
offer a configuration option like dist_info_verifier = my_private_helper:verify (entrypoint style, though not a regular entrypoint) that will 👍/👎 the entire .dist-info directory before extracting anything else
design an optional internal package that distributors can add to pip's _vendor directory that, if present, will do the verification unconditionally

The last option seems pretty straightforward to implement and pushes most of the work onto the people who want to add this (i.e. me). The second option could have a default that verifies hashes and can optionally be replaced by something more thorough, which would also allow an "insecure" mode of not even checking hashes.

Thoughts?

pfmoore · 2019-02-06T19:31:54Z

I don't have a general opinion on the proposal, but I will say that with regard to #4705 I would consider correct behaviour to be to fail the install completely if a hash fails to validate. I'm not keen on the idea of extracting the subset of records whose hashes are OK.

I don't follow your use case, but if it depends on being able to part-install wheels that have files which fail their hash check, then I'm not sure it's something we should be supporting.

zooba · 2019-02-06T19:38:32Z

I don't follow your use case, but if it depends on being able to part-install wheels that have files which fail their hash check, then I'm not sure it's something we should be supporting.

You're right. I was trying to imply that if a file doesn't validate, it should never hit the disk. If no files hit the disk at all, that's also fine (I happen to know that right now it gets extracted in a code path that has no special knowledge of the source of the ZIP file - I should have ignored that context and specified a more pure requirement.)

In short: yes, fail the install completely if a hash doesn't match. But please do it before extracting the files, not after extracting. (Open question: other file types that don't have RECORD? Arbitrary verification through an extension would be great, even if pip can't do its own RECORD verification when there is no RECORD.)

dstufft · 2019-02-06T19:39:42Z

Wheel signatures are kind of a bad misfeature and I'm not really very enthusiastic about adding code to support them. Does TUF not solve your use case?

I am +1 on checking hashes before files touch disk if possible.

zooba · 2019-02-06T19:48:04Z

Does TUF not solve your use case?

If it does, it would seem to be by accident as it solves 100 other use cases :( But I suspect "verify the RECORD file is signed using the WinVerifyTrust API" is not one of their goals, though it is one of mine.

One use case I don't need solved is how to let arbitrary people publish their own signed wheels. I have control over that, so I can force my users to sign with a particular certificate, and can manage the public key being on the target machine myself (indeed, this is already managed for me, which is why I have a requirement to use specific Windows APIs to verify).

The alternative for me is to implement my own frontend that can do it. Again, since I control the source of our packages, this is a viable option - I only have to install wheels if that's all I want to do. But there are plenty of other people with similar needs out there and our frontend would not help them (because it'd likely never get released, tbh).

Either way, happy to help with the hash checking on files, even if I end up building a private installer. It raises the bar a little bit for messing with wheels between build/install.

dstufft · 2019-02-08T02:48:32Z

@zooba So I've been thinking about this case (the WinVerifyTrust api) and I remembered this ticket: #1035. The scope of that ticket was a bit wider than what you proposed here, but it would allow you to do what you wanted here (as well as enable other people to do other things).

Does something like that seem reasonable? If so we can re-open that ticket there and I think that accepting a generic "pluggable file verifier" is probably a reasonable thing to do.

zooba · 2019-02-08T18:23:51Z

@dstufft #1035 (after accounting for your feedback and ignoring the side discussions) looks very much like what I have in mind. (Though frankly I'd prefer to not reopen that issue and avoid notifying everyone who was involved that they should come and rant about GPG/TUF/etc. again...)

I'm also okay with deliberately narrowing the scope, for example:

A configurable wheel verifier specified as --verifier "<import module>:<callable>" (or ini file/env/etc.) that takes an open ZipFile object (and dist name/logger/cmdline context) and either returns silently or raises an exception; in the latter case, the wheel is rejected

The default could then be a verifier that checks the hashes in RECORD, which fills that need while also allowing an override without a specific new option. Exceptions are extensible into the future if we see some value in a "try alternate source" result, though I personally don't have a need for that.

If/when all sdist installs are going via wheel, I assume this would still kick in. But maybe there's some context we can pass in to indicate that it's a local build?

For our internal use, I'd even consider having the verifier ping our internal inventory system so we know that a package is being used (tracking everything that gets installed in a 100k person company is hard ;) ). So yeah, there's a lot of value in this kind of hook.

dstufft · 2019-02-08T22:22:05Z

@zooba I think it's fair to open another issue then if you'd rather not have that one re-opened.

pradyunsg added the state: needs discussion This needs some more discussion label Jan 25, 2019

hoechenberger mentioned this issue Jan 29, 2019

Travis currently not building psychopy/psychopy#2238

Closed

zooba mentioned this issue Feb 9, 2019

Pluggable wheel verifier #6248

Open

SpecLad linked a pull request Jan 29, 2023 that will close this issue

Do basic syntax checks on the RECORD file when installing a wheel #11762

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling RECORD files with invalid entries #6198

Handling RECORD files with invalid entries #6198

pradyunsg commented Jan 25, 2019

cjerdonek commented Jan 25, 2019

zooba commented Feb 6, 2019

pfmoore commented Feb 6, 2019

zooba commented Feb 6, 2019

dstufft commented Feb 6, 2019

zooba commented Feb 6, 2019

dstufft commented Feb 8, 2019

zooba commented Feb 8, 2019

dstufft commented Feb 8, 2019

Handling RECORD files with invalid entries #6198

Handling RECORD files with invalid entries #6198

Comments

pradyunsg commented Jan 25, 2019

cjerdonek commented Jan 25, 2019

zooba commented Feb 6, 2019

pfmoore commented Feb 6, 2019

zooba commented Feb 6, 2019

dstufft commented Feb 6, 2019

zooba commented Feb 6, 2019

dstufft commented Feb 8, 2019

zooba commented Feb 8, 2019

dstufft commented Feb 8, 2019