Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lock file support (Umbrella). #1400

Closed
jsirois opened this issue Aug 16, 2021 · 4 comments
Closed

Implement lock file support (Umbrella). #1400

jsirois opened this issue Aug 16, 2021 · 4 comments

Comments

@jsirois
Copy link
Member

jsirois commented Aug 16, 2021

Pex is generally used to produce an application binary and these traditionally get a lockfile in ecosystems that support them in order to allow reproducible builds of the binary at later dates. Even when Pex is used for other purposes, consumers - namely Pants - have a desire to be able to lock resolves done via Pex. This - in fact - is the primary motivation here.

For the motivating Pants case - and in general - Pex should be able to both produce a lock and consume one. The most robust lock would include all the information needed to exactly reproduce first (user source code via -D and runtime options encoded in PEX-INFO), second? (the Pex runtime packaged in .bootstrap/ and third party code (the distributions installed in .deps/). The motivating case from Pants only cares about third party locking which is the most complex case; so the rest just describes the requirements for that.

  1. The lock should contain enough information to ensure a resolve is bit for bit identical to a prior resolve using the same lock (on the same machine - see requirement 2).
  2. There should be a mode to produce a lock that works on any machine and and under any Python compatible with the --python / --interpreter-constraint / --platform combo used to build the Pex given a single random Pex-compatible interpreter.
  3. The lock file should be external to a PEX file so it can be saved separately (If you have a PEX file in hand already - it is a locked resolve and can just be copied to reproduce its resolve!).

That's about it for hard requirements. It probably makes sense for the lock to be human readable, but that's clearly not required in any way. All known prior art does this though (Cargo, npm, Pipfile, Poetry, PDM, ...). It probably makes sense to keep the nascent PEP-665 in mind. All that's actually needed though is to output a requirements file using --hash and appropriate environment markers to achieve 1-3 above *. Pip will then do the rest and ensure all resolved dists are bit-for-bit identical.

* 1 is actually not achievable in a guaranteed way once you allow 2 fwict. 1 should nearly always still hold, but technically, you could have a resolve given a certain version of Pip one day and then a new version of Pip the next and as long as the new version only includes distributions locked by the 1st version of Pip, the resolve will complete successfully. That new resolve though could include more or less or just plain different subsets of the 2-style lockfile though and you have no way of knowing except by comparing the results of the resolve by hand. To underscore the issue - you don't even need two version of Pip, you could just have certain dists deleted from PyPI between resolves 1 and 2 and the 1st time you get a platform specific wheel for lxml and the next time you get the slower - different code - pure python lxml - say.

@jsirois
Copy link
Member Author

jsirois commented Aug 16, 2021

Some gotchas to avoid in either the initial implementation or with follow-ups:

  1. Bifurcated resolves: The poetry export (and maybe lock) command does not handle dependencies with "bifurcated requirements". python-poetry/poetry#4381
  2. Environment marker explosion: Cannot add "pyobjc~=7.2; sys_platform == 'darwin'" pdm-project/pdm#449

@jsirois
Copy link
Member Author

jsirois commented Aug 16, 2021

There is one known impossible to handle case: when a #2-style ("platform agnostic") resolve needs to traverse an sdist. The sdist may require being (partially) built to extract python version and dependency metadata (e.g.: executing python setup.py egg_info). If the python needed by the setup.py does not exist on the machine generating the lock the lock must fail. This should be a rare problem since it seems ~all modern sdists already have PKG-INFO in them which contains that metadata and can simply be read.

@jsirois
Copy link
Member Author

jsirois commented Aug 17, 2021

This work seems separable into the following task graph:

  1. Platform dependent locks with requirements.txt compatible output: Implement platform specific requirements.txt lock file support. #1401
  2. a. Platform agnostic locks with requirements.txt compatible output: Implement platform agnostic requirements.txt lock file support. #1402 | b. PEP-665 compatible output: Consider implementing PEP-665 lock file format support. #1403
  3. a. Bifurcated resolve handling: Implement Bifurcated Resolve Handling #1404 | b. Environment marker explosion handling: Handle Platform Agnostic Lock Environment Marker Explosion #1405

Of these tasks, Pants only needs 1, 2a and 3a *.

* Afaict Pants doesn't actually need all of 3a, it needs a subset of 3a where the bifurcation is in the top-level requirements themselves, not in interior nodes. That said, handling all of 3a is needed for correctness in all locks so it should probably be implemented fully.

@jsirois
Copy link
Member Author

jsirois commented Jul 14, 2024

I think with the addition of --project in #2455 released in https://github.com/pex-tool/pex/releases/tag/v2.8.0, this issue and the associated project can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

1 participant