Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unified lockfile, pip interoperability #124

Merged
merged 62 commits into from
Nov 29, 2021

Conversation

jvansanten
Copy link
Contributor

@jvansanten jvansanten commented Nov 11, 2021

This PR does two major things that I have personally found useful:

  1. Adds support for pip dependencies (See Support solving for dependencies that are only installable via pip #122)
  2. Adds a structured, multi-platform lock file (See also Allow customization and extension of lockfile metadata headers #106, as well as New lock file format mamba-org/mamba#1209)

Along the way, it factors details of conda invocation and environment solving into dedicated submodules for easier reuse. This is particular important for the PyPI solver, which uses poetry internally, but could potentially be replaced with something else.

Point 2 conflicts with existing work by @maresb in #106, which I had missed when I started on this, and so may have unintentionally duplicated. I would be happy to merge these efforts, as it appears that we made different choices, both for good reasons. In the following I will try to explain my choices.

My long-term goal for this is to be able to automatically update conda environments (including pip reps) used in CI with Renovate. For that I need to be able to:

  1. Solve for pip dependencies layered on top of the conda env.
  2. Update the current solution to use a newer version of a single target package, without upgrading packages outside the dependency graph of the target.
  3. Install different subsets of dependencies without re-solving (e.g. development reps in CI, main deps in production).
  4. Be able to specify all configuration info (such as channels and target platforms) in the source files

Requirement 1 is addressed by including an (optional) pypi solver using poetry as described in #122.

Requirement 2 could be partially addressed using the existing env and explicit formats, but is significantly easier when it can be parsed in a structured way, especially if the parser has to be written again in TypeScript for Renovate. Requirement 3 implies an intermediate file containing all dependencies that can be filtered at install time.

Requirement 4 is addressed by adding support for some extra sections (platforms in environment.yaml, tool.conda-lock.platforms and tool.conda-lock.channels in pyproject.toml) in source files.

The new lock file is conda-lock.toml, for no better reason than that is straightforward to extract information from TOML files with awk. YAML would also be fine. The lock contents are mostly a flat list of packages of the form:

[[package]]
name = "photospline"
version = "2.0.7"
manager = "conda"
platform = "linux-64"
url = "https://conda.anaconda.org/conda-forge/linux-64/photospline-2.0.7-py39ha552708_0.tar.bz2"
hash = "a8fcc6d1c0e2525159bd83643b2e636c"
optional = false
category = "main"

[package.dependencies]
cfitsio = ">=3.470,<3.471.0a0"
libgcc-ng = ">=9.3.0"
libstdcxx-ng = ">=9.3.0"
numpy = ">=1.19.5,<2.0a0"
python = ">=3.9,<3.10.0a0"
python_abi = "3.9.* *_cp39"
suitesparse = ">=5.7.2,<6.0a0"

This unified lock file is created implicitly by every invocation of conda-lock lock. If -k explicit or -k env is specified, the contents are also converted into either explicit or env formats for use with conda env create. There is also a conda-lock render command that produces an explicit or env lock file for each platform, filtering by platform, optional, and category. conda-lock install -f conda-lock.toml internally renders for the current platform, then passes the rendered content on to conda-lock.

Finally, conda-lock lock --update TARGET extracts the previous solution from the lock file, uses it to populate the metadata of a fake conda env, and updates the target package and any dependencies that need to be bumped. The procedure is similar for updating pypi dependencies.

What do you think?

In the absence of an external interface to pip's resolver (see e.g.
pypa/pip#7819), this uses Poetry's
resolution logic to convert pip requirements from environment.yaml
to either transitive dependencies (in the case of env output) or
direct references (in the case of explicit output).  In explicit
mode these are emitted as comment lines that `conda-lock install`
can unpack and pass to `pip install` inside of the target environment.
Dependencies marked with `source = "pypi"` are delegated to the pip
section of the generated conda env; all others are assumed to be
available from conda channels.
Pip only for now; conda update support is slightly trickier
Signficantly faster and less memory-hungry than fetchting the entire
index, at least for conda-forge.
conda-lock now records its solution in conda-lock.toml, in a form
roughly inspired by poetry.lock.  Each entry has a platform and a
category (e.g.  "main" or "dev"), which allows you to extract a
solution for a target platform and extras set without re-solving.
This can be done either with `conda-lock render`, creating an
environment or explicit lockfile that can be installed with vanilla
conda, or `conda-lock install` to render and install in one go.
This makes it possible for `conda-lock` to take all its configuration
from the source file.
Do not attempt to combined locked dependencies for different
platforms, as they can legitimately have different versions (e.g.
libgfortran5 for linux-64 is 6 major versions ahead of osx-64).
Instead, allow exactly one platform, url, and hash per item.
conda_lock/src_parser/pyproject_toml.py Outdated Show resolved Hide resolved
conda_lock/src_parser/pyproject_toml.py Outdated Show resolved Hide resolved
conda_lock/src_parser/pyproject_toml.py Show resolved Hide resolved
conda_lock/conda_lock.py Outdated Show resolved Hide resolved
conda_lock/conda_lock.py Outdated Show resolved Hide resolved


@contextmanager
def fake_conda_environment(locked: Iterable[LockedDependency], platform: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat solution

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also potentially make use of $PREFIX/conda-meta/pinned (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-pkgs.html#preventing-packages-from-updating-pinning) if we want to prevent movement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten about explicit pinning. It looks like conda and micromamba can be convinced to apply minimal updates, but I did end up resorting to pinning to make minimal updates work with mamba. What I really want is a kind of advisory pinning, i.e. "do not update this unless it prevents you from updating the actual target," but that probably only exists deep in the libsolv configuration that mamba uses.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah thats probably a far future thing. Maybe a @wolfv question

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For weird reasons I sometimes cannot comment under a comment in github. Anyways, @mariusvniekerk I didn't quite get the

yeah thats probably a far future thing. Maybe a @wolfv question

What did you mean exactly? :)

I think that was referring to #124 (comment), which got chopped up between review rounds.

Yeah. Its not necceary for this change. There is probably some work needed on the mamba side to better support that for our purposes.

conda_lock/conda_solver.py Outdated Show resolved Hide resolved
conda_lock/invoke_conda.py Show resolved Hide resolved
conda_lock/pypi_solver.py Outdated Show resolved Hide resolved
conda_lock/pypi_solver.py Outdated Show resolved Hide resolved
@mariusvniekerk
Copy link
Collaborator

Tests are failing due to assumptions of a py39 stdlib. Python 3.7 is still under support for conda-forge so we have to keep that one at least as a minimal python version requirement.

@jvansanten
Copy link
Contributor Author

Tests are failing due to assumptions of a py39 stdlib. Python 3.7 is still under support for conda-forge so we have to keep that one at least as a minimal python version requirement.

Right, that was me being lazy. Will clean up.

@maresb
Copy link
Contributor

maresb commented Nov 14, 2021

This looks very interesting. I'm very interested in what @mariusvniekerk would see as a good way forward between our two PRs and the discussion of lockfile formats. Whatever the direction is, I hope we can make some rich metadata for the lockfile format. Would it make sense for me to finish out my PR?

@jvansanten
Copy link
Contributor Author

I'm reasonably happy with this now. It took a lot of incremental commits to get the tests to pass on Windows, so squashing is probably advisable.

This makes it possible for 3rd-party tools to find a single,
canonically-named lock file and use that to find the arbitrarily-named
source files it was created from.
in an attempt to get relative paths to work on Windows
PosixPath.resolve() happily returns an absolute path to a nonexistant
file if it has no parents, but WindowsPath.resolve() just stops if
strict=False.  This causes os.path.commonpath to choke on the
resulting relative path.
@mariusvniekerk
Copy link
Collaborator

🎆 Wow congratulations on slaying the beast

@maresb
Copy link
Contributor

maresb commented Nov 19, 2021

Ya, this looks extremely impressive.

@mariusvniekerk
Copy link
Collaborator

@jvansanten Feel free to squash whatever you want to squash. Going to go over this thing this afternoon.

conda_lock/src_parser/environment_yaml.py Show resolved Hide resolved
conda_lock/src_parser/lockfile.py Outdated Show resolved Hide resolved
conda_lock/conda_lock.py Outdated Show resolved Hide resolved
conda_lock/conda_lock.py Show resolved Hide resolved


@contextmanager
def fake_conda_environment(locked: Iterable[LockedDependency], platform: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah thats probably a far future thing. Maybe a @wolfv question

conda_lock/pypi_solver.py Outdated Show resolved Hide resolved
@wolfv
Copy link

wolfv commented Nov 23, 2021

For weird reasons I sometimes cannot comment under a comment in github. Anyways, @mariusvniekerk I didn't quite get the

yeah thats probably a far future thing. Maybe a @wolfv question

What did you mean exactly? :)

@jvansanten
Copy link
Contributor Author

For weird reasons I sometimes cannot comment under a comment in github. Anyways, @mariusvniekerk I didn't quite get the

yeah thats probably a far future thing. Maybe a @wolfv question

What did you mean exactly? :)

I think that was referring to #124 (comment), which got chopped up between review rounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants