Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support solving for dependencies that are only installable via pip #122

Closed
jvansanten opened this issue Nov 2, 2021 · 2 comments
Closed

Comments

@jvansanten
Copy link
Contributor

jvansanten commented Nov 2, 2021

The problem

I sometimes find myself in a situation where I would like to install the bulk of my dependencies via conda, since some of them are expensive-to-build extensions, but also have some dependencies that are only available on PyPI, or as source releases e.g. on GitHub. For example, this project depends on sncosmo (lots of Cython, takes ~15 minutes to compile from source in CI), but also on a random personal project whose name collides with an unrelated package on PyPI, and so would be impractical to upstream into conda-forge.

While I can always lock the conda deps and then pip install the remainder by hand, it would be attractive to be able to lock both in one go, such that dependencies installed in CI never change behind my back.

See also: #115

Sketch of a solution

  1. Split dependencies into conda and pip categories when parsing sources. For environment.yml the pip dependencies would be in the pip section, whereas for a poetry pyproject.toml, they would be those marked with {source = "pypi"}.
  2. Solve conda environment as usual.
  3. Solve pip dependencies based on the locked conda environment.
  4. Emit explicit pip dependencies into generated lock file. For environment.yml output these could be === version specifiers in the pip section, while for explicit output they could be in the form of a requirements.txt embedded in a commented-out section.

Prototype implementation

I've been playing with a prototype for personal use that you can find here. The biggest hurdle here is that there is no good candidate for a pip-style dependency resolver. pip's own solver is stashed in pip._internal, making it fairly clear that it is not intended to have any kind of stable interface. poetry has its own solver that also makes no claims about interface stability, but also doesn't go to any trouble to discourage 3rd-party use, so I decided to use it for now. The implementation largely follows the sketch above. The pip dependencies are emitted into explicit lock files prefixed with # pip , e.g.

# pip multidict @ https://files.pythonhosted.org/packages/20/ac/ed5b78f2fb5420882cff10fddccd9c2d1edda2ac155ada2fe21dc15d48d3/multidict-5.1.0-cp39-cp39-manylinux2014_x86_64.whl \
# pip      --hash=sha256:ecc771ab628ea281517e24fd2c52e8f31c41e66652d07599ad8818abaad38cda

conda-lock install consumes these lines, synthesises a temporary requirements.txt out of them, and pip installs them in the target env. Alternatively, you can just cat conda-linux-64.lock | awk '/^# pip/ {print substr($0,7)}' > requirements.txt && pip install --no-deps -r requirements.txt. Since these are URL dependencies, pip does not have to solve anything. The only way it can fail is if the target URL is unavailable, or the hash does not match. These are fairly easy to diagnose.

The introduced poetry dependency is optional; if poetry is not installed, pip dependencies are ignored, and the same warning is printed as before.

In addition, conda-lock lock gains an --update option that lets you update the solution in a previously generated lock file, updating only the specified packages and their dependencies. This is potentially useful for testing incremental dependency updates, e.g. those generated by Dependabot or Renovate.

Open questions

I'm mildly satisfied with the solution I've come up with, but would like to gather some feedback before filing a PR, namely:

  1. Is this a useful feature?
  2. Is an optional dependency on poetry a dealbreaker? Is there a better way to solve for pip dependencies?
@wietsedv
Copy link

wietsedv commented Dec 9, 2021

I would really love this feature to be released. I have actually made a poc Conda manager (https://github.com/wietsedv/coman), mainly for adding pip support. But I will abandon that project when conda-lock supports this.

I have three concerns that you might have addressed already (I did not look deeply into your PR).

  1. Relying on poetry is maybe not a perfect solution since poetry is more than just a dependency solver, but it would not be a dealbreaker for me. The obvious alternative would be https://github.com/jazzband/pip-tools. Not sure how poetry and pip-tools work under the hood exactly, so poetry might still be the best solution.
  2. How does this solution deal with version conflicts between pypi and conda? If for instance both a conda dependency requires package X and a pip dependency requires X, I guess only conda.X gets installed? But what if pypi.X and conda.X are different versions, either due to version constraints or just outdated conda? (conda-forge can lag behind pypi).
  3. If a pip dependency has a subdependency X, is an attempt made to install that package through conda? Or just pypi? Conda would of course have my preference.

I'm sorry if these concerns have been addressed already. Good work in any case.

@jvansanten
Copy link
Contributor Author

Thanks! The concerns you point out are only partially addressed in the PR. In the spirit of the rest of conda-lock I've tried to handle the 90% case, deferring (hopefully rare) edge cases to the future. Going point by point:

  1. I agree that poetry is a little heavy-weight; I simply picked it because I was moderately familiar with it and the interface appeared somewhat stable. In the implementation from Add unified lockfile, pip interoperability #124, solve_specs_for_arch calls a function solve_pypi that internally uses poetry if installed. I could imagine adding a version that instead uses pip-tools (or a possible pip-aware version of the mamba solver that I've tried to provoke @wolfv into writing), and allowing advanced users to specify which implementation to target, similarly to how they can choose between mainline conda, mamba, and micromamba now.
  2. You probably guessed the answer already: it doesn't deal with conflicts at all. Specifically, it deals with them similarly to the way that conda would if given an environment.yml with a pip section. The poetry solver gets a specification of all Python packages installed by conda, and adds any pip-specific deps (and their sub-deps, recursively). If the solution requires a version different from what conda solved for, it will be overwritten. One thing I did not notice until now is that dependency info is missing entirely: the poetry solver sees the python side of the conda environment, but does not get any information about the interdependencies, so would not even know if it did end up with a conflict. This is not great, but is possible to fix in a future release without changing the behavior in cases that are currently treated correctly.
  3. Sub-dependencies of pip dependencies will be installed with pip, as they would be in a direct conda install from an environment.yml with a pip section. As with conda, you can circumvent this for specific heavy-weight deps by adding them directly to the dependencies section of environment.yml, or the tool.conda-lock.dependencies section of pyproject.toml. In principle you could also imagine doing this iteratively: solve conda specs, solve pip, extract conda-packaged equivalents, solve conda again, solve pip, and so on until the conda solution converges. The tricky part is the stopping condition. For direct pip dependencies, the user has a way to communicate that a package has no conda equivalent, and so should be installed with pip. For sub dependencies, you would have to decide whether the conda solution has failed expectedly (you've hit a dependency that may or may not be packaged with conda) or unexpectedly (you have a genuine conflict). This seems hard to get right, and since I don't personally need this behavior (yet), I didn't try to implement it.

The only issues standing in the way of a release from my perspective are:

  • The lock file format. It's versioned, so is possible to change in the future, but it's always better not to have to. I think the discussion surrounding it (which, for reasons, has ended up in New lock file format mamba-org/mamba#1209) has nearly converged, though.
  • Poetry extra: right now conda-lock depends on poetry with no version constraint. For future-proofing, it would probably be better to explicitly depend on the 1.1 series in case interna break with the upcoming 1.2 release.

@mariusvniekerk do you see any other blockers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants