Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendoring Python Wheels as Artifacts #1439

Open
staticfloat opened this issue Oct 10, 2019 · 2 comments
Open

Vendoring Python Wheels as Artifacts #1439

staticfloat opened this issue Oct 10, 2019 · 2 comments

Comments

@staticfloat
Copy link
Sponsor Member

We have a pretty good python interop story, but it lacks many of the reproducibility guarantees that Pkg3 has; in particular, when using python packages from the system, or even from Conda.jl, because the python dependencies are managed separately from the Julia dependencies, it is possible for Julia and python packages to get out of sync and break. To resolve this, I propose a technique for creating python "virtual environments" for more fully controlled python installations.

System design constraints:

  • pkg> instantiate should "just work". No matter how much time has passed, you must be able to get back the same python packages as you had before, so that PyCall, IJulia, etc... can all "just work" far off into the future, no matter how much breaking progress the python ecosystem experiences.

  • Upgrading Python packages should be simple. Not via pkg> upgrade, but through some relatively simple mechanism.

  • Isolation from the system. System python packages should not interfere or aid in these packages at all.

Reading this list of design constraints, you might think that this sounds an awful lot like what I've been working on towards JLL packages/Pkg Artifacts, and you would be correct. At least I'm consistent in the kinds of ideas I come up with. Since Artifacts are the 'marteau du jour', as it were, let's recklessly apply them here and see what kind of a system we can create:

  • Bundle a python interpreter as an artifact, e.g. Python_jll. Not too difficult.

  • Translate python packages into artifacts. something like translate_py_pkg(name::String, version = nothing) would hit PyPI's JSON API for a listing of versions, generate an Artifacts.toml entry for that python package by downloading, extracting and tree-hashing the python package.

    • Pure-source python packages are usually tarballs
    • Wheels are zipballs (we'll need .zip support for this....)
    • Explicitly do not support any kind of python package that is not pure-source and is not a wheel. Anything else probably requires arbitrary code execution upon download.
  • Once python packages are being downloaded as artifacts, we set PYTHONPATH appropriately before loading libPython or invoking python, so that these packages are being found properly.

  • Future invocations of the Julia package manager will see these binary blobs that are attached to the current project, and will properly re-instantiate them from PyPI.

There's some subtlety here related to the implicit Python compiler ABI. In particular, on Windows, they assume usage of MSVC, which is fine, except when you start compiling C++ code. It's highly unlikely that Python wheels that contain C++ code will link properly to Julia. This has never and probably will never worked though, so we don't lose that much here. C and FORTRAN code should work together just fine, so we should be okay in 95% of what we want to do, and if you want to do something more complicated, you can always just spin up a Python interpreter compiled properly and communicate over a socket.

@kdheepak
Copy link

This is very interesting and could be extremely useful in ensuring that we have reproducible scientific code when using Python.

Have you considered using conda in addition to wheels? There are numerous conda packages that do not have not had a wheel equivalent. Historically at least, conda has better supported scientific computing packages compared to PyPi. However, wheels have made this a lot better for users in the recent years. Conda appears to have a similar API as well, but I'm not sure if you need to bundle a CondaPython_jll instead.

Also, I'm not sure how exactly wheels deal with non Python dependencies. My understanding is that they vendor all dependencies in the wheel itself. So in a situation where, say in Julia one wanted to interface with a Python package that uses the C API to interface with native libraries depend on Boost or ZeroMQ or other non Python libraries; are you suggesting building those non Python dependencies as separate dependency_jll files or downloading them from the wheel directly?

@tkf
Copy link
Member

tkf commented Oct 17, 2019

Note that a subset of "System design constraints" is already possible with combination of pipenv and PYCALL_JL_RUNTIME_PYTHON: JuliaPy/PyCall.jl#578. However, this does not let us change or record the exact version of libpython.

For PyCall and its downstreams, a fundamental building block we need is the package options to configure libpython for each Julia environment: #458, JuliaLang/Juleps#38 (see also my proof-of-concept implementation here: #1378).

  • Translate python packages into artifacts.

pip does a very good job of caching wheels across different environments. IIUC it is using content-addressable storage just like Artifacts. It'd be unfortunate that pip and Julia duplicate the cache and waste download time. Why not just use pip download API?

  • Once python packages are being downloaded as artifacts, we set PYTHONPATH appropriately

Why not use venv? Tweaking PYTHONPATH is not a good practice.

Also, I think one step missing is resolution of Python package dependencies. Dependency resolution for Python packages is a hard problem because (IIRC) there is no central repository recording the entire dependency graph; you have to download the package to figure out its dependency. Re-implementing this sounds like a lot of duplicated effort.

I think installing Python with Python_jll is a good idea. But, IMHO, a better direction for Python packages would be an integration with a Python package manager like Pipenv and Poetry which have a very similar interface as Pkg.jl. Adding some kind of hooks to Pkg.jl operations sounds like a better solution. See also: Structured, Exchangeable lock file format (requirements.txt 2.0?) - Packaging - Discussions on Python.org

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants