Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.10.4 with new build strategy #43

Merged
merged 49 commits into from
Sep 16, 2022
Merged

v0.10.4 with new build strategy #43

merged 49 commits into from
Sep 16, 2022

Conversation

ngam
Copy link
Contributor

@ngam ngam commented Aug 28, 2022

fixes #38

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@ngam
Copy link
Contributor Author

ngam commented Aug 28, 2022

@conda-forge-admin, please rerender.

@ngam
Copy link
Contributor Author

ngam commented Aug 28, 2022

@mkitti and @MilesCranmer this works as expected. But have a look inside the logs...


DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/MultivariatePolynomials/1bIGc/test/utils.jl.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/StructArrays/w2GaP/src/structarray.jl.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/InverseFunctions/clEOM/src/test.jl.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/Zygote/qGFGD/docs/src/internals.md.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/IRTools/017wp/src/passes/relooper.jl.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/StaticArraysCore/dagUH/Project.toml.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/Transducers/HBMTc/benchmark/bench_sum_transpose.jl.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/StaticArrays/68nRv/src/MMatrix.jl.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/Bijections/IWrOY/.travis.yml.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/PyCall/ygXW2/src/startup.jl.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/clones/418375808345381708/refs/tags/v0.3.3.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/JSON/NeJ9k/test/parser/null.jl.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/StatsAPI/y7Ydc/src/regressionmodel.jl.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/StatsBase/XgjIN/docs/src/deviation.md.  Including it as-is.
DEBUG:conda_build.noarch_python:Don't know how to handle file: share/julia/packages/Metatheory/CduBp/src/Library.jl.  Including it as-is.

also @ocefpaf

@ngam
Copy link
Contributor Author

ngam commented Aug 28, 2022

This is obviously just a draft. I will want to completely tweak this so that we do it more properly. Some issues to address:

  • no longer noarch
  • how reliable are the packaged julia artifacts?

@ngam ngam mentioned this pull request Aug 28, 2022
10 tasks
@ngam
Copy link
Contributor Author

ngam commented Aug 29, 2022

@mkitti pls see how it is structured: if you download the artifacts from here https://artprodeus21.artifacts.visualstudio.com/A910fa339-c7c2-46e8-a579-7ea247548706/84710dde-1620-425b-80d0-4cf5baca359d/_apis/artifact/cGlwZWxpbmVhcnRpZmFjdDovL2NvbmRhLWZvcmdlL3Byb2plY3RJZC84NDcxMGRkZS0xNjIwLTQyNWItODBkMC00Y2Y1YmFjYTM1OWQvYnVpbGRJZC81NTk1ODkvYXJ0aWZhY3ROYW1lL2NvbmRhX2FydGlmYWN0c18yMDIyMDgyOC42LjFfbGludXhfNjRf0/content?format=zip then you will find this package under "broken", then you can unzip it and see how it got structured

The test still fails with pycall not being installed correctly....

@mkitti
Copy link
Contributor

mkitti commented Aug 29, 2022

The depot we package should only contain two directories.

  1. packages
  2. artifacts

That's why I created a temporary directory for the depot and then only copied those two directories.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe) and found some lint.

Here's what I've got...

For recipe:

  • Non noarch packages should have python requirement without any version constraints.
  • Non noarch packages should have python requirement without any version constraints.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

conda-forge-webservices[bot] and others added 2 commits August 29, 2022 17:14
Copy link
Contributor

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add ${PREFIX}/share/SymbolicRegression.jl/depot to the JULIA_DEPOT_PATH environment variable with : separators.

recipe/build.sh Outdated
@@ -8,6 +8,6 @@ ${PYTHON} -c 'import pysr; pysr.install()'

ls ${PREFIX}/share/julia

rm -rf ${PREFIX}/share/julia/!(packages|artifacts)
rm -rf '${PREFIX}/share/julia/!(packages|artifacts)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line does not seem to do what you want.

+ rm -rf '${PREFIX}/share/julia/!(packages|artifacts)'
+ ls /home/conda/feedstock_root/build_artifacts/pysr_1661794169440/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/share/julia
registries
scratchspaces
stdlib
test
artifacts
base
base.cache
clones
compiled
conda
environments
julia-config.jl
logs
packages
prefs
registries
scratchspaces
stdlib
test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rm -rf '${PREFIX}/share/julia/!(packages|artifacts)'
mkdir -p "${PREFIX}/share/SymbolicRegression.jl/depot"
mv "${PREFIX}/share/julia/packages" "${PREFIX}/share/SymbolicRegression.jl/depot/packages"
mv "${PREFIX}/share/julia/artifacts" "${PREFIX}/share/SymbolicRegression.jl/depot/artifacts"
rm -rf "${PREFIX}/share/julia"

This probably should be moved into a SymbolicRegression.jl-feedstock, and we should probably try to have an independent PyCall.jl-feedstock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should be moved into a SymbolicRegression.jl-feedstock, and we should probably try to have an independent PyCall.jl-feedstock.
Yeah... but that defeats the point right?

Okay, I understand what you're getting at, but I want to make it work without having to create an alternative depot. We already have a predefined depot. The larger problem here is, Pkg was designed for "projects" but our whole setup is for "envs", while both are interchangeable to an extent, they're quite different in how they're fundamentally set up.

My question is, why is it not working now? What are we missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Julia depots stack just like Julia environments stack. The mechanism is used for system installed packages, which is kind of what we are doing.

We probably need the environments folder: share/julia/environments/pysr-0.10.1 and/or need to activate that environment.

Basically we need to satisfy
https://github.com/MilesCranmer/PySR/blob/c3dc203265816db4df8d38e7ea29419eafdb2778/pysr/julia_helpers.py#L87

The last expression here should return true.

    from julia.core import JuliaInfo, UnsupportedPythonError
    info = JuliaInfo.load(julia="julia")
    info.is_pycall_built()

is_pycall_built is from here:
https://github.com/JuliaPy/pyjulia/blob/87c669e2729f9743fe2ab39320ec9b91c9300a96/src/julia/juliainfo.py#L134-L135

Copy link
Contributor

@mkitti mkitti Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is a share/julia/conda which is important.

$ ls -R share/julia/conda
share/julia/conda:
3  deps.jl

share/julia/conda/3:

$ cat share/julia/conda/deps.jl 
const ROOTENV = "/home/mkitti/mambaforge/envs/pysr/share/julia/conda/3"
const MINICONDA_VERSION = "3"
const USE_MINIFORGE = true
const CONDA_EXE = "/home/mkitti/mambaforge/envs/pysr/share/julia/conda/3/bin/conda"

Without that folder, then PyCall.jl cannot compile because Conda.jl cannot compile.

julia> using PyCall
[ Info: Precompiling PyCall [438e738f-606a-5dbb-bf0a-cddfbfd45ab0]
ERROR: LoadError: ArgumentError: Path to conda environment is not valid: /home/mkitti/mambaforge/envs/pysr/share/julia/conda/3
Stacktrace:
 [1] prefix(path::String)
   @ Conda ~/mambaforge/envs/pysr/share/pysr_depot/packages/Conda/x2UxR/src/Conda.jl:42
 [2] top-level scope
   @ ~/mambaforge/envs/pysr/share/pysr_depot/packages/Conda/x2UxR/src/Conda.jl:47
 [3] include
   @ ./Base.jl:418 [inlined]
 [4] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::String)
   @ Base ./loading.jl:1318
 [5] top-level scope
   @ none:1
 [6] eval
   @ ./boot.jl:373 [inlined]
 [7] eval(x::Expr)
   @ Base.MainInclude ./client.jl:453
 [8] top-level scope
   @ none:1
in expression starting at /home/mkitti/mambaforge/envs/pysr/share/pysr_depot/packages/Conda/x2UxR/src/Conda.jl:1
ERROR: LoadError: Failed to precompile Conda [8f4d0f93-b110-5947-807f-2305c1781a2d] to /home/mkitti/mambaforge/envs/pysr/share/julia/compiled/v1.7/Conda/jl_gZtz9C.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, ignore_loaded_modules::Bool)
    @ Base ./loading.jl:1466
  [3] compilecache(pkg::Base.PkgId, path::String)
    @ Base ./loading.jl:1410
  [4] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:1120
  [5] require(uuidkey::Base.PkgId)
    @ Base ./loading.jl:1013
  [6] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:997
  [7] include
    @ ./Base.jl:418 [inlined]
  [8] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
    @ Base ./loading.jl:1318
  [9] top-level scope
    @ none:1
 [10] eval
    @ ./boot.jl:373 [inlined]
 [11] eval(x::Expr)
    @ Base.MainInclude ./client.jl:453
 [12] top-level scope
    @ none:1
in expression starting at /home/mkitti/mambaforge/envs/pysr/share/pysr_depot/packages/PyCall/ygXW2/src/PyCall.jl:1
ERROR: Failed to precompile PyCall [438e738f-606a-5dbb-bf0a-cddfbfd45ab0] to /home/mkitti/mambaforge/envs/pysr/share/julia/compiled/v1.7/PyCall/jl_QiBAvQ.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, ignore_loaded_modules::Bool)
   @ Base ./loading.jl:1466
 [3] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1410
 [4] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1120
 [5] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:1013
 [6] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:997

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That deps.jl is without our recent activate.sh modifications. With the modifications, we should end up with the following, which looks better.

const ROOTENV = "/home/mkitti/mambaforge/envs/pysr"
const MINICONDA_VERSION = "3"
const USE_MINIFORGE = true
const CONDA_EXE = "/home/mkitti/mambaforge/bin/conda"

recipe/build.sh Outdated

rm -rf '${PREFIX}/share/julia/!(packages|artifacts)'

ls ${PREFIX}/share/julia
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ls ${PREFIX}/share/julia
ls ${PREFIX}/share/SymbolicRegression.jl/depot

@@ -0,0 +1 @@
export JULIA_DEPOT_PATH="${JULIA_DEPOT_PATH}:${PREFIX}/share/SymbolicRegression.jl/depot"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appending the depot is not going to do anything if we do not put anything there.

(pysr) $ ls share/julia
base  base.cache  environments  julia-config.jl  logs  stdlib  test

(pysr) $ ls share/SymbolicRegression.jl/depot
artifacts  conda  environments  packages

The one thing we may need to do is add PyCall.jl to the environment. That is

julia -e "using Pkg; Pkg.add(\"PyCall\")"

This should not download anything. It should just add PyCall to Project.toml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add PyCall.jl to the @pysr-0.10.1 environment instead of the default one named after the conda environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I got confused. I was looking for share/SymbolicRegression.jl/depot but I couldn't find it, so I was like I might as well add it for now.

But this still doesn't answer my question... or maybe adding PyCall is the answer? I thought it was already added? I will add it now and test. I will also add you to my fork

@mkitti
Copy link
Contributor

mkitti commented Aug 29, 2022

How do you want to populate the new depot? I don't see code to do that at the moment.

@ngam
Copy link
Contributor Author

ngam commented Aug 29, 2022

How do you want to populate the new depot? I don't see code to do that at the moment.

For now, I am simply populating the main julia depot, i.e. under share/julia. Or am I misunderstanding things? It seems to be working insofar it is packaging everything. There is a minor issue of cleaning this up eventually, but that's a different issue. I want to try to get it to work. From the standpoint of conda-forge, it doesn't matter where the depot is as long as it is under $PREFIX. The only thing I want to know is to have julia, etc be aware of all these things we are packaging.

My worry is that in the packaging part, a lot of patching and editing is happening and corrupting these packaged julia entities

@ngam
Copy link
Contributor Author

ngam commented Aug 29, 2022

e.g.

INFO:conda_build.build:Packaging pysr
Packaging pysr-0.10.1-py310hff52083_0
INFO:conda_build.build:Packaging pysr-0.10.1-py310hff52083_0
compiling .pyc files...
number of files: 4069
WARNING :: get_rpaths_raw()=[] and patchelf=['$ORIGIN'] disagree for /home/conda/feedstock_root/build_artifacts/pysr_1661815576484/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/share/julia/artifacts/f3bb2ffc2ce484352a9fabe39f3ac6516c14f259/lib/libLLVMExtra-13.so :: 
patchelf: open: Permission denied
WARNING :: get_rpaths_raw()=[] and patchelf=[''] disagree for /home/conda/feedstock_root/build_artifacts/pysr_1661815576484/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/share/julia/artifacts/abf4b5086b4eb867021118c85b2cc11a15b764a9/lib/libopenspecfun.so.1.4 :: 
patchelf: open: Permission denied
WARNING :: Failed to get_static_lib_exports(/home/conda/feedstock_root/build_artifacts/pysr_1661815576484/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/lib/libbz2.a)
Unknown format

@ngam
Copy link
Contributor Author

ngam commented Aug 29, 2022

I don't see the difference between creating a new depot and keeping the default depot. Maybe I am not following your logic.

@ngam
Copy link
Contributor Author

ngam commented Aug 29, 2022

PyCall is definitely under $PREFIX, but the check doesn't seem to see it...

share/julia/compiled/v1.8/PyCall/GkzkC_AoYtv.ji (binary): Patching

@mkitti
Copy link
Contributor

mkitti commented Sep 12, 2022

Why can this not be done in an activate script?

You want to run the Julia package manager every time someone activates an environment?

Perhaps you mean a post-link script?

In that case, I'm taking the advice here from here:
https://docs.conda.io/projects/conda-build/en/latest/resources/link-scripts.html

Post-link and pre-unlink scripts should:

  • Be avoided whenever possible.
  • Not touch anything other than the files being installed.
  • Not write anything to stdout or stderr, unless an error occurs.
  • Not depend on any installed or to-be-installed conda packages.
  • Depend only on simple system tools such as rm, cp, mv, and ln.

If this doesn't apply, we could certainly take that approach as we started to in #41.

@isuruf
Copy link
Member

isuruf commented Sep 12, 2022

You want to run the Julia package manager every time someone activates an environment?

If the package manager has not been run yet in that environment.
Post-link scripts are for only one conda package and it doesn't give an environment overview.

@cjdoris
Copy link

cjdoris commented Sep 12, 2022

What guarantees that Pkg will use bundled versions of the packages? If one of the dependencies gets updated in the future, then Julia will end up installing it from the internet instead of from the stacked depot, right? And if you're OK with that, then what was the point of bundling the dependencies in the first place?

@cjdoris
Copy link

cjdoris commented Sep 12, 2022

Also how do you deal with packages that may have incompatible mutual dependencies? e.g. if you release packages A and B, which depend on package C, how do you ensure that C is compatible with both versions of A and B that are installed? The logical answer seems to be that you need C to be a Conda package too. The conclusion of which is that you need to release all Julia packages you depend on as individual Conda packages, if you want to ensure compatible versions. This seems like a huge undertaking, similar in scope to conda-forge itself.

@MilesCranmer MilesCranmer mentioned this pull request Sep 12, 2022
3 tasks
@mkitti
Copy link
Contributor

mkitti commented Sep 12, 2022

Part of the point here is to deliver the PySR software as ready-to-use in a solution that has been fully tested by conda-forge. By invoking the Julia package manager during conda package build time we can control the versions included, pinning the dependencies as needed. The status quo that exists before this pull request is that the user has to take additional deliberate steps to make this package function. As such, Julia package management is lies completely out side of conda-forge's control. The specific configuration that the user ends up with may never has been tested together.

I concur that packaging individual Julia packages would be the ideal. To support PySR, this would require adding ~100 individual packages. That's a tremendous barrier to starting though. What I'm trying to sort out here is a way to get started while delivering a practical solution.

What guarantees that Pkg will use bundled versions of the packages? If one of the dependencies gets updated in the future, then Julia will end up installing it from the internet instead of from the stacked depot, right? And if you're OK with that, then what was the point of bundling the dependencies in the first place?

We package a complete environment with a Project.toml and Manifest.toml, so that specifies the exact package versions to use. That complete environment is the default used by PySR and it is "installed" after conda finishes. As time goes on this package will be rebuilt with new Julia dependencies. The point of bundling the dependencies in the first place is that conda and conda-forge are now managing those files. The bundled solution has been tested as part of the CI here. There is some appeal in that. In my understanding, this is what conda-forge is about and what separates from just using conda and any other channel. conda-forge aims to deliver a completely tested solution built from source.

Also how do you deal with packages that may have incompatible mutual dependencies? e.g. if you release packages A and B, which depend on package C, how do you ensure that C is compatible with both versions of A and B that are installed? The logical answer seems to be that you need C to be a Conda package too. The conclusion of which is that you need to release all Julia packages you depend on as individual Conda packages, if you want to ensure compatible versions. This seems like a huge undertaking, similar in scope to conda-forge itself.

I discussed this above. We are currently addressing a N = 2 situation. The other conda package is https://github.com/conda-forge/xbitinfo-feedstock . By bundling the packages at build time, we have a chance to reconcile the package versions. I've done the analysis in this case and we only need to pin two package versions, Compat.jl and DocStringExtensions.jl. All other packages are compatible at the latest versions.

The next targets for packaging are those at the intersection of these two packages.

Let's consider another question for a second. Why should conda-forge package pure Python or R packages? Why not leave that to pip or poetry? I think this is exactly analogous to the Julia package situation. If I can just use the Julia package manager in an activate script, why not use pip in an activate script? That might work in another conda channel, but my understanding is that is not how conda-forge works. Maybe I'm wrong.

@mkitti
Copy link
Contributor

mkitti commented Sep 12, 2022

@ngam I bumped this to version v0.11.0 and build 1, anticipating that #47 will be merged before this one. Please update and resolve conflicts as needed.

@ngam
Copy link
Contributor Author

ngam commented Sep 13, 2022

Should we rebase and start a new PR, @isuruf and @cjdoris?

I just want to add one point to the discussion. In my mind, a super important part of conda-forge is relocatability. Let me illustrate this point. Both @MilesCranmer and I are actually affiliated with the same academic institution where our HPC compute nodes have no internet access (login nodes do). Say I want to use this great and powerful package (pysr). I get it from conda-forge on my login node and then I get an allocation to run some interesting code on the compute nodes (our HPC uses slurm). But, it will give me a silent error about not having the necessary deps, therefore wasting my time and causing confusion (now, think of an average user who thinks it will just work, not us maintainers and developers).

For me personally, this is my main motivation here. Every time conda create -n some_env some_pkg ... gets triggered, I want some_env to be able to just work without any further modification. Not only that, I also want it to work across login nodes and compute nodes. I think this is one of the greatest and best things about conda-forge. Without this PR, this package (and others like it) require the user to do an extra step. I simply don't think that extra step should exist.

If I understand correctly, the user can simply do another round of pysr.install() if they want right? Or they can just do whatever they want using Pkg as well. Here, we simply want to get the user something that just works out-of-the-box because we could and should.

Edit: this relocatability argument extends to making containers as well.

@ngam
Copy link
Contributor Author

ngam commented Sep 13, 2022

Why can this not be done in an activate script?

An activate script only moves the problem around by a tiny bit and doesn't seem to be recommended in this case; moreover, see my long-running obsession with login vs compute nodes: #43 (comment)

@ngam
Copy link
Contributor Author

ngam commented Sep 13, 2022

Let's consider another question for a second. Why should conda-forge package pure Python or R packages? Why not leave that to pip or poetry? I think this is exactly analogous to the Julia package situation. If I can just use the Julia package manager in an activate script, why not use pip in an activate script? That might work in another conda channel, but my understanding is that is not how conda-forge works. Maybe I'm wrong.

Exactly. We have really good tools and we should use them to great effect. Otherwise, just use pip or clone the repo, etc. --- we are not just another pip because there is no need; pip is great at its thing

@cjdoris
Copy link

cjdoris commented Sep 14, 2022

To clear up any confusion, I am already pro this proposal. Having one package manager be able to resolve both Python and Julia (and other) dependencies for us sounds great. I'm just trying to figure out how it works in practice and if it is scalable to many packages.

We package a complete environment with a Project.toml and Manifest.toml, so that specifies the exact package versions to use. That complete environment is the default used by PySR and it is "installed" after conda finishes. As time goes on this package will be rebuilt with new Julia dependencies.

Is this literally the Manifest created when you build PySR? What happens when we use Conda to install another Julia package with its own Manifest? It's not possible to instantiate two Manifests, do you merge them somehow?

A few smaller questions:

  • As discussed, all Julia dependencies will ultimately need to be added, namely 100s or 1000s of packages. Are there plans/ambitions to automate this, otherwise it is a huge undertaking.
  • Currently you package all packages/artifacts in the depot. But if (in the future) all Julia dependencies are their own Conda packages, you only actually need whichever packages/artifacts are new in the depot.
  • Is there a reason you don't install the packages/artifacts into the existing depot? The Julia distro in conda-forge already creates its own depot in the Conda environment, and these packages/artifacts will have distinct paths, so this should work fine I believe. In a world where every Julia package is a Conda package, we probably don't want 100s of depots stacked up. Thinking about it, separate depots is probably necessary in the current setup just because there are a few common dependencies, so there would indeed be a few file clashes, but this problem would go away in a world where every Julia package is a Conda package.

@cjdoris
Copy link

cjdoris commented Sep 14, 2022

Thinking out loud, just as a point of comparison, there is also my JuliaPkg Python package for managing Julia dependencies. This is the default way that JuliaCall manages its dependencies. It basically provides a way for a Python package to declare any Julia dependencies it needs, and then JuliaPkg will ensure those dependencies are met.

Some pros of JuliaPkg compared to the conda-forge proposal:

  • It's very simple, just a pure-python package, which doesn't require Conda.
  • It is based on Pkg, Julia's package manager, so should be very reliable and non-surprising.
  • There is no separate registry of packages required (whereas the conda-forge way requires registering all Julia packages again).

Some cons:

  • It IS possible to get into an inconsistent state using JuliaPkg, because two Python packages may declare incompatible Julia dependencies. But this will raise an error when dependencies are resolved, leaving it to the user to fix things (unlike say Pip which can just install an inconsistent set of packages) so at least it is not a silent error.
    • However I don't anticipate this will be a problem very often in practice because JuliaPkg/JuliaCall is mostly intended for writing wrapper packages, which will only normally have one dependency (the Julia package being wrapped).
    • It can also be solved by creating Python packages such as juliapkg-SymbolicRegression which just declare a JuliaPkg dependency on SymbolicRegression at a particular version. Then the Python package manager can ensure all the Julia packages are compatible too. This is a similar complexity of effort to creating a Conda package for every Julia package.
  • There is a post-install step (juliapkg.resolve()) to resolve/install Julia dependencies, whereas the conda-forge way has everything already installed.
  • If a Python package using JuliaPkg depends on some Julia package which itself depends on some Python package, then the Python->Python dependency is not visible to the Python package manager, so will be missed, which will cause some runtime error. Similarly for Julia->Python->Julia dependency chains. The Conda way does not have this issue.
    • Again, it can be solved by having Python packages for each Julia package, because then these can also declare their own Python dependencies.

So if we instead just used JuliaPkg everywhere, with the addition of juliapkg-X Python packages for every Julia package X to declare a dependency on X at a specific version, then we get something very similar to the current proposal. This has one key down-side, namely there is a post-install step (juliapkg.resolve()) required before severing the internet connection. But a large up-side is that this works in any Python package manager (Pip/Poetry/Python/etc).

@mkitti
Copy link
Contributor

mkitti commented Sep 14, 2022

@conda-forge-admin, please rerender

@mkitti
Copy link
Contributor

mkitti commented Sep 14, 2022

Is this literally the Manifest created when you build PySR? What happens when we use Conda to install another Julia package with its own Manifest? It's not possible to instantiate two Manifests, do you merge them somehow?

Yes, this is the manifest we used to build. We put it in a dedicated environment called @pysr-{version} which is the default environment that PySR will try to use. The other Python package also creates it's own environment. Thus we don't actually merge them on the user's machine.

The current plan is to reconcile the two environments by pinning packages while building the conda-forge packages such that the Julia packages are at the same version. At the moment, this is only requires pinning two packages.

  • As discussed, all Julia dependencies will ultimately need to be added, namely 100s or 1000s of packages. Are there plans/ambitions to automate this, otherwise it is a huge undertaking.

Ultimately, I think automation is the key. conda-forge seems to have a tradition of automating as much as possible with Github bots, so there is precedence. I'm just not sure if conda-forge is ready for me to stage 100 packages at once. That's why I'm taking a more evolutionary approach where we start with a small number of packages with larger bundles and then focus on the intersection of the current packages. PyCall.jl and PythonCall.jl would definitely be one of the first Julia packages I think.

  • Is there a reason you don't install the packages/artifacts into the existing depot? The Julia distro in conda-forge already creates its own depot in the Conda environment, and these packages/artifacts will have distinct paths, so this should work fine I believe. In a world where every Julia package is a Conda package, we probably don't want 100s of depots stacked up. Thinking about it, separate depots is probably necessary in the current setup just because there are a few common dependencies, so there would indeed be a few file clashes, but this problem would go away in a world where every Julia package is a Conda package.

The existing depot, which we moved from ~/.julia to $CONDA_PREFIX/share/julia is still under the user's control. Given the lack of Julia packages in conda-forge, I would expect the user to have installed some Julia packages directly rather than through conda. Rather than deal with a potential collision of files, Julia's depot stacking allows us to bypass this issue entirely. Separation of "user" and "system" installed packages is the intended use case of depot stacking. In this case, conda-forge is playing the role of the "system".

Earlier I had proposed moving towards a common conda-forge managed depot, but this requires us to eliminate the overlap in packages. In principle this is possible, but it will require additional conda packages to implement.

Thinking out loud, just as a point of comparison, there is also my JuliaPkg Python package for managing Julia dependencies. This is the default way that JuliaCall manages its dependencies. It basically provides a way for a Python package to declare any Julia dependencies it needs, and then JuliaPkg will ensure those dependencies are met.

This is a nice approach. Conda-forge could use JuliaPkg at build time to deliver the initial package state. Thus, JuliaPkg is complementary to the approach taken here.

@cjdoris
Copy link

cjdoris commented Sep 14, 2022

Thanks this is all really helpful.

The current plan is to reconcile the two environments by pinning packages while building the conda-forge packages such that the Julia packages are at the same version. At the moment, this is only requires pinning two packages.

OK so you install one environment per Conda package, giving you a consistent set of Julia packages. Is there a mechanism to merge these into a single environment, so that it's possible to access a bunch of unrelated Julia packages in a single session? I think maybe that's what you meant in the quote above but I'm not sure. How does it work?

@mkitti
Copy link
Contributor

mkitti commented Sep 14, 2022

OK so you install one environment per Conda package, giving you a consistent set of Julia packages. Is there a mechanism to merge these into a single environment, so that it's possible to access a bunch of unrelated Julia packages in a single session? I think maybe that's what you meant in the quote above but I'm not sure. How does it work?

One could use environment stacking:

insert!(LOAD_PATH, 2, "@pysr-0.11.0")
insert!(LOAD_PATH, 2, "@xbitinfo-0.4.3")

See https://docs.julialang.org/en/v1/manual/code-loading/#Environments

That still requires us to ensure consistency of the dependencies. In the short term, this means pinning packages in the depots. In the long term, we split out the mutual dependencies into their own packages

@MilesCranmer
Copy link
Contributor

Is this PR good to go?

I tried to unpin Julia in #50 but having some issues (which I assume are solved in this PR) - so would be great if I could merge this.

Cheers,
Miles

@mkitti
Copy link
Contributor

mkitti commented Sep 15, 2022

My current plan is to attend the conda-forge core meeting next week to discuss this.

That said, I don't think the world will end if you merged this. The worse case scenario is that we'll have to back out the changes later. Let me take a look at #50

@ngam
Copy link
Contributor Author

ngam commented Sep 16, 2022

This is your feedstock @MilesCranmer, feel free to merge

@MilesCranmer
Copy link
Contributor

Thanks, sounds good to me! I am eager to hear the results of the discussion.

Since it seems like the current package (even before recent versions, possibly) is failing on macOS, and this one seems to fix those issues, I will go ahead and merge now. I think it will also allow for us to study the stability of the build strategy which you could discuss at the meeting, too. Happy to back out the changes too, if a different strategy is decided on!

@MilesCranmer MilesCranmer merged commit c1325e8 into conda-forge:main Sep 16, 2022
@MilesCranmer
Copy link
Contributor

Just tested it locally - works perfectly! Super cool to not have to do any extra install steps.

Since this fixes the Manifest during build, it would be interesting in the future to think about packaging sysimages created by PackageCompiler.jl for reduced latency.

@cjdoris
Copy link

cjdoris commented Sep 16, 2022

Nice! I'd be grateful (@mkitti, @ngam) if you could keep me in the loop as this stuff develops.

@mkitti
Copy link
Contributor

mkitti commented Sep 16, 2022

Since this fixes the Manifest during build, it would be interesting in the future to think about packaging sysimages created by PackageCompiler.jl for reduced latency.

Shipping a system image is interesting although a fully optimized approach would still involve building the system image on the user's machine. I've noticed that custom system images often can take advantage of advanced features like AVX-512 if built on the user's machine.

Nice! I'd be grateful (@mkitti, @ngam) if you could keep me in the loop as this stuff develops.

We should take a look at replacing pyjulia with juliacall upstream. It would be nice if we were less dependent on environment variables for communication. For example, it would be nice to set the initial Julia environment via the JLOptions struct rather than JULIA_DEPOT_PATH environment variable.

The next feedstock that we may visit will be xbitinfo / BitInformation.jl. I would like to have the conversation with conda-forge core first though before proceeding further.
https://github.com/conda-forge/xbitinfo-feedstock

@ngam
Copy link
Contributor Author

ngam commented Sep 17, 2022

I would like to have the conversation with conda-forge core first though before proceeding further.

Thanks for doing this. This seems to be the quickest way to get things moving. I would attend with you but I am trying to remain anonymous lol, and I trust you are a better messenger anyway :)

@ngam
Copy link
Contributor Author

ngam commented Sep 17, 2022

Thinking out loud, while @mkitti talks to core, @cjdoris, we could possibly look into taking care of your packages with a similar setup. I need like at least a week to begin having sustained free time and clear mind though.

Eventually, we may want to create some sort of an automated tool to do the tricks on Pkg and you may be right person to lead that effort...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

let's improve this recipe (and set a standard for python packages calling julia packages)
7 participants