Improve the installation of `nnpdf` packages #1773

scarlehoff · 2023-07-11T07:47:31Z

Currently both validphys and n3fit are completely free of C++ dependence. In addition, the C++ part of the code is no longer used (some people are still using evolven3fit but it is no longer a requirement).

The only external dependency that is not available as a python package is lhapdf, which for the moment would need to be installed by the user. However, in some of the systems that we use to run there is either lhapdf available by default or an official package for your favourite linux (or macos) flavour so I think we can sidestep the issue for the moment.

So the idea would be to write a pyproject.toml file such that it:

installs n3fit
copies the data and theory db into the package data
installs all dependencies other than lhapdf and pandoc
installs validphys
then vp-get will create the share/NNPDF/ structure as needed

All libnnpdf, evolven3fit, etc, can be skipped. If anyone needs any part of it they can still use conda (for the time being we will have to maintain both) but this gives an extra option to people.

Some notes:

in principle the list of dependencies is not very different from conda https://github.com/NNPDF/nnpdf/blob/master/conda-recipe/meta.yaml only the C++-specific ones need to be dropped: swig, apfel, libarchive, etc.
python is not allowed to write to share during installation, the data and profile should go as a package-data (both in the conda installation and the pip package) and share would be for things that are "vp-downloaded". This is also more consistent with how other programs do it and also ensures that installation and data are always in sync as we intended.
since validphys needs pandoc we might want to start by having a package that only works for n3fit (which is what needs to run in clusters and might benefit the most from an unmanaged pip installation), validphys would still be installed but would crash badly when creating reports if pandoc is not available (or with a nice error).

cc @APJansen I've added you here since you already worked quite a bit on things adjacent to that when you had to install for MacOS so I'd be grateful if you could chime in (or even take this task if you wish since you already have some experience with the installation!)

A minimal requirements.txt file with packages needed in order for the fit to start running

# n3fit
tensorflow
psutil
# evolven3fit
eko
# validphys
pineappl
fiatlux
reportengine
validobj
prompt_toolkit
## hyperopt
hyperopt
seaborn

Only LHAPDF is missing from this list.

The text was updated successfully, but these errors were encountered:

Zaharid · 2023-07-11T08:48:27Z

The only external dependency that is not available as a python package is lhapdf, which for the moment would need to be installed by the user. However, in some of the systems that we use to run there is either lhapdf available by default or an official package for your favourite linux (or macos) flavour so I think we can sidestep the issue for the moment.

Note that this hugely increases the surface area for hard to diagnose user-specific problems that the current setup seeks to minimize. And would make the corresponding documentation harder. A hard requirement is for this to work on linux university clusters where one cannot install packages and in general cannot assume that anything is installed in the right way in the right place. I'd also note that this is introducing a problem we don't currently have.

Note also that even in Python we have some build steps (in particular writing the versions, which I like very much in the way it's done). So that would need to be replicated.

note 2: not sure if python is allowed to write to share during installation, but it would be equally possible to convert the data into package-data and leave share for the things that are "vp-downloaded".

I think this is my biggest concern: It is not clear to me how you go about having reasonable "editable" and environment dependent data folders that can be found by other tools. This is the concern expressed here basically: https://gitlab.com/hepcedar/lhapdf/-/merge_requests/12#note_944200133. For example, ideally we would like the pdfs downloaded by vp to be in the folder where lhapdf (the original from hepforge) would find them.

In terms of cost benefit, it seems to me that doing #1690 would alleviate a lot of the issues with the current setup more easily than figuring all the details here. Especially if we go further such as disabling the c build stuff by default.

scarlehoff · 2023-07-11T09:01:56Z

A hard requirement is for this to work on linux university clusters where one cannot install packages and in general cannot assume that anything is installed in the right way in the right place. I'd also note that this is introducing a problem we don't currently have.

In terms of cost benefit

There is a secondary problem with conda (which afaik only affects me) which is that because of licensing concerns it might be blocked at cern. So for me it is solving a problem that I currently have. I can create an environment in the node where I run and do pip install nnpdf (and download the theory there).

ideally we would like the pdfs downloaded by vp to be in the folder where lhapdf (the original from hepforge) would find them.

This is already the case, isn't it?

Zaharid · 2023-07-11T09:06:21Z

There is a secondary problem with conda (which afaik only affects me) which is that because of licensing concerns it might be blocked at cern. So for me it is solving a problem that I currently have. I can create an environment in the node where I run and do pip install nnpdf (and download the theory there).

I guess that would also more simply solved by using the conda-forge channel, ie #1487?

This is already the case, isn't it?

It is now. I am not sure how one would go about making sure it works in the setup you described.

scarlehoff · 2023-07-11T09:08:47Z

I guess that would also more simply solved by using the conda-forge channel

Given my experience every time we change something in conda I challenge the simply :P

It is now. I am not sure how one would go about making sure it works in the setup you described.

In the same way? lhapdf would need to be installed either way.

Zaharid · 2023-07-11T09:38:02Z

I guess that would also more simply solved by using the conda-forge channel

Given my experience every time we change something in conda I challenge the simply :P

Fair enough. But then again "simply" relative to rewriting the whole setup and associated docs.

Zaharid · 2023-07-11T09:50:10Z

Just to note that another thing to be figured out in this framework would be how to install pandoc properly.

scarlehoff · 2023-07-11T17:16:54Z

Worst case scenario we limit the package to n3fit. Still, let me update the OP.

APJansen · 2023-08-11T09:21:56Z

I think this would make it a lot easier to install, so I would welcome it. To implement this myself though I don't know enough about the nnpdf infrastructure, nor about poetry. What I can of course do is provide you with lists of versions that worked for me for example, and test things. We also have a new colleague Carlos @Cmurilochem joining this project on our side, who also has an M1 Mac and installed nnpdf this week.

(to keep it short I'm only listing the packages that were installed explicitly, not their dependencies. For the pip ones there is no way to do this, so I just listed ones that I think are relevant).

Here is mine:

name: nnpdf-dev
channels:
  - apple
  - https://packages.nnpdf.science/private
  - https://packages.nnpdf.science/public
  - defaults
  - conda-forge
dependencies:
  - clangxx_osx-arm64=14.0.6
  - cmake=3.26.4
  - eko=0.12.2
  - gsl=2.7.1
  - ipython=8.12.0
  - libarchive=3.6.2
  - libsqlite=3.42.0
  - pkg-config=0.29.2
  - pybind11=2.10.4
  - python=3.9.16
  - sqlite=3.41.2
  - swig=4.0.2
  - tensorflow-deps=2.9.0
  - yaml-cpp=0.7.0
  - pip:
      - apfel==3.0.7
      - fiatlux==0.1.2
      - hyperopt==0.2.7
      - numpy==1.25.1
      - pineappl==0.6.0
      - tensorflow-macos==2.9.2
      - tensorflow-metal==0.5.0
      - validobj==1.0
prefix: /Users/aronjansen/miniconda3/envs/nnpdf-dev

alecandido · 2023-08-11T10:04:36Z

@APJansen the problem would be close to its solution (thanks to the various refactoring in vp and on the theory side), up to the Pandoc problem.

The only realistic solution I see for the time being would be to dynamically install it (essentially, add installation scripts to vp/reportengine for some platforms, on top of the checker). In the longer term, we could drop Pandoc usage, and use a different Markdown-to-HTML compiler, like Python-Markdown. For templating, Jinja is already being used, so there is no problem there.

https://github.com/NNPDF/reportengine/blob/79eec2e33c5aab3998fd58e9c3cf84bd2c2696e7/src/reportengine/report.py#L293-L310

Cmurilochem · 2023-08-11T12:43:21Z

As a follow-up on @APJansen's suggestion, I also share here my complete conda (version 22.9.0) environment.yml file, despite indeed suspecting that its proper reproduction is platform dependent. I managed to install the package this week with the kind help of @APJansen. I have a M1 MacBook Pro running with Monterey 12.2.1 macOS.

name: nnpdf-dev
channels:
  - apple
  - https://packages.nnpdf.science/private
  - https://packages.nnpdf.science/public
  - defaults
  - conda-forge
dependencies:
  - blas=1.0=openblas
  - bzip2=1.0.8=h620ffc9_4
  - c-ares=1.19.0=h80987f9_0
  - ca-certificates=2023.05.30=hca03da5_0
  - clang-14=14.0.6=default_h1b80db6_1
  - cmake=3.22.1=hae769c0_0
  - expat=2.4.9=hc377ac9_0
  - grpcio=1.42.0=py39h95c9599_0
  - gsl=2.7.1=h3af6ccd_1
  - h5py=3.6.0=py39h7fe8675_0
  - hdf5=1.12.1=h160e8cb_2
  - icu=68.1=hc377ac9_0
  - krb5=1.19.4=h8380606_0
  - libarchive=3.6.1=he3a3bf9_0
  - libclang-cpp14=14.0.6=default_h1b80db6_1
  - libcurl=7.84.0=hc6d1d07_0
  - libcxx=14.0.6=h848a8c0_0
  - libedit=3.1.20221030=h80987f9_0
  - libev=4.33=h1a28f6b_1
  - libffi=3.4.4=hca03da5_0
  - libgfortran=5.0.0=11_3_0_hca03da5_28
  - libgfortran5=11.3.0=h009349e_28
  - libiconv=1.16=h1a28f6b_2
  - libllvm14=14.0.6=h4b41812_0
  - libnghttp2=1.46.0=h95c9599_0
  - libopenblas=0.3.21=h269037a_0
  - libsqlite=3.39.4=h76d750c_0
  - libssh2=1.10.0=h449679c_2
  - libuv=1.44.2=h80987f9_0
  - libxml2=2.9.14=h8c5e841_0
  - libzlib=1.2.12=h03a7124_3
  - llvm-openmp=14.0.6=hc6e5704_0
  - lz4-c=1.9.4=h313beb8_0
  - ncurses=6.4=h313beb8_0
  - openssl=1.1.1v=h1a28f6b_0
  - pcre=8.45=hc377ac9_0
  - pip=23.2.1=py39hca03da5_0
  - pkg-config=0.29.2=h1a28f6b_0
  - python=3.9.13=hbdb9e5c_2
  - readline=8.2=h1a28f6b_0
  - rhash=1.4.1=hf27765b_1
  - setuptools=68.0.0=py39hca03da5_0
  - sqlite=3.39.3=h1058600_0
  - swig=4.0.2=hc377ac9_4
  - tensorflow-deps=2.9.0=0
  - tk=8.6.12=hb8d0fd4_0
  - wheel=0.38.4=py39hca03da5_0
  - xz=5.4.2=h80987f9_0
  - yaml-cpp=0.7.0=hc377ac9_1
  - zlib=1.2.12=h5a0b063_3
  - zstd=1.5.2=h8574219_0
  - pip:
    - absl-py==1.4.0
    - astunparse==1.6.3
    - blessings==1.7
    - cachetools==5.3.1
    - certifi==2023.7.22
    - charset-normalizer==3.2.0
    - cloudpickle==2.2.1
    - contourpy==1.1.0
    - curio==1.6
    - cycler==0.11.0
    - echo==0.8.0
    - eko==0.13.5
    - flatbuffers==1.12
    - fonttools==4.42.0
    - future==0.18.3
    - gast==0.4.0
    - google-auth==2.22.0
    - google-auth-oauthlib==0.4.6
    - google-pasta==0.2.0
    - hyperopt==0.2.7
    - idna==3.4
    - importlib-metadata==6.8.0
    - importlib-resources==6.0.1
    - jinja2==3.1.2
    - keras==2.9.0
    - keras-preprocessing==1.1.2
    - kiwisolver==1.4.4
    - libclang==16.0.6
    - llvmlite==0.40.1
    - lz4==4.3.2
    - markdown==3.4.4
    - markupsafe==2.1.3
    - matplotlib==3.7.2
    - networkx==3.1
    - numba==0.57.1
    - numpy==1.24.4
    - oauthlib==3.2.2
    - opt-einsum==3.3.0
    - packaging==23.1
    - pandas==2.0.3
    - pillow==10.0.0
    - pineappl==0.6.1
    - prompt-toolkit==3.0.39
    - protobuf==3.19.6
    - psutil==5.9.5
    - py4j==0.10.9.7
    - pyasn1==0.5.0
    - pyasn1-modules==0.3.0
    - pygments==2.16.1
    - pyparsing==3.0.9
    - python-dateutil==2.8.2
    - pytz==2023.3
    - pyyaml==6.0.1
    - qtpy==2.3.1
    - reportengine==0.31
    - requests==2.31.0
    - requests-oauthlib==1.3.1
    - rsa==4.9
    - ruamel-yaml==0.17.32
    - ruamel-yaml-clib==0.2.7
    - scipy==1.11.1
    - seaborn==0.12.2
    - six==1.15.0
    - tensorboard==2.9.1
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.1
    - tensorflow-estimator==2.9.0
    - tensorflow-macos==2.9.2
    - tensorflow-metal==0.5.0
    - termcolor==2.3.0
    - tqdm==4.65.2
    - typing-extensions==4.7.1
    - tzdata==2023.3
    - urllib3==1.26.16
    - validobj==1.0
    - wcwidth==0.2.6
    - werkzeug==2.3.6
    - wrapt==1.15.0
    - zipp==3.16.2
prefix: /Users/murilo/opt/anaconda3/envs/nnpdf-dev

I did some experiments with it and generated a fresh conda enviroment using the above env file with

conda env create -f environment.yml

With this env, my installation of nnpdf was successful after installing:

LHAPDF
Apfel
validphys
fiatlux - from source (https://github.com/scarrazza/fiatlux) and with pip install .
as indicated in https://docs.nnpdf.science/get-started/installation.html#installation-from-source-on-m1-m2-macs.

Although this does not solve this issue, it may be useful for near-term Mac users.

scarlehoff · 2023-08-11T12:49:16Z

Note that several of these tools, which are necessary to install the whole code currently in the repository, are not needed for a fit.

In particular apfel (and the whole C++ library) is not needed anymore. So cmake and c-compilers, gfortran, etc are no longer needed.
pandoc should be avaoidable as the fit itself should not activate any part that depend on it. If the user needs to generate reports then they can install pandoc by themselves.

However, changing the installation method means that the data needs to be installed as part of the python project (it is currently done by the cmake installation, which would also no longer be needed).

Essentially LHAPDF is the only library that is used during the fit and that it's not installable and so that would need to be installed manually by the user.

Zaharid · 2023-08-22T10:51:48Z

FWIW I would suggest looking at this thing

https://prefix.dev/docs/pixi/overview

in a year or so time (at any rate, after it has gained a build command). It has the potential of being less disruptive than the suggestions being made here while addressing the actual problems.

alecandido · 2023-08-22T12:33:13Z

All of n3fit, validphys, and reportengine are pure Python packages, and their dependencies are all Python packages as well (being pure or not), apart for a few ones (those discussed above).

Keeping the complications of a generic package manager for the sake of using a pair of dependencies (that might not be packaged in other places as well, e.g. LHAPDF) it's not worth to me.

However, I'm also interested in a similar business, and soon enough I will start packaging all our projects (i.e. the pineline) for Nix (but because Nix itself is coming with some other benefits).
But I will not advertise it as the main installation tool, since pip is already perfect for this purpose (it's ubiquitous, well-maintained, with a broad community).

scarlehoff · 2023-08-23T07:40:52Z

I think by using pdfflow #923 and lhapdf-management (I need to break that away from the lhapdf repository) n3fit will be completely free from non-available packages. We don't do anything with lhapdf that is not included in pdfflow after all.

scarlehoff · 2024-01-26T11:30:26Z

Closed with #1861 and #1864

scarlehoff added the devtools Build, automation and workflow label Jul 11, 2023

scarlehoff assigned APJansen and scarlehoff Jul 11, 2023

scarlehoff mentioned this issue Aug 8, 2023

Refactor msr #1781

Merged

scarlehoff mentioned this issue Aug 25, 2023

Add an lhapdf_compatibility module for LHAPDF #1799

Merged

scarlehoff mentioned this issue Nov 17, 2023

To Do for 4.0.10 #1854

Open

34 tasks

scarlehoff closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the installation of `nnpdf` packages #1773

Improve the installation of `nnpdf` packages #1773

scarlehoff commented Jul 11, 2023 •

edited

Loading

Zaharid commented Jul 11, 2023

scarlehoff commented Jul 11, 2023

Zaharid commented Jul 11, 2023

scarlehoff commented Jul 11, 2023

Zaharid commented Jul 11, 2023

Zaharid commented Jul 11, 2023

scarlehoff commented Jul 11, 2023

APJansen commented Aug 11, 2023 •

edited

Loading

alecandido commented Aug 11, 2023 •

edited

Loading

Cmurilochem commented Aug 11, 2023 •

edited

Loading

scarlehoff commented Aug 11, 2023

Zaharid commented Aug 22, 2023

alecandido commented Aug 22, 2023

scarlehoff commented Aug 23, 2023

scarlehoff commented Jan 26, 2024

Improve the installation of nnpdf packages #1773

Improve the installation of nnpdf packages #1773

Comments

scarlehoff commented Jul 11, 2023 • edited Loading

Zaharid commented Jul 11, 2023

scarlehoff commented Jul 11, 2023

Zaharid commented Jul 11, 2023

scarlehoff commented Jul 11, 2023

Zaharid commented Jul 11, 2023

Zaharid commented Jul 11, 2023

scarlehoff commented Jul 11, 2023

APJansen commented Aug 11, 2023 • edited Loading

alecandido commented Aug 11, 2023 • edited Loading

Cmurilochem commented Aug 11, 2023 • edited Loading

scarlehoff commented Aug 11, 2023

Zaharid commented Aug 22, 2023

alecandido commented Aug 22, 2023

scarlehoff commented Aug 23, 2023

scarlehoff commented Jan 26, 2024

Improve the installation of `nnpdf` packages #1773

Improve the installation of `nnpdf` packages #1773

scarlehoff commented Jul 11, 2023 •

edited

Loading

APJansen commented Aug 11, 2023 •

edited

Loading

alecandido commented Aug 11, 2023 •

edited

Loading

Cmurilochem commented Aug 11, 2023 •

edited

Loading