Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move spack-stack repos into spack submodule, use two-level namespace for openmpi and mpich, add package py-pyhdf, use only one macOS reference config #131

Merged

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented May 3, 2022

Description

This PR has once again grown bigger than intended ... it:

  • Finalizes the move of the spack-stack repos into the spack submodule
    • This requires adding the projections/name translations for ecmwf-atlas to atlas in the common module config
  • Enables use of two-level namespace for openmpi and mpich (the former required making changes to the package config in spack, for the latter this was already implemented)
    • Also need to export OMPI_MCA_rmaps_base_oversubscribe=1 in the openmpi module so that one can use more tasks than CPU cores (for unit testing)
  • Fixes several problems with CI builds on Linux and macOS
  • Adds the existing hdf and new py-pyhdf packages
    • Requires setting the default provider for jpeg to libjpeg-turbo (already part of our spack-stack)
  • Updates one of the macOS reference configs to macos-monterey-apple-clang-openmpi-reference (use apple-clang instead of (llvm) clang) and removes the macOS reference config for easier maintenance
    • This reference config now builds openmpi-4.1.3 built by spack with two-level namespace enabled
    • Note that I tested the same config with mpich-3.4.2 built by spack with two-level namespace enabled
  • Adds the capability to create containers with spack, all built into the existing scripts
    • create-env.py is renamed to create.py environment ... for creating local environments; Github unfortunately doesn't see this and thinks it is a new file and the old one got deleted (use a graphical diff tool locally to compare)
    • Containers are built using create.py container ...
    • One example is added: configs/containers/docker-ubuntu-gcc-openmpi.yaml
  • Comments out git-lfs in the common package config, because pinning a version there isn't needed (we don't build git-lfs, but require it due to the dependencies on go/go-bootstrap, which are difficult to build) and having it in there breaks the container builds
  • Updates and consolidates the documentation in README.md, config/sites/README.md and config/common/README.md into a single (top-level) README.md
    • Note: In my next PR, I will add the capability to create HTML (and possibly LaTeX/PDF) documentation using sphinx

Dependencies

Issues

Testing

  • CI tests all pass - see https://github.com/climbfuji/spack-stack/actions (Note: currently some still running, but they all passed yesterday)
  • macOS Monterey with apple-clang@13.1.6 and openmpi-4.1.3 built by spack for several applications (including full ctests for the fv3-jedi bundle)
  • macOS Monterey with apple-clang@13.1.6 and mpich-3.4.2 built by spack for several applications (including full ctests for the fv3-jedi bundle)
  • Container build of docker-ubuntu-gcc-openmpi config on macOS

@climbfuji climbfuji linked an issue May 3, 2022 that may be closed by this pull request
@climbfuji climbfuji force-pushed the feature/two_level_namespace_containers branch from 778b950 to ca884be Compare May 4, 2022 06:24
@climbfuji climbfuji force-pushed the feature/two_level_namespace_containers branch 5 times, most recently from bd23e62 to 35e8786 Compare May 6, 2022 06:07
@climbfuji climbfuji force-pushed the feature/two_level_namespace_containers branch from 35e8786 to 7bf7d24 Compare May 6, 2022 06:15
@climbfuji
Copy link
Collaborator Author

This looks great.

Comments out git-lfs in the common package config, because it isn't needed (we don't build git-lfs but require it due to the dependencies on go/go-bootstrap, which are difficult to build) and having it in there breaks the container builds

JediBaseEnv has a dependency on git-lfs. Are you saying that it shouldn't have that?

Thanks for looking at this PR! Ready for "review" soon.

JediBaseEnv does have a dependency on git-lfs, yes, but we can't pin a version in packages.yaml (or have any entry in there), because of the way we piece together configs/common/packages.yaml and the docker config (config/container/*). It doesn't allow having two entries in the packages section (spack containerize ignores one of the two, which in the current order means the one from the container config).

We could fix that by stripping out the packages: section identifier and combine the common packages with the container packages in a different order, in which case the containerize command will ignore the entry in the common packages, i.e. it won't make a difference at all. Same for all regular sites, because we install it as an external package (go and go-bootstrap are really tricky to build).

@climbfuji climbfuji force-pushed the feature/two_level_namespace_containers branch from 88ffdc3 to 8d5e566 Compare May 6, 2022 08:28
@climbfuji climbfuji force-pushed the feature/two_level_namespace_containers branch from 8d5e566 to 65bf01f Compare May 6, 2022 08:40
# without having to rerun all other CI tests. Later, should consider
# using https://github.com/jurplel/install-qt-action for all runners
# and get rid of the macOS homebrew installation/package config
#sudo apt-get install qt5-default qttools5-dev-tools
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note. These are not yet used in spack, there was a problem during the concretization step when I added it manually as an external spack. I worked around the long build times using package-level plus install-level parallelism, see further down.

@@ -650,3 +650,5 @@ def substitute_config_vars(config_str):
with open(python_module_file, 'w') as f:
f.write(module_content)
logging.info(" ... writing {}".format(python_module_file))

logging.info("Metamodule generation completed successfully in {}".format(meta_module_dir))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice to have, because one can copy & paste the meta_module_dir for the module use statement.

@climbfuji climbfuji marked this pull request as ready for review May 6, 2022 14:09
@climbfuji climbfuji requested a review from kgerheiser May 6, 2022 14:09
@climbfuji
Copy link
Collaborator Author

@srherbener @mer-a-o FYI

kgerheiser pushed a commit to JCSDA/spack that referenced this pull request May 6, 2022
* Add option to compile `openmpi` without flat namespace (aka two-level namespaces) - corresponding flags borrowed from `mpich`

* Rename `nceplibs-bundle` to `nceplibs-bundle-env`

* Add new package `py-pyhdf`, add it to `jedi-base-env`

* Bug fix for building `hdf` on Ubuntu 20.04 aarch64 - see also spack#30522

* Get timeout for web requests with `urllib` from spack config, same as for `curl` (see corresponding PR for the authoritative spack repo spack#30468)

See JCSDA/spack-stack#131 for the corresponding changes in `spack-stack` and for testing.
@@ -225,11 +284,26 @@ runs:

- name: build-env
shell: bash
env:
OMP_NUM_THREADS: 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this help? The runner only has 2 vcpus, and what uses OpenMP? Does Spack use it internally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenMP threads are used for the package-level build parallelism (config:build_jobs). It does help, though not a lot. At least it decreases the build time from 6hrs+ to 5hrs 25min with ubuntu/intel for jedi-ewok - good enough for the moment.

openmpi:
environment:
set:
'OMPI_MCA_rmaps_base_oversubscribe': '1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this was always one of the reasons I used MPICH. Having it encoded into the environment is good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem with mpich on macOS is that once the ufs-weather-model update to use mpi_f08 is made, we can't use mpich on macOS anymore, because it doesn't build mpi_f08 on macOS. But openmpi does. So I am glad that there is a solution to (a) the oversubscribe problem and (b) the two-level namespace problem.

@climbfuji
Copy link
Collaborator Author

@kgerheiser Submodule pointer updated, .gitmodules reverted - ready for review and merging, if ok.

@kgerheiser kgerheiser merged commit 928b1fc into JCSDA:develop May 6, 2022
@climbfuji climbfuji deleted the feature/two_level_namespace_containers branch November 11, 2022 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants