Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libaccesom2: Build failure when using Spack v0.21 and nci-openmpi #61

Closed
harshula opened this issue Feb 5, 2024 · 25 comments
Closed

libaccesom2: Build failure when using Spack v0.21 and nci-openmpi #61

harshula opened this issue Feb 5, 2024 · 25 comments
Assignees

Comments

@harshula
Copy link
Collaborator

harshula commented Feb 5, 2024

While testing Spack v0.21 on Gadi, the following happened when using nci-openmpi:

[gadi test-v0.21-nci-openmpi]$ spack install access-om2 ^netcdf-c@4.7.4 ^netcdf-fortran@4.5.2 ^parallelio@2.5.2 ^nci-openmpi@4.0.2 %intel@19.0.5.281

CMake Error in /[...]/tmp/spack-stage/spack-stage-libaccessom2-master-q7h257hjh2fom5c7zgc6annmrzce2pm5/spack-build-q7h257h/CMakeFiles/CMakeTmp/CMakeLists.txt:
  Imported target "MPI::MPI_Fortran" includes non-existent path

    "/apps/openmpi/4.0.2/include/Intel:-I/apps/openmpi/4.0.2/include/Intel"

  in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:

  * The path was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and references files it does not
  provide.


See build log for details:
  /[...]/tmp/spack-stage/spack-stage-libaccessom2-master-q7h257hjh2fom5c7zgc6annmrzce2pm5/spack-build-out.txt

==> Warning: Skipping build of mom5-master-5vptfbmf7kqenqdzmdeuv6udjmitb6bo since libaccessom2-master-q7h257hjh2fom5c7zgc6annmrzce2pm5 failed
==> Warning: Skipping build of access-om2-latest-lzin4rut23ztsvtlq44xc5lre6sddnt4 since mom5-master-5vptfbmf7kqenqdzmdeuv6udjmitb6bo failed
==> Warning: Skipping build of cice5-master-3rqpsj5jag2qvezxs6wlrm5l3ounhwmd since libaccessom2-master-q7h257hjh2fom5c7zgc6annmrzce2pm5 failed
==> Error: access-om2-latest-lzin4rut23ztsvtlq44xc5lre6sddnt4: Package was not installed
==> Error: Installation request failed.  Refer to reported errors for failing package(s).

Testing Spack v0.21 in a Docker image with openmpi:

[root@d038f8bf660a /]# spack --version
0.21.1 (e30fedab102f9281a220fb4fae82e3f8c43a82ac)

[root@d038f8bf660a /]# spack install access-om2 ^netcdf-c@4.7.4 ^netcdf-fortran@4.5.2 ^parallelio@2.5.2 ^openmpi@4.0.2 %intel@2021.2.0

[+] /opt/release/linux-rocky8-x86_64/intel-2021.2.0/mom5-master-jqnmk4kpfy6qkpdrmi7puxg22dfxpj7r
==> Installing access-om2-latest-uutnryfwqdj7xjhfw2srbfyleckoshiw [37/37]
==> No binary for access-om2-latest-uutnryfwqdj7xjhfw2srbfyleckoshiw found: installing from source
==> No patches needed for access-om2
==> access-om2: Executing phase: 'install'
==> access-om2: Successfully installed access-om2-latest-uutnryfwqdj7xjhfw2srbfyleckoshiw
  Stage: 0.00s.  Install: 0.00s.  Post-install: 0.07s.  Total: 0.24s
[+] /opt/release/linux-rocky8-x86_64/intel-2021.2.0/access-om2-latest-uutnryfwqdj7xjhfw2srbfyleckoshiw

[root@d038f8bf660a /]# spack find
-- linux-rocky8-x86_64 / gcc@8.5.0 ------------------------------
gmake@4.4.1  intel-oneapi-compilers@2021.2.0  patchelf@0.17.2

-- linux-rocky8-x86_64 / intel@2021.2.0 -------------------------
access-om2@latest                   json-fortran@8.3.0    nghttp2@1.57.0
autoconf@2.69                       krb5@1.20.1           numactl@2.0.14
automake@1.16.5                     libaccessom2@master   oasis3-mct@master
bison@3.8.2                         libedit@3.1-20210216  openmpi@4.0.2
bzip2@1.0.8                         libevent@2.1.12       openssh@9.5p1
ca-certificates-mozilla@2023-05-30  libiconv@1.17         openssl@3.1.3
cice5@master                        libpciaccess@0.17     parallelio@2.5.2
cmake@3.24.2                        libsigsegv@2.14       perl@5.26.3
curl@8.4.0                          libtool@2.4.7         pigz@2.7
datetime-fortran@1.7.0              libxcrypt@4.4.35      pkgconf@1.9.5
diffutils@3.9                       libxml2@2.10.3        pmix@4.2.2
findutils@4.9.0                     m4@1.4.19             tar@1.34
gettext@0.22.3                      mom5@master           util-macros@1.19.3
gmake@4.4.1                         ncurses@6.4           xz@5.4.1
hdf5@1.14.3                         netcdf-c@4.7.4        zlib-ng@2.1.4
hwloc@2.9.1                         netcdf-fortran@4.5.2  zstd@1.5.5
==> 51 installed packages
@harshula harshula self-assigned this Feb 5, 2024
@harshula
Copy link
Collaborator Author

harshula commented Feb 5, 2024

Testing Spack v0.21 on Gadi using openmpi succeeds in building libaccessom2:

[gadi test-v0.21-openmpi]$ spack install access-om2 ^netcdf-c@4.7.4 ^netcdf-fortran@4.5.2 ^parallelio@2.5.2 ^openmpi@4.0.2 %intel@19.0.5.281

[gadi test-v0.21-openmpi]$ spack find
-- linux-rocky8-x86_64_v4 / intel@19.0.5.281 --------------------
access-om2@latest                   krb5@1.20.1           oasis3-mct@master
autoconf@2.69                       libaccessom2@master   openmpi@4.0.2
automake@1.16.5                     libedit@3.1-20210216  openssh@9.5p1
bison@3.8.2                         libevent@2.1.12       openssl@3.1.3
bzip2@1.0.8                         libiconv@1.17         parallelio@2.5.2
ca-certificates-mozilla@2023-05-30  libpciaccess@0.17     perl@5.26.3
cice5@master                        libsigsegv@2.14       pigz@2.7
cmake@3.24.2                        libtool@2.4.7         pkgconf@1.9.5
datetime-fortran@1.7.0              libxcrypt@4.4.35      pmix@4.2.2
diffutils@3.9                       libxml2@2.10.3        tar@1.34
findutils@4.9.0                     m4@1.4.19             util-macros@1.19.3
gettext@0.22.3                      mom5@master           xz@5.4.1
gmake@4.4.1                         ncurses@6.4           zlib-ng@2.1.4
hdf5@1.14.3                         netcdf-c@4.7.4        zstd@1.5.5
hwloc@2.9.1                         netcdf-fortran@4.5.2
json-fortran@8.3.0                  numactl@2.0.14
==> 46 installed packages

==> Installing access-om2-latest-sdntrjn6funnpxiggzmj5tbgaqhxowqr [46/46]             
==> No binary for access-om2-latest-sdntrjn6funnpxiggzmj5tbgaqhxowqr found: installing
from source
==> No patches needed for access-om2
==> access-om2: Executing phase: 'install'
==> access-om2: Successfully installed access-om2-latest-sdntrjn6funnpxiggzmj5tbgaqhxowqr
  Stage: 0.00s.  Install: 0.00s.  Post-install: 0.78s.  Total: 2.98s                  
[+] /[...]/test-v0.21-openmpi/release/linux-rocky8-x86_64_v4/intel-19.0.5.281/access-om2-latest-sdntrjn6funnpxiggzmj5tbgaqhxowqr      


@harshula
Copy link
Collaborator Author

harshula commented Feb 5, 2024

Modifying nci-openmpi and setting OMPI_FCFLAGS instead of appending is a workaround:

diff --git a/packages/nci-openmpi/package.py b/packages/nci-openmpi/package.py
index 3b1006a..38b2c84 100644
--- a/packages/nci-openmpi/package.py
+++ b/packages/nci-openmpi/package.py
@@ -34,7 +34,7 @@ class NciOpenmpi(Package):
         elif self.spec.satisfies("%gcc"):
             finc_path = join_path(self.prefix.include, "GNU")
             flib_path = join_path(self.prefix.lib, "GNU")
-        env.append_path("OMPI_FCFLAGS", "-I" + finc_path)
+        env.set("OMPI_FCFLAGS", "-I" + finc_path)
         env.append_path("OMPI_LDFLAGS", "-L" + self.prefix.lib + " -L" + flib_path)
 
     # The following is reproduced from the builtin openmpi spack package
[gadi]$ spack install access-om2 ^netcdf-c@4.7.4 ^netcdf-fortran@4.5.2 ^parallelio@2.5.2 ^nci-openmpi@4.0.2 %intel@19.0.5.281

==> Installing access-om2-latest-k6qc7etkb7w3r55xrlpxkfnukvbxpxzu [16/16]
==> No binary for access-om2-latest-k6qc7etkb7w3r55xrlpxkfnukvbxpxzu found: installing from source
==> No patches needed for access-om2
==> access-om2: Executing phase: 'install'
==> access-om2: Successfully installed access-om2-latest-k6qc7etkb7w3r55xrlpxkfnukvbxpxzu
  Stage: 0.00s.  Install: 0.00s.  Post-install: 0.34s.  Total: 2.09s
[+] /[...]/test-v0.21-nci-openmpi/release/linux-rocky8-x86_64_v4/intel-19.0.5.281/access-om2-latest-k6qc7etkb7w3r55xrlpxkfnukvbxpxzu

@aidanheerdegen
Copy link
Member

The original CMake error for v0.21:

"/apps/openmpi/4.0.2/include/Intel:-I/apps/openmpi/4.0.2/include/Intel"

implies

OMPI_FCFLAGS="-I/apps/openmpi/4.0.2/include/Intel:-I/apps/openmpi/4.0.2/include/Intel"

I believe the error was nci-openmpi using env.append_path. This is intended for paths, so defaults to a : separator.

Instead it should have used env.append_flags which defaults to using space (" ") as the separator.

So the error seems to occur because OMPI_FCFLAGS is now being populated in v0.21. Using append_flags would not have created an error in this case, though an include flag and path would be duplicated.

@harshula
Copy link
Collaborator Author

harshula commented Feb 7, 2024

The problem to solve is to find out why Spack v0.20's OMPI_FCFLAGS does NOT contain the path, whereas in v0.21, it does contain the path.

@aidanheerdegen
Copy link
Member

Does compiling with spack v0.20 still work with the existing package definition? If not my guess is there was some change to the NCI wrappers that started populating OMPI_FCFLAGS.

@harshula
Copy link
Collaborator Author

harshula commented Feb 7, 2024

You tested the recently built access-om2 with Spack v0.20: https://forum.access-hive.org.au/t/testing-spack-model-component-builds-for-access-om2/1567/11 , that's why this issue is specifically about Spack v0.21.

@harshula
Copy link
Collaborator Author

harshula commented Feb 7, 2024

Last week and again today, to double check, I tested NOT setting OMPI_FCFLAGS in nci-openmpi. In both instances, libaccessom2 failed to build.

2024/01/29

diff --git a/packages/nci-openmpi/package.py b/packages/nci-openmpi/package.py
index 3b1006a..80e2e2b 100644
--- a/packages/nci-openmpi/package.py
+++ b/packages/nci-openmpi/package.py
@@ -34,7 +34,7 @@ class NciOpenmpi(Package):
         elif self.spec.satisfies("%gcc"):
             finc_path = join_path(self.prefix.include, "GNU")
             flib_path = join_path(self.prefix.lib, "GNU")
-        env.append_path("OMPI_FCFLAGS", "-I" + finc_path)
+        #env.set("OMPI_FCFLAGS", "-I" + finc_path)
         env.append_path("OMPI_LDFLAGS", "-L" + self.prefix.lib + " -L" + flib_path)
 
     # The following is reproduced from the builtin openmpi spack package
1 error found in build log:
     5     -- Detecting Fortran compiler ABI info - done
     6     -- Check for working Fortran compiler: /[...]/test-v0.2
           1-nci-openmpi/spack/lib/spack/env/intel/ifort - skipped
     7     ---- PROJECT_VERSION: '2.0.202212'
     8     ---- FQDN: gadi-login-04.gadi.nci.org.au
     9     ---- NUMBER_OF_LOGICAL_CORES: 48
     10    -- Could NOT find MPI_Fortran (missing: MPI_Fortran_F77_HEADER_DIR M
           PI_Fortran_MODULE_DIR) (found version "3.1")
  >> 11    CMake Error at /apps/cmake/3.24.2/share/cmake-3.24/Modules/FindPacka
           geHandleStandardArgs.cmake:230 (message):
     12      Could NOT find MPI (missing: MPI_Fortran_FOUND) (found version "3.
           1")
     13    Call Stack (most recent call first):
     14      /apps/cmake/3.24.2/share/cmake-3.24/Modules/FindPackageHandleStand
           ardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
     15      /apps/cmake/3.24.2/share/cmake-3.24/Modules/FindMPI.cmake:1835 (fi
           nd_package_handle_standard_args)
     16      CMakeLists.txt:34 (find_package)
     17    

See build log for details:
  /[...]/tmp/spack-stage/spack-stage-libaccessom2-master-alfppnrq65u25lkgvx37arfe745ck6ng/spack-build-out.txt

2024/02/07

diff --git a/packages/nci-openmpi/package.py b/packages/nci-openmpi/package.py
index 3b1006a..4bfe945 100644
--- a/packages/nci-openmpi/package.py
+++ b/packages/nci-openmpi/package.py
@@ -34,7 +34,6 @@ class NciOpenmpi(Package):
         elif self.spec.satisfies("%gcc"):
             finc_path = join_path(self.prefix.include, "GNU")
             flib_path = join_path(self.prefix.lib, "GNU")
-        env.append_path("OMPI_FCFLAGS", "-I" + finc_path)
         env.append_path("OMPI_LDFLAGS", "-L" + self.prefix.lib + " -L" + flib_path)
 
     # The following is reproduced from the builtin openmpi spack package
[gadi]$ spack --version
0.21.1 (e30fedab102f9281a220fb4fae82e3f8c43a82ac)

[gadi]$ spack install access-om2 ^netcdf-c@4.7.4 ^netcdf-fortran@4.5.2 ^parallelio@2.5.2 ^nci-openmpi@4.0.2 %intel@19.0.5.281


1 error found in build log:
     5     -- Detecting Fortran compiler ABI info - done
     6     -- Check for working Fortran compiler: /[...]/test-v0.2
           1-nci-openmpi-2/spack/lib/spack/env/intel/ifort - skipped
     7     ---- PROJECT_VERSION: '2.0.202212'
     8     ---- FQDN: gadi-login-04.gadi.nci.org.au
     9     ---- NUMBER_OF_LOGICAL_CORES: 48
     10    -- Could NOT find MPI_Fortran (missing: MPI_Fortran_F77_HEADER_DIR M
           PI_Fortran_MODULE_DIR) (found version "3.1")
  >> 11    CMake Error at /apps/cmake/3.24.2/share/cmake-3.24/Modules/FindPacka
           geHandleStandardArgs.cmake:230 (message):
     12      Could NOT find MPI (missing: MPI_Fortran_FOUND) (found version "3.
           1")
     13    Call Stack (most recent call first):
     14      /apps/cmake/3.24.2/share/cmake-3.24/Modules/FindPackageHandleStand
           ardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
     15      /apps/cmake/3.24.2/share/cmake-3.24/Modules/FindMPI.cmake:1835 (fi
           nd_package_handle_standard_args)
     16      CMakeLists.txt:34 (find_package)
     17    

See build log for details:
  /[...]/tmp/spack-stage/spack-stage-libaccessom2-master-alfppnrq65u25lkgvx37arfe745ck6ng/spack-build-out.txt

@harshula
Copy link
Collaborator Author

harshula commented Feb 8, 2024

It appears that nci-openmpi's setup_run_environment() is run twice on Spack v0.21, whereas, it is only run once on Spack v0.20. This also explains why not setting OMPI_FCFLAGS fails.

@micaeljtoliveira
Copy link
Collaborator

Sorry, a very naive question. Is there any reason for not using the gadi OpenMPI as an external package? I'm asking because we've been doing that for the COSIMA spack instance without any issues up to now, but maybe there's something I missed?

@harshula
Copy link
Collaborator Author

harshula commented Feb 9, 2024

Hi @micaeljtoliveira , How are you informing Spack built packages that are using openmpi's compilers to modify LDFLAGS/FCFLAGS to include Gadi's non-standard directory paths? e.g. For Intel compilers, LDFLAGS to include -L/apps/openmpi/4.0.2/lib/Intel and FCFLAGS to include -I/apps/openmpi/4.0.2/include/Intel?

@micaeljtoliveira
Copy link
Collaborator

@harshula I'm not. That's usually the job of the mpi compiler wrapper, no?

@harshula
Copy link
Collaborator Author

harshula commented Feb 9, 2024

Hi @micaeljtoliveira , I'll try a build without nci-openmpi and using:

  openmpi:
    externals:
    - spec: openmpi@4.0.2
      prefix: /apps/openmpi/4.0.2
      modules:
      - openmpi/4.0.2
    buildable: false

Does that look correct? Let's see what happens.

@harshula
Copy link
Collaborator Author

harshula commented Feb 9, 2024

It appears that nci-openmpi's setup_run_environment() is run twice on Spack v0.21, whereas, it is only run once on Spack v0.20.

Using git bisect, I found that the change of behaviour appears to have been introduced in spack/spack#35737 .

@harshula
Copy link
Collaborator Author

harshula commented Feb 9, 2024

Hi @micaeljtoliveira ,

That's usually the job of the mpi compiler wrapper, no?

This is what happens if I do NOT use nci-openmpi:

-- Could NOT find MPI_Fortran (missing: MPI_Fortran_F77_HEADER_DIR MPI_Fortran_MODULE_DIR) (found version "3.1")
CMake Error at /apps/cmake/3.24.2/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_Fortran_FOUND) (found version "3.1")
Call Stack (most recent call first):
  /apps/cmake/3.24.2/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /apps/cmake/3.24.2/share/cmake-3.24/Modules/FindMPI.cmake:1835 (find_package_handle_standard_args)
  CMakeLists.txt:34 (find_package)

Have a look at CMakeLists.txt and the SPD and see if there's problem:
https://github.com/ACCESS-NRI/libaccessom2/blob/master/CMakeLists.txt
https://github.com/ACCESS-NRI/spack-packages/blob/main/packages/libaccessom2/package.py

@micaeljtoliveira
Copy link
Collaborator

Have a look at CMakeLists.txt and the SPD and see if there's problem:

Yes, there's an issue. You explicitly need to tell spack to use the mpi compiler wrappers by adding something like this to the package.py file:

    def cmake_args(self):                                                                                                                                                                                                                   
        return [                                                                                                                                                                                                                            
            self.define("CMAKE_C_COMPILER", self.spec["mpi"].mpicc),                                                                                                                                                                        
            self.define("CMAKE_CXX_COMPILER", self.spec["mpi"].mpicxx),                                                                                                                                                                     
            self.define("CMAKE_Fortran_COMPILER", self.spec["mpi"].mpifc),                                                                                                                                                                  
        ] 

This is a very common pattern in spack SPD's (and a bit silly in my opinion, because if one says that MPI is required, spack should use the wrappers automatically...)

I've tested the above snippet in the COSIMA spack instance and seems to works fine.

@harshula
Copy link
Collaborator Author

Hi @micaeljtoliveira , Compilation succeeds when cmake_args() is added. We'll also do a runtime test. Thanks! The aforementioned code appears to tell CMake where to find the MPI compilers? Is the problem in CMake's find_package(MPI)?

@micaeljtoliveira
Copy link
Collaborator

The aforementioned code appears to tell CMake where to find the MPI compilers? Is the problem in CMake's find_package(MPI)?

Hi @harshula, I don't think this is a CMake problem, but rather a feature. My understanding is that find_package(MPI) job is not to find out which compiler wrapper to use, just to make sure the MPI libraries and include files are correctly set.

Note that usually one also needs to explicitly pass exactly the same information to the autotools.

@harshula
Copy link
Collaborator Author

Hi @micaeljtoliveira , The runtime tests were successful. I definitely prefer to use openmpi directly because it simplifies supporting both Gadi and CI. While debugging this Issue, I found that CMAKE_PREFIX_PATH contained /apps/openmpi/4.0.2 and PATH contained /apps/openmpi/4.0.2/bin. Should the search for the MPI compiler happen in CMakeLists.txt?

@harshula
Copy link
Collaborator Author

The change in Spack v0.21 that resulted in this issue is being addressed upstream in spack/spack#42700 and spack-packages in #65 . Next, I'll make the change in libaccessom2's SPD.

@harshula
Copy link
Collaborator Author

@micaeljtoliveira
Copy link
Collaborator

Should the search for the MPI compiler happen in CMakeLists.txt?

@harshula I've been investigating a bit more the behaviour of CMake+MPI and it seems that CMake is supposed to figure out how to correctly compile an MPI program without using the wrappers (that would be one of the jobs of find_package(MPI)). This does not seem to work correctly on Gadi because of the "special" way how compilers and MPI libraries are installed and used.

The way to circumvent this problem on Gadi when using CMake directly, is simply to explicitly set the compilers with the corresponding CMake options (CMAKE_Fortran_COMPILER and CMAKE_C_COMPILER). When using Spack, this is slightly more complicated, as Spack doesn't easily let you do that in the spack.yaml files, so one needs to do it either by adding a new MPI package like was done when using nci-openmpi or by modifying the SPD's like I suggested.

I guess in the end there's no "correct" way of doing this, as NCI will not change the way they install their software anytime soon, so I would argue for whatever is more convenient for us.

@harshula
Copy link
Collaborator Author

Notes
Loading the CMake module is not a workaround that can replace setting CMAKE_*_COMPILER:

diff --git a/common/gadi/packages.yaml b/common/gadi/packages.yaml
index 28bb638..19e1e3c 100644
--- a/common/gadi/packages.yaml
+++ b/common/gadi/packages.yaml
@@ -12,6 +12,7 @@ packages:
     externals:
     - spec: cmake@3.24.2
       prefix: /apps/cmake/3.24.2
+      modules: [cmake/3.24.2]
     buildable: false
   openmpi:
     externals:

@harshula
Copy link
Collaborator Author

Hi @micaeljtoliveira ,

The libaccessom2 build succeeds on Gadi, even without the MPI compiler wrappers being passed explicitly to CMake, when extra_attributes was added to packages.yaml. e.g.

    - spec: openmpi@4.1.5
      prefix: /apps/openmpi/4.1.5
      extra_attributes:
        environment:
          prepend_path:
            CMAKE_PREFIX_PATH: /apps/openmpi/4.1.5/include/Intel
      modules: [openmpi/4.1.5]

@micaeljtoliveira
Copy link
Collaborator

@harshula That's interesting. It just confirms what I wrote above regarding the way paths are set by the environment modules on Gadi. In any case, I guess setting the prefix is slightly cleaner than setting CMAKE_Fortran_COMPILER and CMAKE_C_COMPILER.

@harshula
Copy link
Collaborator Author

Hi @micaeljtoliveira ,

Upstream Spack removed the MPI compiler wrappers being passed explicitly to CMake in the FMS SPD (spack/spack@d21aa1c#diff-f04fae00f6c65e2738850eed4ec088e76c09a1fe79d8516cfd51af8dd0e89aa4L115-L118). This could be the direction Spack is moving in.

I had added a revert of the FMS change to our fork of Spack v0.22. I'm not planning on keeping that revert when we move to Spack v0.23.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants