Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update WCOSS2 build to spack-stack/1.6.0 #1350

Closed
Tracked by #1342
RussTreadon-NOAA opened this issue Oct 29, 2024 · 14 comments · Fixed by #1435
Closed
Tracked by #1342

Update WCOSS2 build to spack-stack/1.6.0 #1350

RussTreadon-NOAA opened this issue Oct 29, 2024 · 14 comments · Fixed by #1435

Comments

@RussTreadon-NOAA
Copy link
Contributor

EIB and NCO have installed a test version of spack-stack/1.6.0 on Cactus. This version of spack-stack has been used to build GDASApp at 764f58c. g-w CI was run for JEDI based DA. All jobs in all streams successfully ran to completion. The Cactus spack-stack/1.6.0 installation remains experimental. It is not ready for general use.

This issue is opened to document the use of spack-stack on WCOSS2.

@RussTreadon-NOAA
Copy link
Contributor Author

Work for this issue will be done in feature/wcoss2_spack-stack.

Commit 43cb9b7 to feature/wcoss2_spack-stack updated modulefiles/GDAS/wcoss2.intel.lua to use spack-stack/1.6.0.

@RussTreadon-NOAA
Copy link
Contributor Author

Note: Acorn build is not functional due to error in Acorn installation of boost/1.79.0. Acorn installation of boost/1.79.0 uses the Cactus path for the boost/1.79.0 lib and include directories. WCOSS2 ticket #2024102910000015 opened reporting this error.

@RussTreadon-NOAA
Copy link
Contributor Author

Resolution of GDASApp issues #1331 and #1336 depend upon resolution of this issue.

@RussTreadon-NOAA
Copy link
Contributor Author

spack-stack/1.6.0 may not be implemented on WCOSS2 given the push to move to spack-stack/1.9.0. See GDASApp issue #1283

@RussTreadon-NOAA
Copy link
Contributor Author

Merge develop @ 76208eb into feature/wcoss2_spack-stack. Done at bd19676.

Rerun ./build.sh -v -f on Dogwood, Cactus, and Acorn in working copy of feature/wcoss2_spack-stack @ bd19676.

The Dogwood and Cactus builds run to completion.

The Acorn build aborts while configuring soca with the error message

-- Adding bundle project soca
-- ---------------------------------------------------------
-- [soca] (1.8.0)
-- Feature TESTS enabled

...

-- Found OpenMP: TRUE (found version "5.0") found components: Fortran 
CMake Error at soca/CMakeLists.txt:34 (find_package):
  By not providing "Findfms.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "fms", but
  CMake did not find one.

  Could not find a package configuration file provided by "fms" (requested
  version 2023.3.0) with any of the following names:

    fmsConfig.cmake
    fms-config.cmake

  Add the installation prefix of "fms" to CMAKE_PREFIX_PATH or set "fms_DIR"
  to a directory containing one of the above files.  If "fms" provides a
  separate development package or SDK, be sure it has been installed.


-- Configuring incomplete, errors occurred!

The only difference between wcoss2.intel.lua acorn.intel.lua is the path to spack-stack/1.6.0. wcoss2.intel.lua has

prepend_path("MODULEPATH", "/apps/ops/test/spack-stack-1.6.0-nco/envs/nco-intel-19.1.3.304/install/modulefiles/Core")

whereas acorn.intel.lua has

prepend_path("MODULEPATH", "/lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core")

Acorn /lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/envs contains several environments.

 russ.treadon@alogin01:/lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.6.0/envs> ls -l
total 40
drwxrwsr-x 7 alexander.richert nceplibs 4096 Nov 27 22:55 esmf-8.6.1-mapl-2.40.3.1-addon
-rw-rw-r-- 1 alexander.richert nceplibs  175 Jan 25  2024 README
drwxrwsr-x 6 alexander.richert nceplibs 4096 Aug 30 17:25 ue-esmf-8.6.1-mapl-2.46.2
drwxrwsr-x 7 alexander.richert nceplibs 4096 Aug 30 21:41 ue-esmf-8.6.1-mapl-2.46.3
drwxrwsr-x 6 alexander.richert nceplibs 4096 Nov 27 22:35 ufswm-env
drwxrwsr-x 7 alexander.richert nceplibs 4096 Aug 21 19:15 unified-env
drwxrwsr-x 7 alexander.richert nceplibs 4096 Sep 20 18:20 unified-env-fms-2024.01
drwxr-sr-x 4 hang.lei          nceplibs 4096 Aug 21 19:03 upp-addon-env
drwxrwsr-x 7 alexander.richert nceplibs 4096 Aug 28 05:15 upp-env
drwxrwsr-x 7 alexander.richert nceplibs 4096 Sep 17 17:25 upp-esmf-8.6.1-mapl-2.46.3

Three questions:

  1. Which of these is Acorn spack-stack/1.6.0 environments is equivalent to Dogwood and Cactus /apps/ops/test/spack-stack-1.6.0-nco/envs/nco-intel-19.1.3.304?
  2. When will the Dogwood / Cactus /apps/ops/test/spack-stack-1.6.0-nco/envs/nco-intel-19.1.3.304 be promoted to /apps/ops/prod?
  3. If the Dogwood and Cactus builds work do I need to be concerned about the Acorn build failure? In other words, must we get a successful Acorn build before Dogwood / Cactus /apps/ops/test/spack-stack-1.6.0-nco/envs/nco-intel-19.1.3.304 can be promoted to /apps/ops/prod?

We need to build GDASApp with spack-stack on WCOSS2 in order for all JEDI aerosol and marine DA jobs to successfully run to completion.

@DavidHuber-NOAA , do you know who I should contact to answer the above questions?

@DavidHuber-NOAA
Copy link
Contributor

I would recommend reaching out to @AlexanderRichert-NOAA and/or Wei Wei at NCO.

@RussTreadon-NOAA
Copy link
Contributor Author

Progress on this issue has become critical now that both Cactus and Dogwood have been updated.

As documented in g-w issue #3100, GDASApp no longer builds on Dogwood or Cactus using hpc-stack. The build using the test installation of spack-stack/1.6.0 successfully runs to completion.

WCOSS2 Ticket #2024111410000051 has been updated with this information.

@AlexanderRichert-NOAA, is it acceptable to update, at least in the interim, modulefiles/GDAS/wcoss2.intel.lua to use the test installation of spack-stack/1.6.0 on Cactus and Dogwood?

At present there is no other way to build and run GDASApp on WCOSS2. This is a problem. GFS v17 exercises the land and marine DA components of GDASApp.

@RussTreadon-NOAA
Copy link
Contributor Author

@DavidNew-NOAA , @CoryMartin-NOAA , @danholdaway , @guillaumevernieres :

We can not build GDASApp develop on Cactus or Dogwood following the system upgrade using hpc-stack. The build using spack-stack/1.6.0 works but this is an unofficial installation.

@danholdaway
Copy link
Contributor

Thanks for tracking this @RussTreadon-NOAA. It seems sensible to me to just point to 1.6.0 for now so we can keep testing other developments

@CoryMartin-NOAA
Copy link
Contributor

I agree. I was just talking with @CatherineThomas-NOAA , @RussTreadon-NOAA what do I need to do to build GDASApp on WCOSS? Just use feature/wcoss2_spack-stack?

@RussTreadon-NOAA
Copy link
Contributor Author

I've been working on this over the past two weeks. There have been issues with WCOSS2 spack-stack and system libraries and modules following the WCOSS2 upgrades. spack-stack issues seem to be resolved as of yesterday (1/7). System issues remain but I have patches we can use until sysadmins fix the underlying problems.

Today I installed a fresh clone of g-w develop along with the current head of GDASApp develop. A modified wcoss2.intel.lua is being using to build g-w on Cactus. The GDASApp build is still running. I'll run g-w CI once the GDASApp build completes.

@RussTreadon-NOAA
Copy link
Contributor Author

I'll update feature/wcoss2_spack_stack to the current head of GDASApp develop and commit spack-stack changes. I'll open GDASApp and g-w PRs once g-w CI successfully completes on Cactus.

@CoryMartin-NOAA
Copy link
Contributor

Thanks @RussTreadon-NOAA

@RussTreadon-NOAA
Copy link
Contributor Author

WCOSS2 g-w CI

Install g-w develop at 673470a on Cactus with changes to sorc/gdas.cd from feature/wcoss2_spack-stack at 0ecc40b.

Run the following g-w CI configurations on Cactus with the following results:

/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48_ATM_test
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103231200        Done    Jan 08 2025 16:52:15    Jan 08 2025 19:42:36
202103231800        Done    Jan 08 2025 16:52:15    Jan 08 2025 19:59:12
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48mx500_3DVarAOWCDA_test
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103241800        Done    Jan 08 2025 16:52:17    Jan 08 2025 17:10:22
202103250000        Done    Jan 08 2025 16:52:17    Jan 08 2025 18:05:27
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48mx500_hybAOWCDA_test
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103241800        Done    Jan 08 2025 16:52:18    Jan 08 2025 17:10:28
202103250000        Done    Jan 08 2025 16:52:18    Jan 08 2025 18:10:25
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48_S2SWA_gefs_test
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103231200        Done    Jan 08 2025 16:52:23    Jan 08 2025 19:10:50
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C48_S2SW_test
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103231200        Done    Jan 08 2025 16:52:26    Jan 08 2025 18:33:43
202103231800        Done    Jan 08 2025 16:52:26    Jan 08 2025 18:40:33
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96_atm3DVar_test
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201800        Done    Jan 08 2025 16:52:29    Jan 08 2025 17:10:38
202112210000        Done    Jan 08 2025 16:52:29    Jan 08 2025 19:15:42
202112210600        Done    Jan 08 2025 16:52:29    Jan 08 2025 18:55:32
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_hybatmaerosnowDA_test
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201200        Done    Jan 08 2025 16:52:31    Jan 08 2025 17:15:47
202112201800        Done    Jan 08 2025 16:52:31    Jan 08 2025 19:15:46
202112210000        Done    Jan 08 2025 16:52:31    Jan 08 2025 19:09:21
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_hybatmDA_test
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202112201800        Done    Jan 08 2025 16:52:33    Jan 08 2025 17:10:48
202112210000        Done    Jan 08 2025 16:52:33    Jan 08 2025 19:15:50
202112210600        Done    Jan 08 2025 16:52:33    Jan 08 2025 19:05:36
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_ufs_hybatmDA_test
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202402231800        Done    Jan 08 2025 16:52:37    Jan 08 2025 17:10:53
202402240000        Done    Jan 08 2025 16:52:37    Jan 08 2025 19:57:12
202402240600        Done    Jan 08 2025 16:52:37    Jan 08 2025 20:10:37
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96_S2SWA_gefs_replay_ics_test
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202011010000        Done    Jan 08 2025 16:52:40    Jan 08 2025 17:57:08

All jobs in each configuration successfully ran to completion.

The following g-w CI configurations use components from GDASApp

  • C48mx500_3DVarAOWCDA
  • C48mx500_hybAOWCDA
  • C96C48_hybatmaerosnowDA
  • C96C48_ufs_hybatmDA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants