Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The global workflow WCDA test is failing on WCOSS #1331

Closed
Tracked by #1342
guillaumevernieres opened this issue Oct 14, 2024 · 13 comments · Fixed by #1435
Closed
Tracked by #1342

The global workflow WCDA test is failing on WCOSS #1331

guillaumevernieres opened this issue Oct 14, 2024 · 13 comments · Fixed by #1435
Assignees
Labels

Comments

@guillaumevernieres
Copy link
Contributor

Make it work!

@AndrewEichmann-NOAA
Copy link
Collaborator

What's the priority of this?

@guillaumevernieres
Copy link
Contributor Author

What's the priority of this?

high, next in line after you are done with the ingestion of @givelberg 's new obs.

@RussTreadon-NOAA
Copy link
Contributor

Cactus spack-stack/1.6.0 test

@DavidHuber-NOAA pointed me at a test installation of spack-stack/1.6.0 on Cactus. Modify GDASApp modulefiles/GDAS/wcoss2.intel.lua to use spack-stack/1.6.0. Successfully build GDASApp. Set up and run C48mx500_3DVarAOWCDA in PSLOT = prwcda_pr2978. As reported in GDASApp issue #1336 all marine DA jobs successfully ran to completion

202103241800       gdas_prepoceanobs                   159595816           SUCCEEDED                   0         1         183.0
202103241800      gdas_marineanlinit                   159596100           SUCCEEDED                   0         1          30.0
202103241800         gdas_marinebmat                   159595817           SUCCEEDED                   0         1          53.0
202103241800       gdas_marineanlvar                   159596602           SUCCEEDED                   0         1          80.0
202103241800     gdas_marineanlchkpt                   159596888           SUCCEEDED                   0         1          42.0
202103241800     gdas_marineanlfinal                   159597178           SUCCEEDED                   0         1          33.0

Tagging @guillaumevernieres and @AndrewEichmann-NOAA for awareness.

@RussTreadon-NOAA
Copy link
Contributor

g-w issue #3039 opened to request missing GDA files be copied from Hera to WCOSS2

@RussTreadon-NOAA
Copy link
Contributor

FYI: It was necessary to edit g-w env/WCOSS2.env in order for all marine DA jobs to run.

--- a/env/WCOSS2.env
+++ b/env/WCOSS2.env
@@ -109,17 +109,15 @@ elif [[ "${step}" = "marinebmat" ]]; then
     export APRUNCFP="${launcher} -n \$ncmd --multi-prog"
     export APRUN_MARINEBMAT="${APRUN_default}"
 
-elif [[ "${step}" = "ocnanalrun" ]]; then
+elif [[ "${step}" = "marineanlvar" ]]; then
 
     export APRUNCFP="${launcher} -n \$ncmd --multi-prog"
+    export APRUN_MARINEANLVAR="${APRUN_default}"
 
-    export APRUN_OCNANAL="${APRUN_default}"
-
-elif [[ "${step}" = "ocnanalchkpt" ]]; then
+elif [[ "${step}" = "marineanlchkpt" ]]; then
 
     export APRUNCFP="${launcher} -n \$ncmd --multi-prog"
-
-    export APRUN_OCNANAL="${APRUN_default}"
+    export APRUN_MARINEANLCHKPT="${APRUN_default}"
 
 elif [[ "${step}" = "ocnanalecen" ]]; then

@guillaumevernieres
Copy link
Contributor Author

Thanks @RussTreadon-NOAA . We haven't yet tested or implemented anything marine DA related on WCOSS2 yet. @AndrewEichmann-NOAA is going to start on this within the next few days.

@RussTreadon-NOAA
Copy link
Contributor

@guillaumevernieres and @AndrewEichmann-NOAA , we could be proactive with g-w PR #2944 and replace ocnanalrun and ocnanalchkpt with their marine counterparts in WCOSS2.env. This brings WCOSS2.env in line with what we have in other ${machine}.env files.

@AndrewEichmann-NOAA
Copy link
Collaborator

@RussTreadon-NOAA I'd suggest that it would probably go through review faster on its own.

@RussTreadon-NOAA
Copy link
Contributor

g-w issue #3039 has been closed. Given this, the C48mx500_3DVarAOWCDA g-w CI successfully ran to completion.

 /lfs/h2/emc/ptmp/russ.treadon/EXPDIR/prwcda_pr2978
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202103241200        Done    Oct 28 2024 17:22:03    Oct 28 2024 17:37:32
202103241800        Done    Oct 28 2024 17:22:03    Oct 29 2024 14:17:48

As a reminder two changes are required before we can activate C48mx500_3DVarAOWCDA on WCOSS2

  1. update env/WCOSS2.env as noted above
  2. build WCOSS2 GDASApp using spack-stack/1.6.0 or, if avaialble, a later spack-stack version.

@RussTreadon-NOAA
Copy link
Contributor

This issue can be closed once both of the above items are resolved.

@AndrewEichmann-NOAA
Copy link
Collaborator

When trying to build GDASApp with the gw-ci tests, I get the following (with an otherwise unmodified CMakeCache.txt:

-- ---------------------------------------------------------
-- Adding bundle project gsibec
-- ---------------------------------------------------------
-- [gsibec] (1.2.1)
-- Feature TESTS enabled
-- Could NOT find MPI_Fortran (missing: MPI_Fortran_WORKS) 
CMake Error at /apps/spack/cmake/3.20.2/intel/19.1.3.304/utnbptm3hrf7gppztidueu4jogfgemut/share/cmake-3.20/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_Fortran_FOUND Fortran)
Call Stack (most recent call first):
  /apps/spack/cmake/3.20.2/intel/19.1.3.304/utnbptm3hrf7gppztidueu4jogfgemut/share/cmake-3.20/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /apps/spack/cmake/3.20.2/intel/19.1.3.304/utnbptm3hrf7gppztidueu4jogfgemut/share/cmake-3.20/Modules/FindMPI.cmake:1742 (find_package_handle_standard_args)
  gsibec/CMakeLists.txt:37 (find_package)

module list returns the following:

Currently Loaded Modules:
  1) PrgEnv-intel/8.2.0   6) intel/19.1.3.304  11) boost/1.79.0      16) qhull/2020.2   21) nco/5.0.6
  2) cmake/3.20.2         7) hdf5/1.10.6       12) gsl-lite/v0.40.0  17) eckit/1.24.4   22) gsl/2.7
  3) craype/2.7.17        8) netcdf/4.7.4      13) sp/2.4.0          18) fckit/0.11.0   23) prod_util/2.0.14
  4) cray-pals/1.2.2      9) udunits/2.2.28    14) python/3.8.6      19) atlas/0.35.0   24) bufr/12.0.1
  5) git/2.29.0          10) eigen/3.4.0       15) ecbuild/3.7.0     20) nccmp/1.8.9.0  25) GDAS/wcoss2.intel

which is considerably a considerably smaller list than on Hera, and this after running module use /lfs/h2/emc/da/noscrub/andrew.eichmann/gw/develop/global-workflow/sorc/gdas.cd/modulefiles/ ; module load GDAS/wcoss2. Is that expected?

@RussTreadon-NOAA
Copy link
Contributor

@AndrewEichmann-NOAA, I cloned g-w develop at 5bb3f86 with GDASApp develop at 34f894f in sorc/gdas.cd. Execution of build_all.sh -uk triggered the GDASApp build. logs/build_gdas.log lists more modules that you

Currently Loaded Modules:
  1) craype-x86-rome     (H)   8) cray-pals/1.2.2    15) eigen/3.4.0       22) eckit/1.24.4      29) bufr/12.0.1
  2) libfabric/1.11.0.0. (H)   9) git/2.29.0         16) boost/1.79.0      23) fckit/0.11.0      30) fms-C/2023.04
  3) craype-network-ofi  (H)  10) intel/19.1.3.304   17) gsl-lite/v0.40.0  24) atlas/0.35.0      31) esmf-C/8.6.0
  4) envvar/1.0               11) cray-mpich/8.1.12  18) sp/2.4.0          25) nccmp/1.8.9.0     32) GDAS/wcoss2.intel
  5) PrgEnv-intel/8.2.0       12) hdf5/1.10.6        19) python/3.8.6      26) nco/5.0.6
  6) cmake/3.20.2             13) netcdf/4.7.4       20) ecbuild/3.7.0     27) gsl/2.7
  7) craype/2.7.17            14) udunits/2.2.28     21) qhull/2020.2      28) prod_util/2.0.14

The GDASApp cmake configure completed and now source code is being compiled. This test is being performed in /lfs/h2/emc/da/noscrub/russ.treadon/git/global-workflow/test on Cactus.

@RussTreadon-NOAA
Copy link
Contributor

Update
g-w CI C48mx500_3DVarAOWCDA successfully runs to completion on WCOSS2 (Dogwood) using g-w develop at 7ff942e IF GDASApp is built using a test version of spack-stack/1.6.0.

Once the official release of spack-stack is installed on WCOSS2 we can update modulefiles/GDAS/wcoss2.intel.lua to use spack-stack. g-w CI C48mx500_3DVarAOWCDA can then be activated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants