Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade spack-stack from version 1.5.1 to 1.6.0 #2093

Merged
merged 58 commits into from
Jun 5, 2024

Conversation

RatkoVasic-NOAA
Copy link
Collaborator

@RatkoVasic-NOAA RatkoVasic-NOAA commented Jan 16, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full RT suite (compared to current baselines) on either Hera/Derecho/Hercules AND have committed the log to my PR branch.
  • Add list of all failed regression tests in "Regression Tests" section.

PR Information

Change modulefiles to upgrade spack-stack libraries from version 1.5.1 to 1.6.0

Description

Change modulefiles to upgrade spack-stack libraries from version 1.5.1 to 1.6.0

Commit Message

Solves issue #2091

  • UFSWM:
    • Updates modulefiles to spack-stack version 1.6.0
    • Updates Hera GNU version to 13.3

Priority

  • Critical Bugfix (This PR contains a critical bug fix and should be prioritized.)
  • High (This PR contains a feature or fix needed for a time-sensitive project (eg, retrospectives, implementations))
  • Normal

Blocking Dependencies

Git Issues Fixed By This PR

#2091

Changes

Subcomponent (with links)

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Input data

  • No changes are expected to input data.
  • Changes are expected to input data:
    • New input data.
    • Updated input data.

Regression Tests:

  • No changes are expected to any regression test.
  • Changes are expected to the following tests:

Libraries

  • Not Needed
  • Needed
    • Create separate issue in JCSDA/spack-stack asking for update to library. Include library name, library version.
    • Add issue link from JCSDA/spack-stack following this item

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

@RatkoVasic-NOAA RatkoVasic-NOAA self-assigned this Jan 16, 2024
@RatkoVasic-NOAA
Copy link
Collaborator Author

RatkoVasic-NOAA commented Jan 16, 2024

For now

  • Hera(Intel & GNU)
  • Orion (Intel)
  • Hercules (Intel & GNU)
  • Gaea-C5 (Intel)
  • Jet (Intel)

are done.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Jan 17, 2024

Should we bring in the change of NOAA-GFDL/GFDL_atmos_cubed_sphere#299 ?

@zach1221
Copy link
Collaborator

Hi, @RatkoVasic-NOAA . I'm doing some preliminary testing against this PR, across the HPCs. Can you sync up your branch for me, please?

@RatkoVasic-NOAA
Copy link
Collaborator Author

Hi, @RatkoVasic-NOAA . I'm doing some preliminary testing against this PR, across the HPCs. Can you sync up your branch for me, please?

Done

@junwang-noaa
Copy link
Collaborator

@Hang-Lei-NOAA would you please let us know the module files to use on acorn for spack-stack 1.6.0 and also the hpc-stack version for wcoss2? We can test the hpc-stack version on acorn. Thanks

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Jan 24, 2024 via email

@DavidHuber-NOAA
Copy link
Collaborator

@RatkoVasic-NOAA I have a draft PR to upgrade the UPP to spack-stack 1.6.0 (NOAA-EMC/UPP#856). There are a few more updates to go in, but if it is successful, could we update the UPP hash?

@RatkoVasic-NOAA
Copy link
Collaborator Author

@DavidHuber-NOAA great! You can make PR to this PR with UPP changes.

@JiliDong-NOAA
Copy link
Contributor

I have a PR to update UPP hash for fix on missing reflectivity when using inline post

#2113

If there is a plan to update UPP in this PR soon, I will close mine. There are three RTs failed from the UPP update. These failed RTs are expected with the baseline changes from:

  1. REFZI and REFZR are switched to missing values as expected, which is similar to what Thompson microphysics scheme does
  2. missing value are handled differently for reflectivity when switching to the correct block: previousy missing values will be reset to dbzmin and they are now kept as is.

@DavidHuber-NOAA
Copy link
Collaborator

@JiliDong-NOAA Yes, I am planning on pushing an updated hash for the UPP shortly to capture its upgrade to spack-stack v1.6.0.

@JiliDong-NOAA
Copy link
Contributor

@JiliDong-NOAA Yes, I am planning on pushing an updated hash for the UPP shortly to capture its upgrade to spack-stack v1.6.0.

Sounds good. Thanks.

@souopgui
Copy link

souopgui commented Feb 2, 2024

I ran the RT on S4 with BL_DATE=20240111;
RegressionTests_s4.log

You can ignore cpld_control_gfsv17_iau_intel and cpld_control_c48_intel, they also failed in the baseline, I have not investigated yet.

The following two failed:

  • rap_control_dyn64_phy32_debug_intel
  • cpld_control_nowave_noaero_p8_intel

@DeniseWorthen
Copy link
Collaborator

Where are we on this PR? As far as I know, we can't run coupled GNU tests on Hercules until we move to 1.6, right? Since Hercules is our only other GNU platform, this means all developer testing needs to happen on Hera, which is very slow right now.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Feb 6, 2024

Can we keep updating the check list at #2091 ?

@zach1221
Copy link
Collaborator

@RatkoVasic-NOAA I tested the 1.6.0 installation on Gaea-C5, and all the tests passed, using the installation path (/ncrc/proj/epic/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core). Could you sync up this branch, so that is reflected in the gaea modulefile?

@DeniseWorthen
Copy link
Collaborator

  • Modifying the MPI_Init_thread calls in two routines: ./WW3/model/src/ww3_multi.F90 and ./WW3/model/src/ww3_shel.F90
    to CALL MPI_INIT_THREAD( MPI_THREAD_SERIALIZED, THRLEV, IERR_MPI)
    from CALL MPI_INIT_THREAD( MPI_THREAD_FUNNELED, THRLEV, IERR_MPI)

These two programs are used for the standalone ww3 and are unused for the coupled model.

@natalie-perlin
Copy link
Collaborator

  • Modifying the MPI_Init_thread calls in two routines: ./WW3/model/src/ww3_multi.F90 and ./WW3/model/src/ww3_shel.F90
    to CALL MPI_INIT_THREAD( MPI_THREAD_SERIALIZED, THRLEV, IERR_MPI)
    from CALL MPI_INIT_THREAD( MPI_THREAD_FUNNELED, THRLEV, IERR_MPI)

These two programs are used for the standalone ww3 and are unused for the coupled model.

This is great to know, thank you , Denise, that's very important to know!! (in case my suspicions are valid)

@zach1221
Copy link
Collaborator

zach1221 commented Jun 5, 2024

Looks like @natalie-perlin was right, cpld_debug_p8_gnu and cpld_control_p8_gnu are now passing on hercules but failing on hera. I'm getting the same "OSC pt2pt component does not support MPI_THREAD_MULTIPLE" error message now.
/scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_RT/rt_3687442/cpld_debug_p8_gnu/err

The rest of the new hera gnu baselines are completed and final matching should be finished soon.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jun 5, 2024

I believe this must be an issue related to MAPL, since all non-aerosol cpld and datm tests are working for hera+gnu. I took @zach1221 run directory, turned off CHM and it is now running.

@zach1221
Copy link
Collaborator

zach1221 commented Jun 5, 2024

I added the two cases to issue 2263, just in case.

@DeniseWorthen
Copy link
Collaborator

@zach1221 I'm confused about the reference to 2263. I just ran the control_c48_gnu on Hera using the SS16 branch and it does not hang. Did you find that the control_c48_gnu test on Hera was still failing? It seems to me that the only Hera/GNU tests which are still failing are those which include aerosols (MAPL and the like).

@zach1221
Copy link
Collaborator

zach1221 commented Jun 5, 2024

@zach1221 I'm confused about the reference to 2263. I just ran the control_c48_gnu on Hera using the SS16 branch and it does not hang. Did you find that the control_c48_gnu test on Hera was still failing? It seems to me that the only Hera/GNU tests which are still failing are those which include aerosols (MAPL and the like).

I've updated the title to remove reference to the c48 case, as it is passing.

@zach1221
Copy link
Collaborator

zach1221 commented Jun 5, 2024

I've turned off the cpld_debug_p8 and cpld_control_p8 gnu cases on Hera, in rt.conf. So, we may proceed with the PR, as they're running and passing on Hercules. Natalie and I can continue to look into the issue on Hera.

@natalie-perlin
Copy link
Collaborator

/scratch1/NCEPDEV/stmp2/Zachary.Shrader/FV3_RT/rt_3687442/cpld_debug_p8_gnu/err

Sounds good.

What is CHM that @DeniseWorthen mentioned?

@natalie-perlin
Copy link
Collaborator

I've turned off the cpld_debug_p8 and cpld_control_p8 gnu cases on Hera, in rt.conf. So, we may proceed with the PR, as they're running and passing on Hercules. Natalie and I can continue to look into the issue on Hera.

@zach1221 - It would be very helpful to see the logs for these tests on Hercules - please let me know if there are working directories to take a look!

@zach1221
Copy link
Collaborator

zach1221 commented Jun 5, 2024

I've turned off the cpld_debug_p8 and cpld_control_p8 gnu cases on Hera, in rt.conf. So, we may proceed with the PR, as they're running and passing on Hercules. Natalie and I can continue to look into the issue on Hera.

@zach1221 - It would be very helpful to see the logs for these tests on Hercules - please let me know if there are working directories to take a look!

@natalie-perlin sure here is another run directory.
/work2/noaa/stmp/zshrader/stmp/zshrader/FV3_RT/rt_2847879

@zach1221 zach1221 merged commit 485ccdf into ufs-community:develop Jun 5, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.