Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot test fails to run on Cheyenne GNU following the update to the ufs-weather-model hash #827

Closed
MichaelLueken opened this issue Jun 8, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@MichaelLueken
Copy link
Collaborator

MichaelLueken commented Jun 8, 2023

Expected behavior

Since the run_fcst and run_post steps successfully pass, the plot_allvars step should also successfully complete.

Current behavior

Following the update of the SRW App's ufs-weather-model hash to e403bb4 and decreasing the DT_ATMOS value from 180 to 150 to correct CFL violations in the specify_template_filenames, grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2, and GST_release_public_v1 WE2E tests, the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot test is now failing in the plot_allvars step. To move forward with updating the weather model hash, it was decided to force DT_ATMOS = 180 for this test.

Machines affected

Currently, this test has only been run on Cheyenne GNU. Additional machine and compiler options will be tested moving forward.

Steps To Reproduce

  1. Clone the authoritative develop branch on Cheyenne -
    git clone git@github.com:ufs-community/ufs-srweather-app.git
  2. Build the SRW App using the GNU compiler and submit the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot WE2E test -
    ./run_WE2E_tests.py -t=grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot -m=cheyenne -c=gnu -a=
  3. See that the test fails in the plot_allvars step.
plot_allvars_failure_message
@MichaelLueken MichaelLueken added the bug Something isn't working label Jun 8, 2023
@MichaelLueken MichaelLueken changed the title The grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot test fails to run on Cheyenne GNU following the update to the ufs-weather-modle hash The grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot test fails to run on Cheyenne GNU following the update to the ufs-weather-model hash Jun 8, 2023
@MichaelLueken
Copy link
Collaborator Author

The grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot test was ran using the Intel compiler on Cheyenne and successfully passed. While testing on other machines is still required, it looks like the issue might be due to the compiler used.

@natalie-perlin You were able to get the UFS-WM RTs to pass on Cheyenne using GNU 11.2.0. Are there any modifications that would need to be made to the allow the UFS-WM to be built with GNU 11.2.0, or were the modifications that were made to the HPC-stack build? Thanks!

@natalie-perlin
Copy link
Collaborator

@MichaelLueken - please note that the RTs passed on Cheyenne only using 10.1 or 12.1 compiler (more recent build and tests). All the RTs do no pass using 11.2 compiler. That was the reason the UFS-WM did not update the Cheyenne GNU to 11.2.

@MichaelLueken
Copy link
Collaborator Author

@natalie-perlin Thank you very much for the clarification! I will attempt to run the test using the 12.1 compiler and see if it passes with a more up-to-date compiler (and one the works with the UFS-WM).

@natalie-perlin
Copy link
Collaborator

natalie-perlin commented Jun 9, 2023

@MichaelLueken, few things noticed on Cheyenne:

  • compiling now fails with intel
  • compiling with gnu/11.2.0 goes with no issues
  • compiling with gnu/12.1.0 needs building openblas for this stack (will be done asap)

@MichaelLueken
Copy link
Collaborator Author

@natalie-perlin I was able to build the SRW App on Cheyenne Intel without issue using the current develop branch with:

./devbuild.sh -p=cheyenne -c=intel all

It looks like the gnu/12.1.0 stack still needs openblas to be built. I will comment out the openblas line for now and continue with testing the gnu/12.1.0 compiler.

@natalie-perlin
Copy link
Collaborator

@MichaelLueken - just have built the openblas on Cheyenne gnu/12.1.0, and the SRW
compiled successfully

@natalie-perlin
Copy link
Collaborator

Openblas is now made one of the modules installed as a part of the local copy of the hpc-stack, and could now be built quickly.

@natalie-perlin
Copy link
Collaborator

Fresh checkout of the SRW this afternoon successfully compiles with Intel on Cheyenne - problem resolved

@MichaelLueken
Copy link
Collaborator Author

The previous successful passes on Cheyenne GNU compiler versions 10.1.0 and 12.1.0 were incorrect. These tests still had the DT_ATMOS value set to 180, rather than 150. The different compiler versions still lead to a failure in the plot_allvars script.

Will now begin working through the changes in PR #1731, which is the PR that began causing issues for this test.

@MichaelLueken
Copy link
Collaborator Author

In sorc/ufs-weather-model/FV3/ccpp/physics/physics/samfdeepcnv.f, if the cq parameter is increased from 1.0 to 1.3, the value it was before the changes made in PR #1731, then both the plot_allvars script will run without an issue at DT_ATMOS = 150, but the tests that were failing at DT_ATMOS = 180 will pass with DT_ATMOS = 180 once again.

Will run additional tests to ensure that this doesn't break other RRFS_CONUS_25km tests and then will open an issue in the ccpp-physics repository to open a discussion on this and see if this value can be changed back (or if a namelist option can be added to high resolution grids to use 1.3 rather than 1.0).

@MichaelLueken
Copy link
Collaborator Author

Cheyenne is no longer available and this test is run on all platforms comprehensive tests and on Hercules for the coverage tests. Closing obsolete issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants