Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFL violations and issues plotting output from changes made in PR #65 (3a306a4) for RRFS_CONUS_25km grid #82

Open
MichaelLueken opened this issue Jun 14, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@MichaelLueken
Copy link

Description

While updating the UFS-WM to the latest hash (e403bb4), the SRW App's WE2E tests started failing with the following error message:

FATAL from PE 1: compute_qs: saturation vapor pressure table overflow, nbad= 1

Ultimately, we were able to get around this issue by decreasing DT_ATMOS from 180 to 150. This change caused the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot WE2E test fail. To make this test work again, DT_ATMOS was set to 180 for this specific test.

Trying to identify the issue that led to the failures, I attempted to change the cq parameter values in mfpbltq.f, mfscuq.f, samfdeepcnv.f, samfshalcnv.f, and satmedmfvdifq.f from 1.0 back to 1.3. Changing this value in samfdeepcnv.f allowed the RRFS_CONUS_25km tests to run using the original DT_ATMOS value of 180. Further, this change corrected the issue seen in the plotting WE2E test.

Unfortunately, I'm not familiar enough with CCPP to know what the cq parameter is used for. Is there a reason that it was reduced from 1.3 to 1.0 in the noted routines as part of PR #65? Would it be possible to set this value back to 1.3 for samfdeepcnv.f, or maybe add a namelist variable so that the value can be set at the application level?

Tagging @grantfirl, @JongilHan66, and @Qingfu-Liu since these individuals are either the PR owner or worked closely with the changes made in PR #65 for HR2.

Steps to Reproduce

  1. Clone the SRW App on Hera: git clone git@github.com:ufs-community/ufs-srweather-app.git
  2. cd ufs-srweather-app
  3. ./manage_externals/checkout_externals
  4. ./devbuild.sh -p=hera
  5. module use $PWD/modulefiles
  6. module load wflow_hera
  7. conda activate workflow_tools
  8. vi ush/predef_grid_params.yaml find RRFS_CONUS_25km and set DT_ATMOS from 150 to 180
  9. cd tests/WE2E
  10. ./run_WE2E_tests.py -t= grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 -m=hera -a=<insert account here>
  11. See the noted error message in the description in the log/run_fcst* log file. A copy of the error in the log file has been added to the end of this issue as well.

Additional Context

  • Issues were encountered on UCAR's Cheyenne (with Intel compilers) and Hera (both Intel and GNU).
  • Cheyenne's Intel compiler used is 2022.1, Hera's Intel compiler used is 2022.1.2, and Hera's GNU compiler used is 9.2.0.
  • The test noted above uses the FV3_GFS_v15p2 SDF, while the noted failure of the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot test uses the FV3_GFS_v17_p8 SDF.
  • The issues became apparent in PR PBL and Convection and Microphysics update for HR2 #65 in the ccpp-physics repo, PR #1731 in the ufs-weather-model repo (the PR that brought the changes made in PR PBL and Convection and Microphysics update for HR2 #65 into the UFS-WM), and PR #799 in the SRW App, where the UFS-WM hash was updated.

Output

grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_run_fcst_log
@MichaelLueken MichaelLueken added the bug Something isn't working label Jun 14, 2023
@JongilHan66
Copy link
Collaborator

cq=1.3 implies that entrainment rates in updrafts for moisture and tracers are about 30% larger than that in temperature. The reason for cq=1.3 is to increase CAPE by reducing mass flux transport for moisture and tracers. I changed it back to cq=1.0 (which is the value in the current operational GFSv16) because of lack of physical justification for cq=1.3. I don't think the change of the cq value can cause a numerical instability.

@RatkoVasic-NOAA
Copy link

I ran couple of tests:

  1. compiled with DEBUG options. It failed immediately in post_fv3.F90, line 4645
  2. turned off inline post, it run successfully all 6 hours (still in DEBUG mode).
  3. compiled back with optimization (inline post OFF), and it failed same way after 60 steps in compute_qs

I doubt that it is up to CFL, rather this longer step revealed some other underlaying bug. We might have also two things here: one for QS and other is why it failed in post:

11: forrtl: error (75): floating point exception
11: Image              PC                Routine            Line        Source
11: ufs_model          00000000096573AB  Unknown               Unknown  Unknown
11: libpthread-2.17.s  00002B3EB88FF630  Unknown               Unknown  Unknown
11: ufs_model          000000000227A6B1  post_fv3_mp_set_p        4645  post_fv3.F90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants