Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GFSv17 IAU restart tests #2306

Open
junwang-noaa opened this issue Jun 3, 2024 · 17 comments · May be fixed by #2404
Open

Add GFSv17 IAU restart tests #2306

junwang-noaa opened this issue Jun 3, 2024 · 17 comments · May be fixed by #2404
Assignees
Labels
enhancement New feature or request

Comments

@junwang-noaa
Copy link
Collaborator

Description

GFSv17 is running with IAU turned on, the model should be able to restart with that configuration. A GFSv17 IAU restart test (cpld_restart_control_gfsv17_iau) based on cpld_control_gfsv17_iau needs to be added in RT.

Solution

Alternatives

Related to

@junwang-noaa junwang-noaa added the enhancement New feature or request label Jun 3, 2024
@junwang-noaa
Copy link
Collaborator Author

The RT test cpld_control_gfsv17_iau in ufs-weather-model generates consistent restart files for all the components. The test starts ar 2021032206 with fhrot=3 and total forecast length 30hr, the model produces forecast results from 2021032212 to 2021032312 (2021-03-23-43200). Below is the list of restart files:

ufs.cpld.ww3.r.2021-03-23-43200

RESTART/
20210323.120000.MOM.res.nc  
         
20210323.120000.fv_core.res.tile6.nc     20210323.120000.phy_data.tile1.nc
20210323.120000.ca_data.tile1.nc      20210323.120000.fv_srf_wnd.res.tile1.nc  20210323.120000.phy_data.tile2.nc
20210323.120000.ca_data.tile2.nc      20210323.120000.fv_srf_wnd.res.tile2.nc  20210323.120000.phy_data.tile3.nc
20210323.120000.ca_data.tile3.nc      20210323.120000.fv_srf_wnd.res.tile3.nc  20210323.120000.phy_data.tile4.nc
20210323.120000.ca_data.tile4.nc      20210323.120000.fv_srf_wnd.res.tile4.nc  20210323.120000.phy_data.tile5.nc
20210323.120000.ca_data.tile5.nc      20210323.120000.fv_srf_wnd.res.tile5.nc  20210323.120000.phy_data.tile6.nc
20210323.120000.ca_data.tile6.nc      20210323.120000.fv_srf_wnd.res.tile6.nc  20210323.120000.sfc_data.tile1.nc
20210323.120000.fv_core.res.nc        20210323.120000.fv_tracer.res.tile1.nc   20210323.120000.sfc_data.tile2.nc
20210323.120000.fv_core.res.tile1.nc  20210323.120000.fv_tracer.res.tile2.nc   20210323.120000.sfc_data.tile3.nc
20210323.120000.fv_core.res.tile2.nc  20210323.120000.fv_tracer.res.tile3.nc   20210323.120000.sfc_data.tile4.nc
20210323.120000.fv_core.res.tile3.nc  20210323.120000.fv_tracer.res.tile4.nc   20210323.120000.sfc_data.tile5.nc
20210323.120000.fv_core.res.tile4.nc  20210323.120000.fv_tracer.res.tile5.nc   20210323.120000.sfc_data.tile6.nc
20210323.120000.fv_core.res.tile5.nc  20210323.120000.fv_tracer.res.tile6.nc

ufs.cpld.cpl.r.2021-03-23-43200.nc

iced.2021-03-23-43200.nc

A test run directory can be seen on hera at:
/scratch1/NCEPDEV/stmp2/Jun.Wang/FV3_RT/rt_906282/cpld_control_gfsv17_iau_intel

@junwang-noaa
Copy link
Collaborator Author

@aerorahul FYI.

@junwang-noaa
Copy link
Collaborator Author

I checked the IAU test generated from global-workflow, the test case /scratch1/NCEPDEV/climate/Jessica.Meixner/IAUcheckpointrestarts/iau01/TMP/RUNDIRS/testiau/gfsfcst.2021032418/fcst.1642156

In the run directory we have the following in restart/FV3_RESTART:

[Jun.Wang@hfe01 RESTART]$ ls 20210329.180000*
20210329.180000.ca_data.tile1.nc      20210329.180000.fv_core.res.tile6.nc     20210329.180000.phy_data.tile1.nc
20210329.180000.ca_data.tile2.nc      20210329.180000.fv_srf_wnd.res.tile1.nc  20210329.180000.phy_data.tile2.nc
20210329.180000.ca_data.tile3.nc      20210329.180000.fv_srf_wnd.res.tile2.nc  20210329.180000.phy_data.tile3.nc
20210329.180000.ca_data.tile4.nc      20210329.180000.fv_srf_wnd.res.tile3.nc  20210329.180000.phy_data.tile4.nc
20210329.180000.ca_data.tile5.nc      20210329.180000.fv_srf_wnd.res.tile4.nc  20210329.180000.phy_data.tile5.nc
20210329.180000.ca_data.tile6.nc      20210329.180000.fv_srf_wnd.res.tile5.nc  20210329.180000.phy_data.tile6.nc
20210329.180000.coupler.res           20210329.180000.fv_srf_wnd.res.tile6.nc  20210329.180000.sfc_data.tile1.nc
20210329.180000.fv_core.res.nc        20210329.180000.fv_tracer.res.tile1.nc   20210329.180000.sfc_data.tile2.nc
20210329.180000.fv_core.res.tile1.nc  20210329.180000.fv_tracer.res.tile2.nc   20210329.180000.sfc_data.tile3.nc
20210329.180000.fv_core.res.tile2.nc  20210329.180000.fv_tracer.res.tile3.nc   20210329.180000.sfc_data.tile4.nc
20210329.180000.fv_core.res.tile3.nc  20210329.180000.fv_tracer.res.tile4.nc   20210329.180000.sfc_data.tile5.nc
20210329.180000.fv_core.res.tile4.nc  20210329.180000.fv_tracer.res.tile5.nc   20210329.180000.sfc_data.tile6.nc
20210329.180000.fv_core.res.tile5.nc  20210329.180000.fv_tracer.res.tile6.nc

in restart/MOM6_RESTART:

20210329.150000.MOM.res.nc

in restart/CICE_RESTART

cice_model.res.2021-03-29-54000.nc

in restart/CMEPS_RESTART

 ufs.cpld.cpl.r.2021-03-29-54000.nc

To make consistent restart files, we need to change the following line from:
in ufs.configure:

restart_n = 12

in model_configure:

restart_interval:        12 24 36 48 60 72 84 96 108 120

to:
in ufs.configure:

restart_n = 15

in model_configure:

restart_interval:        12 27 42 57 72 87 102 117

At this time, the atmosphere history files may not reproduce when the model restarts at fh=27, 57,87 and 117. New updates are required to let marine/CMEPS components to have flexible restart file output time, that will completely resolve the issue.

@junwang-noaa
Copy link
Collaborator Author

@aerorahul FYI.

@aerorahul
Copy link
Contributor

@junwang-noaa Thanks.
I think this is straightforward and do-able.
I might have questions about what those should be when doing a restart from failure or a second segment. I can ask them elsewhere.

@junwang-noaa
Copy link
Collaborator Author

junwang-noaa commented Aug 1, 2024

A GFSv17 IAU restart test cpld_restart_gfsv17_iau is added. Unfortunately the test does not reproduce the results from cpld_control_gfsv17_iau control test. Further debugging shows that the test reproduces in atmonly only mode and in s2s (atm-ocn-ice), but in coupled gfsv17 mode (atm-ocn-ice-wav). In the coupled gfsv17 mode, the coupling fields (e.g. wavImp_Sw_pstokes, wavImp_Sw_z0) that WW3 sends to ocn and atm do not reproduce in control run and restart run.

[Jun.Wang@hfe12 cpld_control_gfsv17_iau_intel_rst]$ nccmp -d ufs.cpld.cpl.hi.atm.2021-03-22-68400.nc ../cpld_control_gfsv17_iau_intel/ufs.cpld.cpl.hi.atm.2021-03-22-68400.nc
[Jun.Wang@hfe12 cpld_control_gfsv17_iau_intel_rst]$ nccmp -d ufs.cpld.cpl.hi.ice.2021-03-22-68400.nc ../cpld_control_gfsv17_iau_intel/ufs.cpld.cpl.hi.ice.2021-03-22-68400.nc
[Jun.Wang@hfe12 cpld_control_gfsv17_iau_intel_rst]$ nccmp -d ufs.cpld.cpl.hi.ocn.2021-03-22-68400.nc ../cpld_control_gfsv17_iau_intel/ufs.cpld.cpl.hi.ocn.2021-03-22-68400.nc
[Jun.Wang@hfe12 cpld_control_gfsv17_iau_intel_rst]$ nccmp -d ufs.cpld.cpl.hi.wav.2021-03-22-68400.nc ../cpld_control_gfsv17_iau_intel/ufs.cpld.cpl.hi.wav.2021-03-22-68400.nc
DIFFER : VARIABLE : wavImp_Sw_pstokes_x1 : POSITION : [0,0,0] : VALUES : -1.37339e-16 <> -1.37338e-16

for field wavImp_Sw_z0 that ww3 is sent to atm, it shows:

108c108
<     1.97232438949868e-05, 1.04722093965393e-05, 9.44941621128237e-06,
---
>     1.97232475329656e-05, 1.04722093965393e-05, 9.44941621128237e-06,
110,118c110,118
<     2.25347548621357e-06, 1.74139472619572e-06, 1.11799658952805e-06,
<     4.90223101223819e-05, 2.61735185631551e-05, 5.37803043698659e-06,
---
>     2.25347548621357e-06, 1.74139449882205e-06, 1.11799658952805e-06,
>     4.90223101223819e-05, 2.61735185631551e-05, 5.37802816324984e-06,
...

@junwang-noaa
Copy link
Collaborator Author

The two run directory is at:
/scratch1/NCEPDEV/stmp2/Jun.Wang/FV3_RT/rt_3917402/cpld_control_gfsv17_iau_intel
/scratch1/NCEPDEV/stmp2/Jun.Wang/FV3_RT/rt_3917402/cpld_control_gfsv17_iau_intel_rst

The coupling fields in ufs.cpld.cpl.hi.wav.2021-03-22-68400.nc show that the import fields going to ww3 are identical in the control and restart run, but the coupled fields coming out of ww3 are different.

@MatthewMasarik-NOAA would you please take a look? Thanks

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @junwang-noaa, thank you for letting us know about this issue. We've added this to our task list and are in the process of prioritizing tasking for our group given our staffing for the next few months. I'll post back again shortly.

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @junwang-noaa, I spoke with @sbanihash last week, who is now on leave through August. She advised to wait until she returned from leave for one of us to look into it.

@junwang-noaa
Copy link
Collaborator Author

@MatthewMasarik-NOAA Thanks for letting me know.

@sbanihash
Copy link

@junwang-noaa just wanted to let you know that we still haven't had time to dig into this but it is on our to-do list. Thank you for your patience.

@sbanihash
Copy link

@junwang-noaa Wanted to let you know that I haven't had time to dig into this yet. I know you were also asking for any updates in the Monday meeting, so wanted to let you know that it is on my list, but haven't had time to get to it yet. Thanks for your patience.

@JessicaMeixner-NOAA
Copy link
Collaborator

Trying to assess the issues here. I've tried using #2404 but am having trouble running the tests after trying to update. Is there something else I should be using for addressing this? i can use the workflow or something else if needed.

@NickSzapiro-NOAA
Copy link
Collaborator

@junwang-noaa, I've started looking at this and think I get to the same place with diffs first originating internal to WW3. I don't know if you were planning to continue looking at anything in particular. @JessicaMeixner-NOAA I'm not sure what was going wrong for you but this branch may be helpful:
https://github.com/NickSzapiro-NOAA/ufs-weather-model/tree/gfsv17_iau_restart with the my_rt.conf

In short, cpld_control_gfsv17 restart reproduces control but IAU does not from FHROT=18. This is particularly confusing as I think IAUs are off before the IAU restart test even begins, right? If so, then this is just a normal restart test. Looking at the mediator history files written every coupling step, differences between cpld_control_gfsv17_iau_intel and cpld_restart_gfsv17_iau_intel first occur in:

ufs.cpld.cpl.hi.wav.2021-03-23-03600.nc then ufs.cpld.cpl.hi.atm.2021-03-23-04320.nc then ufs.cpld.cpl.hi.ice.2021-03-23-05040.nc then ufs.cpld.cpl.hi.ocn.2021-03-23-07200.nc

Note that ocn may not actually be last as it is in a 3600s slow coupling loop. Output is currently on hercules at /work2/noaa/stmp/nszapiro/stmp/nszapiro/FV3_RT/rt_2957580/

Via nccmp, we see that WAV receives the same variables b4b but sends different variables to CMEPS after the first hour of simulation:

nccmp -d -S -q -f -g -B --Attribute=checksum --warn=format /work2/noaa/stmp/nszapiro/stmp/nszapiro/FV3_RT/rt_2957580/cpld_control_gfsv17_iau_intel//ufs.cpld.cpl.hi.wav.2021-03-23-03600.nc /work2/noaa/stmp/nszapiro/stmp/nszapiro/FV3_RT/rt_2957580/cpld_restart_gfsv17_iau_intel//ufs.cpld.cpl.hi.wav.2021-03-23-03600.nc
...
Variable             Group  Count          Sum      AbsSum          Min         Max       Range         Mean      StdDev
wavImp_Sw_pstokes_x1 /     229696  0.000215089 0.000861298 -2.99793e-05 0.000114741  0.00014472  9.36408e-10 3.35939e-07
wavImp_Sw_pstokes_x2 /     193534 -0.000500737  0.00226996  -0.00027287 8.95131e-05 0.000362383 -2.58733e-09 8.18138e-07
wavImp_Sw_pstokes_x3 /     198004 -0.000461796   0.0012822 -0.000152376 3.84662e-05 0.000190842 -2.33226e-09 5.07879e-07
wavImp_Sw_pstokes_y1 /     227735 -5.37705e-05 0.000797042 -0.000145033 3.70741e-05 0.000182107  -2.3611e-10 3.37732e-07
wavImp_Sw_pstokes_y2 /     197392  0.000577232  0.00186699 -7.07683e-06 0.000204533  0.00021161  2.92429e-09 6.63731e-07
wavImp_Sw_pstokes_y3 /     202856  0.000329009 0.000921099 -1.06767e-05  0.00012587 0.000136547  1.62189e-09 3.44511e-07
wavImp_Sw_z0         /      35580  3.87704e-06 9.38305e-06 -2.00816e-07  2.4871e-06 2.68791e-06  1.08967e-10 1.49519e-08

For maps, I'm plotting scatter points where |diff|>1.E-16 . These diffs occur globally, e.g., ncdiff of wavImp_Sw_pstokes_x1_gl_ufs.cpld.cpl.hi.wav.2021-03-23-03600.nc:
Image
There are also suspicious diffs over land in this global, unstructured wave mesh., e.g., over Antarctica in ncdiff of wavImp_Sw_z0_sh_ufs.cpld.cpl.hi.wav.2021-03-23-03600.nc:
Image

I should say that I turned off DOPOST=.false. as it was segfaulting when making post_fname=./GFSPRS.GrbF15 with:

144:  3 0x00000000040d6005 fdlvl_()  /work/noaa/nems/nszapiro/tasks/gfsv17_iau_restart/ufs-weather-model/FV3/upp/sorc/ncep_post.fd/FDLVL.f:397
144:  4 0x0000000003fead1f miscln_()  /work/noaa/nems/nszapiro/tasks/gfsv17_iau_restart/ufs-weather-model/FV3/upp/sorc/ncep_post.fd/MISCLN.f:822
144:  5 0x0000000003ca3c12 process_()  /work/noaa/nems/nszapiro/tasks/gfsv17_iau_restart/ufs-weather-model/FV3/upp/sorc/ncep_post.fd/PROCESS.f:112
144:  6 0x0000000002cdc18f post_fv3_mp_post_run_fv3_()  /work/noaa/nems/nszapiro/tasks/gfsv17_iau_restart/ufs-weather-model/FV3/io/post_fv3.F90:206
144:  7 0x0000000002c9a962 module_wrt_grid_comp_mp_wrt_run_()  /work/noaa/nems/nszapiro/tasks/gfsv17_iau_restart/ufs-weather-model/FV3/io/module_wrt_grid_comp.F90:2036

Among the oddities, this restart test starts at fhrot=18 so I don't know why post is trying to make F15

Maybe it's helpful to share this update. I also wonder if 1) I understand the test correctly and 2) there are suggestions for where to dig in WW3

@junwang-noaa
Copy link
Collaborator Author

@NickSzapiro-NOAA Thanks for looking into this issue. I think you are getting the error that I had with the previous version. I don't have time to continue working on this issue. If you can share your branch that is updated to the latest ufs weather model develop branch with wave group, that will be great. Thanks

@JessicaMeixner-NOAA
Copy link
Collaborator

Thanks @NickSzapiro-NOAA - I'll try to circle back to this next week from your branch.

@JessicaMeixner-NOAA
Copy link
Collaborator

Just wanted to ping here and say I am starting to look at this now, no updates yet though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

6 participants