Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Verification upgrades and bug fixes #973

Merged
merged 73 commits into from
Jan 16, 2024

Conversation

gsketefian
Copy link
Collaborator

@gsketefian gsketefian commented Nov 15, 2023

DESCRIPTION OF CHANGES:

This PR cleans up and simplifies the verification tasks in the SRW App. Main changes:

  • For each METplus tool that is run on APCP, combine APCP01h and APCPgt01h METplus configuration (conf) files into one so that different behavior/code/scripts are not required for 01h vs. >01h.
  • For APCP01h verification, use the NetCDF files generated by the PcpCombine_[fcst|obs] tasks instead of the original GRIB2 files. Note that for 01h accumulation, all that the the PcpCombine_[fcst|obs] tasks do is convert from GRIB2 to NetCDF format (unlike for >01h, for which the 01hr accumulations must be summed to obtain 03h, 06h, etc accumulations). This must be done because currently, even though the NetCDF files for APCP01h are created by the PcpCombine_[fcst|obs] tasks, they are not actually used by downstream tasks such as GridStat.
  • Change thresholds in METplus conf files to use "ge", "gt", etc as opposed to "<=", "<", etc. This is so that thresholds can also easily be used in file and variable names.
  • For clarity, change SFC and UPA verification field group names into ADPSFC and ADPUPA, respectively.
  • Change behavior of ASNOW verification to be more similar to that of APCP.
  • Bug fix in GridStat_ensprob_ASNOW.conf. There is an inadvertent shift in the threshold values used in the forecast field array names with respect to the threshold values specified for the observations. Fix to make thresholds for forecast and obs match.
  • Add METplus logging level control in the main SRW App config file.
  • For clarity, rename some verification variables as needed (in the main SRW App config file, in the rocoto workflow xml, in ex-scripts, etc).
  • Clean up comments in METplus config files and make these files more similar to each other where possible.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • hercules.intel
  • cheyenne.intel
  • cheyenne.gnu
  • derecho.intel
  • gaea.intel
  • gaeac5.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

The set of fundamental WE2E tests as well as all the verification tests were run on Hera with Intel. All completed successfully. The fundamental tests are:

grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_GFS_v16
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16

The verification tests are:

MET_ensemble_verification
MET_ensemble_verification_only_vx
MET_ensemble_verification_only_vx_time_lag
MET_ensemble_verification_winter_wx
MET_verification
MET_verification_only_vx
MET_verification_winter_wx

Manual regression tests were also run on the following WE2E tests:

MET_verification_winter_wx [aka custom_ESGgrid_Great_Lakes_snow_8km]
MET_ensemble_verification_only_vx
MET_ensemble_verification_winter_wx

All had minor expected differences in results relative to the develop branch. There was a major difference in output (stat files) from the run_MET_GridStat_vx_ensprob_ASNOW06h task of the MET_ensemble_verification_winter_wx, but that is due to the bug fix in GridStat_ensprob_ASNOW.conf regarding the mismatch between forecast and obs thresholds (and is thus expected).

DEPENDENCIES:

None

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@michelleharrold @JeffBeck-NOAA @willmayfield

…iable fieldname_in_MET_filedir_names (as it is with APCP).
… ">=", ">", "<=", and "<" to "ge", "gt", "le", and "lt".
… variables are grouped together, (2) commented-out settings of FCST_VAR<n>_... and OBS_VAR<n>_... are removed, (3) FCST_VAR<n>_OPTIONS and OBS_VAR<n>_OPTIONS come last, and (4) each element in options is on a separate line.
…tool names to template METplus conf files as variables; add new workflow variable that specifies the METplus verbosity level.
…ks for APCP (i.e. not just for accumulation > 1 hr but also for accumulation = 1 hour) use obs and forecast tasks that have been processed by the PcpCombine tool of METplus. These files are all in NetCDF format and are generated from grib2 input files (both for obs and forecast). This involves:

* Renaming certain workflow variables for verification so their role/purpose is clearer.
* Modifying the names of the arrays for APCP variables in the NetCDF files.
* Changing the names of the levels associated with APCP variables in the NetCDF files (e.g. "A1" instead of "A01").
* For consistency, making changes for accumulated snow (ASNOW) analogous to the above changes for APCP.
* In ex-scripts for vx tasks, removing if-statement that treats the case of APCP accumulation period > 1 hour separately and performing the same steps for both accumulation = 1 hour and > 1 hour.
…should be replaced with [OBS|FCST]_PCP_COMBINE_INPUT_DATATYPE. Thus, they are only relevant to METplus conf files that run PcpCombine. Thus, remove all use of these variables in conf files that do not run PcpCombine, and in the ones that do, replace them with the appropriate new variables (some already have the new variables).
…s for combining METplus conf files for APCP01h and APCPgt01h (inadvertantly left out of a previous commit).
…f files no longer differentiate between 1 hour accumulation and > 1 hour accumulation.
@MichaelLueken MichaelLueken added bug Something isn't working enhancement New feature or request labels Nov 16, 2023
gsketefian added a commit to gsketefian/ufs-srweather-app that referenced this pull request Nov 22, 2023
…h (that is used for PR ufs-community#973 into ufs-srweather-app) into the feature/nep_nmep branch.
Copy link
Collaborator

@JeffBeck-NOAA JeffBeck-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on improving METplus within the SRW App!

Copy link
Collaborator

@RatkoVasic-NOAA RatkoVasic-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did couple of tests on Hercules:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16                COMPLETE              29.85
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP                 COMPLETE              13.24
MET_verification_only_vx                                           COMPLETE               0.27
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot     COMPLETE              32.35
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0         COMPLETE              21.12
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE              96.83

Approved.

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some great simplifying and cleanup changes...love to see a reduction of almost 3000 lines! 👍

I have a few questions, but since they aren't major and mostly aren't specifically related to these changes I won't hold up this PR

OBS_VAR1_THRESH = >=20.0
OBS_VAR1_OPTIONS = censor_thresh = lt-20.0; censor_val = -20.0; cnt_thresh = [ >15 ]; cnt_logic = UNION; convert(x) = x * 3280.84 * 0.001;
OBS_VAR1_THRESH = ge20
OBS_VAR1_OPTIONS = censor_thresh = lt-20.0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these copy-pasted from the REFC values? I suppose it does make sense to censor negative values since they don't make physical sense (outside of really edge cases that I'm not sure ever realistically take place), but maybe that should be lt0.0?

Copy link
Collaborator Author

@gsketefian gsketefian Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich This is the same as in the original file except each individual option is on a separate line to make the options easier to read and not have one super-long line. The splitting of the options onto separate lines is also in anticipation of the next PR, which jinja-fies things a lot. I didn't change any values here, just kept whatever we've inherited from the time Michelle/Jamie/Evan put these files together.


#FCST_VAR8_NAME = RETOP_L0_ENS_FREQ_ge50
FCST_VAR8_NAME = {{fieldname_in_met_output}}_L0_ENS_FREQ_ge50
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe consider GE0 verification here too...though I can't remember if this variable is above sea level or above ground level

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich What is GE0?


#FCST_VAR3_NAME = RETOP_L0_ENS_FREQ_ge40
FCST_VAR3_NAME = {{fieldname_in_met_output}}_L0_ENS_FREQ_ge40
OBS_VAR2_THRESH = ge30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that all of these different thresholds are considered separate variables here, vs with wind for example where different thresholds are specified for the same variable?

Copy link
Collaborator Author

@gsketefian gsketefian Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich I noticed this as well but decided not to explore it in too much depth for the purposes of this PR. So I don't understand things completely yet, but here are a couple of hints:

  1. The wind case(s) you noticed are in the deterministic PointStat files, i.e. PointStat_(ADP)[SFC|UPA].conf, whereas this file is for processing of ensemble probabilistic fields. You can also see the thresholds grouped together for deterministic RETOP processing in GridStat_REFC.conf, where there are lines like this:

    FCST_VAR1_THRESH = {{field_thresholds}}
    

    and {{field_thresholds}} is a list of values set in the ex-script, e.g. set to 'ge20, ge30, ge40, ge50' for RETOP.

  2. Here, a specific forecast array named {{fieldname_in_met_output}}_L0_ENS_FREQ_ge30 is being read in from the NetCDF file generated by GenEnsProd (where {{fieldname_in_met_output}} is equal to RETOP). Note that the string ge30 is hard-coded in this name. If OBS_VAR2_THRESH were set to a list of thresholds instead, then we would somehow need to specify a list of array names for FCST_VAR2_NAME (and some of the other variables like OBS_VAR2_NAME would have to become lists as well). I don't know if METplus can handle that. Maybe it can, but I didn't explore further.

Maybe @michelleharrold has a better answer. If there is a more compact way to express what we want done, I'd use it.

# "ASNOW" may be added to this list in order to include
# the related verification tasks in the workflow.
# accumulated snow (ASNOW) is often not of interest in non-winter cases
# and because observation files for ASNOW are not availabe on NOAA HPSS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# and because observation files for ASNOW are not availabe on NOAA HPSS
# and because observation files for ASNOW are not available on NOAA HPSS

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@gsketefian
Copy link
Collaborator Author

Looks like some great simplifying and cleanup changes...love to see a reduction of almost 3000 lines! 👍

I have a few questions, but since they aren't major and mostly aren't specifically related to these changes I won't hold up this PR

I didn't realize that info was available (easily?). Where can one see the line number change for a PR? There will be a much larger reduction of lines in my next PR :)

@MichaelLueken
Copy link
Collaborator

@gsketefian -

At the top of the PR, on the right hand most side, there are green numbers with a plus and red numbers with a minus. The green plus signifies the number of added lines in a PR, while the red minus represents the number of lines removed.

For this PR, I see the following in the top right side:

+1,394 −4,133

so there were 1,394 added lines, and 4,133 removed lines in this PR.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Jan 10, 2024
@MichaelLueken
Copy link
Collaborator

The WE2E coverage tests were manually run on Derecho and all successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_ESGgrid_IndianOcean_6km                                     COMPLETE              23.77
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot     COMPLETE              38.17
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16                COMPLETE              44.85
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR           COMPLETE              29.32
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta    COMPLETE              17.71
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR                COMPLETE              40.76
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_  COMPLETE              24.76
pregen_grid_orog_sfc_climo                                         COMPLETE              15.86
specify_template_filenames                                         COMPLETE              15.10
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             250.30

@gsketefian
Copy link
Collaborator Author

@gsketefian -

At the top of the PR, on the right hand most side, there are green numbers with a plus and red numbers with a minus. The green plus signifies the number of added lines in a PR, while the red minus represents the number of lines removed.

For this PR, I see the following in the top right side:

+1,394 −4,133

so there were 1,394 added lines, and 4,133 removed lines in this PR.

Oh right, thanks @MichaelLueken!

@gsketefian
Copy link
Collaborator Author

@JeffBeck-NOAA @RatkoVasic-NOAA @mkavulich Thanks for the reviews!

@MichaelLueken
Copy link
Collaborator

@gsketefian - All of the tests passed, with the exception of two tests on Jet:

  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h - Failed in both make_ics and make_lbcs with terminate called after throwing an instance of 'std::bad_alloc' error messages. I will attempt to relaunch these failed jobs manually.
  • get_from_HPSS_ics_RAP_lbcs_RAP - Failed in make_lbcs with srun: error: s3: task 23: Bus error (core dumped). I will attempt to relaunch this failed job manually.

The Jenkins workspace on Jet can be found: /mnt/lfs1/NAGAPE/epic/role.epic/jenkins/workspace/fs-srweather-app_pipeline_PR-973/jet/expt_dirs.

@gsketefian
Copy link
Collaborator Author

gsketefian commented Jan 10, 2024

@MichaelLueken Thanks for the update Mike. The PR doesn't touch the make_[ics|lbcs] tasks, so hopefully those are just one-time jet-specific issues.

@MichaelLueken
Copy link
Collaborator

The two tests that had failed on Jet - get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h and get_from_HPSS_ics_RAP_lbcs_RAP - have successfully completed following the use of rocotorewind and rocotoboot:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community                                                          COMPLETE              41.46
custom_ESGgrid                                                     COMPLETE              50.50
custom_ESGgrid_Great_Lakes_snow_8km                                COMPLETE              36.93
custom_GFDLgrid                                                    COMPLETE              32.32
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018         COMPLETE              30.57
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h     COMPLETE              50.94
get_from_HPSS_ics_RAP_lbcs_RAP                                     COMPLETE              19.08
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR                 COMPLETE             243.68
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot     COMPLETE              60.16
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2        COMPLETE              20.82
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta       COMPLETE             531.87
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR       COMPLETE              18.01
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1136.34

@MichaelLueken
Copy link
Collaborator

@gsketefian - Given that @christinaholtNOAA's PR #994 was approved and tested first, I merged that PR first. Changes were made to to the ex-scripts to transition to UW's CLI command line tool, which kicked off conflicts in these scripts in your branch. Please merge the current authoritative develop into your feature/vx_upgrades branch as soon as possible, address the conflicts in the ex-scripts, then I will complete the merge of this PR. Thank you very much!

@MichaelLueken
Copy link
Collaborator

@gsketefian -

While attempting to run one last batch of verification tests, specifically running @mkavulich's new MET_ensemble_verification_winter_wx WE2E verification test, the VX_FIELDS in tests/WE2E/test_configs/verification/config.MET_ensemble_verification_winter_wx.yaml needs to be updated to use VX_FIELDS: [ "APCP", "REFC", "RETOP", "ADPSFC", "ADPUPA", "ASNOW" ], rather than VX_FIELDS: [ "APCP", "REFC", "RETOP", "SFC", "UPA", "ASNOW" ]. Once this minor modification is made and my final tests are complete, I will move forward with merging this PR. Thanks!

Copy link
Collaborator

@willmayfield willmayfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the MET_verification_winter_wx case and changed the forecast length out to 36h to check that the 24h snow vx tasks run correctly--they do! And the obs/fcst fields look reasonably similar when I turned on ncpairs. Nice work Gerard! :-)

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsketefian -

Several issues have been identified with the new winter weather verification WE2E test, MET_ensemble_verification_winter_wx, that @mkavulich has introduced as part of PR #997.

  • The VX_FIELDS in tests/WE2E/test_configs/verification/config.MET_ensemble_verification_winter_wx.yaml needs to be updated to use VX_FIELDS: [ "APCP", "REFC", "RETOP", "ADPSFC", "ADPUPA", "ASNOW" ], rather than VX_FIELDS: [ "APCP", "REFC", "RETOP", "SFC", "UPA", "ASNOW" ]
  • The run_MET_GenEnsProd_vx_ASNOW06h task is failing. I see the following in the log/metplus.log.GenEnsProd_ASNOW06h logfile:
WARNING:
WARNING: process_ensemble() -> ensemble field "ASNOW/A06" not found in file "/work/noaa/epic/mlueken/ufs-srweather-app/expt_dirs/MET_ensemble_verification_winter_wx/2022020300/mem001/metprd/PcpCombine_fcst/srw.t00z.prslev.f006.rrfs_conuscompact_25km_ASNOW_a06h.nc"
WARNING:
WARNING:
WARNING: process_ensemble() -> ensemble field "ASNOW/A06" not found in file "/work/noaa/epic/mlueken/ufs-srweather-app/expt_dirs/MET_ensemble_verification_winter_wx/2022020300/mem002/metprd/PcpCombine_fcst/srw.t00z.prslev.f006.rrfs_conuscompact_25km_ASNOW_a06h.nc"
WARNING:
WARNING:
WARNING: process_ensemble() -> ensemble field "ASNOW/A06" not found in file "/work/noaa/epic/mlueken/ufs-srweather-app/expt_dirs/MET_ensemble_verification_winter_wx/2022020300/mem003/metprd/PcpCombine_fcst/srw.t00z.prslev.f006.rrfs_conuscompact_25km_ASNOW_a06h.nc"
WARNING:
WARNING:
WARNING: process_ensemble() -> ensemble field "ASNOW/A06" not found in file "/work/noaa/epic/mlueken/ufs-srweather-app/expt_dirs/MET_ensemble_verification_winter_wx/2022020300/mem004/metprd/PcpCombine_fcst/srw.t00z.prslev.f006.rrfs_conuscompact_25km_ASNOW_a06h.nc"
WARNING:
WARNING:
WARNING: process_ensemble() -> ensemble field "ASNOW/A06" not found in file "/work/noaa/epic/mlueken/ufs-srweather-app/expt_dirs/MET_ensemble_verification_winter_wx/2022020300/mem005/metprd/PcpCombine_fcst/srw.t00z.prslev.f006.rrfs_conuscompact_25km_ASNOW_a06h.nc"
WARNING:
WARNING:
WARNING: process_ensemble() -> ensemble field "ASNOW/A06" not found in file "/work/noaa/epic/mlueken/ufs-srweather-app/expt_dirs/MET_ensemble_verification_winter_wx/2022020300/mem006/metprd/PcpCombine_fcst/srw.t00z.prslev.f006.rrfs_conuscompact_25km_ASNOW_a06h.nc"
WARNING:
WARNING:
WARNING: process_ensemble() -> ensemble field "ASNOW/A06" not found in file "/work/noaa/epic/mlueken/ufs-srweather-app/expt_dirs/MET_ensemble_verification_winter_wx/2022020300/mem007/metprd/PcpCombine_fcst/srw.t00z.prslev.f006.rrfs_conuscompact_25km_ASNOW_a06h.nc"
WARNING:
WARNING:
WARNING: process_ensemble() -> ensemble field "ASNOW/A06" not found in file "/work/noaa/epic/mlueken/ufs-srweather-app/expt_dirs/MET_ensemble_verification_winter_wx/2022020300/mem008/metprd/PcpCombine_fcst/srw.t00z.prslev.f006.rrfs_conuscompact_25km_ASNOW_a06h.nc"
WARNING:
WARNING:
WARNING: process_ensemble() -> ensemble field "ASNOW/A06" not found in file "/work/noaa/epic/mlueken/ufs-srweather-app/expt_dirs/MET_ensemble_verification_winter_wx/2022020300/mem009/metprd/PcpCombine_fcst/srw.t00z.prslev.f006.rrfs_conuscompact_25km_ASNOW_a06h.nc"
WARNING:
WARNING:
WARNING: process_ensemble() -> ensemble field "ASNOW/A06" not found in file "/work/noaa/epic/mlueken/ufs-srweather-app/expt_dirs/MET_ensemble_verification_winter_wx/2022020300/mem010/metprd/PcpCombine_fcst/srw.t00z.prslev.f006.rrfs_conuscompact_25km_ASNOW_a06h.nc"
WARNING:
ERROR  :
ERROR  : process_ensemble() -> 0 of 10 (0) fields found for "ASNOW/A06" does not meet the threshold specified by "ens.ens_thresh" (0.05) in the configuration file.
ERROR  :

Both of these issues will need to be corrected with the newly added MET_ensemble_verification_winter_wx test before I can move forward with merging this PR in.

@gsketefian
Copy link
Collaborator Author

@MichaelLueken I encountered those problems as well with test MET_ensemble_verification_winter_wx. Several ASNOW tasks were failing, and, besides the change to config.MET_ensemble_verification_winter_wx.yaml that you pointed out, it was for the most part a matter of adding the accumulation to the variable name in the ASNOW METplus conf files, e.g. changing

FCST_VAR1_NAME = {{fieldname_in_met_output}}

to

FCST_VAR1_NAME = {{fieldname_in_met_output}}_{{accum_hh}}

I made this change in GenEnsProd_ASNOW.conf, EnsembleStat_ASNOW.conf, GridStat_ensmean_ASNOW.conf, and GridStat_ensprob_ASNOW.conf.

However, I also found a stealthy bug in GridStat_ensprob_ASNOW.conf that changes results (and which @willmayfield will probably be interested in). The issue was an inadvertent shift in the threshold values used in the forecast field array names with respect to the threshold values specified for the observations. For example, for VAR2, the buggy code is

FCST_VAR2_NAME = {{fieldname_in_met_output}}_{{accum_hh}}_A{{accum_no_pad}}_ENS_FREQ_gt0.0
...
OBS_VAR2_THRESH = ge0.508

What it should be is:

FCST_VAR2_NAME = {{fieldname_in_met_output}}_{{accum_hh}}_A{{accum_no_pad}}_ENS_FREQ_ge0.508
...
OBS_VAR2_THRESH = ge0.508

So I think the thresholds for the obs and forecasts were not matching. So although the run_MET_GridStat_vx_ensprob_ASNOW06h task succeeds in the develop branch, I think the results are incorrect. I think I've fixed the issue. @willmayfield if you're interested in taking a look at the results of this test (after I push my latest changes), please let me know and we can wait for you to take a look before merging.

I'm rerunning the test now to make sure it works from scratch and will then push my fixes.
Thanks,
Gerard

@gsketefian
Copy link
Collaborator Author

@MichaelLueken @willmayfield I reran the MET_ensemble_verification_winter_wx with my newest version, and it was successful. I've also done regression tests on this test as well as MET_ensemble_verification_only_vx and custom_ESGgrid_Great_Lakes_snow_8km. All have only expected differences in the vx output.

Please feel free to retest and merge. Thanks.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsketefian - Thank you very much for addressing the failures in the new test and finding a new bug in the configuration files! I was able to successfully run the MET_ensemble_verification_winter_wx WE2E test:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
MET_ensemble_verification_winter_wx_20240112093529                 COMPLETE             158.23
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             158.23

Reapproving PR and retesting following the latest changes.

@willmayfield - Please let me know if you are okay with these changes at your earliest convenience. Thanks!

@MichaelLueken
Copy link
Collaborator

@gsketefian - Here is the current update on the retesting for this PR:

The WE2E coverage tests on Gaea have completed successfully:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community_20240112103959                                           COMPLETE              23.22
custom_ESGgrid_NewZealand_3km_20240112104004                       COMPLETE              64.46
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              34.92
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240112104  COMPLETE              31.97
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024011210  COMPLETE              33.87
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson  COMPLETE             357.80
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024011  COMPLETE              33.36
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_20  COMPLETE             363.78
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot_202  COMPLETE              10.55
nco_ensemble_20240112104015                                        COMPLETE              78.47
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom  COMPLETE             351.98
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1384.38

The WE2E coverage tests on Gaea C5 have completed successfully:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community_20240112104016                                           COMPLETE              43.13
custom_ESGgrid_NewZealand_3km_20240112104024                       COMPLETE              48.67
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              27.85
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240112104  COMPLETE              30.65
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024011210  COMPLETE              31.93
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson  COMPLETE             313.32
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024011  COMPLETE              30.43
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_20  COMPLETE             272.79
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot_202  COMPLETE              16.73
nco_ensemble_20240112104043                                        COMPLETE              96.57
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom  COMPLETE             304.58
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1216.65

The WE2E coverage tests on Hera GNU have completed successfully:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Central_Asia_3km_20240112155348                     COMPLETE              36.65
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200_202401  COMPLETE              12.85
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_20240112155352              COMPLETE              20.08
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024011215  COMPLETE              45.85
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202  COMPLETE              30.48
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240112155  COMPLETE              20.99
long_fcst_20240112155402                                           COMPLETE              95.20
MET_verification_only_vx_20240112155405                            COMPLETE               0.25
MET_ensemble_verification_only_vx_time_lag_20240112155410          COMPLETE               8.98
nco_grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_202  COMPLETE              63.53
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             334.86

The WE2E coverage tests on Hera Intel have completed successfully:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Peru_12km_20240112155349                            COMPLETE              18.60
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200_2024011  COMPLETE               6.77
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2  COMPLETE             789.24
get_from_HPSS_ics_HRRR_lbcs_RAP_20240112155354                     COMPLETE              14.18
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               6.55
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              13.08
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP_20240112155405  COMPLETE              10.46
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2_20240  COMPLETE               7.13
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_202401  COMPLETE             240.04
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_20240112  COMPLETE             343.84
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR_202401121  COMPLETE             332.25
pregen_grid_orog_sfc_climo_20240112155414                          COMPLETE               8.33
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1790.47

The WE2E coverage tests on Hercules have completed successfully:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE_202  COMPLETE               7.23
grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_202  COMPLETE              10.36
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202  COMPLETE              27.77
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              16.63
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024011209  COMPLETE              25.20
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240112091  COMPLETE              52.97
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_  COMPLETE              13.31
grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240112091331  COMPLETE              68.37
grid_SUBCONUS_Ind_3km_ics_NAM_lbcs_NAM_suite_GFS_v16_202401120913  COMPLETE              29.07
MET_verification_only_vx_20240112091333                            COMPLETE               0.23
specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS_20240112091334               COMPLETE               7.74
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             258.88

The tests are still running on both Jet and Orion.

@MichaelLueken
Copy link
Collaborator

The WE2E coverage tests have successfully passed on Jet:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
community_20240112203333                                           COMPLETE              19.12
custom_ESGgrid_20240112203338                                      COMPLETE              27.94
custom_ESGgrid_Great_Lakes_snow_8km_20240112203339                 COMPLETE              18.86
custom_GFDLgrid_20240112203344                                     COMPLETE              19.11
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202401  COMPLETE              11.38
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20  COMPLETE              52.60
get_from_HPSS_ics_RAP_lbcs_RAP_20240112203349                      COMPLETE              17.85
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240112203350  COMPLETE             247.62
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              50.02
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              16.22
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024  COMPLETE             521.74
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR_2024  COMPLETE              11.71
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1014.17

Still awaiting completion on Orion.

@willmayfield
Copy link
Collaborator

willmayfield commented Jan 16, 2024

@MichaelLueken @gsketefian I tried it again and everything worked fine! I'm good with the changes.

I was worried that something was wrong with these results, but I now know that the problem was the model/physics giving unrealistic results on this test case, and not something due to this PR.

@MichaelLueken
Copy link
Collaborator

The WE2E coverage tests have successfully passed on Orion:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_SF_1p1km_20240113115145                             COMPLETE             170.58
deactivate_tasks_20240113115150                                    COMPLETE               1.35
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me  COMPLETE             918.85
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_  COMPLETE             262.32
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20240  COMPLETE             141.35
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta_202401131  COMPLETE              16.29
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240113115  COMPLETE             409.75
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_  COMPLETE              30.79
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_2  COMPLETE             280.11
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0_202401  COMPLETE              15.15
nco_20240113115203                                                 COMPLETE               7.87
2020_CAD_20240113115205                                            COMPLETE              35.60
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            2290.01

Given @willmayfield's continued approval after retesting these changes, I will now move forward with merging this PR.

@MichaelLueken MichaelLueken merged commit bbe78ca into ufs-community:develop Jan 16, 2024
2 of 4 checks passed
@gsketefian
Copy link
Collaborator Author

@willmayfield @MichaelLueken Thanks for working on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants