Determine list of WE2E tests for `fundamental` test set #277

gsketefian · 2022-05-25T19:26:01Z

Description

The purpose of this issue is to propose and discuss a set of WE2E tests to run for each PR into the ufs-srweather-app and/or regional_workflow repositories. These are referred to as "fundamental" tests in the SRW App's Code Management Document.

Solution

A spreadsheet has been created that contains the complete set of WE2E tests (as of 20220525) with various parameters (e.g. grid on which the forecast runs, forecast length, time step, etc) listed for each. There are a total of 80 tests, and the rows highlighted in blue (of which there are 57) are the ones proposed to be included in the fundamental test set.

Note that there are 4 categories of WE2E tests:

grids_extrn_mdls_suites_community (contains 41 tests)
grids_extrn_mdls_suites_nco (contains 3 tests)
release_SRW_v1 (contains 1 test)
wflow_featues (contains 37 tests; two are duplicates (i.e. symlinks to other tests), so really only 35 unique tests)

These categories correspond to subdirectories under the ufs-srweather-model/regioal_workflow/tests/WE2E/test_configs directory.

As a first cut, the following procedure is used to determine which tests from the four categories to include in the set of fundamental tests.

From the category grids_extrn_mdls_suites_community, include the tests that use the 25km grids (RRFS_CONUS_25km, RRFS_CONUScompat_25km, and CONUS_25km_GFDLgrid) as well as those that use the (very small in extent) 3km grid over Indianapolis (SUBCONUS_Ind_3km). This means excluding tests that use the 13km and 3km grids over the CONUS, Alaska, and North America as well as the mid-sized 3km sub-CONUS grid (RRFS_SUBCONUS_3km). This results in a total of 20 tests from the grids_extrn_mdls_suites_community category to include in the fundamental set of tests.

Similarly, from the category grids_extrn_mdls_suites_nco, include only tests that run on the 25km grid. The only test in this category (out of a total of 3) that runs on such a grid is nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR. Thus, there is only one test from this category to include in the fundamental set of tests.

From the release_SRW_v1 category, include the test GST_release_public_v1 (which is the only test in this category) since it is the only long-term test in the complete set of WE2 tests (runs out to 48 hours). (We might want to rename this category or just move this test into one of the others.)

Finally, from the wflow_features category, include all tests except template_vars and nco_inline_post since these are just symlinks to (i.e. alternate name for) the deactivate_tasks test in wflow_features and the nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR test in grids_extrn_mdls_suites_nco, respectively, both of which are already included in the fundamental set of tests. This results in a total of 35 tests from this category to include in the fundamental test set. (Note that for platforms that do not have access to NOMADS, the test get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio should not be run.)

With the above proposed approach, the total number of WE2E tests to be included in the fundamental tests is 20+1+35+1=57. The combined relative cost of these fundamental tests is about 113 units, where 1 unit of cost corresponds to running a 6-hour forecast on the RRFS_CONUS_25km grid with a time step of 40 sec. For comparison, the relative cost of running the full set of WE2E tests (of which there are 80) is 1622. (The relative cost of each WE2E test is listed in the spreadsheet.)

Notes on "cost":
To calculate the relative cost of a test, first calculate the absolute cost as follows:

  abs_cost = nx*ny*num_time_steps*num_fcsts

where nx and ny are the number of points in the horizontal (x and y) directions, num_time_steps is the number of time steps, and num_fcsts is the number of forecasts the test runs (this will be greater than 1 for tests that include multiple starting dates and/or run ensembles). Note that this cost calculation does not differentiate between different physics suites. Relative cost is then calculated using

  rel_cost = abs_cost/abs_cost_ref

where abs_cost_ref is the cost of running a single (num_fcsts = 1) 6-hour forecast on the RRFS_CONUS_25km grid (with nx = 219 and ny = 131) with a time step of 40 sec (so that num_time_steps = 6*3600/40 = 540), i.e.

  abs_cost_ref = 219*131*540*1 = 15,492,060

Related To

PR #776 in regional_workflow introduced the calculation of relative cost for the WE2E tests. PR #278 provides the necessary update to the documentation.

The text was updated successfully, but these errors were encountered:

mkavulich · 2022-07-06T15:36:22Z

@gsketefian Thanks for your effort to put together a first draft of this list. For ease of conversation I have replicated the list you described above in its entirety here:

grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_HRRR
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
GST_release_public_v1
community_ensemble_008mems
community_ensemble_2mems
custom_ESGgrid
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_TRUE
custom_GFDLgrid
deactivate_tasks
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019101818
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2020022518
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2020022600
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2021010100
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019101818
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2020022518
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2020022600
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021010100
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2021062000.sh
get_from_HPSS_ics_GSMGFS_lbcs_GSMGFS
get_from_HPSS_ics_HRRR_lbcs_RAP
get_from_HPSS_ics_RAP_lbcs_RAP
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio
inline_post
MET_ensemble_verification
MET_verification
nco_ensemble.sh
pregen_grid_orog_sfc_climo.sh
specify_DOT_OR_USCORE
specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE
specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS
specify_RESTART_INTERVAL
specify_template_filenames
subhourly_post_ensemble_2mems
subhourly_post

I would like to propose a slightly reduced list of tests, in two parts, the first being one that has (in my opinion) no drawbacks, with the second being potentially controversial.

Can definitely remove

In my opinion, these three tests can be removed without reducing any coverage of grids, suites, or basic functionality in the test suite:

grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
community_ensemble_008mems

This would result in significant core hour savings, reducing the cost from 113 to 76 "units" as described above; a 33% savings!

Should be removed, but open to debate

In my opinion, we should remove the tests that correspond to physics suites and features that are no longer supported/deprecated. These can still be tested in the comprehensive suite, but we shouldn't waste time and resources testing these for every single change to the code:

grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_TRUE
custom_GFDLgrid
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019101818
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2020022518
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2020022600
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021010100
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio
get_from_HPSS_ics_GSMGFS_lbcs_GSMGFS

This would result in another ~16 "unit" savings; in total, almost 50% savings from the original proposed list.

Other thoughts

Once this list is finalized and adopted, I would recommend we put a file tests.fundamental (or similar) in the regional_workflow/tests/WE2E directory, for ease of use.

gsketefian · 2022-07-06T16:28:12Z

@mkavulich All that sounds good to me. Note that ~6 new tests have been added to the set since I created the list, so I need to update the list and I/we have to figure out which of the new ones need to go into the fundamental set. Also, when running the comprehensive set, the script that does it needs to update that set on the fly since any given PR might be changing the set (add/delete tests, rename/move tests, update the tests.fundamental file, etc).

mkavulich · 2022-07-06T20:41:57Z

Okay, so after some offline discussion, we have a proposed path forward:

How the "fundamental test suite" can be run

The test config files for those tests considered "fundamental" will be modified to include a "FUNDAMENTAL_TEST=TRUE" variable (or similar). The run_WE2E_tests.sh script will be modified to look for this variable, and users can run the script with the "--fundamental=true" flag (or similar) to automatically run the proper set of tests. Because some tests only work on certain platforms (e.g. those tests that retrieve data from NOAA HPSS), the config file variable can have the appropriate platform-specific logic to avoid running those tests that will always fail on a given platform.

List of fundamental tests

grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_HRRR
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
GST_release_public_v1 (hera only)
community_ensemble_2mems
custom_ESGgrid
deactivate_tasks
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200 (hera, jet only)
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019101818 (hera, jet only)
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2020022518 (hera, jet only)
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2020022600 (hera, jet only)
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2021010100 (hera, jet only)
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2021062000 (hera, jet only)
get_from_HPSS_ics_HRRR_lbcs_RAP (hera, jet only)
get_from_HPSS_ics_RAP_lbcs_RAP (hera, jet only)
inline_post
MET_ensemble_verification
MET_verification
nco_ensemble
pregen_grid_orog_sfc_climo
specify_DOT_OR_USCORE
specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE
specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS (hera only)
specify_RESTART_INTERVAL
specify_template_filenames

Several tests from the above list have been removed because they are known failures, or for other reasons that I specified in a previous comment.

gsketefian added the enhancement New feature or request label May 25, 2022

JeffBeck-NOAA mentioned this issue Jun 1, 2022

Update documentation for SRW APP ufs-community/UFS_UTILS#656

Merged

mkavulich self-assigned this Jul 6, 2022

mkavulich mentioned this issue Jul 7, 2022

Replace deprecated NCAR python environment with conda on Cheyenne #326

Merged

mkavulich mentioned this issue Jul 22, 2022

Remove redundant hdf5 module that causes problems with Cheyenne build #332

Merged

mkavulich mentioned this issue Sep 20, 2022

Review and update WE2E testing suite ufs-community/regional_workflow#484

Closed

mkavulich closed this as completed Feb 9, 2023

mkavulich mentioned this issue Feb 9, 2023

Overhaul and consolidate WE2E tests, identify needed additional tests #587

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine list of WE2E tests for `fundamental` test set #277

Determine list of WE2E tests for `fundamental` test set #277

gsketefian commented May 25, 2022 •

edited

Loading

mkavulich commented Jul 6, 2022

gsketefian commented Jul 6, 2022

mkavulich commented Jul 6, 2022

Determine list of WE2E tests for fundamental test set #277

Determine list of WE2E tests for fundamental test set #277

Comments

gsketefian commented May 25, 2022 • edited Loading

Description

Solution

Related To

mkavulich commented Jul 6, 2022

Can definitely remove

Should be removed, but open to debate

Other thoughts

gsketefian commented Jul 6, 2022

mkavulich commented Jul 6, 2022

How the "fundamental test suite" can be run

List of fundamental tests

Determine list of WE2E tests for `fundamental` test set #277

Determine list of WE2E tests for `fundamental` test set #277

gsketefian commented May 25, 2022 •

edited

Loading