Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine list of WE2E tests for fundamental test set #277

Closed
gsketefian opened this issue May 25, 2022 · 3 comments
Closed

Determine list of WE2E tests for fundamental test set #277

gsketefian opened this issue May 25, 2022 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@gsketefian
Copy link
Collaborator

gsketefian commented May 25, 2022

Description

The purpose of this issue is to propose and discuss a set of WE2E tests to run for each PR into the ufs-srweather-app and/or regional_workflow repositories. These are referred to as "fundamental" tests in the SRW App's Code Management Document.

Solution

A spreadsheet has been created that contains the complete set of WE2E tests (as of 20220525) with various parameters (e.g. grid on which the forecast runs, forecast length, time step, etc) listed for each. There are a total of 80 tests, and the rows highlighted in blue (of which there are 57) are the ones proposed to be included in the fundamental test set.

Note that there are 4 categories of WE2E tests:

  • grids_extrn_mdls_suites_community (contains 41 tests)
  • grids_extrn_mdls_suites_nco (contains 3 tests)
  • release_SRW_v1 (contains 1 test)
  • wflow_featues (contains 37 tests; two are duplicates (i.e. symlinks to other tests), so really only 35 unique tests)

These categories correspond to subdirectories under the ufs-srweather-model/regioal_workflow/tests/WE2E/test_configs directory.

As a first cut, the following procedure is used to determine which tests from the four categories to include in the set of fundamental tests.

From the category grids_extrn_mdls_suites_community, include the tests that use the 25km grids (RRFS_CONUS_25km, RRFS_CONUScompat_25km, and CONUS_25km_GFDLgrid) as well as those that use the (very small in extent) 3km grid over Indianapolis (SUBCONUS_Ind_3km). This means excluding tests that use the 13km and 3km grids over the CONUS, Alaska, and North America as well as the mid-sized 3km sub-CONUS grid (RRFS_SUBCONUS_3km). This results in a total of 20 tests from the grids_extrn_mdls_suites_community category to include in the fundamental set of tests.

Similarly, from the category grids_extrn_mdls_suites_nco, include only tests that run on the 25km grid. The only test in this category (out of a total of 3) that runs on such a grid is nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR. Thus, there is only one test from this category to include in the fundamental set of tests.

From the release_SRW_v1 category, include the test GST_release_public_v1 (which is the only test in this category) since it is the only long-term test in the complete set of WE2 tests (runs out to 48 hours). (We might want to rename this category or just move this test into one of the others.)

Finally, from the wflow_features category, include all tests except template_vars and nco_inline_post since these are just symlinks to (i.e. alternate name for) the deactivate_tasks test in wflow_features and the nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR test in grids_extrn_mdls_suites_nco, respectively, both of which are already included in the fundamental set of tests. This results in a total of 35 tests from this category to include in the fundamental test set. (Note that for platforms that do not have access to NOMADS, the test get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio should not be run.)

With the above proposed approach, the total number of WE2E tests to be included in the fundamental tests is 20+1+35+1=57. The combined relative cost of these fundamental tests is about 113 units, where 1 unit of cost corresponds to running a 6-hour forecast on the RRFS_CONUS_25km grid with a time step of 40 sec. For comparison, the relative cost of running the full set of WE2E tests (of which there are 80) is 1622. (The relative cost of each WE2E test is listed in the spreadsheet.)

Notes on "cost":
To calculate the relative cost of a test, first calculate the absolute cost as follows:

  abs_cost = nx*ny*num_time_steps*num_fcsts

where nx and ny are the number of points in the horizontal (x and y) directions, num_time_steps is the number of time steps, and num_fcsts is the number of forecasts the test runs (this will be greater than 1 for tests that include multiple starting dates and/or run ensembles). Note that this cost calculation does not differentiate between different physics suites. Relative cost is then calculated using

  rel_cost = abs_cost/abs_cost_ref

where abs_cost_ref is the cost of running a single (num_fcsts = 1) 6-hour forecast on the RRFS_CONUS_25km grid (with nx = 219 and ny = 131) with a time step of 40 sec (so that num_time_steps = 6*3600/40 = 540), i.e.

  abs_cost_ref = 219*131*540*1 = 15,492,060

Related To

PR #776 in regional_workflow introduced the calculation of relative cost for the WE2E tests. PR #278 provides the necessary update to the documentation.

@mkavulich
Copy link
Collaborator

@gsketefian Thanks for your effort to put together a first draft of this list. For ease of conversation I have replicated the list you described above in its entirety here:

  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional
  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp
  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR
  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta
  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
  • grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp
  • grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
  • grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16
  • grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR
  • grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta
  • grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
  • grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
  • grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
  • grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
  • grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
  • grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
  • grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
  • grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
  • grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_HRRR
  • grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
  • nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
  • GST_release_public_v1
  • community_ensemble_008mems
  • community_ensemble_2mems
  • custom_ESGgrid
  • custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE
  • custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_TRUE
  • custom_GFDLgrid
  • deactivate_tasks
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019101818
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2020022518
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2020022600
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2021010100
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019101818
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2020022518
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2020022600
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021010100
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2021062000.sh
  • get_from_HPSS_ics_GSMGFS_lbcs_GSMGFS
  • get_from_HPSS_ics_HRRR_lbcs_RAP
  • get_from_HPSS_ics_RAP_lbcs_RAP
  • get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio
  • inline_post
  • MET_ensemble_verification
  • MET_verification
  • nco_ensemble.sh
  • pregen_grid_orog_sfc_climo.sh
  • specify_DOT_OR_USCORE
  • specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE
  • specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS
  • specify_RESTART_INTERVAL
  • specify_template_filenames
  • subhourly_post_ensemble_2mems
  • subhourly_post

I would like to propose a slightly reduced list of tests, in two parts, the first being one that has (in my opinion) no drawbacks, with the second being potentially controversial.

Can definitely remove

In my opinion, these three tests can be removed without reducing any coverage of grids, suites, or basic functionality in the test suite:

  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp
  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR
  • grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
  • community_ensemble_008mems

This would result in significant core hour savings, reducing the cost from 113 to 76 "units" as described above; a 33% savings!

Should be removed, but open to debate

In my opinion, we should remove the tests that correspond to physics suites and features that are no longer supported/deprecated. These can still be tested in the comprehensive suite, but we shouldn't waste time and resources testing these for every single change to the code:

  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional
  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
  • grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp
  • grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
  • grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16
  • grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
  • custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE
  • custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_TRUE
  • custom_GFDLgrid
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019101818
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2020022518
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2020022600
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021010100
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio
  • get_from_HPSS_ics_GSMGFS_lbcs_GSMGFS

This would result in another ~16 "unit" savings; in total, almost 50% savings from the original proposed list.

Other thoughts

Once this list is finalized and adopted, I would recommend we put a file tests.fundamental (or similar) in the regional_workflow/tests/WE2E directory, for ease of use.

@gsketefian
Copy link
Collaborator Author

@mkavulich All that sounds good to me. Note that ~6 new tests have been added to the set since I created the list, so I need to update the list and I/we have to figure out which of the new ones need to go into the fundamental set. Also, when running the comprehensive set, the script that does it needs to update that set on the fly since any given PR might be changing the set (add/delete tests, rename/move tests, update the tests.fundamental file, etc).

@mkavulich
Copy link
Collaborator

Okay, so after some offline discussion, we have a proposed path forward:

How the "fundamental test suite" can be run

The test config files for those tests considered "fundamental" will be modified to include a "FUNDAMENTAL_TEST=TRUE" variable (or similar). The run_WE2E_tests.sh script will be modified to look for this variable, and users can run the script with the "--fundamental=true" flag (or similar) to automatically run the proper set of tests. Because some tests only work on certain platforms (e.g. those tests that retrieve data from NOAA HPSS), the config file variable can have the appropriate platform-specific logic to avoid running those tests that will always fail on a given platform.

List of fundamental tests

  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta
  • grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
  • grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR
  • grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta
  • grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
  • grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
  • grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
  • grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
  • grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
  • grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_HRRR
  • grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
  • nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
  • GST_release_public_v1 (hera only)
  • community_ensemble_2mems
  • custom_ESGgrid
  • deactivate_tasks
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200 (hera, jet only)
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019101818 (hera, jet only)
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2020022518 (hera, jet only)
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2020022600 (hera, jet only)
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2021010100 (hera, jet only)
  • get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2021062000 (hera, jet only)
  • get_from_HPSS_ics_HRRR_lbcs_RAP (hera, jet only)
  • get_from_HPSS_ics_RAP_lbcs_RAP (hera, jet only)
  • inline_post
  • MET_ensemble_verification
  • MET_verification
  • nco_ensemble
  • pregen_grid_orog_sfc_climo
  • specify_DOT_OR_USCORE
  • specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE
  • specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS (hera only)
  • specify_RESTART_INTERVAL
  • specify_template_filenames

Several tests from the above list have been removed because they are known failures, or for other reasons that I specified in a previous comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants