-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Consolidate verification tasks using retrieve_data.py #864
[develop] Consolidate verification tasks using retrieve_data.py #864
Conversation
9238f29
to
7a26db0
Compare
7a26db0
to
e41abf2
Compare
@MichaelLueken This branch has been rebased with the latest changes from develop, so should be in its final state pending reviewer comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall these changes look good to me!
I do have some concerns about adding the 2020_CAD
test to comprehensive.cheyenne
and the new custom_ESGgrid_Great_Lakes_snow_8km
to comprehensive
.
The 2020_CAD
test only has data staged on Cheyenne. All other machines are able to pull data from AWS, but that wasn't possible on Cheyenne. Since staged data isn't available on all machines, there is no USE_USER_STAGED_EXTRN_FILES
in the test's config file. In order to add this test to comprehensive.cheyenne
, the staged data on Cheyenne would need to be propagated to the rest of the machines and USE_USER_STAGED_EXTRN_FILES: true
would need to be added. It should be fine to add this test to comprehensive
, but should be removed from comprehensive.cheyenne
(unless the data is added to the rest of the Tier-1 platforms).
The new custom_ESGgrid_Great_Lakes_snow_8km
test requires access to HPSS to pull the necessary verification data. While this test has been successfully run on both Hera and Jet, Gaea doesn't have the necessary hpss
module that can be loaded to pull the data from HPSS. Would adding this test to both the coverage.jet
and coverage.hera.gnu.com
(or coverage.hera.intel.nco
) suites and removing it from comprehensive
be the better path forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my concern regarding the 2020_CAD
test in the comprehensive.cheyenne
file! Approving these changes now.
It seems that some of the tests are failing on Hera. The following tests succeeded: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP While the following tests failed: custom_ESGgrid_Great_Lakes_snow_8km On the develop branch all tests succeeded except for custom_ESGgrid_Great_Lakes_snow_8km. Since this last one is a new config file, I simply copied it and tried to use it on the develop branch. In all the cases where the run did not succeed, the code stopped while executing the following tasks: get_obs_ccpa, get_obs_nohrsc, get_obs_mrms, get_obs_ndas. |
@VanderleiVargas-NOAA can you point me to your tests on Hera? I have run those tests on Hera (with Intel) and they passed on my end. |
the outputs are at the SRW app that I used is at |
…sion may have bugs, I am committing so I can test on another platform since my Jet "fairshare" is apparently off-the-charts low
…e tarball contains a full path, we need to lstrip the leading "/" before globbing
All tasks now working for simple test case! Next up is to try all WE2E vx tests
…ctory, we need to create it prior to calling the script
obs for tests MET_verification, MET_verification_only_vx, and MET_ensemble_verification (modified to pull rather than use staged data) - Remove old j-jobs and exscripts - Cleanup and update mrms_pull_topofhour.py to be more pythonic - Move to obs dir prior to pulling data to avoid potential conflicts from temporarily extracted files from HPSS - Move many comments from config_defaults.yaml to more appropriate location in new data pulling/staging script; add more details about data quirks
…verification tests now pass when retrieving from HPSS!
- Remove usage of "*_vrfy" functions due to issues mentioned in Issue 861 (ufs-community#861) - Remove extra "mkdir" statement where it shouldn't be for MRMS - Use correct variable (vhh_noZero) for checking valid hour for NDAS
- Remove stanzas eliminating the "task_get_obs*" tasks; these should now be run every time (if using staged data, they will check that the data exists) - Modify tasks RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP (MET_verification) and MET_ensemble_verification_only_vx_time_lag to retrieve obs data from HPSS - Fix MET_ensemble_verification_only_vx_time_lag test by correcting variable NET to NET_default (addresses issue 851) - Add new verification task custom_ESGgrid_Eastern_NA_snow_8km that will have winter weather verification tasks (for subsequent PR)
- Don't retrieve obs data from HPSS for grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP; too many platforms use this test - Rename custom_ESGgrid_Eastern_NA_snow_8km to custom_ESGgrid_Great_Lakes_snow_8km; add to comprehensive and Jet coverage test - Add MET_ensemble_verification_only_vx to comprehensive for Cheyenne and Orion, as well as to coverage suite via Hera/GNU - Move MET_verification_only_vx coverage from Hera/GNU to Cheyenne/GNU - Add new mysterious "2020_CAD" test to all comprehensive lists, not just Orion (what's up with that?), sort alphabetically
… didn't account for templating behavior
- Remove stand-alone j-jobs and exscripts for NOHRSC - Add NOHRSC_obs stanza to parm/data_locations.yml - Update documentation and comments to include NOHRSC obs - Add "ASNOW" (accumulated snow) verification to custom_ESGgrid_Great_Lakes_snow_8km -
…m comprehensive.cheyenne
…E testing purposes, similar to TEST_ALT_EXTRN_MDL_SYSBASEDIR_ICS
23dc10f
to
6b97050
Compare
@MichaelLueken The conflicts have been resolved and I commented out the two tests we discussed from the coverage suite files. Let me know if you need anything else from my end. Edit: to ensure the conflict resolution didn't break anything major I ran the verification and fundamental tests on Hera/Intel |
The Cheyenne Intel WE2E coverage tests were run on Hera and all successfully passed:
The Cheyenne GNU WE2E coverage tests were run on Hera and all successfully passed:
Manual testing of MET_ensemble_verification_only_vx_time_lag on Hera GNU:
The Gaea WE2E coverage tests were manually run and all successfully passed:
The Jet WE2E coverage tests were manually run and all successfully passed:
Manual testing of custom_ESGgrid_Great_Lakes_snow_8km on Jet:
The Hera and Orion Jenkins tests successfully passed. Now moving forward with merging this work. |
…sts, remove unsupported domains (#871) Script improvements * run_WE2E_tests.py * Adds ability to run a group of all tests in a subdirectory of test_configs * Replace --use_cron_to_relaunch with a --launch argument that can take the values "python", "cron", or "none" (the last of which will create the experiments but not run them) * monitor_jobs.py * Adds --mode flag, which if specified as "advance" will only run rocotorun once for each experiment, then quit. * generate_FV3LAM_wflow.py and set_FV3nml_sfc_climo_filenames.py * Add --debug flag that will give more verbose output. No longer read "VERBOSE" variable from config.yaml for this script. * Make most prints "debug only" * get_crontab_contents.py * Overhaul script to remove global variables, add arguments as needed * Fixes bug where submitting jobs from cron won't work if your crontab is empty * Change functionality so that the script will remove the specified line from crontab if "--remove" flag is provided, otherwise it prints the crontab contents New tests * Several new custom domain tests are added in a new custom_grids directory. The new custom domains were chosen to span a variety of locations, terrain types, and dates. Aside from custom_ESGgrid_Great_Lakes_snow_8km, these are basic tests that don't use many non-default settings, and so are good candidates for testing new SRW capabilities in the future. See the "Documentation" section for more details. * A new "long forecast" test (108 hours) starting at 2023060112 that retrieves FV3GFS grib2 input data from AWS (and so can not be run on Cheyenne). Additional changes * Removes unsupported domains: * EMC_AK * EMC_HI * EMC_PR * EMC_GU * GSL_HAFSV0.A_25km * GSL_HAFSV0.A_13km * GSL_HAFSV0.A_3km * GSD_HRRR_AK_50km * For HPSS tests, ensure they all are explicitly set to look for HPSS data only * Test custom_ESGgrid_Great_Lakes_snow_8km (introduced in [develop] Consolidate verification tasks using retrieve_data.py #864) moved from verification to custom_tests, with a symbolic link (test_configs/verification/config.MET_verification_winter_wx.yaml) left behind. Also the ICs/LBCs are now from RAP output, retrieved from HPSS (since the observation data requires HPSS access anyway this is fine) * Make new comprehensive test file for Gaea (symlink to Orion's file) since that machine also does not have HPSS access * Various comment fixes
DESCRIPTION OF CHANGES:
As described in Issue #850, the scripts for retrieving observation data for verification are quite convoluted and hard-to-maintain due to the constant changing of naming conventions on HPSS. We can leverage the functionality of the
retrieve_data.py
script to greatly simplify and consolidate this code, and that's what this PR achieves.In addition, many other bug fixes and improvements are included:
mrms_pull_topofhour.py
; now has amain()
function for importing, uses proper command-line arguments with argparse, uses f-strings to simplify print statements, and raises an exception if no observation file is found instead of failing silentlyretrieve_data.py
where filenames in tarballs containing full paths are not properly extracted--data_type
argument inretrieve_data.py
: the data type is customizable via the config file (which is also customizable), so we shouldn't hard-code arbitrary choices. Missing data types are gracefully handled later anyway.custom_ESGgrid_Great_Lakes_snow_8km
that specifies a custom domain for a snowstorm in February 2023, re-using data that was staged for the AQM test. This is part of an effort to expand testing (finishing Issue Overhaul and consolidate WE2E tests, identify needed additional tests #587) that will be completed in a subsequent PR, including explicit tests for winter precipitation verification.Fixes bug described in Issue New test experiment MET_ensemble_verification_only_vx_time_lag can not be created due to invalid variable "NET" #851.TestMET_ensemble_verification_only_vx_time_lag
now works on most platforms, not just hard-coded to HeraType of change
TESTS CONDUCTED:
Ran all verification tests (including new one) on Jet and Hera successfully. Also ran all tests by retrieving data from HPSS, rather than using staged data; all passed. Ran Fundamental tests on Jet as well.
Note: After rebasing latest changes, ran fundamental and all verification tests on Hera; all passed.
DEPENDENCIES:
[develop] Add verification of snowfall accumulation #853 should be merged prior to this PR; those changes are included in this PRDOCUMENTATION:
ISSUE:
MET_verification_only_vx
,MET_ensemble_verification_only_vx
, andMET_ensemble_verification_only_vx_time_lag
fail on most platforms despite data being staged #863Resolves New test experiment MET_ensemble_verification_only_vx_time_lag can not be created due to invalid variable "NET" #851Fixed by [develop] Bug fix for Issue #851 (renamingNET
toNET_default
) #868CHECKLIST