Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCP Combine tasks fail in NCO mode #688

Closed
mkavulich opened this issue Mar 21, 2023 · 5 comments
Closed

PCP Combine tasks fail in NCO mode #688

mkavulich opened this issue Mar 21, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@mkavulich
Copy link
Collaborator

Expected behavior

PCP Combine tasks should succeed.

Current behavior

Running a PCP Combine task results in failure when running in NCO mode. The problem appears to be due to the exregional_run_met_pcpcombine.sh script not looking for forecast files in the correct location:

========================================================================
Entering script:  "exregional_run_met_pcpcombine.sh"
In directory:     "/scratch2/BMC/fv3lam/kavulich/UFS/workdir/test_develop/ufs-srweather-app/scripts"

This is the ex-script for the task that runs the METplus PcpCombine
that combines hourly accumulated precipitation (APCP) data to generate
files containing multi-hour accumulated precipitation (e.g. 3-hour, 6-
hour, 24-hour).  The input files can come from either observations or
a forecast.
========================================================================
Initial (i.e. before filtering for missing files) set of forecast hours
is:
  fhr_array = ( "3" "6" )

The file (fp) for the current forecast hour (fhr; relative to the cycle
date cdate) is missing:
  fhr = "1"
  cdate = "2019061500"
  fp = "/scratch2/BMC/fv3lam/kavulich/UFS/workdir/test_develop/expt_dirs/fundamental/MET_ensemble_verification/2019061500/mem001/postprd/rrfs.t00z.prslev.f001.rrfs_conus_25km.grib2"
Excluding the current forecast hour from the list of hours passed to the
METplus configuration file.

The file (fp) for the current forecast hour (fhr; relative to the cycle
date cdate) is missing:
  fhr = "4"
  cdate = "2019061500"
  fp = "/scratch2/BMC/fv3lam/kavulich/UFS/workdir/test_develop/expt_dirs/fundamental/MET_ensemble_verification/2019061500/mem001/postprd/rrfs.t00z.prslev.f004.rrfs_conus_25km.grib2"
Excluding the current forecast hour from the list of hours passed to the
METplus configuration file.

Final (i.e. after filtering for missing files) set of foreast hours is
(written as a single string):
  fhr_list = ""

FATAL ERROR:
ERROR:
  From script:  "exregional_run_met_pcpcombine.sh"
  Full path to script:  "/scratch2/BMC/fv3lam/kavulich/UFS/workdir/test_develop/ufs-srweather-app/scripts/exregional_run_met_pcpcombine.sh"
The list of forecast hours for which to run METplus is empty:
  FHR_LIST = []
Exiting with nonzero status.
End exregional_run_met_pcpcombine.sh at Tue Mar 21 17:17:15 UTC 2023 with error code 1 (time elapsed: 00:00:03)
FATAL ERROR:
ERROR:
  From script:  "JREGIONAL_RUN_MET_PCPCOMBINE"
  Full path to script:  "/scratch2/BMC/fv3lam/kavulich/UFS/workdir/test_develop/ufs-srweather-app/jobs/JREGIONAL_RUN_MET_PCPCOMBINE"
Call to ex-script corresponding to J-job "JREGIONAL_RUN_MET_PCPCOMBINE" failed.
Exiting with nonzero status.
End JREGIONAL_RUN_MET_PCPCOMBINE at Tue Mar 21 17:17:15 UTC 2023 with error code 1 (time elapsed: 00:00:03)

Machines affected

Tested on Hera, but should appear on all platforms

Steps To Reproduce

Running the MET_verification WE2E test with --run_envir=nco is an easy way to replicate this failure:

./run_WE2E_tests.py -t MET_verification -m hera --account=fv3lam --run_envir=nco -q
Checking that all tests are valid
Will run 1 tests:
/scratch2/BMC/fv3lam/kavulich/UFS/workdir/test_develop/ufs-srweather-app/tests/WE2E/test_configs/verification/config.MET_verification.yaml
Calling workflow generation function for test MET_verification

Workflow for test MET_verification successfully generated in
/scratch2/BMC/fv3lam/kavulich/UFS/workdir/test_develop/expt_dirs/MET_verification

calling function that monitors jobs, prints summary
Writing information for all experiments to WE2E_tests_20230321172852.yaml
Checking tests available for monitoring...
Starting experiment MET_verification running
Updating database for experiment MET_verification
Setup complete; monitoring 1 experiments
Use ctrl-c to pause job submission/monitoring
03/21/23 17:34:41 UTC :: FV3LAM_wflow.xml :: Cycle 201906150000, Task run_MET_PcpCombine_fcst_APCP03h, jobid=43122423, in state DEAD (FAILED), ran for 11.0 seconds, exit status=256, try=2 (of 2)
03/21/23 17:34:41 UTC :: FV3LAM_wflow.xml :: Cycle 201906150000, Task run_MET_PcpCombine_fcst_APCP06h, jobid=43122424, in state DEAD (FAILED), ran for 10.0 seconds, exit status=256, try=2 (of 2)
Experiment MET_verification is DEAD;will no longer monitor.
All 1 experiments finished in 0:07:42.588701
Calculating core-hour usage and printing final summary
----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
MET_verification                                                   DEAD                  12.76
----------------------------------------------------------------------------------------------------
Total                                                              DEAD                  12.76

Detailed summary written to WE2E_summary_20230321173634.txt

All experiments are complete
Summary of results available in WE2E_tests_2023032117
@JeffBeck-NOAA
Copy link
Collaborator

@gsketefian, are you aware of this bug? I'm guessing this should be an easy fix, similar to the other "vx in NCO mode" changes that were necessary in the XML.

@mkavulich
Copy link
Collaborator Author

@JeffBeck-NOAA I just noticed that #683 Includes a fix for this bug, so I assume he is aware :)

@JeffBeck-NOAA
Copy link
Collaborator

@mkavulich, great! Thanks for catching that.

@gsketefian
Copy link
Collaborator

Well I know that vx tasks don't work in NCO mode in the current develop branch, probably for multiple reasons. And #683 should fix all of them (I successfully tested all vx tasks in NCO mode). But before I can merge it I have to update it with the latest develop, most importantly to make sure it works with the new rocoto XML generation system. Not sure exactly when I will get to it though, maybe late this week.

@MichaelLueken
Copy link
Collaborator

@mkavulich @gsketefian Following the merging of PR #683 into develop, this test is now passing and not failing in PCP Combine tasks. Closing this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants