Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Bugfix find task function in setup.py #745

Merged

Conversation

danielabdi-noaa
Copy link
Collaborator

@danielabdi-noaa danielabdi-noaa commented Apr 26, 2023

DESCRIPTION OF CHANGES:

There seems to be a bug in dict_find. I am not sure how it worked so far.

  • When a task is found but is deeper in the dictionary it returns False
  • In some cases it returns None instead of False
  • Logic for task_make_lbcs is wrong

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

DOCUMENTATION:

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa These changes look good to me! Thanks for correcting run_make_lbcs - renaming task_make_ics to task_make_lbcs.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Apr 28, 2023
@MichaelLueken
Copy link
Collaborator

@danielabdi-noaa As of this time, there is the expected failure of the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR test on Hera Intel (due to issue #688).

I'm also noting a failure on Cheyenne GNU, the nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km test. This test is failing in the post. Unfortunately, there is no error message in the post log files. It prints:

Starting post-processing for fhr = 000 hr...

and then stops. If you would like to look at the output, please see:

/glade/scratch/epicufsrt/jenkins/workspace/fs-srweather-app_pipeline_PR-745/expt_dirs/nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km

I'll try manually running this test and see if it passes or fails. I'll let you know what I find.

@MichaelLueken
Copy link
Collaborator

@danielabdi-noaa Following up, manual reruns of the nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km test are failing as well. I am seeing the following in the log files for the post in these runs:

Launching J-job (jjob_fp) for task "run_post" ...
  jjob_fp = "/glade/scratch/mlueken/ufs-srweather-app/jobs/JREGIONAL_RUN_POST"

/glade/scratch/mlueken/ufs-srweather-app/ush/job_preamble.sh: line 22: [: -eq: unary operator expected

"ln_vrfy" operation returned with a message.  This command was
issued from the script in file:

  ""

Message from "ln_vrfy" function's "ln" operation:
  ln: target '/glade/scratch/mlueken/ufs-srweather-app/expt_dirs/nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km/log/make_lbcs_mem000_2019061500.log' is not a directory

I'm also seeing:

Starting post-processing for fhr = 002 hr...
/glade/scratch/mlueken/ufs-srweather-app/ush/bash_utils/print_msg.sh: line 192: BASH_SOURCE[1]: unbound variable
FATAL ERROR:
ERROR:
  From script:  ""
  Full path to script:  ""
Call to executable to run post for forecast hour 002 returned with non-
zero exit code.
Exiting with nonzero status.
FATAL ERROR:
ERROR:
  From script:  "JREGIONAL_RUN_POST"
  Full path to script:  "/glade/scratch/mlueken/ufs-srweather-app/jobs/JREGIONAL_RUN_POST"
Call to ex-script corresponding to J-job "JREGIONAL_RUN_POST" failed.
Exiting with nonzero status.

I'll try running the test separately on another platform to see if it is an issue only on Cheyenne or if it is also present on other machines.

I'm trying to see now if this is an issue with the test itself, or if this is an issue associated with issue #652. I'll continue trying to see if I can't uncover any other issues with this test.

@MichaelLueken
Copy link
Collaborator

The nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km test successfully completed on Hera. I'll retry running the test on Cheyenne Monday morning, but the fact that the test passes on other machines makes the failure seem like what is documented in issue #652.

Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@MichaelLueken
Copy link
Collaborator

The rerun of the nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km on Cheyenne this morning once again failed for some of the run_post tasks. This is likely due to issue #652, so I will move forward with merging this work now.

@MichaelLueken MichaelLueken merged commit 55337a7 into ufs-community:develop May 1, 2023
willmayfield pushed a commit to willmayfield/ufs-srweather-app that referenced this pull request May 1, 2023
## DESCRIPTION OF CHANGES: 
* Modifications to `run_WE2E_tests.sh`:
  * Add examples to help/usage statement
* Modifications to `check_expts_status.sh`:
  * Add arguments list that can be processed by `process_args`
  * Add new optional arguments:  `num_log_lines`, `verbose`
  * Include a help/usage message

## TESTS CONDUCTED:
* Ran `run_WE2E_tests.sh --help` from the command line and got the expected help message.
* Ran `check_expts_status.sh --help` from the command line and got the expected help message.
* Used `run_WE2E_tests.sh` to run a set of 2 WE2E tests -- works as expected.
* Used `check_expts_status` to check on the status of the 2 tests run above and got the expected status message.
 
## DEPENDENCIES:
PR #[241](ufs-community#241)

## DOCUMENTATION:
A lot of this PR is documentation in the scripts.  There is an accompanying documentation PR #[241](ufs-community#241) into ufs-srweather-app.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants