Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update wrapper scripts to work on all supported systems #878

Closed
gspetro-NOAA opened this issue Aug 8, 2023 · 3 comments · Fixed by #883
Closed

Update wrapper scripts to work on all supported systems #878

gspetro-NOAA opened this issue Aug 8, 2023 · 3 comments · Fixed by #883
Labels
bug Something isn't working

Comments

@gspetro-NOAA
Copy link
Collaborator

gspetro-NOAA commented Aug 8, 2023

Currently, the wrapper scripts no longer run as-is on Hera (and possibly other RDHPCS systems).

Expected behavior

Running the workflow with standalone scripts using the instructions in the documentation for running with standalone scripts should allow users to run the experiment to completion on any supported system.

Current behavior

When I run the workflow according to the instructions in the documentation for running with standalone scripts, the initial tasks all come up with error messages on Hera.

Machines affected

Running the make_grid wrapper scripts on Hera led to a variety of errors. I have not tested yet on other systems.

Steps To Reproduce

After generating the workflow and running: ./run_make_grid.sh, the following error came up:

+ /scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 108: /source_util_funcs.sh: No such file or directory
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 109: source_config_for_task: command not found
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 110: /job_preamble.sh: No such file or directory
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 129: -f: command not found
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 145: print_info_msg: command not found
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 155: check_for_preexist_dir_file: command not found
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 156: mkdir_vrfy: command not found
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 165: mkdir_vrfy: command not found
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 174: /exregional_make_grid.sh: No such file or directory
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 176: print_err_msg_exit: command not found
touch: cannot touch ‘/make_grid_task_complete.txt’: Read-only file system
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID: line 208: job_postamble: command not found

After updating the scripts to add 'a' in set -xa for all wrapper files and rerunning ./run_make_grid.sh, the following error appears:

========================================================================
Entering script:  "JREGIONAL_MAKE_GRID"
In directory:     "/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs"

This is the J-job script for the task that generates grid files.
========================================================================

Specified directory or file (dir_or_file) already exists:
  dir_or_file = "/scratch2/NAGAPE/epic/Gillian.Petro/expt_dirs/standalone/grid"
Moving (renaming) preexisting directory or file to:
  old_dir_or_file = "/scratch2/NAGAPE/epic/Gillian.Petro/expt_dirs/standalone/grid_old001"

========================================================================
Entering script:  "exregional_make_grid.sh"
In directory:     "/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/scripts"

This is the ex-script for the task that generates grid files.
========================================================================
/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/scripts/exregional_make_grid.sh: line 63: ulimit: stack size: cannot modify limit: Operation not permitted
End exregional_make_grid.sh at Mon Aug  7 22:13:29 UTC 2023 with error code 1 (time elapsed: 00:00:00)
FATAL ERROR: 
ERROR:
  From script:  "JREGIONAL_MAKE_GRID"
  Full path to script:  "/scratch2/NAGAPE/epic/Gillian.Petro/ufs-srweather-app/jobs/JREGIONAL_MAKE_GRID"
Call to ex-script corresponding to J-job "JREGIONAL_MAKE_GRID" failed.
Exiting with nonzero status.
End JREGIONAL_MAKE_GRID at Mon Aug  7 22:13:29 UTC 2023 with error code 1 (time elapsed: 00:00:02)

Detailed Description of Fix (optional)

Likely, we will need to update the wrapper scripts and/or their documentation to implement the changes from #847 that allow Jenkins tests to run the wrappers successfully.

@gspetro-NOAA gspetro-NOAA added the bug Something isn't working label Aug 8, 2023
@MichaelLueken
Copy link
Collaborator

@gspetro-NOAA -

To correct the ulimit: stack size: cannot modify limit: Operation not permitted failure, the ush/machine/hera.yaml file will need to be modified as so:

PRE_TASK_CMDS: '{ ulimit -s unlimited; ulimit -a; }'

will need to be changed to:

PRE_TASK_CMDS: '{ ulimit -S -s unlimited; ulimit -a; }'

This is the trick that @natalie-perlin and @BruceKropp-Raytheon found that worked for Hera.

@gspetro-NOAA
Copy link
Collaborator Author

gspetro-NOAA commented Aug 8, 2023

@MichaelLueken Is this something that will work on all systems? If so, could I just tell users in the docs to make that adjustment in their config file when running w/o Rocoto?

@MichaelLueken
Copy link
Collaborator

@gspetro-NOAA - Unfortunately, I can't say whether this will work on all systems or not. This is only being used to allow the wrapper scripts to run on Hera. Additional testing will be required to see if this will work on other systems.

@MichaelLueken MichaelLueken linked a pull request Aug 24, 2023 that will close this issue
37 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants