-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract common j-job tasks #1230
Abstract common j-job tasks #1230
Conversation
Takes all of the tasks that are common to all j-jobs and abstracts them out into a shared script that is sourced by each job: - Set and create $DATA - Call setpy and set $cycle - Set pid, pgmout, and pgmerr - Source config files - Source machine environment file The common j-job header script is called by passing a list of config files to source: ``` ${HOMEgfs}/ush/jjob_header.sh [config1 [config2 [...]]] ``` Some pre j-job rocoto entry scripts (`jobs/rocoto/*`) are currently doing much more than they should be. These sometimes required extra finagling, usually pre-calling the jjob header in the rocoto script before it does something.
The ocean analysis jobs appear to need a persistent $DATA directory, so the j-jobs have had their previous settings for $DATA restored. Additionally, the j-job header now wipes any existing $DATA directory if the variable $WIPE_DATA is set to "YES", which is the default. To allow the persistent $DATA for the ocean analysis jobs, the RUN and POST j-jobs set $WIPE_DATA to "NO".
The gempak j-job was always adding the CPU ranks to the CFP command file. These are only needed (or allowed) on slurm, which prevented the job from running properly on WCOSS2. Following the paradigm of existing jobs, now the ranks are only added if `$CFP_MP` is "YES".
92cb4e7
to
0d45879
Compare
Fix one minor thing to shut the linter up even though the line was not touched in this PR.
First wave of testing passes on all machines. Pulled in develop changes and running a last test on Orion where I'm going to compare output against develop. |
These were meant to be part of 0d45879
Output confirmed identical to develop. Had to revert PR #1158 locally on both to get the analysis to run successfully, at least with the chosen date. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In spirit, this is a neat feature.
However, there are quite a few realizations where standardization likely hurts in the transparency of the jobs.
The j-job header is altered to require the specification of the job name to pass to the environment script rather than using the inherent job name since the ECFlow names are different and some jjobs call the environment script with different names. J-job header arguments are now passed in with options, with `-e` setting the job name to provide the environment script (required) and `-c` passing the list of config files (the former positional arguments). A few jobs not yet in the dev workflow were not previously calling the environment script. For those, I made up new job names as placeholders (the environment scripts accept unrecongnized names). Also added error messages for fatal errors and expanded documentation a bit more.
The ocean analysis jobs were accepting previous values of `$DATA` if it were set, which is not necessary.
Previous commit moving the jjob_header after checking the cycle didn't work because variables from the config files are needed. Instead, moved the header back where it was and delete `$DATA` if we are aborting due to the cycle.
There was no non-zero exit in the jjob header after printing the fatal error message when the job name for the environment file is not provided with `-e`.
Testing complete after addressing reviewer comments. Output still identical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great.
A lot of duplicate lines have been eliminated.
I like that the header takes in named arguments that improves readability.
A few inline comments could use some explaining.
* develop: Correct issue in linking final restart files (NOAA-EMC#1285) Remove execute permissions from config files (NOAA-EMC#1281) Make needed updates to run forecast from GEFS (NOAA-EMC#1203) Remove unnecessary variables which reference to nemsio (NOAA-EMC#1259) Create analysis files for early-cycle EnKF by default (NOAA-EMC#1237) Don't wipe $DATA before running ocean bmat (NOAA-EMC#1280) More marine DA j-jobs (NOAA-EMC#1270) Update UFS-DA atmospheric prep script to be consistent with GDASApp update (NOAA-EMC#1265) Add new jjob for ocean analysis bmat (NOAA-EMC#1239) Retire ecf/versions in develop (NOAA-EMC#1267) Deploy documentation to RTD (NOAA-EMC#1264) Temporarily disable failing pytest (NOAA-EMC#1263) Remove incorrect/misleading comments in config.base (NOAA-EMC#1261) Add initial Sphinx documentation (NOAA-EMC#1258) Remove nemsio support (NOAA-EMC#1255) Increase wallclock for diag jobs (NOAA-EMC#1216) Use correct resources for GFS gempak (NOAA-EMC#1214) Abstract common j-job tasks (NOAA-EMC#1230) Add missing mkgfsawps.x link (NOAA-EMC#1218) Fix post sounding job (NOAA-EMC#1212) Revert "Use fracoro data for all new UFS applications (NOAA-EMC#1182)" (NOAA-EMC#1240) Use fracoro data for all new UFS applications (NOAA-EMC#1182) Revert "Merge GFS v16.3 operational GSI changes into develop branch. (NOAA-EMC#1158)" (NOAA-EMC#1238) Add more user defined parameters for the marine DA (NOAA-EMC#1235) Update pytests action version and run sequentially (NOAA-EMC#1236) Add utility to compare Fortran namelists (NOAA-EMC#1234) Updates for pygw (NOAA-EMC#1231) Merge GFS v16.3 operational GSI changes into develop branch. (NOAA-EMC#1158) Move member up in directory hierarchy (NOAA-EMC#1201) Enable staging ics for cycled experiments. (NOAA-EMC#1199) Add tests for configuration.py (NOAA-EMC#1192) Replace ocnanal_${CDATE}} with ${RUN}ocnanal_${cyc} (NOAA-EMC#1191) define NET and RUN in the Rocoto XML to accurately mimic the ecf in ecflow (NOAA-EMC#1193) Fix checking for restart files (NOAA-EMC#1186) Fix 'DEBUG' option in build_ufs.sh (NOAA-EMC#1188) Update archive job memory request value for R&Ds (NOAA-EMC#1183) Reorder post so all flux files are generated when running offline (NOAA-EMC#1181) Stop checking for restarts on non-GFS CDUMPs (NOAA-EMC#1179) Add missing jobids in some pre-job scripts (NOAA-EMC#1176) Remove existing directory if it exists when getic runs (NOAA-EMC#1165) Add logging decorator, test and test for yaml_file (NOAA-EMC#1178) fix coding norm check in `hosts.py` (NOAA-EMC#1174) Fix some bugs and make other changes so ctest in GDASApp works (NOAA-EMC#1172) Support for the GDASApp testing in containers (NOAA-EMC#1151) ATM 3DVAR with and without IAU (NOAA-EMC#1113) Enable checking for python norms and fix violating code (NOAA-EMC#1168) Enforce decimal math in atmos post (NOAA-EMC#1171) Update marine DA j-jobs to new format (NOAA-EMC#1149) Add utility to manipulate files en masse (NOAA-EMC#1166) add action to run pytests (NOAA-EMC#1167) Pin `differential-shellcheck` to `v3` tag (NOAA-EMC#1162) Add a task base class and basic logger (NOAA-EMC#1160) Recursively convert dict to AttrDict when making an AttrDict (NOAA-EMC#1154) move configuration.py to pygw. Use it from there. return AttrDict after sourcing configs (NOAA-EMC#1153) JEDI based Marine DA tasks (NOAA-EMC#1134) Allow customizations based on user/configuration (NOAA-EMC#1146) First step towards making j-jobs consistent in use from ecflow and rocoto (NOAA-EMC#1120) enable APP=S2SWA on WCOSS2 (NOAA-EMC#1142) Fix typo in .shellcheckrc Remove prod_envir module load from WCOSS2 (NOAA-EMC#1138) Link staged GSI fix files instead of cloning them from gerrit (NOAA-EMC#1132) Address shellcheck warnings in env files (NOAA-EMC#1136) Adds group size and nmem for GEFS (NOAA-EMC#1127) Remove unnecessary sCDATE assignment in forecast_predet.sh (NOAA-EMC#1133) Convert archive jobs to proper j-jobs (NOAA-EMC#1115) Update C48 forecast to run with one thread (NOAA-EMC#1131) Improved error messages from atmos analysis (NOAA-EMC#1125) Update MODULEPATH for Orion (NOAA-EMC#1126) MPMD variable updates and fix (NOAA-EMC#1124) Introduce FHMAX_ENKF_GFS to extending ensemble forecast capabilities (NOAA-EMC#1122) Update R&D launcher commands for tasks and multi-prog (NOAA-EMC#1112) Correct crtm path in UFS DA atmospheric analysis scripts (NOAA-EMC#1111) Correct syntax in remaining sorc scripts (NOAA-EMC#1105) Add GSI background error covariance as an option for UFS DA variational assimilation (NOAA-EMC#1104) Add Early Cycle EnKF workflow (NOAA-EMC#1022) Correct errors with gdas and monitoring symlinks (NOAA-EMC#1101) Fixed gfs-utils links (NOAA-EMC#1099) Fix build scripts and bring into compliance (NOAA-EMC#1096) Feature/updates for gdas app (NOAA-EMC#1091) Change GLDAS USE_CFP to NO on Hera (NOAA-EMC#1094) Resource updates to support WCOSS2 (NOAA-EMC#1070) Set COMPILER in link for detect machine (NOAA-EMC#1092) gfs utils update (NOAA-EMC#1088) GFS-UTILS update for build and ush scripts (NOAA-EMC#1082) Update UFS version to 2022 Oct 19 (NOAA-EMC#1083) Use more cycledefs for task control (NOAA-EMC#1078) removing superfluous EFSOI-specific files from develop (NOAA-EMC#1079) Update UFS to Sept 9 version (NOAA-EMC#1073) Modify default file location for monitor data when using rocoto (NOAA-EMC#1065) Fix companion ocean resolution for C48 (NOAA-EMC#1066) Add trailing slash for gldas topo path (NOAA-EMC#1064) Limit number of CPU for post (NOAA-EMC#1061) Fix eupd trace (NOAA-EMC#1057) Port to S4 (NOAA-EMC#1023) Update to obsproc.v1.0.2 and prepobs.v1.0.1 (NOAA-EMC#1049) Add GDAS to the partial build list (NOAA-EMC#1050) Fix group number being treated as octal in gdas arch (NOAA-EMC#1053) Remove trace from link script (NOAA-EMC#1046) Update gfs-utils hash to 3a609ea (NOAA-EMC#1048) Fix link script usage statement (NOAA-EMC#1045) Replace preamble variable commands with functions (NOAA-EMC#1012) Implement fix reorg and remove gfs-utils code (NOAA-EMC#1009) Rename post scripts (NOAA-EMC#1038) Fix missing @ symbol with COMINsyn in config.base (NOAA-EMC#1039) WCOSS2 run support and script/config updates (NOAA-EMC#1030) Remove base_svn from Hera and Orion hosts files (NOAA-EMC#1036) initial commit for incoming yaml work (NOAA-EMC#1029) Fix radiance verification failing to find diag files (NOAA-EMC#1031) Supported resolutions on platforms and defaults for mode (NOAA-EMC#1026) Add GLDAS scripts & fix GLDAS job (NOAA-EMC#1018) Update GSI Monitor for radmon fix Correct shell linter config (NOAA-EMC#1013) Correct diagnostic file handling in ush/ozn_xtrct.sh (NOAA-EMC#1016) Add shell linter Github action for pull requests (NOAA-EMC#1007) Build updates for WCOSS2 (NOAA-EMC#1002) Update UFS_UTILS tag to `ufs_utils_1_8_0` (NOAA-EMC#1001) Fix preamble id (NOAA-EMC#996) Add missing "atmos" into job dependencies (NOAA-EMC#998) Bugfix in arch.sh to remove hardwired "htar" (NOAA-EMC#992) Add in stubs for aerosol DA tasks + bugfix for setup_expt where cycled and ATMA are used (NOAA-EMC#990) Add GSI monitor scripts (NOAA-EMC#969) Fix product generation at some fcst hrs (NOAA-EMC#988) Add initial config files for global aerosol DA (NOAA-EMC#986) Update diag table to remove wav-ocn coupling fields (NOAA-EMC#979) use a robust Findwgrib2.cmake to find wgrib2 built w/ native wgrib2 build (NOAA-EMC#970) Externals.cfg was stale and had drifted off (NOAA-EMC#965) Fix post comparison with zero-padded numbers (NOAA-EMC#964)
Description
Takes all of the tasks that are common to all j-jobs and abstracts them out into a shared script that is sourced by each job:
The common j-job header script is called by passing the job name for the
${machine}.env
files using the-e
option, and a list of config files to source with the-c
option.The job name argument (
-e
) is mandatory, and the config list is optional but recommend to always use as well.Some pre j-job rocoto entry scripts (
jobs/rocoto/*
) are currently doing much more than they should be. These sometimes required extra finagling, usually pre-calling the jjob header in the rocoto script before it does something.Refs: #1069
Type of change
How Has This Been Tested?
Post sounding (bufr) and gempak jobs complete successfully
AWIPS and WAFS failing for unrelated reasons
GDAS App jobs not tested
Some j-jobs are not called in the development workflow. TODOs an corresponding issues have been added to these.
Checklist
Dependencies
#1212
#1214
#1216
#1218