A tool for cycling forecast and data assimilation experiments with the MPAS-Atmosphere model and the MPAS-JEDI data assimilation package.
#login to Cheyenne
mkdir -p /fresh/path/for/submitting/experiments
cd /fresh/path/for/submitting/experiments
module load git
git clone https://github.com/NCAR/MPAS-Workflow
#modify configuration as needed in scenarios/ and config/
source env-setup/cheyenne.csh
#OR
source env-setup/cheyenne.sh
./drive.csh
#OR
./run.csh {{runConfig}}
#OR
./test.csh
It is required to set the work
and run
directories in $HOME/.cylc/global.rc as follows:
[hosts]
[[localhost]]
work directory = /glade/scratch/USERNAME/cylc-run
run directory = /glade/scratch/USERNAME/cylc-run
[[[batch systems]]]
[[[[pbs]]]]
job name length maximum = 236
USERNAME
must be filled in with your user-name. It is possible to choose different locations for
the cylc work
and run
directories, as long as you also modify cylcWorkDir
in drive.csh
. It is
recommended to set job name length maximum
to a large value.
The files under the config/
and scenarios/
directories describe the configuration for the
entire workflow. Some files are designed to be modified by users, and others mostly by developers.
config/builds.csh
: describes the build directories for critical applications
config/scenario.csh
: selection of a particular experiment scenario
For many csh
scripts located under config/
and config/applications
, there is a one-to-one
correspondance with yaml files located under scenarios/base/
. For those configuration components,
the csh
script is used to parse the yaml
and/or the identically named section of the scenario
yaml
file (e.g., scenarios/3dvar_OIE120km_WarmStart.yaml
). The base
scenario yaml
's contain
the default values and documentation for each user-configurable setting. Those base
configuration sections are as follows:
scenarios/base/experiment.yaml
: experiment naming conventions
scenarios/base/job.yaml
: account and queue selection
scenarios/base/model.yaml
: model mesh settings
scenarios/base/observations.yaml
: observation source data
scenarios/base/workflow.yaml
: cylc task selection and date bounds
scenarios/base/ensvariational.yaml
scenarios/base/forecast.yaml
scenarios/base/hofx.yaml
scenarios/base/initic.yaml
scenarios/base/rtpp.yaml
scenarios/base/variational.yaml
scenarios/base/verifyobs.yaml
scenarios/base/verifymodel.yaml
While users can directly modify those base
yaml
's to achieve their desired configuration,
it is recommended to modify one of the existing full scenarios located directly under scenarios/
or create a new scenario by copying one of the default scenarios to a new file. Doing so allows
each user to easily distinguish their custom experimental settings from the the GitHub HEAD branch,
while being able to merge recent repository changes without conflict. Users may select a
particular scenario, including a custom one of their own making, within config/scenario.csh
.
Modifications to these scripts are not necessary for typical users. However, there are edge cases outside the design envelope of MPAS-Workflow for which they will need to be extended and/or refactored. It is best practice to discuss such modifications that benefit multiple users via GitHub issues, and then submit pull requests.
generateExperiment.csh
: produces cofig/experiment.csh
, which is a global description of the
workflow file structure and file-naming conventions used across multiple applications
config/environment.csh
: run-time environment used across compiled executables and python scripts
config/modeldata.csh
: static model-space data files, including fixed ensemble forecast members
for deterministic experiments, first guess files for the first cycle
of an experiment, surface variable update files (sst and xice), and common static.nc files to be
used across all cycles.
config/obsdata.csh
: static observation-space data file structure; soon to be replaced by
the observations
configuration section and observations.csh
config/tools.csh
: initializes python tools for workflow task management
If a developer wishes to add a new configuration key beyond the current available options, the
recommended procedure is to add the key, default value, and description in one of the base
yaml
files, then parse the option in the corresponding config/*.csh
file. Developers
are referred to the many existing examples and it is recommended to discuss additional options
to be merged back into the GitHub repository via GitHub issues.
Configuration aspects that are unique to MPAS-Atmosphere
config/mpas/geovars.yaml
: list of templated geophysical variables (GeoVars
) that MPAS-JEDI can
provide to UFO; identical to mpas-jedi/test/testinput/namelists/geovars.yaml
, but duplicated here
so that users modify it at run-time as needed.
config/mpas/variables.csh
: model/analysis variables used to generate yaml
files for MPAS-JEDI applications
E.g., namelist.atmosphere
, streams.atmosphere
, and stream_list.atmosphere.*
config/mpas/forecast/*
: tasks derived from forecast.csh
config/mpas/hofx/*
: tasks derived from HofX.csh
config/mpas/initic/*.csh
: GenerateColdStartIC.csh
and UngribColdStartIC.csh
config/mpas/rtpp/*
: RTPPInflation.csh
config/mpas/variational/*
: Variational
-type tasks derived from either of Variational.csh
or
EnsembleOfVariational.csh
The application-specific yaml
stubs provide a base set of options that are common across most
experiments. Parts of those stubs are automatically populated via the workflow. Advanced
users or developers are encouraged to modify the application-specific yamls directly to suit
their needs.
config/jedi/applications/*.yaml
: MPAS-JEDI application-specific yaml
templates. These will be
further populated by scripts templated on PrepJEDI.csh
and/or PrepVariational.csh
.
config/jedi/ObsPlugs/variational/*.yaml
: observation yaml
stubs that get plugged into Variational
jedi/applications
yamls, e.g., 3dvar.yaml
, 3denvar.yaml
, and 3dhybrid.yaml
config/jedi/ObsPlugs/hofx/*.yaml
: same, but for jedi/applications/hofx.yaml
Creates a new cylc suite file, then runs it. Users need not modify this file. Developers who wish
to add new cylc tasks, or modify the relationships between tasks, will modify drive.csh
and/or
the files in the include
directory:
include/criticalpath.rc
: controls all elements of the critical path for all 4CriticalPathType
options and 2InitializationType
options. Allows for re-use ofinclude/forecast.rc
andinclude/da.rc
according to the user selections. Those latter two scripts describe all the intra-forecast and intra-da dependencies, respectively, independent of tasks in other categories.include/verification.rc
: describes the dependencies betweenHofX
,Verify*
,Compare*
, and other kinds of tasks that produce verification statistics files. It includes dependencies onforecast
andda
tasks that produce the data to be verified. Multiple aspects of verification are controlled via theworkflow
section of the scenario configuration. Full descriptions of all verification options are available inscenarios/base/workflow.yaml
.include/tasks.rc
: describes all cylc tasks that can be selected under the[[dependencies]]
node ofdrive.csh
, all of which are described in eithercriticalpath.rc
,verification.rc
, or files included therein.
See scenarios/base/workflow.yaml
for user-selectable options that control drive.csh
.
run.csh
executes drive.csh
or SetupWorkflow.csh
for a set of pre-defined
scenarios, each of which must be described in a scenario configuration file (i.e.,
scenarios/*.yaml
). The scenario set is selected in runs/*.yaml
. One of those run
configurations is test.yaml
. It is recommended to run the test
scenario set (1) when
a new user first clones the MPAS-Workflow repository and (2) before submitting a GitHub pull request
to MPAS-Workflow. For example, execute the following from
the command-line:
source env-script/cheyenne.${YourShell}
./run.csh test
#OR, equivalently,
./test.csh
Most of the run configurtaions (runs/*.yaml
) only select a single scenario, except for the
automated test. When only one scenario is selected, the user can achieve the same effect by
executing drive.csh
and the choice to use run.csh
is a matter of personal preference.
These scripts serve as templates for multiple workflow components. The actual task scripts that
are selected via drive.csh
are generated by performing sed substitution within SetupWorkflow.csh
and AppAndVerify.csh
. Here we give a brief summary of the design and templating for each script.
CleanHofx.csh
: used to generate CleanHofX*.csh
scripts, which clean HofX
working directories
(e.g., Verification/fc/*
) in order to reduce experiment disk resource requirements.
CleanVariational.csh
: used to generate CleanCyclingDA.csh
, which cleans expensive and
easily reproducible files from the CyclingDA
working directories in order to reduce experiment
disk resource requirements. This is more important for EDA experiments than for single-state
deterministic cycling.
EnsembleOfVariational.csh
: used in the EDAInstance*
cylc task; executes the
mpasjedi_eda
application. Similar to Variational.csh
, except that the EDA is conducted in
a single executable. Multiple EDAInstance*
members with a small number of sub-members can
be conducted simultaneously if it is beneficial to group members instead of running them all
independently like what is achieved via DAMember*
tasks. Users are referred to
scenarios/base/variational.yaml
for configuration information.
forecast.csh
: used to generate all forecast scripts, e.g., CyclingFC.csh
and ExtendedMeanFC.csh
,
which execute mpas_atmosphere
forecasts across a templated time range with state output at a
templated interval. Takes Variational
analyses or cold-start initial conditions (IC) as inputs.
HofX.csh
: used to generate all HofX*
scripts, e.g., HofXBG.csh
, HofXMeanFC.csh
, and
HofXEnsMeanBG.csh
. Each of those executes the mpasjedi_hofx3d
application. Templated w.r.t.
the input state directory and prefix, allowing it to read any forecast state written through the
MPAS-Atmosphere
da_state
stream.
PrepJEDI.csh
: substitutes commonly repeated sections in the yaml
file for all MPAS-JEDI
applications. Templated w.r.t. the application type (i.e., variational
, hofx
) and application
name (e.g., 3denvar
, hofx
). Prepares namelist.atmosphere
, streams.atmosphere
, and
stream_list.atmosphere.*
. Links required static files and graph info files that describe MPI
partitioning.
PrepVariational.csh
: further modifies the application yaml
file(s) for the Variational
task
Variational.csh
: used in the DAMember*
cylc task; executes the mpasjedi_variational
application. Templated w.r.t. the background state prefix and directory. Reads one output
forecast state from a CyclingFCMember*
task, as coded in SetupWorkflow.csh
. Multiple instances
can be launched in parallel to conduct an ensemble of data assimilations (EDA). See
scenarios/base/variational.yaml
for configuration information.
verifyobs.csh
: used to generate scripts that verify observation-database output from HofX
and
Variational
-type tasks.
verifymodel.csh
: used to generate scripts that verify model forecast states with respect to GFS
analyses.
These scripts are used as-is without sed substitution. They are copied to the experiment
workflow directory by SetupWorkflow.csh
.
GenerateColdStartIC.csh
: generates cold-start IC files from GFS analyses
GenerateABEInflation.csh
: generates Adaptive Background Error Inflation (ABEI) factors based on
all-sky IR brightness temperature H(x_mean)
and H_clear(x_mean)
from GOES-16 ABI and Himawari-8
AHI
GetWarmStartIC.csh
: generates links to pre-generated warm-start IC files
MeanBackground.csh
: calculates the mean of ensemble background states
MeanAnalysis.csh
: calculates the mean of ensemble analysis states
ObsToIODA.csh
: converts BUFR and PrepBUFR observation files to IODANC format
RTPPInflation.csh
: performs Relaxation To Prior Perturbation (RTPP) inflation, taking as input two
ensembles, one each of background states and analysis states
AppAndVerify.csh
: generate "Application" and "Verification" cylc-task shell scripts from the
templated workflow task scripts via sed
substitution
getCycleVars.csh
: defines cycle-specific variables, such as multiple formats of the valid date,
and date-resolved directories
SetupWorkflow.csh
:
- Generate the experiment directory
- Copy the current config and scenarios directories to the experiment workflow directory so that a record is kept of all settings
- Copy non-templated task scripts to the experiment directory
- Generate cylc-task shell scripts from via templated substitution of
AppAndVerify.csh
, then execution of application-specificAppAndVerify*.csh
scripts
Each of these tools perform a useful part of the workflow that is otherwise cumbersome to achieve
via shell scripts. The argument definitions for each script can be retrieved by executing
python {{ScriptName}}.py --help
advanceCYMDH
: time-stepping used to figure out dates relative to an arbitrary input date
getYAMLNode
: retrieves a yaml
node key or value from a yaml
file
memberDir
: generates an ensemble member directory string, dependent on experiment- and
application-specific inputs
nSpaces
: generates a string containing the number of spaces that are input. Used for
controlling indentation of some yaml
components
substituteEnsembleBMembers
: replaced by substituteEnsembleBTemplate
substituteEnsembleBTemplate
: generates and substitutes the ensemble background error
covariance members from template
configuration into application yamls that match *envar*
and *hybrid*
. See Variational.csh
for the specific behavior.
updateXTIME
: updates the xtime
variable in an MPAS-Atmosphere
state file so that it can be read
into the model as though it had the correct time stamp
Note for developers: for simple single-processor operations, the preferred practice in
MPAS-Workflow
is to use python scripts. Developers are encouraged to try this approach before
writing source-code for a compiled executable that is more onerous to build and maintain.
Single-node multi-processor tasks may also be carried out in python scripts, which is the current
practice in MPAS-Workflow
verification. However, scalable multi-processor operations, especially
those dealing with complex operations on model state data are often better-handled by compiled
executables.
- Print a list of active suites
cylc scan
- Open an X-window GUI showing the status of all active suites.
cylc gscan
Double-click an individual suite in order to see detailed information or navigate between suites using the drop-down menus. From the GUI, it is easy to perform actions on the entire suite or individual tasks, e.g., hold, resume, kill, trigger. It is also possible to interrogate the real-time progress the cylc tasks being executed and, in some cases, the next tasks that will be triggered. There are multiple views available, including a flow chart view that is useful for new users to learn the dependencies between tasks.
- Shut down a suite (
SUITENAME
) after killing all active tasks
cylc stop --kill SUITENAME
- Trigger all tasks in a suite with a particular
STATUS
(e.g., failed, submit-failed)
cylc trigger SUITENAME "*.*:STATUS"
- Useful c-shell alises based on the above
alias cylcstopkill "cylc stop --kill \!:1"
# usage:
cylcstopkill SUITENAME
alias cylctriggerfailed "cylc trigger \!:1 '*.*:failed'"
# usage:
cylctriggerfailed SUITENAME
alias cylctriggerstatus "cylc trigger \!:1 '*.*:\!:2'"
# usage:
cylctriggerstatus SUITENAME STATUS
This workflow includes automated deletion of some intermediate files. That behavior can be modified
in scripts that look like Clean{{Application}}.csh
. If data storage is still a problem, it is
recommended to remove the Cycling*
directories of an experiment after all desired verification has
completed. The model- and observation-space statistical summary files in the Verification
directory are orders of magnitude smaller than the full model states and instrument feedback files.
Liu, Z., Snyder, C., Guerrette, J. J., Jung, B.-J., Ban, J., Vahl, S., Wu, Y., Trémolet, Y., Auligné, T., Ménétrier, B., Shlyaeva, A., Herbener, S., Liu, E., Holdaway, D., and Johnson, B. T.: Data Assimilation for the Model for Prediction Across Scales – Atmosphere with the Joint Effort for Data assimilation Integration (JEDI-MPAS 1.0.0): EnVar implementation and evaluation, Geosci. Model Dev. Discuss. [preprint], https://doi.org/10.5194/gmd-2022-133, in review, 2022
Oliver, H., Shin, M., Matthews, D., Sanders, O., Bartholomew, S., Clark, A., Fitzpatrick, B., van Haren, R., Hut, R., and Drost, N.: Workflow Automation for Cycling Systems, Computing in Science & Engineering, 21, 7–21, https://doi.org/10.1109/mcse.2019.2906593, 2019.
Skamarock, W. C., Klemp, J. B., Duda, M. G., Fowler, L. D., Park, S.-H., and Ringler, T. D.: A Multiscale Nonhydrostatic Atmospheric Model Using Centroidal Voronoi Tesselations and C-Grid Staggering, Monthly Weather Review, 140, 3090–3105, https://doi.org/10.1175/mwr-d-11-00215.1, 2012.
- Maryam Abdi-Oskouei
- Junmei Ban
- Ivette Hernandez Banos1 (ivette@ucar.edu)
- Jamie Bresch
- JJ Guerrette1 (guerrett@ucar.edu)
- Soyoung Ha
- BJ Jung
- Zhiquan Liu
- Chris Snyder
- Craig Schwartz
- Steven Vahl
- Yali Wu
- Yonggang Yu
These people have contributed any of the following: GitHub pull requests and review, data, scripts on which workflow tasks are templated, source code, or critical consultation.