ECE3-POSTPROC is a suite of post-processing tools for EC-Earth 3. It includes HIRESCLIM2, ECMEAN, TIMESERIES and AMWG. The last three require the output from the first one. A REPRODUCIBILITY TEST, which relies on ECMEAN output is also provided.
The code has been ported on cca (ECMWF), rhino (KNMI), marconi (CNR), and marenostrum4 (BSC). See the Porting instructions below to include other machines.
You first need to get the code:
git clone https://github.com/plesager/ece3-postproc.git
and the dedicated data set available from ECMWF archive:
ec:/nm6/EC-EARTH/ECEARTH3.2b/INPUT/ece-post-proc.tar.gz
The general idea is that you pass the experiment name (EXP) and the years to process (typically a range, say, YEAR1 YEAR2) as input to the scripts on the command line.
For this to work as intended, locations of the post-processing code, of the data used to run the EC-Earth model, and your platform, must be known. They are unlikely to change and set as environment variables, so in your shell rc file (~/.bashrc or equivalent):
export ECE3_POSTPROC_TOPDIR=<dir where this file is> export ECE3_POSTPROC_DATADIR=<dir where your ecearth init data (not run output!) are located> export ECE3_POSTPROC_MACHINE=<name of your (HPC) machine>
The ”name of your (HPC) machine” is used to retrieve the machine configuration. The list of available platforms is obtained with:
ls ./conf/
If yours is not present, you need to port the code. See the Porting section below.
Optionally, you can also set your HPC account:
export ECE3_POSTPROC_ACCOUNT=<HPC account>
If not set, your default account is used to submit job to HPC. At ECMWF, this is the 1st one in the list you get with the command (on ecgate only): “account -l $USER”
If you want to temporarily change the HPC account, you can also use the command line option when calling each tool.
Finally, the code relies on $SCRATCH and $USER being defined. If not, define one. A lot of temporary data files, job scripts and their log are being written on the $SCRATCH. The $USER is used in few job manager commands (SLURM or PBS), and should already be defined.
If you use a job scheduler (SLURM, PBSpro, …), there is a set of wrappers that let you submit jobs in parallel. If not, you can still run the wrapped scripts. The wrappers calls are described here [TODO: add description for case without wrappers]. All the calls are made in the script sub-directory of the package.
Each tool has its specifics set up. If the code has already been ported to your machine, you will have little to change. The settings define the needed executable/lib (cdo, nco, netcdf,…), location of auxiliary data, of run output, and where to save the results. All configurations are collected in:
./conf/<your-machine>/conf_<tool>_<your-machine>.sh
cd ${ECE3_POSTPROC_TOPDIR}/script ./hc.sh [-c] [-6] [-a account] [-u userexp] [-m months_per_leg] EXP YEAR1 YEAR2 REF
This will create a set of netcdf files with monthly global averages in the ${ECE3_POSTPROC_POSTDIR}/post directory, which is defined in the config file.
For more information about the script options, just call
./hc.sh -h
Upon success, ${ECE3_POSTPROC_POSTDIR}/postcheck_EXP_YYYY.txt files are created with some basic information. By repeating the command with the -c option, these files are printed. In case of problem the location of the log is printed.
In “./conf/<your-machine>/conf_hiresclim_<your-machine>.sh”, you can set some options (if you want daily or 6h output on top of the monthly ones, or the nemo_extra output for example). However the most important settings that you have to change are the templates for the model output and for the results location. For example:
export IFSRESULTS0='/scratch/ms/nl/${USER}/ECEARTH-RUNS/${EXPID}/output/ifs/${LEGNB}' export NEMORESULTS0='/scratch/ms/nl/${USER}/ECEARTH-RUNS/${EXPID}/output/nemo/${LEGNB}' export ECE3_POSTPROC_POSTDIR='/scratch/ms/nl/${USER}/ECEARTH-RUNS/${EXPID}/post'
The first two must include at least either the ${year} or the ${LEGNB} to find the
correct data, and must be single-quoted. The
conf/rhino/conf_hiresclim_rhino.sh
The NEMO variable names expected by HIRESCLIM2 may differ from those found in EC-Earth3 output. If needed, you can change the variables name from EC-Earth in conf_hiresclim_<your-machine>.sh. Note for example that for the ice thickness from the icemod file, you need:
export nm_icethic="sithic" # for EC-Earth (3.2.3 and above) default output export nm_icethic="sithick" # for older version like PRIMAVERA
If you have IFS output with mixed time and/or levels output, hiresclim2 will not work unless you filter the output beforehand. A clue that you need filtering is when you have errors like:
cdo setdate: Started child process "settime,00:00:00 -timmean icmgg2df_195001 (pipe1.1)". cdo(2) settime: Started child process "timmean icmgg2df_195001 (pipe2.1)". Warning (cgribexScanTimestep2) : Record 144 (id=133.128 lev1=1 lev2=0) timestep 2: Parameter not defined at timestep 1! cdo(3) timmean: Open failed on >icmgg2df_195001< Unsupported file structure
Filtering can be done with grib_filter, and is readily available in hiresclim. You can activate a pass through grib_filter by uncommenting the lines in the conf_hiresclim_<your-machine>.sh file that start with:
FILTERGG2D FILTERGG3D FILTERGGSH
As written now in some of the conf_hiresclim_<platform>.sh files, these filters are activated if you set the CMIP6=1 in the config file. Going one step further, on some platforms, calling hc.sh with the -6 option automatically set CMIP6=1 in the config file (see the cca and rhino platforms for example).
./ecm.sh [-a account] [-r rundir] [-u USERexp] [-c] [-y] [-p] EXP YEAR1 YEAR2
The options are the same as for hiresclim2. For details, call
./ecm.sh -h
Output tables with Performance Indices and mean global fluxes are found in:
${ECE3_POSTPROC_DIAGDIR}/table/${EXPID}
and one line summary is found:
${ECE3_POSTPROC_DIAGDIR}/table/globtable.txt ${ECE3_POSTPROC_DIAGDIR}/table/gregory.txt
If the option -y was used, you also get yearly global means available in:
${ECE3_POSTPROC_DIAGDIR}/table/yearly_fldmean_${exp}.txt
and its subset
${ECE3_POSTPROC_DIAGDIR}/table/gregory_${exp}.txt
which has only the three variables needed for a Gregory plot.
The default output directory ${ECE3_POSTPROC_DIAGDIR} is set in the
$ECE3_POSTPROC_TOPDIR/conf/${ECE3_POSTPROC_MACHINE}/conf_ecmean_${ECE3_POSTPROC_MACHINE}.sh
config file.
You can quickly check for success by executing the command again with -c option. It will print the summary line from globtable.txt and gregory.txt files, if they exist. For more insight, have a look at the submitted scripts and logs, which are in $SCRATCH/tmp_ece3_ecmean.
EC-Mean creates a climatology from the experiment to derive the performance indices. The climatology is by default in the same directory as the HIRESCLIM2 output:
${ECE3_POSTPROC_POSTDIR}/clim-${YEAR1}-${YEAR2}
and not removed, since it can be use for other purposes (notably the reproducibility test).
amwg.sh [-a account] [-r altdir] [-u USERexp] EXP YEAR1 YEAR2
ts.sh [-l] [-a account] [-u userexp] [-r POSTDIR] [-c] EXP
It will create and store time-series plots of several variables. It is smart enough to update an existing series, i.e. you can run it several time during an on-going run to monitor it.
Timeseries for one experiment EXP will be in the diagnostic dir ${ECE3_POSTPROC_DIAGDIR}/timeseries/EXP, as two netCDF files and two html pages (one for atmosphere and one for ocean).
The output can be put on a remote machine through ssh and scp. See remote variables RHOST, RUSER, WWW_DIR_ROOT in the config file. If access to the remote machine is possible only from the login node, you should run the script on that node with the -l flag to get scp to work. But that will slow down the computing of the timeseries -if allowed-. It is then better to switch this off (with empty RHOST) for computing, and then either switch it on and re-run to just copy to the remote or write your own dedicated script/CLI bash function.
The acceptance/reproducibility test consists in 3+1 steps:
- run an ensemble of 5 members
- running EC-mean to get the climatology and the Reichler & Kim (R&K) performance indices of each run
- cast the R&K indices into a format suitable for the next step
Several ensembles, corresponding to different setups (platform, compiler,…), must be run. Then a statistical comparison (4th step) is performed.
The acceptance/reproducibility test (4th step) relies on a set of scripts written in R. Few R packages are needed: s2dverification, ncdf4, RColorBrewer. If you do not control your environment and R and/or the packages are missing, it may be easier to work on another machine where you can easy installed the packages. For example:
# define a personal R library location, mkdir /usr/people/sager/Rlib # and make sure that R is aware of it (put that one in your ~/.bashrc): export R_LIBS=/usr/people/sager/Rlib/ # within R, install: install.packages("s2dverification", lib="/usr/people/sager/Rlib/") install.packages("ncdf4", lib="/usr/people/sager/Rlib/") install.packages("RColorBrewer", lib="/usr/people/sager/Rlib/")
If you are not doing the test yourself, but only run an ensemble and EC-mean on its members, you do not need these R packages.
You must run 5 experiments for 20 years with perturbed initial conditions. Your experiments name should be made of 3 characters (the stem) followed by a number from 1-to-5. For example: cca1, cca2, cca3, cca4, cca5. The stem uniquely defines your ensemble. If you do not follow this format, collecting the R&K indices in a format suitable for the comparison scripts will be slightly more complicated but still feasible (see below). Your runs will differ by their initial conditions, which require some setup.
you can create these initial conditions on the fly, by adding a call to the perturbation script in your classic/ece-*.sh.tmpl, i.e. by replacing (be sure that there is no space after each ‘'):
ln -s \ ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMSHECE3INIT \ ICMSH${exp_name}INIT
with
# apply AMIP perturbation to 3D temperature ${ECE3_POSTPROC_TOPDIR}/reproducibility/perturb_ifs_ic.py -s t \ ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMSHECE3INIT \ ICMSH${exp_name}INIT
If you are using the initial conditions from 1950 provided by BSC as laid out in the next section, you should use (5 lines to change):
Index: ece-esm.sh.tmpl =================================================================== --- ece-esm.sh.tmpl (revision 5836) +++ ece-esm.sh.tmpl (working copy) @@ -25,7 +25,7 @@ # config="ifs nemo lim3 rnfmapper xios:detached oasis lpjg:fdbck" # "Veg" : GCM+LPJ-Guess # config="ifs nemo lim3 rnfmapper xios:detached oasis tm5:chem,o3,ch4,aero" # "AerChem" : GCM+TM5 -config="ifs nemo lim3 rnfmapper xios:detached oasis lpjg:fdbck tm5:co2" +config="ifs amip" # minimum sanity has_config amip nemo && error "Cannot have both nemo and amip in config!!" @@ -493,13 +493,15 @@ # Initial data ln -s \ - ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMGGECE3INIUA \ + <full-path-to-your-ic-dir>/atmos/ICMGGa0raINIUA \ ICMGG${exp_name}INIUA - ln -s \ - ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMSHECE3INIT \ + # apply AMIP perturbation to 3D temperature + ${ECE3_POSTPROC_TOPDIR}/reproducibility/perturb_ifs_ic.py -s t \ + <full-path-to-your-ic-dir>/atmos/ICMSHa0raINIT \ ICMSH${exp_name}INIT + rm -f ICMGG${exp_name}INIT - cp ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMGGECE3INIT \ + cp <full-path-to-your-ic-dir>/atmos/ICMGGa0raINIT \ ICMGG${exp_name}INIT # add bare_soil_albedo to ICMGG*INIT
Then, using your favorite method, run 5 experiments with a name that ends with 1,…,5.
UPDATE There is an alternative location of the BSC-1950 archive at ECMWF:
ec:/nm6/EC-EARTH/ECEARTH3.2b/INPUT/ece-data-reproducibility.tar.gz
It contains perturbed initial conditions for AMIP runs, which can be used directly by replacing:
# Initial data ln -s \ ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMGGECE3INIUA \ ICMGG${exp_name}INIUA ln -s \ ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMSHECE3INIT \ ICMSH${exp_name}INIT rm -f ICMGG${exp_name}INIT cp ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMGGECE3INIT \ ICMGG${exp_name}INIT
with (assuming you unpacked the data in your EC-Earth ini_data_dir):
ln -s ${ini_data_dir}/ic/atmos/ICMGGa0raINIUA ICMGG${exp_name}INIUA ln -s ${ini_data_dir}/ic/atmos/0${exp_name:3}/ICMSHa${exp_name:3}raINIT ICMSH${exp_name}INIT rm -f ICMGG${exp_name}INIT cp ${ini_data_dir}/ic/atmos/ICMGGa0raINIT ICMGG${exp_name}INIT
This may be solution if you cannot install the grib_api module for python.
A perturbation script is also available for ocean restart but has not been tested yet. But you can used perturbed ocean restarts already prepared beforehand. For example, with the following 1950 initial conditions provided by BSC, which are available through ftp, see https://dev.ec-earth.org/issues/447#note-1, and look like this once unpacked:
ic ├── atmos │ ├── ICMGGa0raINIT │ ├── ICMGGa0raINIUA │ └── ICMSHa0raINIT ├── ice │ └── a0ra_fc0_19491231_restart_ice.nc └── ocean ├── a0ra_fc0_19491231_restart.nc ├── a0ra_fc1_19491231_restart.nc ├── a0ra_fc2_19491231_restart.nc ├── a0ra_fc3_19491231_restart.nc └── a0ra_fc4_19491231_restart.nc
You just need to submit 5 runs that start from these different restarts. What follows is some tips to help you streamline the process. Start by reorganizing the initial conditions so you can use the same script template in all your runtime dirs. For example, you can:
cd ic/ocean/ mkdir 0{1..5} for k in {1..5}; do cd 0$k; ln -s ../a0ra_fc$((k-1))_19491231_restart.nc restart_oce.nc ; cd - ; done for k in {1..5}; do cd 0$k; ln -s ../../ice/a0ra_fc0_19491231_restart_ice.nc restart_ice.nc ; cd - ; done
which gives you:
[2041] >>> tree ic ic ├── atmos │ ├── ICMGGa0raINIT │ ├── ICMGGa0raINIUA │ └── ICMSHa0raINIT ├── ice │ └── a0ra_fc0_19491231_restart_ice.nc └── ocean ├── 01 │ ├── restart_ice.nc -> ../../ice/a0ra_fc0_19491231_restart_ice.nc │ └── restart_oce.nc -> ../a0ra_fc0_19491231_restart.nc ├── 02 │ ├── restart_ice.nc -> ../../ice/a0ra_fc0_19491231_restart_ice.nc │ └── restart_oce.nc -> ../a0ra_fc1_19491231_restart.nc ├── 03 │ ├── restart_ice.nc -> ../../ice/a0ra_fc0_19491231_restart_ice.nc │ └── restart_oce.nc -> ../a0ra_fc2_19491231_restart.nc ├── 04 │ ├── restart_ice.nc -> ../../ice/a0ra_fc0_19491231_restart_ice.nc │ └── restart_oce.nc -> ../a0ra_fc3_19491231_restart.nc ├── 05 │ ├── restart_ice.nc -> ../../ice/a0ra_fc0_19491231_restart_ice.nc │ └── restart_oce.nc -> ../a0ra_fc4_19491231_restart.nc ├── a0ra_fc0_19491231_restart.nc ├── a0ra_fc1_19491231_restart.nc ├── a0ra_fc2_19491231_restart.nc ├── a0ra_fc3_19491231_restart.nc └── a0ra_fc4_19491231_restart.nc
UPDATE There is an alternative location of the archive at ECMWF:
ec:/nm6/EC-EARTH/ECEARTH3.2b/INPUT/ece-data-reproducibility.tar.gz
it already has these links in place and contains perturbed initial conditions for AMIP runs.
Then you modify your ece-esm.sh.tmpl template script to account for that data tree as follow (just 5 lines to change):
Index: ece-esm.sh.tmpl =================================================================== --- ece-esm.sh.tmpl (revision 5836) +++ ece-esm.sh.tmpl (working copy) @@ -25,7 +25,7 @@ # config="ifs nemo lim3 rnfmapper xios:detached oasis lpjg:fdbck" # "Veg" : GCM+LPJ-Guess # config="ifs nemo lim3 rnfmapper xios:detached oasis tm5:chem,o3,ch4,aero" # "AerChem" : GCM+TM5 -config="ifs nemo lim3 rnfmapper xios:detached oasis lpjg:fdbck tm5:co2" +config="ifs nemo:start_from_restart lim3 rnfmapper xios:detached oasis" # minimum sanity has_config amip nemo && error "Cannot have both nemo and amip in config!!" @@ -215,7 +215,7 @@ # This is only needed if the experiment is started from an existing set of NEMO # restart files -nem_restart_file_path=${start_dir}/nemo-rst +nem_restart_file_path="<full-path-to-your-ic-dir>/ocean/0${exp_name:3}" nem_restart_offset=0 @@ -493,13 +493,13 @@ # Initial data ln -s \ - ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMGGECE3INIUA \ + <full-path-to-your-ic-dir>/atmos/ICMGGa0raINIUA \ ICMGG${exp_name}INIUA ln -s \ - ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMSHECE3INIT \ + <full-path-to-your-ic-dir>/atmos/ICMSHa0raINIT \ ICMSH${exp_name}INIT rm -f ICMGG${exp_name}INIT - cp ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMGGECE3INIT \ + cp <full-path-to-your-ic-dir>/atmos/ICMGGa0raINIT \ ICMGG${exp_name}INIT # add bare_soil_albedo to ICMGG*INIT
Then, using your favorite method, run 5 experiments with a name that ends with 1,…,5.
For each of your 5 experiments, you need to run hireclim2 followed by EC-mean to get their resulting climatology and their Reichler-Kim performance indices. For example, assuming your experiment runs from 1990-2009:
# Get monthly means cd ${ECE3_POSTPROC_TOPDIR}/script for k in {1..5}; do ./hc.sh cca${k} 1990 2009 1990; done # Once the /hc.sh/ jobs are finished, get climatology and PI for k in {1..5}; do ./ecm.sh cca${k} 1990 2009; done
Then you need to gather the PI results into a format suitable for the R scripts:
cd ${ECE3_POSTPROC_TOPDIR}/reproducibility/ ./collect_ens.sh [-t] STEM NB_MEMBER YEAR1 YEAR2
The -t option let you collect both the PI indices and the climatology from each run into a tar file in your $SCRATCH. This is *useful for sharing and then being able to compare with other ensemble results*.
If your run names and/or EC-mean output do not follow the default settings, you can still collect the data without too much work. Indeed the collect_ens.sh is essentially one line of code that is easy to hack and run at the command line or an ad hoc script:
var2d="t2m msl qnet tp ewss nsss SST SSS SICE T U V Q"
for var in ${var2d}
do
for rname in your-list-of-run-names
do
cat ${path-to-rk-tables}/PI2_RK08_${rname}_${year1}_${year2}.txt | grep "^${var} " | \
tail -1 | \
awk {'print $2'} >> ${EnsembleName}_${year1}_${year2}_${var}.txt
done
done
Once you have two ensembles processed, you can compare them. Both ensembles output collected in the previous step should be gathered in a DATADIR, where:
# For run ${nb} of ensemble ${stem}, climatological data are expected in:
$DATADIR/${stem}${nb}/post/clim-${year1}-${year2}/
# For one ensemble, ${stem}, tables are expected in:
$DATADIR/${stem}/
If you use the -t option to collect all these data in a tar file (see previous step), DATADIR is just the directory where you unpack the archive. If not, it should not be difficult to re-organize your output with few mkdir and mv calls.
With the data in place, the statistics package can be run:
./compare.sh -d $DATADIR stem1 stem2 start_year end_year nb_member
A PDF file with all generated plots is created in DATADIR/plots. That default location can be overwritten at the command line with the -p option.
ec:/nm6/EC-EARTH/ECEARTH3.2b/INPUT/ece-post-proc.tar.gz
- add platform templates in a conf/<your_platform_name> directory (adapt
existing ones to your job scheduler)
conf/<your-machine>/hc_<your-machine>.tmpl conf/<your-machine>/header_<your-machine>.tmpl
The job scheduler command to submit job is set in the configuration scripts.
- add a configuration script for each tools:
conf/<your-machine>/conf_hiresclim_<your-machine>.sh conf/<your-machine>/conf_timeseries_<your-machine>.sh conf/<your-machine>/conf_ecmean_<your-machine>.sh conf/<your-machine>/conf_amwg_<your-machine>.sh
TODO: combine those into two config files: one USER oriented (i.e anything that changes with the experiment to process), and one for the machine (i.e. setup that should not changed with the experiment/user).
- You must install nco, netcdf, python, cdo, and cdftools if missing.
- For CDFTOOLS you cannot use the light one that ships with barakuda.
- If the netCDF4 python module is not available, you cannot build
the 3D relative humidity. Set in your
./conf/<your-machine>/conf_hiresclim_<your-machine>.sh:
rh_build=0
- Some EC-Earth experiments put the water flux output from NEMO in
the SBC files instead of the grid_T files. Then you need
export use_SBC=1
in your ./conf/<your-machine>/conf_hiresclim_<your-machine>.sh config.
This is needed only if the output files of NEMO are per processes. In which case you need to do something along these lines:
cd <EC-EARTH-DIR>/sources/nemo-3.6/TOOLS/REBUILD_NEMO/ <F90-COMPILER> rebuild_nemo.f90 -o ../rebuild_nemo.exe -I<PATH-TO-NETCDF-INSTALLATION>/include -L<PATH-TO-NETCDF-INSTALLATION>/lib -lnetcdf -lnetcdff
Copied from a suite of post-processing tools from Jost (it/ccjh) on Monday, March 27, 2017. This project is a quick attempt at cleaning up the tools suite and making it easier to port. Added and adapted (Jan 2018) the code for the reproducibility test developed by Martin Ménégoz and Francois Massonnet.
Modified to work with default ecearth-3 output tree. Removed the possibility to run somebody else code (just clone it!) but can still processed output from another user.
Improved the performance of HIRECLIM2 with parallelization over the years. Can process monthly legged runs. Catch all errors with “set -e” everywhere. Try to be smart in dealing with and cleaning up temporary dirs, by using mktemp, …