To move output, restart and post-processed (hiresclim2, ECmean climatology) data from an EC-EARTH experiment to the ECFS tape archive. Output and restart data are moved and not just copied, since one of the goals is to free space - particularly with long and/or high resolution runs.
Utilities to retrieve and/or clean the ECFS archive, and a couple of simple manipulations of the raw output are also available.
You must set the correct top directory of your EC-Earth3 experiments and the ECFS path where to move the data in the config.cfg file. This setup is used for all the tools.
The backup_ece3.sh script should be submitted with sbatch (or srun). It takes two arguments: the 4-letter experiment id and the leg number to backup.
An example call for moving output/restart from the 3rd leg of the aGez experiment:
sbatch backup_ece3.sh aGez 3
Note that the restarts written at the end of the leg are backed up. Be sure that they are not needed anymore before submitting the backup script!
You can process several legs at once with a simple loop. See examples by calling the script without any argument:
./backup_ece3.sh
Few options are still hardcoded in the script. See the hardcoded options section at the top. By default all components output (IFS, NEMO, TM5) and restart (IFS, NEMO, OASIS) are put on tape if they exist. You probably do not need to change this, but you can skip some output or restart with the switches in the hardcoded options section. Note the special case of leg 0, which makes a tarball of several TM5 restarts and model logs over all legs.
The following SLURM options can be useful. You set them either when calling sbatch or add them to the backup script with a #SBATCH directive:
The default setup in the script is bckp.%j.out where %j will be replaced by the jobid. This can be overwritten at the command line. For example:
sbatch --output=bckp-aGez-003.out backup_ece3.sh aGez 3
If a directory is specified, be sure that it exists before submitting.
Your default ECMWF account (namely $EC_billing_account) is used, unless you specify another one at the command line. For example:
sbatch --account=spnldrij backup_ece3.sh aGez 3
The script sets a walltime of 6h (currently the default for the nf queue). You can overwrite it with the –time option:
sbatch --time=00:30:00 backup_ece3.sh aGez 3
By using a small value you can reduce queue time. Here are some estimates ([TODO] update for the new hardware in Bologna), but keep in mind that ECFS can be very slow at times.
- 3hr is enough for standard resolution
- 12hr is needed for the high resolution
- 5hr for high resolution
You can set the name of your job with –job-name=name. If waiting for another job to finish successfully before submitting the backup script, you can set a conditional dependency with –depend=afterok:jobid.
Output data are zipped (except ICMSH), restarts and postprocessed data are tar and zipped. They all end up in the archive (set in config.cfg):
ec:/${USER}/ECEARTH-RUNS/${exp}
Note that large files (>137G) are split into two or more pieces.
The retrieve-ece3.sh script let you retrieve restart and/or output files for a particular leg. They are put back into their original location (directories created if needed), allowing for rerun for example. The script works in a very similar was as backup_ece3.sh, except you have to specify which model you are interested in. For example:
sbatch -o "ifs nemo" -r oasis retrieve-ece3.sh aGez 3
will retrieve output from IFS and NEMO, and OASIS restarts, all from the 3rd leg of the aGez experiment.
You should not submit a job for the last two terminated legs if the experiment is still running, or obviously if you want to process your data. Remember that output of leg N is complete only when leg N+1 is finished unless it is the last leg. Remember also that ece2cmor3 requires data from the first leg for all of its IFS processing.
There are a few other scripts that can be useful. They all have few hardcoded settings at the top (typically the top directory of your EC-Earth experiments):
- extract_var.sh: easily extract a variable from the IFS raw output
- check_bckp.sh : list the dirs that are not empty, and give their size.
- split_2y.sh : to copy data from an experiment with 2-year legs to a new one with 1-year legs, and a new name if needed.
- rebuild_bckp.sh : to rebuild large files that have been split into 2 or more pieces when being backed up. Useful when retrieving files from tapes for additional work.