-
Notifications
You must be signed in to change notification settings - Fork 55
Installation notes
These are notes for installing Dales on different systems. (Work in progress, may not be completely accurate/sufficient/up to date. Please comment or fix if you find mistakes).
General requirements for installing Dales:
- NetCDF
- MPI
- a Fortran compiler, e.g. gfortran
- cmake
- make
- doxygen, dot, latex (optional, for building the documentation)
- HYPRE library (optional, for iterative Poisson solver)
- FFTW3 library (optional, for higher performance)
sudo dnf install gcc-gfortran cmake netcdf-fortran-devel openmpi-devel
module load mpi/openmpi
git clone https://github.com/dalesteam/dales.git
cd dales
mkdir build
cd build
cmake ..
make
The command module load mpi/openmpi
is needed also before launching Dales - it has to be
run once in every new terminal (unless added to the shell initialization files).
Use brew to install the dependencies.
# first install homebrew
brew install gcc make cmake open-mpi netcdf git
# TODO - check which Fortran compiler mpif90 actually uses
git clone https://github.com/dalesteam/dales.git
cd dales
git checkout 4.3
mkdir build
export SYST=gnu-fast
cd build
cmake ..
make
The DALES executable is now at dales/build/src/dales4
Note that brew by default installs gfortran10, which does not work directly with DALES versions older than 4.3 (one needs to add -std=legacy to the compiler options).
Other useful programs:
# ncview for viewing netCDF files
brew install ncview
sudo apt install git cmake gfortran netcdf-bin libnetcdf-dev libnetcdff-dev libopenmpi-dev libhypre-dev libfftw3-dev
# netcdf-bin is included for the ncdump tool, not directly needed
git clone https://github.com/dalesteam/dales.git
cd dales
mkdir build
export SYST=gnu-fast
cd build
cmake ..
make -j 4
DALES can be run in Ubuntu using WSL. Install Ubuntu tutorial, then follow the steps for Ubuntu above. Another route is to run DALES in Linux in a virtual machine. A VirtualBox image with DALES pre-installed is available here
Install dependencies with the following command, then compile as usual.
sudo aptitude install libnetcdf-dev mpich gfortran make cmake
Note: -march=native does not work on ARM with gfortran 4.9, which comes with the old Debian Jessie. In the new release it is supposed to work.
Performance: 32 us / (grid point * time step) single core, 46 us / (grid point * time step) with 4 cores on a Raspberry Pi 3B. 60 s Bomex. Compiled with -O3, gfortran 4.9.
With -Ofast: 25 us / (grid point * step)
sudo apt install libnetcdf-dev libnetcdff-dev openmpi-bin gfortran make cmake
git clone https://github.com/dalesteam/dales.git
cd dales
mkdir build
export SYST=gnu-fast
cd build
cmake ..
make
Notes:
- with mpich instead of OpenMPI, DALES crashes in INIT_MPI.
- gcc and gfortran shipped is version 8.3.0.
-march=native
and thusSYST=gnu-fast
now works.
see Cartesius description and Batch usage instructions.
git clone https://github.com/dalesteam/dales
cd dales/
# git checkout to4.3_Fredrik
mkdir build
cd build
export SYST=gnu-fast
module load 2019
module load netCDF-Fortran/4.4.4-foss-2018b
module load CMake/3.12.1-GCCcore-7.3.0
module unload OpenMPI/3.1.1-GCC-7.3.0-2.30
module load OpenMPI/3.1.4-GCC-7.3.0-2.30
cmake ..
make VERBOSE=1 -j 4
The reason for replacing the default OpenMPI 3.1.1 with 3.1.4 is that 3.1.1 contains a bug which caused crashes on Lisa.
To compile with the optional HYPRE library, add/substitute the following:
module load Hypre/2.14.0-foss-2018b
cmake .. -DUSE_HYPRE=True -DHYPRE_LIB=/sw/arch/RedHatEnterpriseServer7/EB_production/2019/software/Hypre/2.14.0-foss-2018b/lib/libHYPRE.a
git clone https://github.com/dalesteam/dales
cd dales/
# git checkout to4.3_Fredrik
mkdir build
cd build
export SYST=lisa-intel
module load 2019
module load CMake
module load intel/2018b
module load netCDF-Fortran/4.4.4-intel-2018b
module load FFTW/3.3.8-intel-2018b # optional
module load Hypre/2.14.0-intel-2018b # optional
cmake ..
# todo: add optional FFTW and HYPRE flags
make VERBOSE=1 -j 4
#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -n 16 #total number of tasks, number of nodes calculated automatically
# Other useful SBATCH options
# #SBATCH -N 2 #number of nodes
# #SBATCH --ntasks-per-node=16
# #SBATCH --constraint=ivy # Runs only on Ivy Bridge nodes
# #SBATCH --constraint=haswell # Runs only on Haswell nodes (faster, AVX2)
module load 2019
module load netCDF-Fortran/4.4.4-foss-2018b
module load CMake/3.12.1-GCCcore-7.3.0
module unload OpenMPI/3.1.1-GCC-7.3.0-2.30
module load OpenMPI/3.1.4-GCC-7.3.0-2.30
# module load Hypre/2.14.0-foss-2018b
DALES=$HOME/dales/build/src/dales4
# cd somewhere - otherwise runs in same directory as submission
srun $DALES namoptions-hypre.001
Note that Cartesius contains both Haswell and Ivy Bridge nodes. Haswell are faster, and support AVX2 instructions. To get the full benefit of them, DALES should be compiled with AVX2 support, and will then be incompatible with the older node type (request node type in the job script). For consistent benchmarking, one should request a specific node type in the job script.
See the users guide.
Warning: don't use the 2018b module set - OpenMPI version 3.1.1 which is included there has been found to cause crashes. See Quirks. The module set below is working.
# -- load modules both for compilation and run script --
module load pre2019
module load foss/2017b
module load netCDF-Fortran/4.4.4-foss-2017b
module load cmake
# -- compile --
export SYST=gnu-fast
# enable aggressive optimization flags, to be added in Dales 4.2
cd dales
mkdir build
cd build
cmake ..
make
#PBS -lnodes=2:ppn=16:cpu3
#PBS -lwalltime=2:00:00
module load eb
module load foss/2017b
module load netcdf/gnu/4.2.1-gf4.7a
# Path to the Dales program
DALES=~/dales/build2/src/dales.exe
EXPERIMENT=~/your_case_directory/
cd $EXPERIMENT
mpiexec $DALES namoptions.001
cpu3
in the job script specifies a particular cpu type, see job requirements. If omitted, the job may run on any available cpu type, which can confuse benchmarking by influencing the performance.
ppn
is processes per node. The Lisa nodes have 8 cores, with hyperthreading 16 processes per node will fit.
Hyperthreading seems beneficial for Dales.
mpiexec
by default launches as many MPI tasks as there are slots available. Note that the number of tasks should be compatible with nprocx
, nprocy
in the namelist (specify 0 to determine them automatically). Also, itot
must be divisible by nprocx
, and jtot
by nprocy
.
These steps were tested on the TEMS test system
in May 2021 using the git branches v4.3
(current default branch) and to4.4_Fredrik
.
module load prgenv/gnu
module load openmpi
module load cmake/3.19.5
module load netcdf4/4.7.4
module load fftw/3.3.9
export SYST=gnu-fast
git clone https://github.com/dalesteam/dales.git
cd dales
mkdir build
cd build
cmake .. # -DUSE_FFTW=True
make -j 4
Note: with the optional -DUSE_FFTW=True
,
FFTW is not found automatically. Edit the CMakeLists or set environment variables.
The lib and include paths can be found with module show fftw
Sample job script. Starts dales in the directory where the job was submitted. To run somewhere else, use the --chdir=
option
or add a cd
command in the script.
#!/bin/bash
#SBATCH --job-name=dales
#SBATCH --qos=np
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --time=24:0:0
# other SBATCH options :
# --output=test-mpi.%j.out
# --error=test-mpi.%j.out
# --chdir=/scratch...
# --mem-per-cpu=100
# --account=<PROJECT-ID>
# modules here should match what was used during compile
module load prgenv/gnu
module load openmpi
module load cmake/3.19.5
module load netcdf4/4.7.4
module load fftw/3.3.9
NAMOPTIONS=namoptions.001
DALES=$HOME/dales/build/src/dales4
CASE=`pwd`
echo DALES $DALES
echo CASE $CASE
echo hostname `hostname`
# optionally edit nprocx, nprocy in namelist
#NX=8
#NY=16
#sed -i -r "s/nprocx.*=.*/nprocx = $NX/;s/nprocy.*=.*/nprocy = $NY/" $NAMOPTIONS
srun $DALES $NAMOPTIONS | tee output.txt
module load prgenv/intel
module load intel-mpi
module load cmake/3.19.5
module load netcdf4/4.7.4
module load fftw/3.3.9
export SYST=lisa-intel
Quick single-node benchmarking shows gnu Fortran being 13% faster. gnu 8.3 (default) and 10.2 (newest) are very similar.
Login to cca
(see the documentation).
Note that the Fortran compiler on this machine is called ftn
.
Here is an example of how to compile DALES with the intel compiler Make sure that the following lines (or something similar depending on your own preferences) are part of your CmakeLists.txt
file:
elseif("$ENV{SYST}" STREQUAL "ECMWF-intel")
set(CMAKE_Fortran_COMPILER "ftn")
set(CMAKE_Fortran_FLAGS "-r8 -ftz -extend_source" CACHE STRING "")
set(CMAKE_Fortran_FLAGS_RELEASE "-g -traceback -Ofast -xHost" CACHE STRING "")
set(CMAKE_Fortran_FLAGS_DEBUG "-traceback -fpe1 -O0 -g -check all" CACHE STRING "")
For compiling,set the system variable by typing
export SYST=ECMWF-intel
and load the right modules
prgenvswitchto intel
module load netcdf4/4.4.1
Then proceed as usual (cmake
& make
).
Here is an overview of some very simple and very limited scaling tests on that machine, mostly to demonstrate the effect of spreading your job over several nodes and of using hyperthreading (the later seems to be highly case-sensitive though). The test was done with a cumulus convection case with 36x144x296 grid points on a 3.6x14.4x17.9 km^3 domain that was run for 4 hours with quite a few statistics etc. turned on.
- 1 node, hyperthreading on (i.e. 72 tasks per node): 11226 s
- 1 node, hyperthreading off (i.e. 36 tasks per node): 7079 s
- 2 nodes, hyperthreading on (i.e. 72 tasks per node): 8822 s
- 2 nodes, hyperthreading off (i.e. 36 tasks per node): 5370 s
Take-away message: Hyperthreading increases (!) run time by about 60 percent (in this case!) and scaling is clearly not linear when you use more than one node (i.e. when the program has to communicate over the network).
Jobs are scheduled using PBS. Here is an example job script:
#!/bin/ksh
#PBS -q np # <-- queue for parallel runs (alternatively use ns or nf)
#PBS -N jobname
#PBS -l EC_nodes=2 # <-- number of nodes (each has 36 CPUs)
#PBS -l EC_tasks_per_node=36 # <-- use the full node
#PBS -l EC_hyperthreads=1 # <-- hyperthreading (1: off, 2: on)
#PBS -l walltime=48:00:00 # <-- maximum of 48 h wall clock time per job
#PBS -m abe # <-- email notification on abortion/start/end
#PBS -M johndoe@email.com # <-- your email address
# load the same modules as during compilation
prgenvswitchto intel
module load netcdf4/4.4.1
cd /path/to/your/work/directory
aprun -N $EC_tasks_per_node -n $EC_total_tasks -j $EC_hyperthreads dales
Since the machine only allows for jobs of maximum 48 h wall clock time, you might have to re-submit your simulations several times (warm start) to get to the desired simulation time. There are basically two approaches to do this somewhat automatically (they both have pros and cons):
-
Find a nice length of simulation that can be finished in say 1 day to leave a generous margin, and then schedule several of these jobs in sequence using
qsub -W depend=afterok:<PREVIOUS_JOBID> jobfile
This will start the following job once the previous one finished successfully. (Don't forget to set
lwarmstart
,startfile
andruntime
correctly in thenamoptions
file!) This method has the advantage that it does not waste any computation time. -
Alternatively, let the simulation run as far as it gets within 48 h wall time (and save init files very regularly) and submit a job that automatically figures out how to do the warm start. This method has the advantage that it minimises the number of output files and jobs that you have to run. For this, submit the following job with
qsub -W depend=afternotok:<FIRST_JOBID> jobfile
This will start the following job once the previous one finished with a non-zero exit code (most likely that happens when it runs out of time). Add these lines of code to your job file to automatically do the warm start based on the latest init files that DALES has created and adjust the run time in the namoptions accordingly:
Exp_dir=/path/to/your/work/directory # <-- this is where you run the next 48 h Warm_dir=/path/to/your/init/directory # <-- this is where your init files are cd $Exp_dir # find out how many hours are completed strlength=$(ls $Warm_dir/initd0* | tail -1 | wc -c) cutstart=$((strlength-18)) cutend=$((cutstart+1)) hrsdone=$(ls $Warm_dir/initd0* | tail -1 | cut -c $cutstart-$cutend) cutstart=$((cutstart+3)) cutend=$((cutend+3)) mindone=$(ls $Warm_dir/initd0* | tail -1 | cut -c $cutstart-$cutend) # copy the init files to the work directory cp $Warm_dir/init[sd]0${hrsdone}h${mindone}m* $Exp_dir/. # adjust the namoptions file cp $Exp_dir/namoptions.original $Exp_dir/namoptions hrsdone=$(echo $hrsdone | sed 's/^0*//') # remove leading 0s mindone=$(echo $mindone | sed 's/^0*//') secdone=$((hrsdone*3600+mindone*60)) sectodo=$((172800-secdone)) # <-- adjust your simulation time here (2 days here) startfname=$(ls $Exp_dir | head -1) sed -i "s/^startfile.*/startfile = '${startfname}'/" $Exp_dir/namoptions sed -i "s/^runtime.*/runtime = ${sectodo}/" $Exp_dir/namoptions # then continue with the usual stuff
Note that the directory needs to contain a
namoptions.original
file (basically a copy of the one from the previous simulation) in whichlwarmstart
is set to true and the lines for thestartfile
andruntime
are present but empty, e.g.:&RUN iexpnr = 002 lwarmstart = .true. startfile = runtime = /
Tested 6.2.2020. On Eagle, compilation must be done in an interactive job, since the module system does not work on the login node. The module netcdf/4.4.1.1_impi-5.0.3_icc-15.0.3 provides a consistent set of netCDF with Fortran bindings. A more recent cmake than the default is also required.
# start interactive job
srun --pty -N1 --ntasks-per-node=1 -p fast -t 60 /bin/bash
git clone https://github.com/dalesteam/dales
cd dales
# git checkout to4.3_Fredrik # optionally check out another branch
mkdir build
cd build
export SYST=lisa-intel
export MODULEPATH=/home/plgrid-groups/plggvecma/.qcg-modules:$MODULEPATH
module load vecma/common/cmake
module load netcdf/4.4.1.1_impi-5.0.3_icc-15.0.3
cmake ..
make -j 4
Tested in May 2020.
git clone https://github.com/dalesteam/dales
module load hdf5/gcc5/1.10.1
module load netcdf/gcc5
module load netcdf-fortran/gcc5/4.4.4
module load openmpi/gcc5/3.0.0
export SYST=gnu-fast
cd dales
# git checkout to4.3 # optionally check out a branch
mkdir build
cd build
cmake ..
make
# load the same modules
# cd to the case directory
~/dales/build/src/dales4 namoptions.001