Skip to content
Fredrik Jansson edited this page Oct 25, 2021 · 45 revisions

These are notes for installing Dales on different systems. (Work in progress, may not be completely accurate/sufficient/up to date. Please comment or fix if you find mistakes).

Requirements

General requirements for installing Dales:

  • NetCDF
  • MPI
  • a Fortran compiler, e.g. gfortran
  • cmake
  • make
  • doxygen, dot, latex (optional, for building the documentation)
  • HYPRE library (optional, for iterative Poisson solver)
  • FFTW3 library (optional, for higher performance)

Specific instructions per operating system

Fedora

sudo dnf install gcc-gfortran cmake netcdf-fortran-devel openmpi-devel
module load mpi/openmpi

git clone https://github.com/dalesteam/dales.git
cd dales
mkdir build
cd build
cmake ..
make

The command module load mpi/openmpi is needed also before launching Dales - it has to be run once in every new terminal (unless added to the shell initialization files).

MacOS

Use brew to install the dependencies.

# first install homebrew

brew install gcc make cmake open-mpi netcdf git 
# TODO - check which Fortran compiler mpif90 actually uses

git clone https://github.com/dalesteam/dales.git
cd dales
git checkout 4.3
mkdir build
export SYST=gnu-fast
cd build
cmake ..
make

The DALES executable is now at dales/build/src/dales4 Note that brew by default installs gfortran10, which does not work directly with DALES versions older than 4.3 (one needs to add -std=legacy to the compiler options).

Other useful programs:

# ncview for viewing netCDF files
# cdo for processing netCDF data e.g. merging the 2D and 3D output files 
brew install cdo ncview   

Ubuntu (also Ubuntu running in WSL on Windows)

sudo apt install git cmake gfortran netcdf-bin libnetcdf-dev libnetcdff-dev libopenmpi-dev libhypre-dev libfftw3-dev
# netcdf-bin is included for the ncdump tool, not directly needed

git clone https://github.com/dalesteam/dales.git
cd dales
mkdir build
export SYST=gnu-fast
cd build
cmake ..
make -j 4

Windows

DALES can be run in Ubuntu using WSL. Install Ubuntu tutorial, then follow the steps for Ubuntu above. Another route is to run DALES in Linux in a virtual machine. A VirtualBox image with DALES pre-installed is available here

Raspberry Pi 3B / Raspbian

Install dependencies with the following command, then compile as usual.

sudo aptitude install libnetcdf-dev mpich gfortran make cmake

Note: -march=native does not work on ARM with gfortran 4.9, which comes with the old Debian Jessie. In the new release it is supposed to work.

Performance: 32 us / (grid point * time step) single core, 46 us / (grid point * time step) with 4 cores on a Raspberry Pi 3B. 60 s Bomex. Compiled with -O3, gfortran 4.9.

With -Ofast: 25 us / (grid point * step)

Raspberry Pi 4B / Raspbian 10 (Buster)

sudo apt install libnetcdf-dev libnetcdff-dev openmpi-bin gfortran make cmake
git clone https://github.com/dalesteam/dales.git
cd dales
mkdir build
export SYST=gnu-fast
cd build
cmake ..
make

Notes:

  • with mpich instead of OpenMPI, DALES crashes in INIT_MPI.
  • gcc and gfortran shipped is version 8.3.0. -march=native and thus SYST=gnu-fast now works.

Specific machines / clusters

Snellius

(Work in progress by FJ. Compiles OK, running job not tested)

module load 2021
module load foss/2021a
# module load FFTW/3.3.9-gompi-2021a # included in foss
module load netCDF-Fortran/4.5.3-gompi-2021a
module load CMake/3.20.1-GCCcore-10.3.0
# module load Hypre/2.21.0-foss-2021a # optional

mkdir build
cd build
export SYST=gnu-fast
cmake .. -DUSE_FFTW=True
make -j 8

job script:

#!/bin/bash                                                                                                                              
# uses 1/4 of a node                                                                                                                     
#SBATCH --nodes=1                                                                                                                        
#SBATCH --ntasks=32                                                                                                                      
#SBATCH --partition=thin                                                                                                                 
#SBATCH --time=01:00:00                                                                                                                  

# Other useful SBATCH options                                                                                                            
# #SBATCH --ntasks-per-node=16                                                                                                           

module load 2021
module load foss/2021a
module load netCDF-Fortran/4.5.3-gompi-2021a
module load CMake/3.20.1-GCCcore-10.3.0
# module load Hypre/2.21.0-foss-2021a # optional                                                                                         

NAMOPTIONS=namoptions-144.001
DALES=/home/janssonf/snellius/build/src/dales4.3
CASE=`pwd`
WORK=run-1

mkdir -p $WORK
cd $WORK
cp $CASE/{lscale.inp.001,$NAMOPTIONS,prof.inp.001,scalar.inp.001} ./

echo DALES $DALES
echo CASE $CASE
echo WORK $WORK
echo hostname `hostname`

srun $DALES $NAMOPTIONS | tee output.txt

Cartesius

see Cartesius description and Batch usage instructions.

Compilation - GNU Fortran

git clone https://github.com/dalesteam/dales
cd dales/
# git checkout to4.3_Fredrik

mkdir build
cd build

export SYST=gnu-fast
module load 2019
module load netCDF-Fortran/4.4.4-foss-2018b
module load CMake/3.12.1-GCCcore-7.3.0
module unload OpenMPI/3.1.1-GCC-7.3.0-2.30
module load OpenMPI/3.1.4-GCC-7.3.0-2.30

cmake ..

make VERBOSE=1 -j 4

The reason for replacing the default OpenMPI 3.1.1 with 3.1.4 is that 3.1.1 contains a bug which caused crashes on Lisa.

To compile with the optional HYPRE library, add/substitute the following:

module load Hypre/2.14.0-foss-2018b
cmake .. -DUSE_HYPRE=True -DHYPRE_LIB=/sw/arch/RedHatEnterpriseServer7/EB_production/2019/software/Hypre/2.14.0-foss-2018b/lib/libHYPRE.a

Compilation - Intel Fortran

git clone https://github.com/dalesteam/dales
cd dales/
# git checkout to4.3_Fredrik

mkdir build
cd build

export SYST=lisa-intel

module load 2019
module load CMake
module load intel/2018b
module load netCDF-Fortran/4.4.4-intel-2018b
module load FFTW/3.3.8-intel-2018b    # optional
module load Hypre/2.14.0-intel-2018b  # optional


cmake ..
# todo: add optional FFTW and HYPRE flags

make VERBOSE=1 -j 4

Job script

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -n 16  #total number of tasks, number of nodes calculated automatically 

# Other useful SBATCH options
# #SBATCH -N 2  #number of nodes 
# #SBATCH --ntasks-per-node=16
# #SBATCH --constraint=ivy # Runs only on Ivy Bridge nodes
# #SBATCH --constraint=haswell # Runs only on Haswell nodes (faster, AVX2)

module load 2019
module load netCDF-Fortran/4.4.4-foss-2018b
module load CMake/3.12.1-GCCcore-7.3.0
module unload OpenMPI/3.1.1-GCC-7.3.0-2.30
module load OpenMPI/3.1.4-GCC-7.3.0-2.30
# module load Hypre/2.14.0-foss-2018b

DALES=$HOME/dales/build/src/dales4

# cd somewhere - otherwise runs in same directory as submission
srun $DALES namoptions-hypre.001

Tuning

Note that Cartesius contains both Haswell and Ivy Bridge nodes. Haswell are faster, and support AVX2 instructions. To get the full benefit of them, DALES should be compiled with AVX2 support, and will then be incompatible with the older node type (request node type in the job script). For consistent benchmarking, one should request a specific node type in the job script.

Lisa

See the users guide.

Warning: don't use the 2018b module set - OpenMPI version 3.1.1 which is included there has been found to cause crashes. See Quirks. The module set below is working.

Compilation

# -- load modules both for compilation and run script --
module load pre2019
module load foss/2017b
module load netCDF-Fortran/4.4.4-foss-2017b
module load cmake


# -- compile --
export SYST=gnu-fast  
# enable aggressive optimization flags, to be added in Dales 4.2

cd dales
mkdir build
cd build
cmake ..
make

Job script

#PBS -lnodes=2:ppn=16:cpu3
#PBS -lwalltime=2:00:00

module load eb
module load foss/2017b
module load netcdf/gnu/4.2.1-gf4.7a

# Path to the Dales program
DALES=~/dales/build2/src/dales.exe
EXPERIMENT=~/your_case_directory/

cd $EXPERIMENT
mpiexec $DALES namoptions.001

cpu3 in the job script specifies a particular cpu type, see job requirements. If omitted, the job may run on any available cpu type, which can confuse benchmarking by influencing the performance.

ppn is processes per node. The Lisa nodes have 8 cores, with hyperthreading 16 processes per node will fit. Hyperthreading seems beneficial for Dales.

mpiexec by default launches as many MPI tasks as there are slots available. Note that the number of tasks should be compatible with nprocx, nprocy in the namelist (specify 0 to determine them automatically). Also, itot must be divisible by nprocx, and jtot by nprocy.

ECMWF Atos system 2021-

These steps were tested on the TEMS test system in May 2021 using the git branches v4.3 (current default branch) and to4.4_Fredrik.

module load prgenv/gnu
module load openmpi
module load cmake/3.19.5
module load netcdf4/4.7.4
module load fftw/3.3.9

export SYST=gnu-fast

git clone https://github.com/dalesteam/dales.git
cd dales
mkdir build
cd build
cmake ..  # -DUSE_FFTW=True
make -j 4

Note: with the optional -DUSE_FFTW=True, FFTW is not found automatically. Edit the CMakeLists or set environment variables. The lib and include paths can be found with module show fftw

Sample job script. Starts dales in the directory where the job was submitted. To run somewhere else, use the --chdir= option or add a cd command in the script.

#!/bin/bash
#SBATCH --job-name=dales
#SBATCH --qos=np
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --time=24:0:0

# other SBATCH options :
#  --output=test-mpi.%j.out
#  --error=test-mpi.%j.out
#  --chdir=/scratch...
#  --mem-per-cpu=100
#  --account=<PROJECT-ID>

# modules here should match what was used during compile
module load prgenv/gnu
module load openmpi
module load cmake/3.19.5
module load netcdf4/4.7.4
module load fftw/3.3.9

NAMOPTIONS=namoptions.001
DALES=$HOME/dales/build/src/dales4
CASE=`pwd`

echo DALES $DALES
echo CASE $CASE
echo hostname `hostname`

# optionally edit nprocx, nprocy in namelist
#NX=8
#NY=16
#sed -i -r "s/nprocx.*=.*/nprocx = $NX/;s/nprocy.*=.*/nprocy = $NY/" $NAMOPTIONS

srun $DALES $NAMOPTIONS | tee output.txt

Intel compiler on TEMS

module load prgenv/intel
module load intel-mpi
module load cmake/3.19.5
module load netcdf4/4.7.4
module load fftw/3.3.9

export SYST=lisa-intel

Quick single-node benchmarking shows gnu Fortran being 13% faster. gnu 8.3 (default) and 10.2 (newest) are very similar.

ECMWF Cray

Login to cca (see the documentation).

Compilation

Note that the Fortran compiler on this machine is called ftn.

Here is an example of how to compile DALES with the intel compiler Make sure that the following lines (or something similar depending on your own preferences) are part of your CmakeLists.txt file:

elseif("$ENV{SYST}" STREQUAL "ECMWF-intel")
 set(CMAKE_Fortran_COMPILER "ftn")
 set(CMAKE_Fortran_FLAGS "-r8 -ftz -extend_source" CACHE STRING "")
 set(CMAKE_Fortran_FLAGS_RELEASE "-g -traceback -Ofast -xHost" CACHE STRING "")
 set(CMAKE_Fortran_FLAGS_DEBUG "-traceback -fpe1 -O0 -g -check all" CACHE STRING "")

For compiling,set the system variable by typing

export SYST=ECMWF-intel

and load the right modules

prgenvswitchto intel
module load netcdf4/4.4.1

Then proceed as usual (cmake & make).

Scaling

Here is an overview of some very simple and very limited scaling tests on that machine, mostly to demonstrate the effect of spreading your job over several nodes and of using hyperthreading (the later seems to be highly case-sensitive though). The test was done with a cumulus convection case with 36x144x296 grid points on a 3.6x14.4x17.9 km^3 domain that was run for 4 hours with quite a few statistics etc. turned on.

  • 1 node, hyperthreading on (i.e. 72 tasks per node): 11226 s
  • 1 node, hyperthreading off (i.e. 36 tasks per node): 7079 s
  • 2 nodes, hyperthreading on (i.e. 72 tasks per node): 8822 s
  • 2 nodes, hyperthreading off (i.e. 36 tasks per node): 5370 s

Take-away message: Hyperthreading increases (!) run time by about 60 percent (in this case!) and scaling is clearly not linear when you use more than one node (i.e. when the program has to communicate over the network).

Job script

Jobs are scheduled using PBS. Here is an example job script:

#!/bin/ksh

#PBS -q np			# <-- queue for parallel runs (alternatively use ns or nf)
#PBS -N jobname
#PBS -l EC_nodes=2		# <-- number of nodes (each has 36 CPUs)
#PBS -l EC_tasks_per_node=36	# <-- use the full node
#PBS -l EC_hyperthreads=1	# <-- hyperthreading (1: off, 2: on)
#PBS -l walltime=48:00:00	# <-- maximum of 48 h wall clock time per job
#PBS -m abe			# <-- email notification on abortion/start/end
#PBS -M johndoe@email.com	# <-- your email address

# load the same modules as during compilation
prgenvswitchto intel
module load netcdf4/4.4.1

cd /path/to/your/work/directory

aprun -N $EC_tasks_per_node -n $EC_total_tasks -j $EC_hyperthreads dales

Warm starts after 48 h

Since the machine only allows for jobs of maximum 48 h wall clock time, you might have to re-submit your simulations several times (warm start) to get to the desired simulation time. There are basically two approaches to do this somewhat automatically (they both have pros and cons):

  1. Find a nice length of simulation that can be finished in say 1 day to leave a generous margin, and then schedule several of these jobs in sequence using

    qsub -W depend=afterok:<PREVIOUS_JOBID> jobfile
    

    This will start the following job once the previous one finished successfully. (Don't forget to set lwarmstart, startfile and runtime correctly in the namoptions file!) This method has the advantage that it does not waste any computation time.

  2. Alternatively, let the simulation run as far as it gets within 48 h wall time (and save init files very regularly) and submit a job that automatically figures out how to do the warm start. This method has the advantage that it minimises the number of output files and jobs that you have to run. For this, submit the following job with

    qsub -W depend=afternotok:<FIRST_JOBID> jobfile
    

    This will start the following job once the previous one finished with a non-zero exit code (most likely that happens when it runs out of time). Add these lines of code to your job file to automatically do the warm start based on the latest init files that DALES has created and adjust the run time in the namoptions accordingly:

    Exp_dir=/path/to/your/work/directory	# <-- this is where you run the next 48 h
    Warm_dir=/path/to/your/init/directory	# <-- this is where your init files are
    
    cd $Exp_dir
    
    # find out how many hours are completed
    strlength=$(ls $Warm_dir/initd0* | tail -1 | wc -c)
    cutstart=$((strlength-18))
    cutend=$((cutstart+1))
    hrsdone=$(ls $Warm_dir/initd0* | tail -1 | cut -c $cutstart-$cutend)
    cutstart=$((cutstart+3))
    cutend=$((cutend+3))
    mindone=$(ls $Warm_dir/initd0* | tail -1 | cut -c $cutstart-$cutend)
    
    # copy the init files to the work directory
    cp $Warm_dir/init[sd]0${hrsdone}h${mindone}m* $Exp_dir/.
    
    # adjust the namoptions file
    cp $Exp_dir/namoptions.original $Exp_dir/namoptions
    hrsdone=$(echo $hrsdone | sed 's/^0*//')  # remove leading 0s
    mindone=$(echo $mindone | sed 's/^0*//')
    secdone=$((hrsdone*3600+mindone*60))
    sectodo=$((172800-secdone))		# <-- adjust your simulation time here (2 days here)
    startfname=$(ls $Exp_dir | head -1)
    sed -i "s/^startfile.*/startfile = '${startfname}'/" $Exp_dir/namoptions
    sed -i "s/^runtime.*/runtime = ${sectodo}/" $Exp_dir/namoptions
    
    # then continue with the usual stuff
    

    Note that the directory needs to contain a namoptions.original file (basically a copy of the one from the previous simulation) in which lwarmstart is set to true and the lines for the startfile and runtime are present but empty, e.g.:

    &RUN
    iexpnr = 002
    lwarmstart = .true.
    startfile = 
    runtime = 
    /
    

Eagle cluster - Polish national grid

Tested 6.2.2020. On Eagle, compilation must be done in an interactive job, since the module system does not work on the login node. The module netcdf/4.4.1.1_impi-5.0.3_icc-15.0.3 provides a consistent set of netCDF with Fortran bindings. A more recent cmake than the default is also required.

# start interactive job
srun --pty -N1 --ntasks-per-node=1  -p fast  -t 60 /bin/bash

git clone https://github.com/dalesteam/dales
cd dales
# git checkout to4.3_Fredrik  # optionally check out another branch
mkdir build
cd build

export SYST=lisa-intel
export MODULEPATH=/home/plgrid-groups/plggvecma/.qcg-modules:$MODULEPATH
module load vecma/common/cmake
module load netcdf/4.4.1.1_impi-5.0.3_icc-15.0.3
cmake ..

make -j 4

TU Delft VR Lab system

Tested in May 2020.

Compile

git clone https://github.com/dalesteam/dales                                                             
module load hdf5/gcc5/1.10.1                                                                             
module load netcdf/gcc5                                                                                  
module load netcdf-fortran/gcc5/4.4.4                                                                    
module load openmpi/gcc5/3.0.0                                                                           
export SYST=gnu-fast                                                                                     
                                                                                                         
cd dales                                                                                                 
# git checkout to4.3  # optionally check out a branch                                                                                       
mkdir build                                                                                              
cd build                                                                                                 
cmake ..                                                                                                 
make                                                                                                     

Run it:

# load the same modules
# cd to the case directory
~/dales/build/src/dales4 namoptions.001