Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit of directory comparison tools #934

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions test/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Global workflow comparison tools
A collection of tools to compare two different global workflow experiments for bitwise identicality.

## Disclaimer

These tools are still a work-in-progress. Use at your own risk. There is no guarantee every relevant file will be compared (but feel free to make a pull request adding more).

# Usage

## Quick start
### To compare two UFS run directories
```
./diff_UFS_rundir.sh dirA dirB
```
Where `dirA` and `dirB` are the two UFS run directories.


### To compare two ROTDIRs
```
./diff_ROTDIR.sh dirA dirB
```
Where `dirA` and `dirB` are the two cycle directories (`.../gfs.YYYYMMDD/HH/`)

OR

```
./diff_ROTDIR.sh rotdir cdate expA expB
```

Where:
- `rotdir` is the root of your rotdirs (the portion of path the experiments share)
- `cdate` is the datetime of the cycle in YYYMMDDHH format
- `expA` and `expB` are the experiment names ($PSLOT) of each experiment

## Description

There are currently two tools included in this package:
* `diff_UFS_rundir.sh` will compare two UFS run directories (must have retained them by setting `KEEPDATA` to `NO` in config.base)
* `diff_ROTDIR.sh` will compare entire ROTDIRs

Both scripts work similarly. You will need two experiments to compare. Typically this means a "baseline" experiment using the current develop and whatever feature you are working on. Experiments need to be for the same cycle and use all the same settings, otherwise there is no chance of them matching. Except for specific text files, file lists are constructed by globbing the first experiment directory, so if the second experiment contains files that would otherwise be included, they will be skipped.

There are three classes of files compared:
- Text files, by simple posix diff
- GRiB2 files, using correaltion from `wgrib2`
- NetCDF files, using NetCDF Operators (nco)

Text and grib2 files are processed first and complete quickly. NetCDF processing is currently a lot slower.

Any variables listed in the coordinates.lst file will be ignored when comparing NetCDFs. This is because coordinate variables are not differenced, so when iterating through the variables of the difference they will be non-zero.

## Output

Output will appear like this:
```
=== <filename> ===
<comparison info>

```

For text files, it will be the ouput of posix diff, which is just an empty string when identical:
```
...

=== field_table ===


=== input.nml ===
310,313c310,313
< FNGLAC = '/scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/develop/fix/fix_am/global_glacier.2x2.grb'
< FNMXIC = '/scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/develop/fix/fix_am/global_maxice.2x2.grb'
< FNTSFC = '/scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/develop/fix/fix_am/RTGSST.1982.2012.monthly.clim.grb'
< FNSNOC = '/scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/develop/fix/fix_am/global_snoclim.1.875.grb'
---
> FNGLAC = '/scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/add_preamble/fix/fix_am/global_glacier.2x2.grb'
> FNMXIC = '/scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/add_preamble/fix/fix_am/global_maxice.2x2.grb'
> FNTSFC = '/scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/add_preamble/fix/fix_am/RTGSST.1982.2012.monthly.clim.grb'
> FNSNOC = '/scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/add_preamble/fix/fix_am/global_snoclim.1.875.grb'

...
```
(Text diffs have two extra blank line to separate the output.)

Grib files will look like this if they are identical:
```
=== GFSFLX.GrbF00 ===
All fields are identical!
=== GFSFLX.GrbF03 ===
All fields are identical!
=== GFSFLX.GrbF06 ===
All fields are identical!
=== GFSFLX.GrbF09 ===
All fields are identical!
=== GFSFLX.GrbF12 ===
All fields are identical!

...

```

And NetCDFs will look like this:
```
=== atmf000.nc ===
0 differences found
=== atmf003.nc ===
0 differences found
=== atmf006.nc ===
0 differences found
=== atmf009.nc ===
0 differences found

...
```

If any variables in a grib or NetCDF do not match, they will be listed instead.
8 changes: 8 additions & 0 deletions test/coordinates.lst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
grid_xt
grid_yt
lat
lon
pfull
phalf
time
time_iso
162 changes: 162 additions & 0 deletions test/diff_ROTDIR.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
#! /bin/env bash

#
# Differences relevant output files in two different experiment ROTDIRs.
# Text files are compared via posix diff. GRiB files are compared via
# correlation reported by wgrib2. NetCDF files are compared by using
# NetCDF operators to calculate a diff then make sure all non-coordinate
# variable differences are zero. File lists are created by globbing key
# directories under the first experiment given.
#
# Syntax:
# diff_ROTDIR.sh [-c coord_file][-h] rotdir cdate expA expB
#
# OR
#
# diff_ROTDIR.sh [-c coord_file][-h] dirA dirB
#
# Arguments:
# rotdir: root rotdir where ROTDIRS are held
# cdate: experiment date/cycle in YYYYMMDDHH format
# expA, expB: experiment ids (PSLOT) to compare
#
# dirA, dirB: full paths to the cycle directories to be compared
# (${rotdir}/${exp}/gfs.${YYYYMMDD}/${cyc})
#
# Options:
# -c coord_file: file containing a list of coordinate variables
# -h: print usage message and exit
#

set -eu

usage() {
#
# Print usage statement
#
echo <<- 'EOF'
Differences relevant output files in two different experiment ROTDIRs.
Text files are compared via posix diff. GRiB files are compared via
correlation reported by wgrib2. NetCDF files are compared by using
NetCDF operators to calculate a diff then make sure all non-coordinate
variable differences are zero. File lists are created by globbing key
directories under the first experiment given.

Syntax:
diff_ROTDIR.sh [-c coord_file][-h] rotdir cdate expA expB

OR

diff_ROTDIR.sh [-c coord_file][-h] dirA dirB

Arguments:
rotdir: root rotdir where ROTDIRS are held
cdate: experiment date/cycle in YYYYMMDDHH format
expA, expB: experiment ids (PSLOT) to compare

dirA, dirB: full paths to the cycle directories to be compared
(${rotdir}/${exp}/gfs.${YYYYMMDD}/${cyc})

Options:
-c coord_file: file containing a list of coordinate variables
-h: print usage message and exit
EOF
}

while getopts ":c:h" option; do
case "${option}" in
c) coord_file=${OPTARG} ;;
h) usage; exit 0 ;;
*) echo "Unknown option ${option}"; exit 1 ;;
esac
done

num_args=$#
case $num_args in
2) # Direct directory paths
dirA=$1
dirB=$2
;;
4) # Derive directory paths
rotdir=$1
date=$2
expA=$3
expB=$4

YYYYMMDD=$(echo $date | cut -c1-8)
cyc=$(echo $date | cut -c9-10)
dirA="$rotdir/$expA/gfs.${YYYYMMDD}/${cyc}"
dirB="$rotdir/$expB/gfs.${YYYYMMDD}/${cyc}"
;;
*) # Unknown option
echo "${num_args} is not a valid number of arguments, use 2 or 4"
usage
exit 1
;;
esac

temp_file=".diff.nc"

# Contains a bunch of NetCDF Operator shortcuts (will load nco module)
source ./netcdf_op_functions.sh
source ./test_utils.sh

coord_file="${coord_file:-./coordinates.lst}"

## Text files
files=""
files="${files} atmos/input.nml" # This file will be different because of the fix paths
files="${files} $(basename_list 'atmos/' "$dirA/atmos/storms.*" "$dirA/atmos/trak.*")"
if [[ -d $dirA/ice ]]; then
files="${files} ice/ice_in"
fi
if [[ -d $dirA/ocean ]]; then
files="${files} ocean/MOM_input"
fi
# if [[ -d $dirA/wave ]]; then
# files="${files} $(basename_list 'wave/station/' "$dirA/wave/station/*bull_tar")"
# fi

for file in $files; do
echo "=== ${file} ==="
fileA="$dirA/$file"
fileB="$dirB/$file"
diff $fileA $fileB || :
done

## GRiB files

module load wgrib2/2.0.8

files=""
files="${files} $(basename_list 'atmos/' $dirA/atmos/*grb2* $dirA/atmos/*.flux.*)"
if [[ -d $dirA/wave ]]; then
files="${files} $(basename_list 'wave/gridded/' $dirA/wave/gridded/*.grib2)"
fi
if [[ -d $dirA/ocean ]]; then
files="${files} $(basename_list 'ocean/' $dirA/ocean/*grb2)"
fi

for file in $files; do
echo "=== ${file} ==="
fileA="$dirA/$file"
fileB="$dirB/$file"
./diff_grib_files.py $fileA $fileB
done

## NetCDF Files
files=""
files="${files} $(basename_list 'atmos/' $dirA/atmos/*.nc)"
if [[ -d $dirA/ice ]]; then
files="${files} $(basename_list 'ice/' $dirA/ice/*.nc)"
fi
if [[ -d $dirA/ocean ]]; then
files="${files} $(basename_list 'ocean/' $dirA/ocean/*.nc)"
fi

for file in $files; do
echo "=== ${file} ==="
fileA="$dirA/$file"
fileB="$dirB/$file"
nccmp -q $fileA $fileB $coord_file
done
110 changes: 110 additions & 0 deletions test/diff_UFS_rundir.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
#! /bin/env bash

#
# Differences relevant output files in two UFS model directories. GRiB files
# are compared via correlation reported by wgrib2. NetCDF files are compared
# by using NetCDF operators to calculate a diff then make sure all non-
# coordinate variable differences are zero.
#
# Syntax:
# diff_UFS_rundir.sh [-c coord_file][-h] dirA dirB
#
# Arguments:
# dirA, dirB: full paths to the UFS run directories to be compared
#
# Options:
# -c coord_file: file containing a list of coordinate variables
# -h: print usage message and exit
#

set -eu

usage() {
#
# Print usage statement
#
echo <<- 'EOF'
Differences relevant output files in two UFS model directories. GRiB files
are compared via correlation reported by wgrib2. NetCDF files are compared
by using NetCDF operators to calculate a diff then make sure all non-
coordinate variable differences are zero.

Syntax:
diff_UFS_rundir.sh [-c coord_file][-h] dirA dirB

Arguments:
dirA, dirB: full paths to the UFS run directories to be compared

Options:
-c coord_file: file containing a list of coordinate variables
-h: print usage message and exit
EOF
}

while getopts ":c:h" option; do
case "${option}" in
c) coord_file=${OPTARG} ;;
h) usage; exit 0 ;;
*) echo "Unknown option ${option}"; exit 1 ;;
esac
done

num_args=$#
case $num_args in
2) # Direct directory paths
dirA=$1
dirB=$2
;;
*) # Unknown option
echo "${num_args} is not a valid number of arguments, use 2"
usage
exit 1
;;
esac

source ./netcdf_op_functions.sh
source ./test_utils.sh

temp_file=".diff.nc"
coord_file="${coord_file:-./coordinates.lst}"

# Input files
files="data_table diag_table fd_nems.yaml field_table ice_in input.nml med_modelio.nml \
model_configure nems.configure pio_in ww3_multi.inp ww3_shel.inp"

for file in $files; do
echo "=== ${file} ==="
fileA="$dirA/$file"
fileB="$dirB/$file"
if [[ -f "$fileA" ]]; then
diff $fileA $fileB || :
else
echo ; echo;
done

# GRiB files
files="$(basename_list '' $dirA/GFSFLX.Grb*)"

module load wgrib2/2.0.8

for file in $files; do
echo "=== ${file} ==="
fileA="$dirA/$file"
fileB="$dirB/$file"
./diff_grib_files.py $fileA $fileB
done

# NetCDF Files
files=""
files="${files} $(basename_list '' $dirA/atmf*.nc $dirA/sfcf*.nc)"
if [[ -d "$dirA/history" ]]; then
files="$(basename_list 'history/' $dirA/history/*.nc)"
fi

for file in $files; do
echo "=== ${file} ==="
fileA="$dirA/$file"
fileB="$dirB/$file"
nccmp -q $fileA $fileB $coord_file
done

Loading