Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure the bufr sounding job #2853

Merged
merged 11 commits into from
Sep 7, 2024
143 changes: 109 additions & 34 deletions scripts/exgfs_atmos_postsnd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,16 @@
# 7) 2018-07-18 Guang Ping Lou Generalize this version to other platforms
# 8) 2019-10-18 Guang Ping Lou Transition to reading in NetCDF model data
# 9) 2019-12-18 Guang Ping Lou generalizing to reading in NetCDF or nemsio
# 10) 2024-08-08 Bo Cui Update to handle one forecast at a time
# For GFSv17 bufr, total number of forecast hours is 141(num_hours=141)
# it requires 7 nodes & allocate 21 processes per node(num_ppn=21)
################################################################

source "${USHgfs}/preamble.sh"

cd $DATA
runscript=${USHgfs}/gfs_bufr.sh

cd "${DATA}" || exit 2

########################################

Expand All @@ -44,47 +49,116 @@ export NINT3=${FHOUT_GFS:-3}

rm -f -r "${COM_ATMOS_BUFR}"
mkdir -p "${COM_ATMOS_BUFR}"
export COM_ATMOS_BUFR="${COM_ATMOS_BUFR}"
BoCui-NOAA marked this conversation as resolved.
Show resolved Hide resolved

GETDIM="${USHgfs}/getncdimlen"
LEVS=$(${GETDIM} "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atmf000.${atmfm}" pfull)
export LEVS=$(${GETDIM} "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atmf000.${atmfm}" pfull)
BoCui-NOAA marked this conversation as resolved.
Show resolved Hide resolved
declare -x LEVS

### Loop for the hour and wait for the sigma and surface flux file:
export FSTART=$STARTHOUR
### wait for the sigma and surface flux file:
sleep_interval=10
max_tries=360
#
while [ $FSTART -lt $ENDHOUR ]
do
export FINT=$NINT1
# Define the end hour for the input
export FEND=$(expr $FSTART + $INCREMENT)
if test $FEND -lt 100; then FEND=0$FEND; fi
if [ $FSTART -eq 00 ]
then
export F00FLAG=YES
else
export F00FLAG=NO
fi

if [ $FEND -eq $ENDHOUR ]
then
export MAKEBUFR=YES
fi

filename="${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atm.logf${FEND}.${logfm}"
if ! wait_for_file "${filename}" "${sleep_interval}" "${max_tries}"; then
err_exit "FATAL ERROR: logf${FEND} not found after waiting $((sleep_interval * ( max_tries - 1) )) secs"
fi

## 1-hourly output before $NEND1, 3-hourly output after
if [[ $((10#$FEND)) -gt $((10#$NEND1)) ]]; then
export FINT=$NINT3
fi
${USHgfs}/gfs_bufr.sh

export FSTART="${FEND}"
# Initialize an empty list to store the hours
hour_list=()

# Generate hours from 0 to NEND1 with interval NINT1
# Convert ENDHOUR to decimal through $((10#$ENDHOUR)) to avoid it is thought as octal number
for (( hour=0; hour<=$((10#$NEND1)) && hour<=$((10#$ENDHOUR)); hour+=$((10#$NINT1)) )); do
BoCui-NOAA marked this conversation as resolved.
Show resolved Hide resolved
hour_list+=("$(printf "%03d" "$hour")")
done

# Generate hours from NEND1 + NINT3 to ENDHOUR with interval NINT3
for (( hour=$((10#$NEND1))+$((10#$NINT3)); hour<=$((10#$ENDHOUR)); hour+=$((10#$NINT3)) )); do
hour_list+=("$(printf "%03d" "$hour")")
done

# Print the hour list
echo "Hour List:" "${hour_list[@]}"

# Count the number of elements in the hour_list
num_hours="${#hour_list[@]}"

# Print the total number of hours
echo "Total number of hours: $num_hours"

# allocate 21 processes per node
# don't allocate more processes, or it might have memory issue
num_ppn=21
export APRUN="mpiexec -np ${num_hours} -ppn ${num_ppn} --cpu-bind core cfp "

if [ -s "${DATA}/poescript_bufr" ]; then
rm ${DATA}/poescript_bufr
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a utility script, ush/run_mpmh.sh, that handles setting up an MPMD job now. That is the preferred method, as it correctly handles both slurm and pbs/torque. You just need to give it the file with your list of commands as an argument. See the atmos products ex-script for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ush/run_mpmd.sh, the mpiexec command misses the setting of the process number per node in bufr job exgfs_atmos.postsnd.sh.. Will there be any update for the run_mpmd.sh in the future?

Copy link
Contributor

@WalterKolczynski-NOAA WalterKolczynski-NOAA Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try without the ppn setting first to confirm it is actually an issue (ideally the MPMD tasks should be equally distributed across all nodes anyway). If it is still required, an entry should be added to the env script on any machine where it is necessary to update the mpmd_opt setting to include -ppn for the sounding job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the bufr job using run_mpmd.sh without setting the ppn parameter, and the job failed. After adding the ppn setting, the bufr job completed successfully. The PBS setting in my jobcard is:
#PBS -l place=vscatter,select=7:ncpus=128:mpiprocs=128

Please let me know if I’m wrong.

Copy link
Contributor

@WalterKolczynski-NOAA WalterKolczynski-NOAA Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See new review. It should maintain the ppn setting while switching to run_mpmd.sh.


for fhr in "${hour_list[@]}"; do

if [ ! -s "${DATA}/${fhr}" ]; then mkdir -p ${DATA}/${fhr}; fi
export fhr=${fhr}
BoCui-NOAA marked this conversation as resolved.
Show resolved Hide resolved
export FINT=${NINT1}
## 1-hourly output before $NEND1, 3-hourly output after
if [[ $((10#${fhr})) -gt $((10#${NEND1})) ]]; then
export FINT=${NINT3}
fi
if [[ ${fhr} -eq 000 ]]; then
export F00FLAG="YES"
else
export F00FLAG="NO"
fi

# Convert fhr to integer
fhr_int=$((10#$fhr))

# Get previous hour
if (( fhr_int == STARTHOUR )); then
fhr_p=${fhr_int}
else
fhr_p=$(( fhr_int - FINT ))
fi

# Format fhr_p with leading zeros
fhr_p="$(printf "%03d" "$fhr_p")"

filename="${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atm.logf${fhr}.${logfm}"
if ! wait_for_file "${filename}" "${sleep_interval}" "${max_tries}"; then
echo "Waiting for the file ${filename} for $((sleep_interval * (max_tries - 1))) seconds..."
err_exit "FATAL ERROR: logf${fhr} not found after waiting $((sleep_interval * (max_tries - 1))) secs"
fi
BoCui-NOAA marked this conversation as resolved.
Show resolved Hide resolved
echo "${runscript} \"${fhr}\" \"${fhr_p}\" \"${FINT}\" \"${F00FLAG}\" \"${DATA}/${fhr}\"" >> "${DATA}/poescript_bufr"
done

chmod +x "${DATA}/poescript_bufr"
startmsg
BoCui-NOAA marked this conversation as resolved.
Show resolved Hide resolved
$APRUN "${DATA}/poescript_bufr"
export err=$?; err_chk

cd "${DATA}" || exit 2

# Initialize fortnum
fortnum=20

# Loop through each element in the array
for fhr in "${hour_list[@]}"; do
# Increment fortnum
fortnum=$((fortnum + 1))
${NLN} "${DATA}/${fhr}/fort.${fortnum}" "fort.${fortnum}"
done

export MAKEBUFR=YES
export fhr=${ENDHOUR}
export FINT=${NINT1}
## 1-hourly output before $NEND1, 3-hourly output after
if [[ $((10#${fhr})) -gt $((10#${NEND1})) ]]; then
export FINT=${NINT3}
fi
if [[ ${fhr} -eq 000 ]]; then
export F00FLAG="YES"
else
export F00FLAG="NO"
fi
${runscript} "${fhr}" "${fhr_p}" "${FINT}" "${F00FLAG}" "${DATA}"
Copy link
Contributor

@WalterKolczynski-NOAA WalterKolczynski-NOAA Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the last hour can't be done as part of the MPMD?

Copy link
Contributor Author

@BoCui-NOAA BoCui-NOAA Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some modifications to the code gfs_bufr.fd/gfs_bufr.f. This code handles two tasks, which is controlled by flag makebufr. When makebufr is set to false, the code reads separate forecast and generates temporary file for each forecast hour. When makebufr is set to yes, the code merges all the temporary files and generates the final bufr products. So the last hour forecast can be processes as part of the MPMD.


##############################################################
# Tar and gzip the individual bufr files and send them to /com
##############################################################
Expand All @@ -105,7 +179,7 @@ fi
# add appropriate WMO Headers.
########################################
rm -rf poe_col
for (( m = 1; m <= NUM_SND_COLLECTIVES ; m++ )); do
for (( m = 1; m <= $((10#$NUM_SND_COLLECTIVES)); m++ )); do
BoCui-NOAA marked this conversation as resolved.
Show resolved Hide resolved
echo "sh ${USHgfs}/gfs_sndp.sh ${m} " >> poe_col
done

Expand All @@ -123,4 +197,5 @@ ${APRUN_POSTSNDCFP} cmdfile
sh "${USHgfs}/gfs_bfr2gpk.sh"



############## END OF SCRIPT #######################
63 changes: 40 additions & 23 deletions ush/gfs_bufr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,17 @@
# 2018-05-30 Guang Ping Lou: Make sure all files are available.
# 2019-10-10 Guang Ping Lou: Read in NetCDF files
# 2024-03-03 Bo Cui: Add options to use different bufr table for different resolution NetCDF files
# 2024-08-08 Bo Cui: Update to handle one forecast at a time
# echo "History: February 2003 - First implementation of this utility script"
#
fhr="$1"
fhr_p="$2"
FINT="$3"
F00FLAG="$4"
workdir="$5"

cd "${workdir}" || exit 2

source "${USHgfs}/preamble.sh"

if [[ "${F00FLAG}" == "YES" ]]; then
Expand All @@ -45,30 +54,37 @@ cat << EOF > gfsparm
&NAMMET
levs=${LEVS},makebufr=${bufrflag},
dird="${COM_ATMOS_BUFR}/bufr",
nstart=${FSTART},nend=${FEND},nint=${FINT},
nstart=${fhr},nend=${fhr},nint=${FINT},
nend1=${NEND1},nint1=${NINT1},nint3=${NINT3},
nsfc=80,f00=${f00flag},fformat=${fformat},np1=0
nsfc=80,f00=${f00flag},fformat=${fformat},np1=0,
fnsig="sigf${fhr}",
fngrib="flxf${fhr}",
fngrib2="flxf${fhr_p}"
/
EOF

sleep_interval=10
max_tries=1000
for (( hr = 10#${FSTART}; hr <= 10#${FEND}; hr = hr + 10#${FINT} )); do
hh2=$(printf %02i "${hr}")
hh3=$(printf %03i "${hr}")

#---------------------------------------------------------
# Make sure all files are available:
filename="${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atm.logf${hh3}.${logfm}"
if ! wait_for_file "${filename}" "${sleep_interval}" "${max_tries}"; then
echo "FATAL ERROR: COULD NOT LOCATE logf${hh3} file"
exit 2
fi

#------------------------------------------------------------------
${NLN} "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atmf${hh3}.${atmfm}" "sigf${hh2}"
${NLN} "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.sfcf${hh3}.${atmfm}" "flxf${hh2}"
done

#---------------------------------------------------------
# Make sure all files are available:

filename="${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atm.logf${fhr}.${logfm}"
if ! wait_for_file "${filename}" "${sleep_interval}" "${max_tries}"; then
echo "FATAL ERROR: COULD NOT LOCATE logf${fhr} file"
exit 2
fi

filename="${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atm.logf${fhr_p}.${logfm}"
if ! wait_for_file "${filename}" "${sleep_interval}" "${max_tries}"; then
echo "FATAL ERROR: COULD NOT LOCATE logf${fhr_p} file"
exit 2
BoCui-NOAA marked this conversation as resolved.
Show resolved Hide resolved
fi

#------------------------------------------------------------------
${NLN} "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atmf${fhr}.${atmfm}" "sigf${fhr}"
${NLN} "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.sfcf${fhr}.${atmfm}" "flxf${fhr}"
${NLN} "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.sfcf${fhr_p}.${atmfm}" "flxf${fhr_p}"

# define input BUFR table file.
${NLN} "${PARMgfs}/product/bufr_gfs_${CLASS}.tbl" fort.1
Expand All @@ -82,18 +98,19 @@ case "${CASE}" in
${NLN} "${PARMgfs}/product/bufr_ij9km.txt" fort.7
;;
*)
echo "WARNING: No bufr table for this resolution, using the one for C768"
${NLN} "${PARMgfs}/product/bufr_ij13km.txt" fort.7
echo "FATAL ERROR: Unrecognized bufr_ij*km.txt For CASE ${CASE}, ABORT!"
exit 1
BoCui-NOAA marked this conversation as resolved.
Show resolved Hide resolved
;;
esac

${APRUN_POSTSND} "${EXECgfs}/${pgm}" < gfsparm > "out_gfs_bufr_${FEND}"
"${EXECgfs}/${pgm}" < gfsparm > "out_gfs_bufr_${fhr}"

export err=$?

if [[ "${err}" -ne 0 ]]; then
echo "GFS postsnd job error, Please check files "
echo "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atmf${hh2}.${atmfm}"
echo "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.sfcf${hh2}.${atmfm}"
echo "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.atmf${fhr}.${atmfm}"
echo "${COM_ATMOS_HISTORY}/${RUN}.${cycle}.sfcf${fhr}.${atmfm}"
err_chk
fi

Expand Down
Loading