Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gfspost job has missing input fluxfile error when running with WRITE_DOPOST=.false. #1157

Closed
jkhender opened this issue Dec 8, 2022 · 4 comments · Fixed by #1181
Closed
Assignees
Labels
bug Something isn't working

Comments

@jkhender
Copy link
Contributor

jkhender commented Dec 8, 2022

Expected behavior
When WRITE_DOPOST is set to .false. in config.base, gfspost job completes successfully.

Current behavior
gfspost job fails due to missing fluxfile when running the following command in inter_flux.sh

if [ $INLINE_POST = ".false." ]; then
     $WGRIB2 $PGBOUT $option1 $option21 $option22 $option23 $option24 \
     $option25 $option26 $option27 $option28 \
     -new_grid $grid1p0  fluxfile_${fhr3}_1p00
else
     $WGRIB2 $COMOUT/${FLUXFL} $option1 $option21 $option22 $option23 $option24 \
     $option25 $option26 $option27 $option28 \
     -new_grid $grid1p0  fluxfile_${fhr3}_1p00
fi

*** FATAL ERROR: missing input file fluxfile ***

Machines affected
hera

To Reproduce
change WRITE_DOPOST to .false. in config.base
run gfsfcst task
run gfspost task

Additional Information
logfile: /scratch1/BMC/gsd-fv3-dev/Judy.K.Henderson/test/gslv17p8_dev/FV3GFSrun/test_emc/logs/2022111800/gfspost*log

@jkhender jkhender added the bug Something isn't working label Dec 8, 2022
@lgannoaa lgannoaa self-assigned this Dec 9, 2022
@lgannoaa
Copy link
Contributor

lgannoaa commented Dec 9, 2022

I will take a look.

@lgannoaa
Copy link
Contributor

lgannoaa commented Dec 9, 2022

Developer is using an older version of the GFS where it used JGLOBAL_ATMOS_NCEPPOST-exgfs_atmos_nceppost.sh approach.
The current develop using JGLOBAL_ATMOS_POST-exgfs_atmos_post.sh.
Error happen when:

  • inter_flux.sh[47]: '[' .false. = .false. ']'
    (WRITE_DOPOST set to .false.)
  • inter_flux.sh[48]: wgrib2 fluxfile -set_grib_type same -new_grid_winds earth -new_grid_interpolation bilinear -if ':(LAND|CRAIN|CICEP|CFRZR|CSNOW|ICSEV):' -new_grid_inter
    polation neighbor -fi -set_bitmap 1 -set_grib_max_bits 16 -if ':(APCP|ACPCP|PRATE|CPRAT):' -set_grib_max_bits 25 -fi -if ':(APCP|ACPCP|PRATE|CPRAT|DZDT):' -new_grid_interpo
    lation budget -fi -new_grid latlon 0:360:1.0 90:181:-1.0 fluxfile_012_1p00
    *** FATAL ERROR: missing input file fluxfile ***
    File gfs.t${cyc}z.sfluxgrbf${fhr}.grib2 is created from forecast job when using inline post.
    inter_flux.sh try to access a non-exist file named fluxfile when failed.

In the developer's exgfs_atmos_nceppost.sh line 406-410 has application logic to create fluxfile:
---
if [ $INLINE_POST = ".false." ]; then
$POSTGPSH
export err=$?; err_chk
mv fluxfile $COMOUT/${FLUXFL}
fi
---
However, in 400-404 executed the inter_flux.sh script where is require the fluxfile as input:
---
#Add extra flux.1p00 file for coupled
if [ "$FLXGF" = 'YES' ]; then
export FH=$(expr $fhr + 0)
$GFSDOWNSHF
export err=$?; err_chk
fi
---
Therefore this job failed as experienced by the developer.

Recommended solution is to move session of the code to create fluxfile before run the script inter_flux.sh
Please replace /scratch1/BMC/gsd-fv3-dev/Judy.K.Henderson/test/gslv17p8_dev/scripts/exgfs_atmos_nceppost.sh
with
/scratch1/NCEPDEV/stmp2/Lin.Gan/ptmp/issue-1157/exgfs_atmos_nceppost.sh

Rerun failed post job.

@jkhender
Copy link
Contributor Author

I did a test replacing my exgfs_atmos_nceppost.sh with /scratch1/NCEPDEV/stmp2/Lin.Gan/ptmp/issue-1157/exgfs_atmos_nceppost.sh.

fluxfile now gets created by executing exgfs_atmos_nceppost.sh first before calling inter_flux.sh. But now the job fails because exgfs_atmos_nceppost.sh moves the file to $COMOUT/${FLUXFL} at line 403 of exgfs_atmos_nceppost.sh and inter_flux.sh is still looking for fluxfile in the RUNDIRS directory at line 49 of inter_flux.sh.

exgfs_atmos_nceppost.sh:

399       # Create fluxfile for the GFSDOWNSHF if running inline post
400       if [ $INLINE_POST = ".false." ]; then
401         $POSTGPSH
402         export err=$?; err_chk
403         mv fluxfile $COMOUT/${FLUXFL}
404       fi
405       $WGRIB2 -s $COMOUT/${FLUXFL} > $COMOUT/${FLUXFLIDX}

inter_flux.sh:

48   if [ $INLINE_POST = ".false." ]; then
49     $WGRIB2 $PGBOUT $option1 $option21 $option22 $option23 $option24 \
50                           $option25 $option26 $option27 $option28 \
51                           -new_grid $grid1p0  fluxfile_${fhr3}_1p00
52   else
53     $WGRIB2 $COMOUT/${FLUXFL} $option1 $option21 $option22 $option23 $option24 \
54                           $option25 $option26 $option27 $option28 \
55                           -new_grid $grid1p0  fluxfile_${fhr3}_1p00
56   fi

if line 49 is replaced with the line below, then the job completes successfully.

$WGRIB2 $COMOUT/${FLUXFL} $option1 $option21 $option22 $option23 $option24 \

Since both cases execute the same command, there is no need for an "if" statement.

My log files are located here:
/scratch1/BMC/gsd-fv3-dev/Judy.K.Henderson/test/gslv17p8_dev/FV3GFSrun/test_emc/logs/2022111800/
gfspost_f006-f006.log_original_error
gfspost_f006-f006.log_new_error
gfspost_f006-f006.log

The example I provided happened to be from an August version of the workflow. However, the same issues are in the current codebase although the files have been updated since I had checked them out.

I had an Oct25 version of the workflow that I tested with similar results.

/scratch1/BMC/gsd-fv3-dev/jhender/test/emc_gw/FV3GFSrun/posttest/logs/2022111800/
gfspost_f006-f006.log completed successfully
gfspost_f006-f006.log.0 failed with inter_flux.sh error

This codebase has exgfs_atmos_nceppost.sh renamed to exgfs_atmos_post.sh.

Let me know if you need additional information.

@lgannoaa
Copy link
Contributor

lgannoaa commented Dec 13, 2022

Thank you @jkhender for provide this information. I run your new test case and jobs completed successfully.

I upgraded your test using the latest global workflow develop #3e53e06. It works as expected when using inline post.
However, when WRITE_DOPOST=".false." in config.base, the post job failed with issue noted in this ticket. A PR will be provided to fix this issue.

If you want to run your test using the latest global workflow develop, here is the package information:
HOMEgfs: /scratch1/NCEPDEV/global/Lin.Gan/git/PR/issue-1157
expdir: /scratch1/NCEPDEV/global/Lin.Gan/expdir/posttest_d
pslot: posttest_d
com: /scratch1/NCEPDEV/stmp2/Lin.Gan/ptmp/com/posttest_d

WalterKolczynski-NOAA pushed a commit that referenced this issue Dec 15, 2022
)

Post would fail when running without inline post because it would attempt to create 1p00 flux files before the flux file had been generated. `inter_flux.sh` also had to be updated to point at the correct file for offline post, which is now the same as inline.

Fixes #1157
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants