Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel IO issue #1234

Closed
Hae-CheolKim-NOAA opened this issue May 26, 2022 · 7 comments
Closed

Parallel IO issue #1234

Hae-CheolKim-NOAA opened this issue May 26, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@Hae-CheolKim-NOAA
Copy link

Description

With the new compilation I cloned on May 23, 2002, I advanced a run for 1/12-th degree DATM (CDEPS)-MOM6-CICE6 on HERA but bumped into an parallel IO issue. Looks like this has something to do with parallel netcdf lib.

...
0: Abort with message NetCDF: Variable not found in file /scratch2/NCEPDEV/nwprod/hpc-stack/src/develop/pkg/pio-2.5.3/src/clib/pio_nc.c at line 1158
40: Abort with message NetCDF: Variable not found in file /scratch2/NCEPDEV/nwprod/hpc-stack/src/develop/pkg/pio-2.5.3/src/clib/pio_nc.c at line 1158
1: Abort with message NetCDF: Variable not found in file /scratch2/NCEPDEV/nwprod/hpc-stack/src/develop/pkg/pio-2.5.3/src/clib/pio_nc.c at line 1158
2: Abort with message NetCDF: Variable not found in file /scratch2/NCEPDEV/nwprod/hpc-stack/src/develop/pkg/pio-2.5.3/src/clib/pio_nc.c at line 1158
41: Abort with message NetCDF: Variable not found in file /scratch2/NCEPDEV/nwprod/hpc-stack/src/develop/pkg/pio-2.5.3/src/clib/pio_nc.c at line 1158
...

To Reproduce:

What compilers/machines are you seeing this with? INTEL/HERA
Give explicit steps to reproduce the behavior.

  1. Compile the model code at /scratch2/NCEPDEV/marine/Hae-Cheol.Kim/ufs-weather-model
  2. run the test case /scratch1/NCEPDEV/stmp2/Hae-Cheol.Kim/FV3_RT/rt_191768/GLBb0.08_038_gefs

Additional context

Add any other context about the problem here.
Directly reference any issues or PRs in this or other repositories that this is related to, and describe how they are related. Example:

  • needs to be fixed also in noaa-emc/nems/issues/<issue_number>
  • needed for noaa-emc/fv3atm/pull/<pr_number>
@Hae-CheolKim-NOAA Hae-CheolKim-NOAA added the bug Something isn't working label May 26, 2022
@DeniseWorthen
Copy link
Collaborator

What is the status of this issue?

@Hae-CheolKim-NOAA
Copy link
Author

Not resolved, yet.

@arunchawla-NOAA
Copy link

@edwardhartnett bringing to your attention. Do you have any suggestions for @Hae-CheolKim-NOAA

@edwardhartnett
Copy link
Contributor

This is a PIO error, not a netCDF error.

Have we tried with the latest release of PIO and netCDF?

It's complaining about a variable not found, what are the lines of code which are being called?

Is there a test code that demonstrates this problem? If not, could you generate one please? That is, a single-file Fortran program which does the same thing the model is doing, and generates the same error?

@Hae-CheolKim-NOAA
Copy link
Author

I tested the new src cloned on Feb 15, 2023, this issue was not repeated and I was able to compile and initialize the 1/12-th degree DATM (CDEPS)-MOM6-CICE6 on HERA. Sorry for the delay in updating the status and thanks for your support.

@junwang-noaa
Copy link
Collaborator

@Hae-CheolKim-NOAA It looks to me the issue is resolved. Can we close it?

@Hae-CheolKim-NOAA
Copy link
Author

Hae-CheolKim-NOAA commented Mar 27, 2023 via email

TerrenceMcGuinness-NOAA added a commit to NOAA-EMC/global-workflow that referenced this issue Oct 16, 2024
# Description

This PR has the GitHub Pipeline script in the `github/workflows`
directory for running CI tests
to be preformed an AWS virtual cluster. It is setup to be launched from
the dispatch action from the Actions tab.

For now it will only run C48_ATM 

Resolves #3006 

Once the yaml pipeline is in `.github/workflows` directory of the
default branch we can test it against [PR
2977](#2977) which may
be needed to build on Parallel Works Centos AWS.

Code managers can check to see if the self-hosted runner
[globalworkflow_parallelworks](https://github.com/NOAA-EMC/global-workflow/settings/actions/runners/22)
is up and ready by checking the
[Running](https://github.com/NOAA-EMC/global-workflow/settings/actions/runners)
Settings.

In pending work we should also be able spin up the cluster on demand
from GitHub as well.

<!-- For more on writing good commit messages, see
https://cbea.ms/git-commit/ -->

# Type of change
- [ ] Bug fix (fixes something broken)
- [ ] New feature (adds functionality)
- [x] Maintenance (code refactor, clean-up, new CI test, etc.)

# Change characteristics
<!-- Choose YES or NO from each of the following and delete the other
-->
- Is this a breaking change (a change in existing functionality)? YES/NO
- Does this change require a documentation update? YES/NO
- Does this change require an update to any of the following submodules?
YES/NO (If YES, please add a link to any PRs that are pending.)
  - [ ] EMC verif-global <!-- NOAA-EMC/EMC_verif-global#1234 -->
  - [ ] GDAS <!-- NOAA-EMC/GDASApp#1234 -->
  - [ ] GFS-utils <!-- NOAA-EMC/gfs-utils#1234 -->
  - [ ] GSI <!-- NOAA-EMC/GSI#1234 -->
  - [ ] GSI-monitor <!-- NOAA-EMC/GSI-Monitor#1234 -->
  - [ ] GSI-utils <!-- NOAA-EMC/GSI-Utils#1234 -->
  - [ ] UFS-utils <!-- ufs-community/UFS_UTILS#1234 -->
  - [ ] UFS-weather-model <!-- ufs-community/ufs-weather-model#1234 -->
  - [ ] wxflow <!-- NOAA-EMC/wxflow#1234 -->

# How has this been tested?
<!-- Please list any test you conducted, including the machine.

CI Tests runs-end-to end on an AWS Centos based virtual cluster on
Parallel Works.

-->

# Checklist
- [ ] Any dependent changes have been merged and published
- [x] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have documented my code, including function, input, and output
descriptions
- [ ] My changes generate no new warnings
- [ ] New and existing tests pass with my changes
- [x] This change is covered by an existing CI test or a new one has
been added
- [ ] I have made corresponding changes to the system documentation if
necessary

---------

Co-authored-by: tmcguinness <terry.mcguinness@noaa.gov>
EricSinsky-NOAA added a commit to NOAA-EMC/global-workflow that referenced this issue Oct 28, 2024
<!--
  *** PLEASE READ ***

  Any PRs not following this template will be closed.

  Please delete all these comments before submitting the PR.

Please use a short (<60 char), descriptive title for the PR title above.
It should complete the sentence "If merged, this PR will _____".
Capitalize the first word and do not end with a period.

  No content should appear above the "Description" header.

If this PR is not merge-ready (e.g. it depends on other PRs not yet
merged), please mark it as draft until it is ready.

  PRs should meet these guidelines:
  - Each PR should address ONE topic and have an associated issue.
  - No hard-coded paths or personal directories.
  - No temporary or backup files should be committed (including logs).
- Any code that you disabled by being commented out should be removed or
reenabled.
-->
# Description
<!-- This description will become the commit message for the PR. -->
<!--
  Solely pointing to an issue is not an adequate description!

  Please use this format for your description:

Describe your changes. Focus on the *what* and *why*. The *how* will be
evident from the changes. In particular, be sure to note any interface
changes, such as command line syntax, that will need to be communicated
to users.

At the end of your description, please be sure to add the issue this PR
solves using the word "Resolves". If there are any issues that are
related but not yet resolved (including in other repos), you may use
"Refs".

  Resolves #1234
  Refs #4321
  Refs NOAA-EMC/repo#5678
-->

This PR brings recent changes from the develop branch to the GEFS
reforecast branch. This PR updates the GEFS reforecast branch to develop
hash ac3cde5 (10/11/2024). This version
of global-workflow uses the ufs-weather-model hash
[6a4e09e](https://github.com/ufs-community/ufs-weather-model/tree/6a4e09e94773ffa39ce7ab6a54a885efada91f21)
(9/9/2024).

Furthermore, this PR ensures the following adjustments for the
reforecast:

- [x] Speed up rocoto by grouping post job
- [x] Optimize PE configuration
- [x]  Remove duplicate OCNSPPT and EPBL settings
- [x] Set restart_interval to fhmax
- [x] Turn off SHUM in config.efcs
- [x] Set FHMIN_WAV to 3 in config.base 
- [x] Turn off ATM history file output
- [x] Change  HMS=${cyc}0000 to HMS=030000 in Wavepostpnt script (#2788)
- [x] Include YYYYMMDDHH (PDY) in job name
- [x] Change CA seed based on case and cyc for control member and
perturbed members
- [x] Fix post ensemble info
- [x] Add tob to ocean products (#2995 )
- [x] Move PEVPR from b group to a group for atmos products (#2995)
- [x] Add option to download initial condition from HPSS
- [x] Add ability to download and stage replay analysis from AWS, which
is needed for the repair_replay task
- [x] Add capability to run forecasts in 7-day intervals  (#2928)
- [x] Update defaults.yaml so that many of the reforecast-specific
settings can be used by default

<!-- For more on writing good commit messages, see
https://cbea.ms/git-commit/ -->

# Type of change
- [ ] Bug fix (fixes something broken)
- [ ] New feature (adds functionality)
- [x] Maintenance (code refactor, clean-up, new CI test, etc.)

# Change characteristics
<!-- Choose YES or NO from each of the following and delete the other
-->
- Is this a breaking change (a change in existing functionality)? NO
- Does this change require a documentation update? NO
- Does this change require an update to any of the following submodules?
NO
  - [ ] EMC verif-global <!-- NOAA-EMC/EMC_verif-global#1234 -->
  - [ ] GDAS <!-- NOAA-EMC/GDASApp#1234 -->
  - [ ] GFS-utils <!-- NOAA-EMC/gfs-utils#1234 -->
  - [ ] GSI <!-- NOAA-EMC/GSI#1234 -->
  - [ ] GSI-monitor <!-- NOAA-EMC/GSI-Monitor#1234 -->
  - [ ] GSI-utils <!-- NOAA-EMC/GSI-Utils#1234 -->
  - [ ] UFS-utils <!-- ufs-community/UFS_UTILS#1234 -->
  - [ ] UFS-weather-model <!-- ufs-community/ufs-weather-model#1234 -->
  - [ ] wxflow <!-- NOAA-EMC/wxflow#1234 -->

# How has this been tested?
<!-- Please list any test you conducted, including the machine.

Example:
- Clone and build on WCOSS
- Cycled test on Orion
- Forecast-only on Hera
-->

This branch is being tested on WCOSS2. When testing has succeeded, this
PR will be marked as ready for review.

# Checklist
- [ ] Any dependent changes have been merged and published
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have documented my code, including function, input, and output
descriptions
- [ ] My changes generate no new warnings
- [ ] New and existing tests pass with my changes
- [ ] This change is covered by an existing CI test or a new one has
been added
- [ ] I have made corresponding changes to the system documentation if
necessary

---------

Co-authored-by: Wei Huang <wei.huang@noaa.gov>
Co-authored-by: Kate Friedman <kate.friedman@noaa.gov>
Co-authored-by: Cory Martin <cory.r.martin@noaa.gov>
Co-authored-by: Andrew.Tangborn <Andrew.Tangborn@noaa.gov>
Co-authored-by: Walter Kolczynski - NOAA <Walter.Kolczynski@noaa.gov>
Co-authored-by: AndrewEichmann-NOAA <58948505+AndrewEichmann-NOAA@users.noreply.github.com>
Co-authored-by: DavidBurrows-NCO <82525974+DavidBurrows-NCO@users.noreply.github.com>
Co-authored-by: AnningCheng-NOAA <48297505+AnningCheng-NOAA@users.noreply.github.com>
Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>
Co-authored-by: AntonMFernando-NOAA <167725623+AntonMFernando-NOAA@users.noreply.github.com>
Co-authored-by: BoCui-NOAA <53531984+BoCui-NOAA@users.noreply.github.com>
Co-authored-by: DavidNew-NOAA <134300700+DavidNew-NOAA@users.noreply.github.com>
Co-authored-by: Jeffrey Whitaker <jeffrey.s.whitaker@noaa.gov>
Co-authored-by: mingshichen-noaa <48537176+mingshichen-noaa@users.noreply.github.com>
Co-authored-by: Jiarui Dong <Jiarui.Dong@noaa.gov>
Co-authored-by: David Huber <david.huber@noaa.gov>
Co-authored-by: Guillaume Vernieres <guillaume.vernieres@gmail.com>
Co-authored-by: RussTreadon-NOAA <26926959+RussTreadon-NOAA@users.noreply.github.com>
Co-authored-by: Innocent Souopgui <162634017+InnocentSouopgui-NOAA@users.noreply.github.com>
Co-authored-by: Neil Barton <103681022+NeilBarton-NOAA@users.noreply.github.com>
DavidHuber-NOAA pushed a commit to NOAA-EMC/global-workflow that referenced this issue Oct 31, 2024
# Description

This update to the GitHub dispatched CI pipeline to execute the
self-hosted GitHub Runner on Parallel Works now adds the feature that
starts up the virtual compute cluster automatically. We now have a
complete end-to-end automated process for running CI tests in Parallel
Works.

Next steps would be tear-down and adding more test to see if it scales.

It also has the update for getting a PR to load up when its originating
from a forked repo.

# Type of change
- [ ] Bug fix (fixes something broken)
- [x] New feature (adds functionality)
- [ ] Maintenance (code refactor, clean-up, new CI test, etc.)

# Change characteristics
<!-- Choose YES or NO from each of the following and delete the other
-->
- Is this a breaking change (a change in existing functionality)? NO
- Does this change require a documentation update? YES
- Does this change require an update to any of the following submodules?
NO (If YES, please add a link to any PRs that are pending.)
  - [ ] EMC verif-global <!-- NOAA-EMC/EMC_verif-global#1234 -->
  - [ ] GDAS <!-- NOAA-EMC/GDASApp#1234 -->
  - [ ] GFS-utils <!-- NOAA-EMC/gfs-utils#1234 -->
  - [ ] GSI <!-- NOAA-EMC/GSI#1234 -->
  - [ ] GSI-monitor <!-- NOAA-EMC/GSI-Monitor#1234 -->
  - [ ] GSI-utils <!-- NOAA-EMC/GSI-Utils#1234 -->
  - [ ] UFS-utils <!-- ufs-community/UFS_UTILS#1234 -->
  - [ ] UFS-weather-model <!-- ufs-community/ufs-weather-model#1234 -->
  - [ ] wxflow <!-- NOAA-EMC/wxflow#1234 -->

# How has this been tested?

The start up aspected has been tested from my forked repo but could not
test repos that are forked.
The test from forked repos has to be tested once the workflow pipeline
in the **develop** branch.

# Checklist
- [x] Any dependent changes have been merged and published
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have documented my code, including function, input, and output
descriptions
- [x] My changes generate no new warnings
- [x] New and existing tests pass with my changes
- [x] This change is covered by an existing CI test or a new one has
been added
- [ ] Any new scripts have been added to the .github/CODEOWNERS file
with owners
- [ ] I have made corresponding changes to the system documentation if
necessary

---------

Co-authored-by: tmcguinness <terry.mcguinness@noaa.gov>
Co-authored-by: tmcguinness <tmcguinness@129.qarestr.sub-172-16-82.myvzw.com>
DavidHuber-NOAA added a commit to NOAA-EMC/global-workflow that referenced this issue Jan 7, 2025
# Description
As referred within #3019, the variable 5WAVH is being removed from each
of the files `parm/wmo/grib2_awpgfs[000-240].003` and
`parm/wmo/grib2_awpgfs_20km_[ak,conus,pac,prico]f000` for the purpose of
remedying "error code 30" that was generated through the execution of
`exgfs_atmos_awips_20km_1p0deg.sh` during the GFSv17 HR4 test run.
Obsolete code is also being removed from the script
`exgfs_atmos_awips_20km_1p0deg.sh`.

No other errors mentioned in #3019 are addressed in this PR.

# Type of change
- [x] Bug fix (fixes something broken)
- [ ] New feature (adds functionality)
- [ ] Maintenance (code refactor, clean-up, new CI test, etc.)

# Change characteristics
- Is this a breaking change (a change in existing functionality)? NO
- Does this change require a documentation update? NO
- Does this change require an update to any of the following submodules?
NO
  (If YES, please add a link to any PRs that are pending.)
  - [ ] EMC verif-global <!-- NOAA-EMC/EMC_verif-global#1234 -->
  - [ ] GDAS <!-- NOAA-EMC/GDASApp#1234 -->
  - [ ] GFS-utils <!-- NOAA-EMC/gfs-utils#1234 -->
  - [ ] GSI <!-- NOAA-EMC/GSI#1234 -->
  - [ ] GSI-monitor <!-- NOAA-EMC/GSI-Monitor#1234 -->
  - [ ] GSI-utils <!-- NOAA-EMC/GSI-Utils#1234 -->
  - [ ] UFS-utils <!-- ufs-community/UFS_UTILS#1234 -->
  - [ ] UFS-weather-model <!-- ufs-community/ufs-weather-model#1234 -->
  - [ ] wxflow <!-- NOAA-EMC/wxflow#1234 -->

# How has this been tested?
Removal of variable 5WAVH from the GRIB2 files should allow completion
of TOCGRIB2 processing (within `exgfs_atmos_awips_20km_1p0deg.sh`) of
the GRIB2 files. @RuiyuSun, or the GW team, may wish to include the
requested modifications for future GFSv17 tests that include
post-processing jobs.

# Checklist
- [ ] Any dependent changes have been merged and published
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have documented my code, including function, input, and output
descriptions
- [ ] My changes generate no new warnings
- [ ] New and existing tests pass with my changes
- [ ] This change is covered by an existing CI test or a new one has
been added
- [ ] Any new scripts have been added to the .github/CODEOWNERS file
with owners
- [ ] I have made corresponding changes to the system documentation if
necessary

Co-authored-by: christopher hill <christopher.m.hill@dlogin05.dogwood.wcoss2.ncep.noaa.gov>
Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>
Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants