Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable run_jedi_exe.py to run 3dvar #278

Merged
merged 20 commits into from
Jan 24, 2023
Merged

Conversation

RussTreadon-NOAA
Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA commented Jan 17, 2023

ush/run_jedi_exe.py no longer successfully executes fv3jedi_var.x using the 3dvar yamls in ush/examples/run_jedi_exe. This functionality is restored in branch feature/runjedi. This PR is opened to get the required changes into develop

@RussTreadon-NOAA RussTreadon-NOAA self-assigned this Jan 17, 2023
@RussTreadon-NOAA RussTreadon-NOAA added hera-GW-RT Queue for automated testing with global-workflow on Hera orion-GW-RT Queue for automated testing with global-workflow on Orion labels Jan 17, 2023
@emcbot emcbot added hera-GW-RT-Running Automated testing with global-workflow running on Hera orion-GW-RT-Running Automated testing with global-workflow running on Orion and removed hera-GW-RT Queue for automated testing with global-workflow on Hera orion-GW-RT Queue for automated testing with global-workflow on Orion labels Jan 17, 2023
@emcbot
Copy link

emcbot commented Jan 17, 2023

Automated Global-Workflow GDASApp Testing Results:
Machine: hera

Start: Tue Jan 17 21:19:22 UTC 2023 on hfe07
---------------------------------------------------
Build:                                 *SUCCESS*
Build: Completed at Tue Jan 17 22:14:38 UTC 2023
---------------------------------------------------
Tests:                                 *SUCCESS*
Tests: Completed at Tue Jan 17 22:25:35 UTC 2023
Tests: 100% tests passed, 0 tests failed out of 40

@emcbot emcbot added hera-GW-RT-Passed Automated testing with global-workflow successful on Hera and removed hera-GW-RT-Running Automated testing with global-workflow running on Hera labels Jan 17, 2023
@emcbot
Copy link

emcbot commented Jan 17, 2023

Automated Global-Workflow GDASApp Testing Results:
Machine: orion

Start: Tue Jan 17 15:26:40 CST 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build:                                 *SUCCESS*
Build: Completed at Tue Jan 17 16:37:54 CST 2023
---------------------------------------------------
Tests:                                 *SUCCESS*
Tests: Completed at Tue Jan 17 16:49:00 CST 2023
Tests: 100% tests passed, 0 tests failed out of 39

@emcbot emcbot added orion-GW-RT-Passed Automated testing with global-workflow successful on Orion and removed orion-GW-RT-Running Automated testing with global-workflow running on Orion labels Jan 17, 2023
Copy link
Contributor

@guillaumevernieres guillaumevernieres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
I haven't tested the application, but I assume it's all good!

@RussTreadon-NOAA
Copy link
Contributor Author

@CoryMartin-NOAA and @guillaumevernieres , I don't think one of the changes in this PR is the best way to do things.

I cut-n-pasted functions genYAML and get_runtime_config from genYAML into run_jedi_exe.py I execute genYAML from run_jedi_exe.py to create the yaml file used by fv3jedi_var.x.

I modified the genYAML which I added to run_jedi_exe.py. It now has two arguments: yamlconfig and output_file. output_file is the yaml which fv3jedi_var.x reads. I need to experiment to see if this addition is really necessary.

Is there a way to reference the external genYAML from run_jedi_exe.py? This is a cleaner solution. Replicating 99% of the functions from one file in another file isn't good software design.

@CoryMartin-NOAA
Copy link
Contributor

@RussTreadon-NOAA the easiest thing to do is probably move those two functions into a file in ush/ufsda, then both the ush/genYAML and run_jedi_exe.py scripts can do something like from ufsda.newfile import genYAML, get_runtime_config

@RussTreadon-NOAA
Copy link
Contributor Author

Thank you @CoryMartin-NOAA . Let me try this.

@RussTreadon-NOAA
Copy link
Contributor Author

Automated execution of test_gdasapp_genYAML_run fails. Traceback reads (click here for all output)

6/8 Test #6: test_gdasapp_genYAML_run .............***Failed    1.77 sec
Traceback (most recent call last):
  File "/home/runner/work/GDASApp/GDASApp/GDASApp/ush/genYAML", line 12, in <module>
    from ufsda.genYAML import genYAML
  File "/home/runner/work/GDASApp/GDASApp/GDASApp/ush/ufsda/__init__.py", line 3, in <module>
    import ufsda.stage
  File "/home/runner/work/GDASApp/GDASApp/GDASApp/ush/ufsda/stage.py", line 32, in <module>
    import ioda_conv_engines as iconv
ModuleNotFoundError: No module named 'ioda_conv_engines'

Executing ctests from Orion command line results in 100% pass. In particular, test_gdasapp_genYAML_run passes.

(gdasapp) Orion-login-3:/work2/noaa/da/rtreadon/git/GDASApp/iss223/build$ ctest -R test_gdasapp
Test project /work2/noaa/da/rtreadon/git/GDASApp/iss223/build
      Start 1295: test_gdasapp_check_python_norms
 1/28 Test #1295: test_gdasapp_check_python_norms .........   Passed    2.26 sec
      Start 1296: test_gdasapp_check_yaml_keys
 2/28 Test #1296: test_gdasapp_check_yaml_keys ............   Passed    1.25 sec
      Start 1297: test_gdasapp_check_valid_yaml
 3/28 Test #1297: test_gdasapp_check_valid_yaml ...........   Passed    2.40 sec
      Start 1298: test_gdasapp_jedi_increment_to_fv3
 4/28 Test #1298: test_gdasapp_jedi_increment_to_fv3 ......   Passed   15.34 sec
      Start 1299: test_gdasapp_genYAML_prep
 5/28 Test #1299: test_gdasapp_genYAML_prep ...............   Passed    0.03 sec
      Start 1300: test_gdasapp_genYAML_run
 6/28 Test #1300: test_gdasapp_genYAML_run ................   Passed   38.65 sec
      Start 1301: test_gdasapp_convert_ewok_yaml
 7/28 Test #1301: test_gdasapp_convert_ewok_yaml ..........   Passed    0.21 sec
      Start 1302: test_gdasapp_convert_bufr_temp_dbuoy
 8/28 Test #1302: test_gdasapp_convert_bufr_temp_dbuoy ....   Passed    3.61 sec
      Start 1303: test_gdasapp_convert_bufr_salt_dbuoy
 9/28 Test #1303: test_gdasapp_convert_bufr_salt_dbuoy ....   Passed    0.23 sec
      Start 1304: test_gdasapp_convert_bufr_temp_mbuoyb
10/28 Test #1304: test_gdasapp_convert_bufr_temp_mbuoyb ...   Passed    0.24 sec
      Start 1305: test_gdasapp_convert_bufr_salt_mbuoyb
11/28 Test #1305: test_gdasapp_convert_bufr_salt_mbuoyb ...   Passed    0.23 sec
      Start 1306: test_gdasapp_convert_bufr_tesacprof
12/28 Test #1306: test_gdasapp_convert_bufr_tesacprof .....   Passed    0.25 sec
      Start 1307: test_gdasapp_convert_bufr_trkobprof
13/28 Test #1307: test_gdasapp_convert_bufr_trkobprof .....   Passed    0.23 sec
      Start 1308: test_gdasapp_convert_bufr_sfcships
14/28 Test #1308: test_gdasapp_convert_bufr_sfcships ......   Passed    0.25 sec
      Start 1309: test_gdasapp_convert_bufr_sfcshipsu
15/28 Test #1309: test_gdasapp_convert_bufr_sfcshipsu .....   Passed    0.24 sec
      Start 1310: test_gdasapp_soca_obsdb
16/28 Test #1310: test_gdasapp_soca_obsdb .................   Passed    1.41 sec
      Start 1311: test_gdasapp_soca_ana_prep
17/28 Test #1311: test_gdasapp_soca_ana_prep ..............   Passed    3.72 sec
      Start 1312: test_gdasapp_soca_ana_bmat
18/28 Test #1312: test_gdasapp_soca_ana_bmat ..............   Passed  124.62 sec
      Start 1313: test_gdasapp_soca_ana_run
19/28 Test #1313: test_gdasapp_soca_ana_run ...............   Passed   58.21 sec
      Start 1314: test_gdasapp_soca_ana_bmat_vrfy
20/28 Test #1314: test_gdasapp_soca_ana_bmat_vrfy .........   Passed   47.95 sec
      Start 1315: test_gdasapp_land_create_ens
21/28 Test #1315: test_gdasapp_land_create_ens ............   Passed    1.16 sec
      Start 1316: test_gdasapp_land_imsproc
22/28 Test #1316: test_gdasapp_land_imsproc ...............   Passed    9.87 sec
      Start 1317: test_gdasapp_land_apply_jediincr
23/28 Test #1317: test_gdasapp_land_apply_jediincr ........   Passed    4.29 sec
      Start 1318: test_gdasapp_land_letkfoi_snowda
24/28 Test #1318: test_gdasapp_land_letkfoi_snowda ........   Passed    9.05 sec
      Start 1319: test_gdasapp_convert_bufr_adpsfc
25/28 Test #1319: test_gdasapp_convert_bufr_adpsfc ........   Passed    9.05 sec
      Start 1320: test_gdasapp_convert_gsi_satbias
26/28 Test #1320: test_gdasapp_convert_gsi_satbias ........   Passed    4.34 sec
      Start 1321: test_gdasapp_store_gsi_satbias
27/28 Test #1321: test_gdasapp_store_gsi_satbias ..........   Passed    6.35 sec
      Start 1322: test_gdasapp_aero_gen_3dvar_yaml
28/28 Test #1322: test_gdasapp_aero_gen_3dvar_yaml ........   Passed    0.20 sec

100% tests passed, 0 tests failed out of 28

Total Test time (real) = 346.02 sec

Not sure what's going on.

@RussTreadon-NOAA
Copy link
Contributor Author

@CoryMartin-NOAA and @guillaumevernieres , is there a way to manually run the Unit Tests / Run Unit Tests with ctest (push)? I would like to debug and fix what's wrong without cluttering branch feature/runjedi with debug commits.

@CoryMartin-NOAA
Copy link
Contributor

@RussTreadon-NOAA this failure is because there is a dependence on IODA in that routine that @guillaumevernieres put in... Not immediately sure what the best solution here is though. And manually running it on Github actions? You can re-run on the same commit manually from the actions page.

@emcbot
Copy link

emcbot commented Jan 23, 2023

Automated Global-Workflow GDASApp Testing Results:
Machine: hera

Start: Mon Jan 23 14:34:18 UTC 2023 on hfe07
---------------------------------------------------
Build:                                 *SUCCESS*
Build: Completed at Mon Jan 23 15:30:54 UTC 2023
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Mon Jan 23 15:37:19 UTC 2023
Tests: 90% tests passed, 4 tests failed out of 40
	1323 - test_gdasapp_atm_jjob_var_prep (Failed)
	1324 - test_gdasapp_atm_jjob_var_run (Failed)
	1326 - test_gdasapp_atm_jjob_ens_prep (Failed)
	1327 - test_gdasapp_atm_jjob_ens_run (Failed)
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/278/global-workflow/sorc/gdas.cd/build/log.ctest

@emcbot emcbot added hera-GW-RT-Failed Automated testing with global-workflow failed on Hera and removed hera-GW-RT-Running Automated testing with global-workflow running on Hera labels Jan 23, 2023
@emcbot
Copy link

emcbot commented Jan 23, 2023

Automated Global-Workflow GDASApp Testing Results:
Machine: orion

Start: Mon Jan 23 08:32:39 CST 2023 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build:                                 *SUCCESS*
Build: Completed at Mon Jan 23 09:32:45 CST 2023
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Mon Jan 23 09:39:32 CST 2023
Tests: 90% tests passed, 4 tests failed out of 39
	1322 - test_gdasapp_atm_jjob_var_prep (Failed)
	1323 - test_gdasapp_atm_jjob_var_run (Failed)
	1325 - test_gdasapp_atm_jjob_ens_prep (Failed)
	1326 - test_gdasapp_atm_jjob_ens_run (Failed)
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/278/global-workflow/sorc/gdas.cd/build/log.ctest

@emcbot emcbot added orion-GW-RT-Failed Automated testing with global-workflow failed on Orion and removed orion-GW-RT-Running Automated testing with global-workflow running on Orion labels Jan 23, 2023
@CoryMartin-NOAA
Copy link
Contributor

@RussTreadon-NOAA I didn't clearly read the comments above earlier... do I need to checkout that specific branch of global workflow to run the CI or are we going to assume that it worked as you tested it manually? I'm happy to do it I just wanted to check that this is the correct thing to do.

@RussTreadon-NOAA
Copy link
Contributor Author

@RussTreadon-NOAA I didn't clearly read the comments above earlier... do I need to checkout that specific branch of global workflow to run the CI or are we going to assume that it worked as you tested it manually? I'm happy to do it I just wanted to check that this is the correct thing to do.

We need to use the updated scripts/exgdas_global_atmos_analysis_prep.py from my forked g-w branch feature/ufsda_stage to get GDASApp test_gdasapp_atm_jjob_var_prep and test_gdasapp_atm_jjob_ens_prep to pass. Once the prep steps pass the corresponding run steps will also pass.

Once this PR, #278, is approved & merged into GDASApp develop, I'll update the GDASApp hash in g-w Externals.cfg and sorc/checkout.sh in g-w PR #1265. Once g-w PR #1265 is merged into g-w develop, the GDASApp GW-RT tests should work. This is messy.

@CoryMartin-NOAA
Copy link
Contributor

Yeah I see it is a bit of a chicken and egg problem. The danger of having it use your forked branch is that I have to remember to change it back to develop :-). Do we think I should do that or run it manually and trust it will work?

@RussTreadon-NOAA
Copy link
Contributor Author

Yeah I see it is a bit of a chicken and egg problem. The danger of having it use your forked branch is that I have to remember to change it back to develop :-). Do we think I should do that or run it manually and trust it will work?

I really like to cross all the t's and dot all the i's, but doing so is probably overkill in this case. You, @CoryMartin-NOAA , have more important things to do than keep changing the g-w branch used by GW-RT.

@CoryMartin-NOAA
Copy link
Contributor

I agree, while it's important to be thorough, we are still in active development, and the danger of this breaking something seems low. We can always revert if that is not the case.

@RussTreadon-NOAA
Copy link
Contributor Author

As an aside, do we know if 3dhofx and 4dhofx works with run_jedi_exe.py? A quick test using feature/runjedi says no.

Do we want to maintain the 3dhofx and 4dhofx capability with run_jedi_exe.py moving forward? If yes, this can be the scope of a new issue.

@CoryMartin-NOAA
Copy link
Contributor

We will need this capability for JEDI evaluation when we are ready to use FV3 cube sphere backgrounds and not GSI provided interpolated values. However, I don't know if we will want it as part of run_jedi_exe or part of the j-job approach. I'm inclined to say the former, but depending on how the EnKF is structured, perhaps we will need to run H(x) as stand alone in the workflow?

@RussTreadon-NOAA
Copy link
Contributor Author

At present JGDAS_GLOBAL_ATMOS_ENSANAL_RUN runs the observer and solver in a single execution of fv3jedi_letkf.x. This was the easiest way to implement fv3jedi_letkf.x for prototype cycling. This approach is very slow. The next iteration should refactor g-w to mimic the operational GFS where we have eobs (observer) and eupd (solver).

Given this we should ensure run_jedi_exe.py can run fv3jedi_hofx.x, fv3jedi_letkf.x, and fv3jedi_var.x in standalone mode. Sometimes we want to work with executables and yamls without the g-w infrastructure.

@RussTreadon-NOAA
Copy link
Contributor Author

Complete the following:

  • sync GDASApp feature/runjedi and g-w feature/ufsda_stage with head of their respective develop
  • rebuild and install g-w with GDASApp workflow tests enabled
  • manually execute ctest -R test_gdasapp. All tests pass on Hera
100% tests passed, 0 tests failed out of 40

Total Test time (real) = 683.75 sec
(gdasapp) Hera(hfe09):/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/pr1265/sorc/gdas.cd/build$

and Orion

100% tests passed, 0 tests failed out of 40

Total Test time (real) = 925.02 sec
(gdasapp) Orion-login-3:/work2/noaa/da/rtreadon/git/global-workflow/pr1265/sorc/gdas.cd/build$
  • execute ush/run_jedi_exe.py for 3dvar. fv3jedi_var.x successfully runs to completion with the caveat that the number of observations types in parm/atm/obs/lists/gdas_prototype.yaml is reduced to allow the job to complete within the 30 minute debug queue wall clock limit.
    Hera
OOPS_STATS ---------------------------------- Parallel Timing Statistics (   6 MPI tasks) -----------------------------------

OOPS_STATS Run end                                  - Runtime:    401.87 sec,  Memory: total:    56.03 Gb, per task: min =     9.25 Gb, max =     9.43 Gb
Run: Finishing oops::Variational<FV3JEDI, UFO and IODA observations> with status = 0
OOPS Ending   2023-01-23 18:50:42 (UTC+0000)
(gdasapp) Hera(hfe09):/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/pr1265/sorc/gdas.cd/ush$

and Orion

OOPS_STATS ---------------------------------- Parallel Timing Statistics (   6 MPI tasks) -----------------------------------

OOPS_STATS Run end                                  - Runtime:    428.68 sec,  Memory: total:    55.39 Gb, per task: min =     9.08 Gb, max =     9.41 Gb
Run: Finishing oops::Variational<FV3JEDI, UFO and IODA observations> with status = 0
OOPS Ending   2023-01-23 19:07:38 (UTC+0000)
(gdasapp) Orion-login-3:/work2/noaa/da/rtreadon/git/global-workflow/pr1265/sorc/gdas.cd/ush$

Given the above change this PR from draft to active.

Copy link
Contributor

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything here looks good to me, thanks @RussTreadon-NOAA !

@RussTreadon-NOAA
Copy link
Contributor Author

@guillaumevernieres , would you like to take another look at this PR or are you OK with your previous approval?

I'd like to merge this PR into GDASApp develop today (1/24) so I can update the GDASApp hash in g-w PR #1265 and get that PR into g-w develop.

@guillaumevernieres
Copy link
Contributor

@guillaumevernieres , would you like to take another look at this PR or are you OK with your previous approval?

I'd like to merge this PR into GDASApp develop today (1/24) so I can update the GDASApp hash in g-w PR #1265 and get that PR into g-w develop.

All good @RussTreadon-NOAA ! Sorry for holding up the merge.

Copy link
Contributor

@guillaumevernieres guillaumevernieres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@guillaumevernieres guillaumevernieres merged commit f7c23af into develop Jan 24, 2023
@RussTreadon-NOAA
Copy link
Contributor Author

Thank you @guillaumevernieres!

@RussTreadon-NOAA RussTreadon-NOAA deleted the feature/runjedi branch January 24, 2023 14:00
WalterKolczynski-NOAA pushed a commit to NOAA-EMC/global-workflow that referenced this pull request Jan 25, 2023
…pdate (#1265)

Updates the GDAS App version to incorporate the changes of NOAA-EMC/GDASApp#278, which restores the ability of `run_jedi_exe.py` to execute UFS-DA applications, specifically `fv3jedi_var.x`. Included in NOAA-EMC/GDASApp#278 is the removal of entry `ufsda.stage` from `ush/ufsda/__init__.py`.  Scripts which simply `import ufsda` must now specify the functions to from `ufsda.stage`. 

g-w issue #1262 documents the addition of the `ufsda.stage` line to `scripts/exgdas_global_atmos_analysis_prep.py`.  This change is required for g-w to successfully stage files used by the var and ensemble UFS-DA applications.

Fixes #1262
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hera-GW-RT-Failed Automated testing with global-workflow failed on Hera hera-RT-Passed Automated testing successful on Hera orion-GW-RT-Failed Automated testing with global-workflow failed on Orion orion-RT-Passed Automated testing successful on Orion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants