Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed excessive evaporation when both innerloop and mraerosol=T #2221

Merged
merged 31 commits into from
May 2, 2024

Conversation

AnningCheng-NOAA
Copy link
Contributor

@AnningCheng-NOAA AnningCheng-NOAA commented Apr 1, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

NWFA Induced evaporation is turn off , but evaporation not related to aerosol is turned on to prevent excessive evaporation when Inner loop and mraerosol=T

Commit Message:

NWFA Induced evaporation is turn off , but evaporation not related to aerosol is turned on to prevent excessive evaporation when Inner loop and mraerosol=T

Priority:

  • Normal

Git Tracking

Sub component Pull Requests:


Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Updates/Changes Baselines.

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

RegressionTests_hera.log

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@grantfirl
Copy link
Collaborator

@AnningCheng-NOAA I'm running UFS RTs on Hera right now and will upload the test_changes.list file when finished. I'm not sure what happened to the PR template when you edited it, but it looks different than other PRs.

@AnningCheng-NOAA
Copy link
Contributor Author

AnningCheng-NOAA commented Apr 25, 2024 via email

@grantfirl
Copy link
Collaborator

No, this PR: #2221

The description format looks like it is messed up somehow. Also, I fixed .gitmodules in your ufs-weather-model and fv3atm branches (you had that the branches were called innl when they should have been mr2_innl), and the fv3atm PR was pointing to old commits of upp and atmos_cubed_sphere. I think that it should be fixed now.

@AnningCheng-NOAA
Copy link
Contributor Author

@grantfirl, I have just updated the PR description, a little bit better.

@zach1221
Copy link
Collaborator

@grantfirl @AnningCheng-NOAA can you please sync up the branch for the PR?

@BrianCurtis-NOAA
Copy link
Collaborator

@grantfirl @AnningCheng-NOAA It was a bit hard to follow what ccpp-framework PR was going into this, but I think i figured it out and linked the correct on in the PR description. Please double check that. What I have not done yet was to make sure that the branch for ccpp-framework PR matched that of the one in the fv3atm PR. Could you please double check that as well?

Also make sure you keep your test_changes.list when bumping the branch/fixing conflicts.

@grantfirl
Copy link
Collaborator

@BrianCurtis-NOAA @AnningCheng-NOAA I can get to this around 4ET today. Please stand by.

@grantfirl
Copy link
Collaborator

@grantfirl @AnningCheng-NOAA It was a bit hard to follow what ccpp-framework PR was going into this, but I think i figured it out and linked the correct on in the PR description. Please double check that. What I have not done yet was to make sure that the branch for ccpp-framework PR matched that of the one in the fv3atm PR. Could you please double check that as well?

Also make sure you keep your test_changes.list when bumping the branch/fixing conflicts.

Sorry about the CCPP Framework PR issue. Since this is Anning's PR, I can't edit the description, but you figured out the correct one: NCAR/ccpp-framework#555

@zach1221 zach1221 added Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. labels Apr 30, 2024
@zach1221 zach1221 added hercules-RT Run Hera regression testing orion-RT derecho-RT Run regression tests on Derecho and removed hercules-RT Run Hera regression testing orion-RT derecho-RT Run regression tests on Derecho labels May 1, 2024
@zach1221
Copy link
Collaborator

zach1221 commented May 1, 2024

Orion is hanging for me, and I'm unable to run any jobs since last night. I've reached out to MSU to see if there is broader issue.

@FernandoAndrade-NOAA
Copy link
Collaborator

There seems to be an ecflow connection issue on Jet. Some tests make it through before connection errors begin to occur. I'm messaging Jet admins to get a support ticket opened and looked into. The RT log will have some of the tests that made it through but it will end without the rest of the tests or the test summary.

@BrianCurtis-NOAA
Copy link
Collaborator

There seems to be an ecflow connection issue on Jet. Some tests make it through before connection errors begin to occur. I'm messaging Jet admins to get a support ticket opened and looked into. The RT log will have some of the tests that made it through but it will end without the rest of the tests or the test summary.

I have not seen this on WCOSS2/Acorn, so it would lean towards a Jet specific issue. We've made some small ecflow changes necessary to Acorn/WCOSS2 that will be brought in when those RT's complete, but if you wanted to see if those changes help/hinder things (i don't think they will be impactful though) then you can git pull using https://github.com/BrianCurtis-NOAA/ufs-weather-model/tree/ecflow_fixes

@FernandoAndrade-NOAA
Copy link
Collaborator

There seems to be an ecflow connection issue on Jet. Some tests make it through before connection errors begin to occur. I'm messaging Jet admins to get a support ticket opened and looked into. The RT log will have some of the tests that made it through but it will end without the rest of the tests or the test summary.

I have not seen this on WCOSS2/Acorn, so it would lean towards a Jet specific issue. We've made some small ecflow changes necessary to Acorn/WCOSS2 that will be brought in when those RT's complete, but if you wanted to see if those changes help/hinder things (i don't think they will be impactful though) then you can git pull using https://github.com/BrianCurtis-NOAA/ufs-weather-model/tree/ecflow_fixes

I think this is likely a jet specific issue as well, I didn't run into these connection errors with Hera and Gaea. Typically I only see these errors at the beginning when the script needs to start up ecflow, but these are showing up in the middle of testing:

   3994 ECFLOW Tasks Remaining: 159/192
   3995 [08:08:04 1.5.2024] ClientInvoker: Connection error: (Client::check_deadline: timed out after 60 seconds for request( --get_state=/regtest
   3995 _2304630 ) on ecflow1:22548)
   3996 ECFLOW Tasks Remaining: 159/192
   3997 ECFLOW Tasks Remaining: 159/192

And on the final occurrence, the remaining tasks seem to just be cancelled:

   4764 ECFLOW Tasks Remaining: 15/192
   4765 [10:20:10 1.5.2024] ClientInvoker: Connection error: (Client::check_deadline: timed out after 60 seconds for request( --get_state=/regtest
   4765 _2304630 ) on ecflow1:22548)
   4766 [10:21:20 1.5.2024] ClientInvoker: Connection error: (Client::check_deadline: timed out after 60 seconds for request( --get_state=/regtest   4766 _2304630 ) on ecflow1:22548)
   4767 [10:21:20 1.5.2024] Request( --get_state=/regtest_2304630 ), Failed to connect to ecflow1:22548. After 2 attempts. Is the server running ?
   4768 ClientEnvironment:
   4769 [10:21:20 1.5.2024] Ecflow version(5.11.4) boost(1.83.0) compiler(gcc 13.2.0) protocol(JSON cereal 1.3.0) openssl(enabled) Compiled on Feb   4769  21 2024 15:55:17
   4770    ECF_HOST/ECF_PORT : host_vec_index_ = 0 host_vec_.size() = 1
   4771    ecflow1:22548
   4772    ECF_NAME =
   4773    ECF_PASS =
   4774    ECF_RID =
   4775    ECF_TRYNO = 1
   4776    ECF_HOSTFILE = /apps/ecflow/5.11.4/share/ecflow/etc/hostfile
   4777    ECF_TIMEOUT = 86400
   4778    ECF_ZOMBIE_TIMEOUT = 43200
   4779    ECF_CONNECT_TIMEOUT = 0
   4780    ECF_DENIED = 0
   4781    NO_ECF = 0
   4782    ECF_DEBUG_CLIENT = 0
   4783
   4784
   4785 ECFLOW Tasks Remaining: 0/192
   4786 rt.sh: Generating Regression Testing Log...

It looks like the sudden log cutoff is from being unable to find the results for the leftover tasks:

10293 grep: /mnt/lfs4/HFIP/h-nems/Fernando.Andrade-maldonado/regression-testing/wm/2221/ufs-weather-model/tests/logs/log_jet/rt_rap_control_inte
  10293 l.log: No such file or directory
  10294 + GETMEMFROMLOG=
  10295 + echo 'rt.sh finished'
  10296 rt.sh finished
  10297 + cleanup
  10298 + echo 'rt.sh: Cleaning up...'
  10299 rt.sh: Cleaning up...

@FernandoAndrade-NOAA
Copy link
Collaborator

I'll retry with rocoto while I wait for a reply from the Jet admins.

@zach1221
Copy link
Collaborator

zach1221 commented May 1, 2024

Looks like orion is available again, I was able to submit jobs finally.

@zach1221
Copy link
Collaborator

zach1221 commented May 1, 2024

Testing complete. We can move to merge the ccpp physics and framework sub-prs

@grantfirl
Copy link
Collaborator

@jkbk2004 @zach1221 FV3 submodule updated and .gitmodules reverted.

@jkbk2004 jkbk2004 merged commit 26cb9e6 into ufs-community:develop May 2, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants