-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update hashes for select JEDI submodules #1298
Comments
@danholdaway , @emilyhcliu , @ADCollard , @CoryMartin-NOAA , @guillaumevernieres , @DavidNew-NOAA : When's a good time to update the submodule hashes in GDASApp I did so in a working copy of
Each of these tests needed an updated reference file in Shall we update submodule hashes now or wait until we update to |
How much do the references need to change by? It's unfortunate that we cannot know what changes in JEDI causes the references to change. Perhaps we need a nightly mechanism to track things like that closer to real time. |
|
That seems like a very large difference for the sondes h(x). Perhaps it's OK once we understand the nature of the changes. But when we update the hashes rarely it would be hard to know what JEDI changes occurred to cause the change. Some kind of nightly mechanism might be what we need to aim for. |
|
Comparison of the GDASApp
whereas after we update JEDI hashes we have
Only the first three The number of nobs from |
Attempts to manually execute
The queue wait time for batch jobs is hours! |
test_gdasapp_WCDA-3DVAR-C48mx500 Need to consider turning off
Another issue with
Line 17 of
Got around this by making the following modification to
|
I don't know why they would time out more than the other tests @RussTreadon-NOAA , most of these jobs only request 1 node. |
@guillaumevernieres : Recently queue wait times have been extremely long on Hera. The queue is swamped. As of 8:56 pm, 9/27/2024, 1322 jobs are queued. 371 are running. 951 are pending. Even a 1 node job will sit in the batch queue a long time. |
I understand that @RussTreadon-NOAA , but why isolate the wcda tests? Aren't the other batch jobs also staying in the queue for ever? |
@guillaumevernieres , not sure why WCDA jobs sit in the queue longer. We should compare how they are submitted with the submission of other test_gdasapp batch jobs. Yes, using the debug queue will certainly improve turn around. If we have multiple instances of the ctests running this will be problematic. debug only allows jobs in the queue per user at any given time. Hercules, Orion, and WCOSS2 queue wait times are generally shorter but this will vary based on system load. On Hera, Hercules, and Orion I think queue wait time is also a function of how quickly we are consuming our monthly core hour allocation. |
@RussTreadon-NOAA , whatever the g-w does to submit jobs is what these tests are using. What I should do is add dependencies to the ctests to stop trying to run the entire sequence if one wcda ctest fails. |
The current approach of using Throughput is much better when I switch the
Work is needed in GDASApp to toggle to |
For the sake of completeness, here's the
For the full ctest listing view |
Work for this issue will be done in feature/update_hashes |
Check |
test_gdasapp_WCDA-3DVAR-C48mx500 failures After updating jedi hashes using
A check of
Looking at yamls in
has been replaced by
This change was made in the following files
After this change job gdasmarinebmat failed in a different place
Rather than keep guessing as to what to change and where to change, I'm reaching out to @guillaumevernieres . Do you know where / what to change given the above error message from gdasmarinebmat ? My working copy of GDASApp uses soca at 4d7ef21 and saber at 1ca1596 test_gdasapp_atm_jjob failures
The
After this all
|
Thanks for testing @RussTreadon-NOAA , I can take care of the soca update. |
Thank you @guillaumevernieres for your help. Feel free to push commits to feature/update_hashes |
I'm planning to spend the morning on this @RussTreadon-NOAA . |
Thank you @guillaumevernieres ! |
It took "a bit" longer than expected ... Oops. I have 2 PR's for you to review @RussTreadon-NOAA , one in |
feature/update_hashes at 3a4616e along with changes to
from @guillaumevernieres g-w feature/update_hashes allow all test_gdasapp ctests to pass
|
@danholdaway , @CoryMartin-NOAA , @emilyhcliu , @ADCollard , @DavidNew-NOAA , @guillaumevernieres : are we OK with the fact that updating the hashes requires updates to
If yes, I'll open a PR to get If no, what's the next step? |
I'm OK with it and I don't see that we have much choice. Luckily we're still early in the Atm development so our science work will catch issues. Moving forward we should find a way to monitor what changing the hashes means for our tests more frequently to we can track the changes against changes to JCSDA code. |
10/4/2024 update Rerun
Build GDASApp with updated modules on Hercules. Rerun ctests inside g-w. All tests pass
Given this, commit updated jedi hashes to feature/update_hashes. Done at f9ea5e4. |
Many of the JEDI hashes referenced in
sorc
are two or more months old. This is OK for some submodules since they do not frequently change (e.g, icepack @ 73136ee). Other submodules change more frequently. For example, oops @ e6485c0 is 18 commits behind the current head ofoops develop
.Script
ush/submodules/update_develop.sh
updates the GDASApp hash for the following JEDI reposto the current head of their respective
develop
.This issue is opened to document the updating of the above GDASApp submodules using
update_develop.sh
The text was updated successfully, but these errors were encountered: