-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust prep job resources on Orion due to recent system updates #463
Comments
If this is due to the recent changes, throwing more nodes at it is a waste of resources. Need to increase the memory. |
Walter,
Is there a way to do that in the model or experiment configuration? I'm
encountering the same problem after a few successful cycles.
Andy
…On Thu, Oct 14, 2021 at 3:57 PM Walter Kolczynski - NOAA < ***@***.***> wrote:
If this is due to the recent changes, throwing more nodes at it is a waste
of resources. Need to increase the memory.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#463 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOBXXGJRVBZ2FPOPWJTZE63UG4Y4BANCNFSM5GAJ6IXA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Andrew Eichmann
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Court, Office #2875
College Park, MD 20740 USA
***@***.***
cell (for the duration): 401-477-2702
office: 301-683-0506 <%2B1%20301%20683%203501>
|
After you generate the workflow, add a tag specification (see the rocoto documentation) to the task specification for any task that is having problems. That should solve the problem, but it hasn't been tested yet. |
Hi Walter,
I'm guessing you mean adding an entry along the lines of MEMORY_PREP_GDAS
to the xml file and specifying a larger-than-default memory allocation? The
current default is 4608 Mb - do you know what it was before?
Thanks,
Andy
On Fri, Oct 15, 2021 at 3:39 PM Walter Kolczynski - NOAA <
***@***.***> wrote:
… Walter, Is there a way to do that in the model or experiment
configuration? I'm encountering the same problem after a few successful
cycles. Andy
… <#m_7241905943026813825_>
On Thu, Oct 14, 2021 at 3:57 PM Walter Kolczynski - NOAA < *@*.
*> wrote: If this is due to the recent changes, throwing more nodes at it
is a waste of resources. Need to increase the memory. — You are receiving
this because you were mentioned. Reply to this email directly, view it on
GitHub <#463 (comment)
<#463 (comment)>>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AOBXXGJRVBZ2FPOPWJTZE63UG4Y4BANCNFSM5GAJ6IXA
<https://github.com/notifications/unsubscribe-auth/AOBXXGJRVBZ2FPOPWJTZE63UG4Y4BANCNFSM5GAJ6IXA>
. Triage notifications on the go with GitHub Mobile for iOS
https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
-- Andrew Eichmann IMSG at NOAA/NWS/NCEP/EMC 5830 University Research
Court, Office #2875 College Park, MD 20740 USA @.* cell (for the
duration): 401-477-2702 office: 301-683-0506 <%2B1%20301%20683%203501>
After you generate the workflow, add a tag specification (see the rocoto
documentation
<http://christopherwharrop.github.io/rocoto/#:~:text=10%3C/walltime%3E%0A%0A%20%20%3C/task%20%3E-,The%20%3Cmemory%3E%20tag,-The%20%3Cmemory%3E%20tag>)
to the task specification for any task that is having problems. That
*should* solve the problem, but it hasn't been tested yet.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#463 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOBXXGO4BNMV27B6TIG4IU3UHB7NDANCNFSM5GAJ6IXA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Andrew Eichmann
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Court, Office #2875
College Park, MD 20740 USA
***@***.***
cell (for the duration): 401-477-2702
office: 301-683-0506 <%2B1%20301%20683%203501>
|
Yes, although the important part is where it is actually added to the task description with the The 4608 MB is the default per-core, but the setting in the XML will be for the total memory. Formerly, Orion was setting the limit as the total memory for the node (192 GB/node). You can go back to 2 nodes and ask for 384 GB, which if I understand things correctly should be the same as before. |
As you indicated I added <memory>&MEMORY_PREP_GDAS;</memory> to the "task"
section.
I tried specifying 384G and 192G, which both got rejected, and then 96G,
which completed in a couple minutes. Thank you!
…On Mon, Oct 18, 2021 at 1:05 PM Walter Kolczynski - NOAA < ***@***.***> wrote:
Yes, although the important part is where it is actually added to the task
description with the <memory> tags.
The 4608 MB is the default per-core, but the setting in the XML will be
for the total memory. Formerly, Orion was setting the limit as the total
memory for the node (192 GB/node). You can go back to 2 nodes and ask for
384 GB, which if I understand things correctly should be the same as before.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#463 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOBXXGMQRWXUXD6OKGDQI3TUHRHTNANCNFSM5GAJ6IXA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Andrew Eichmann
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Court, Office #2875
College Park, MD 20740 USA
***@***.***
cell (for the duration): 401-477-2702
office: 301-683-0506 <%2B1%20301%20683%203501>
|
This was fixed as part of the mega-PR #500 |
* Improve cloud fraction when using Thompson MP. See NCAR/ccpp-physics#809 for more details.
@AndrewEichmann-NOAA reported that his prep job on Orion was failing with an oom-kill. See message below. This is likely related to recent updates on Orion related to memory. Andy reran the job with increased nodes (8 instead of 2 so 4x) and it ran without error:
in config.resources set
export npe_$step=16
:in xml you get 8 nodes now instead of 2:
Should adjust prep job resources on Orion but possibly leave other platforms as is for now so additional nodes aren't wasted elsewhere. Future reworks of resource assignments will likely consider memory and thus handle this better.
The text was updated successfully, but these errors were encountered: