You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I couldn't come up with a great title for this, but comm/compute overlap in
ugni (calling chpl_task_yield while waiting for comm) increases the lifetime of
tasks and limits the number of short-lived tasks we can create. This can easily
lead to OOM situations like in #11820. Creating short-lived tasks ends up
happening for things like reductions, which has caused us to lower our stack
size in most of our scalability tests.
The following program will OOM on 2 locales under ugni. It just creates 10,000
no-op tasks on locale 0, but we do a yield while decrementing the end count and
the task will go to the end of the queue, and new tasks will run. In the worst
case we have to cycle through all 10,000 tasks before we can retire the first
one (and with 8MB stacks that 80GB of task stacks.)
use Time;
configconst trials =10_000;
var t: Timer; t.start();
var a:atomicint;
coforall1..trials doonLocales[numLocales-1]do { };
writeln((a.read(), t.elapsed()));
For short-lived (or at least minimal comm tasks) that don't have any user
induced yielding I think we should be able to create an arbitrary amount of
tasks without OOM.
The text was updated successfully, but these errors were encountered:
Avoid comm/compute overlap for short-lived tasks
[reviewed by @gbtitus]
Avoid comm/compute overlap for short-lived (minimal comm) tasks.
Comm/compute overlap can artificially increase the lifetime of tasks
because we yield, put a task at the end of the queue, and will have to
cycle through all currently running/queued tasks before getting back to
it. This can lead to OOM situations when you have a lot of short-lived
tasks with no user-induced yielding. This can occur pretty easily today
because things like reductions will create numLocales tasks on locale 0.
These tasks are short-lived and don't have any user-induced yields, so
the OOM is pretty extreme/unacceptable behavior.
Here we eliminate comm/compute overlap for unregistered puts/gets and
fast-ons in gasnet. We avoid doing comm/compute overlap for FMA (short
gets/puts and AMOs) under ugni unless a task has issues at least 100 FMA
operations. This allows us to get comm/compute overlap in cases where it
really matters for performance (many FMA request and large BTE comm)
without increasing the lifetime of short-lived tasks. We use
task-local-storage to track the number of FMA requests issued. This is
fast (~half the cost of an atomic operation) so the overhead is
negligible and still allows us to get good performance for
oversubscribed RA-atomics.
This isn't a perfect solution and primarily avoids task yields for
endcount manipulation, but I think this is better than what we had
before. A better solution is probably to decrease the size of our task
stacks (and/or avoid registering task stacks and keep them on
non-hugepages so we could limit the amount of physical memory used for
them.)
This should allow us to remove all our execenvs that reduce the task
stack size for multi-locale and scalability testing.
Closes#12874
I couldn't come up with a great title for this, but comm/compute overlap in
ugni (calling chpl_task_yield while waiting for comm) increases the lifetime of
tasks and limits the number of short-lived tasks we can create. This can easily
lead to OOM situations like in #11820. Creating short-lived tasks ends up
happening for things like reductions, which has caused us to lower our stack
size in most of our scalability tests.
The following program will OOM on 2 locales under ugni. It just creates 10,000
no-op tasks on locale 0, but we do a yield while decrementing the end count and
the task will go to the end of the queue, and new tasks will run. In the worst
case we have to cycle through all 10,000 tasks before we can retire the first
one (and with 8MB stacks that 80GB of task stacks.)
For short-lived (or at least minimal comm tasks) that don't have any user
induced yielding I think we should be able to create an arbitrary amount of
tasks without OOM.
The text was updated successfully, but these errors were encountered: