Agent task spawning performance #341

mturilli · 2014-09-10T13:42:23Z

Evidence for the need to optimize task time startup:

The diagram shows 128 short tasks run by one out of 2 available pilots. Note that the pilot never reaches full utilization. A possible interpretation is: "the pilot manages to start ~40 CUs, then the first one seem to die off and the pilot is not keeping up with unit spawning". If confirmed, this is evidence for the need to optimize the time of task spawning within the agent.

marksantcroos · 2014-09-10T13:47:51Z

Thanks! This looks similar to the data Andre presented, right?

Anyway, full-utilisation is not an absolute goal, we need to get a better intuition of what we would consider acceptable performance for the application workloads that we want to support.

Startup time can be improved, I think that is the general consensus.

I guess we can use this ticket to discuss some possible improvements, but as said, I would also like to collect reasonable expectations from an application perspective.

On 10 Sep 2014, at 9:42 , mturilli notifications@github.com wrote:

Evidence for the need to optimize task time startup:

The diagram shows 128 short tasks run by one out of 2 available pilots. Note that the pilot never reaches full utilization. A possible interpretation is: "the pilot manages to start ~40 CUs, then the first one seem to die off and the pilot is not keeping up with unit spawning". If confirmed, this is evidence for the need to optimize the time of task spawning within the agent.

—
Reply to this email directly or view it on GitHub.

andre-merzky · 2014-09-10T13:49:22Z

Mark, agree with all. The purpose of the ticket is indeed to collect data, and to discuss the target line...

Thanks!

mturilli · 2014-09-10T13:51:21Z

Agreed on all.

Shall we keep a list somewhere of the user communities and type/sample of workloads that we already/need to support? I believe having something we could run ourselves would be very valuable for testing and profiling.

mturilli · 2014-09-11T02:48:50Z

We had an interesting conversation about this ticket with Shantenu. Here a brief summary of the topic and ideas discussed:

The reference use cases should be the 4 or 5 patterns and related real-life workloads implemented in the radical.ensemblemd project.
RP seems to be getting mature enough to be tested by running the workloads served by radical.ensemblemd.
The goal of this run would be:
- qualitative: the characteristics of the real-life workload are kept while reducing only its scale. The idea is: we run a smaller version of the real deal, not a simplified version of the real deal at a small scale.
- qualitative: we run the real-life workloads at the maximum scale we can with the current release. In this way we measure how distant we are from the scalability end goals give to the radical.pilot project.
We use the data collected by running these tests to inform the decisions on what optimization to prioritize in the development roadmap of RP.

andre-merzky · 2014-09-11T10:28:46Z

I certainly agree with the above. Can we (as in RP) ask/task you (as in MD-folx) to take care of this, i.e. to gather those use cases and to derive explicit qualitative and quantitative RP requirements?

andre-merzky · 2014-09-20T12:22:30Z

ping to team MD to reply.

andre-merzky · 2015-04-21T08:42:07Z

ping to team MD to reply.

Otherwise the ticket can probably go, as the pilot start time (the specific ticket topic) has mostly been taken care of, and performance / utilization is sufficiently addressed / discussed in other contexts.

andre-merzky · 2015-05-05T15:06:57Z

ping to team MD to reply.

This is the final call before the ticket is closed. We'll raise the topic during the next pilot call, too.

andre-merzky · 2015-09-15T09:37:51Z

obsolete and superseded...

mturilli added the type:performance label Sep 10, 2014

andre-merzky added this to the MS-X milestone Sep 10, 2014

andre-merzky self-assigned this Sep 10, 2014

andre-merzky modified the milestones: MS-X, MS.9 Oct 2, 2014

andre-merzky closed this as completed Sep 15, 2015

andre-merzky added the topic:performance label Mar 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent task spawning performance #341

Agent task spawning performance #341

mturilli commented Sep 10, 2014

marksantcroos commented Sep 10, 2014

andre-merzky commented Sep 10, 2014

mturilli commented Sep 10, 2014

mturilli commented Sep 11, 2014

andre-merzky commented Sep 11, 2014

andre-merzky commented Sep 20, 2014

andre-merzky commented Apr 21, 2015

andre-merzky commented May 5, 2015

andre-merzky commented Sep 15, 2015

Agent task spawning performance #341

Agent task spawning performance #341

Comments

mturilli commented Sep 10, 2014

marksantcroos commented Sep 10, 2014

andre-merzky commented Sep 10, 2014

mturilli commented Sep 10, 2014

mturilli commented Sep 11, 2014

andre-merzky commented Sep 11, 2014

andre-merzky commented Sep 20, 2014

andre-merzky commented Apr 21, 2015

andre-merzky commented May 5, 2015

andre-merzky commented Sep 15, 2015