Shuffle tasks on hosts with overutilized memory resources #1937

baconmania · 2019-05-08T23:35:05Z

No description provided.

…emory.

… worst off.

ssalinas

Overall strategy of 'fix the worst one first' seems fine to me for memory vs cpu. Left one comment about breaking out of the loop early if we think the current removed set of tasks should satisfy things. Other than that, my only other thought is if 'most overused' is still the best metric for sorting tasks to shuffle. We've seen some cases where a task using 10% of its cpu still gets shuffled, which seems odd. Though I'm not sure fo a good way around it either

ssalinas · 2019-05-20T14:31:37Z

SingularityService/src/main/java/com/hubspot/singularity/config/SingularityConfiguration.java

-  private boolean shuffleTasksForOverloadedSlaves = false; // recommended 'true' when oversubscribing cpu for larger clusters
+  private boolean shuffleTasksForOverloadedSlaves = false; // recommended 'true' when oversubscribing resources for larger clusters
+
+  private double shuffleTasksWhenSlaveMemoryUtilizationPercentageExceeds = 0.82;


was there any math behind this value?

Sort of an arbitrary rule of thumb. Assuming we'd like to set a default "target" memory utilization of 85%, setting this config at 82% should give us enough time to shuffle tasks before actually hitting our target.

ssalinas · 2019-05-20T14:36:25Z

SingularityService/src/main/java/com/hubspot/singularity/scheduler/SingularityUsagePoller.java


      for (TaskIdWithUsage taskIdWithUsage : possibleTasksToShuffle) {
        if (requestsWithShuffledTasks.contains(taskIdWithUsage.getTaskId().getRequestId())) {
          LOG.debug("Request {} already has a shuffling task, skipping", taskIdWithUsage.getTaskId().getRequestId());
          continue;
        }
-        if (cpuOverage <= 0 || shuffledTasksOnSlave > configuration.getMaxTasksToShufflePerHost() || currentShuffleCleanupsTotal >= configuration.getMaxTasksToShuffleTotal()) {
-          LOG.debug("Not shuffling any more tasks (overage: {}, shuffledOnHost: {}, totalShuffleCleanups: {})", cpuOverage, shuffledTasksOnSlave, currentShuffleCleanupsTotal);
+        if ((mostOverusedResource.overusage <= 0) || shuffledTasksOnSlave > configuration.getMaxTasksToShufflePerHost() || currentShuffleCleanupsTotal >= configuration.getMaxTasksToShuffleTotal()) {


cpuOverage <= 0 worked as a condition for 'should this put us back in green?' because we updated it at the end of the loop. I don't currently see mostOverusedResource.overusage getting updated anywhere

ssalinas · 2019-05-22T15:00:21Z

🚢

ssalinas · 2019-05-22T16:24:49Z

🚢

baconmania added 9 commits May 8, 2019 19:29

Add new cleanup type.

f41e824

Add configs for memory shuffling.

cb9fc14

Also collect a list of mem-overloaded slaves.

82644f7

Account for the new cleanup type here.

3f6cd24

Handle the case where the same hosts are overloaded on both CPU and m…

e8be955

…emory.

Let's not do a separate config for this.

d86a44f

Minor cleanup.

d21b737

Compute overusage of CPU & mem, and shuffle based on whichever is the…

589ff01

… worst off.

Add test.

4ab8abd

baconmania changed the title ~~(WIP) Shuffle tasks on hosts with overutilized memory resources~~ Shuffle tasks on hosts with overutilized memory resources May 17, 2019

baconmania added 4 commits May 17, 2019 14:12

Adjust fixtures for other tests.

4110caf

Update comment.

9aa722b

Add dep.

aacb193

Reduce the default target mem utilization.

6e25b84

baconmania force-pushed the shuffle-tasks-for-memory branch from 9c0b981 to 6e25b84 Compare May 17, 2019 18:59

ssalinas reviewed May 20, 2019

View reviewed changes

baconmania added the hs_staging label May 20, 2019

baconmania added 5 commits May 21, 2019 12:33

Bail out when we're no longer in the red.

f915de8

Use consistent units.

c68b2b0

Logging.

de59441

Use the cumulative values here.

2534c71

Don't need this.

f842b1d

baconmania added the hs_qa label May 21, 2019

Cast this to a long as well to get a consistent log statement.

56ab734

baconmania added 2 commits May 22, 2019 11:58

Minor cleanup.

87a2de1

More clarity.

428587d

baconmania added the hs_stable label May 22, 2019

ssalinas mentioned this pull request May 22, 2019

Add ability to disable task shuffle from UI #1941

Merged

ssalinas merged commit c832f48 into master May 22, 2019

ssalinas deleted the shuffle-tasks-for-memory branch May 22, 2019 18:52

ssalinas added this to the 0.23.0 milestone Jun 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shuffle tasks on hosts with overutilized memory resources #1937

Shuffle tasks on hosts with overutilized memory resources #1937

baconmania commented May 8, 2019

ssalinas left a comment

ssalinas May 20, 2019

baconmania May 20, 2019

ssalinas May 20, 2019

ssalinas commented May 22, 2019

ssalinas commented May 22, 2019

Shuffle tasks on hosts with overutilized memory resources #1937

Shuffle tasks on hosts with overutilized memory resources #1937

Conversation

baconmania commented May 8, 2019

ssalinas left a comment

Choose a reason for hiding this comment

ssalinas May 20, 2019

Choose a reason for hiding this comment

baconmania May 20, 2019

Choose a reason for hiding this comment

ssalinas May 20, 2019

Choose a reason for hiding this comment

ssalinas commented May 22, 2019

ssalinas commented May 22, 2019