resource usage endpoint #1570

matush-v · 2017-06-20T15:49:39Z

add some unit tests

Added endpoint to get resource usage data for each tracked request and the cluster as a whole.
Increased the number of points collected form 5 -> 15 and increased the interval between each point (poll frequency x interval) so we have a wider span of data

ssalinas · 2017-06-22T13:28:58Z

SingularityClient/src/main/java/com/hubspot/singularity/client/SingularityClient.java

@@ -562,6 +566,26 @@ public SingularityState getState(Optional<Boolean> skipCache, Optional<Boolean>
    return Optional.of(response.getAs(SingularityTaskReconciliationStatistics.class));
  }

+  public SingularityClusterUtilization getClusterUtilization() {


I would follow the format of the other get calls here. You'll see there is a getSingle method where the http call building and deserialization is done for you

ssalinas · 2017-06-22T13:33:34Z

SingularityService/src/main/java/com/hubspot/singularity/scheduler/SingularityUsagePoller.java

+  @VisibleForTesting
+  void clearOldUsage(List<SingularityTaskUsage> taskUsages, String taskId) {
+    if (taskUsages.size() + 1 > configuration.getNumUsageToKeep()) {
+      long minMillisApart = configuration.getUsageIntervalMultiplier() * configuration.getCheckUsageEveryMillis();


not sure of the purpose for this one, are you trying to make sure to keep a certain time period of data points rather than a count?

Yeah, the goal here is to increase the interval between each point (poll frequency * interval) so we have a wider span of data. I thought that would be more telling of usage since a task could be in a bad state (fail, be modified, etc.) for a few consecutive runs. The interval defaults to 3 so that's essentially 45 minutes of data (15 points 3 min apart) rather than 15 minutes (15 points 1 min apart)

ssalinas · 2017-06-22T13:37:26Z

SingularityService/src/main/java/com/hubspot/singularity/scheduler/SingularityUsagePoller.java

+          includeUtilization = false;
+        }
+
+        if (unusedMemBytes / memoryBytesReserved >= minUnderUtilizedPct) {


is there a downside to return all the data values instead of having mins that have to be hit? I feel like we'd want this to be a full report

I added this min percentage to give us a goal of where we want our utilization to be. It's defaulted to 5% so that means any task using less than 95% of resources allocated would be considered in our report.

Without a min percent, it'd skew the counts and averages. Every task not using all the requested resources, no matter how small, will be counted as under-utilized. If some task is using 98 mb out of the 100 mb they requested, that seems very reasonable so it didn't make sense to count it as an under-utilized task.

With that said, I'm okay with changing it to consider all tasks if you think that would be more useful

ssalinas · 2017-06-22T13:37:56Z

SingularityService/src/main/java/com/hubspot/singularity/scheduler/SingularityUsagePoller.java

+        double unusedCpu = cpuReserved - utilization.getAvgCpuUsed();
+        long unusedMemBytes = memoryBytesReserved - utilization.getMemBytesTotal();
+
+        if (unusedCpu / cpuReserved >= minUnderUtilizedPct) {


similar here, why not just report the over/under for everything rather than filtering it down?

ssalinas · 2017-06-22T13:40:48Z

SingularityService/src/main/java/com/hubspot/singularity/scheduler/SingularityUsagePoller.java

+          maxUnderUtilizedMemBytes = Math.max(unusedMemBytes, maxUnderUtilizedMemBytes);
+          minUnderUtilizedMemBytes = Math.min(unusedMemBytes, minUnderUtilizedMemBytes);
+        } else if (!includeUtilization) {
+          it.remove();


not filtering things down also means you'd get to remove this. Generally better practice and more readable in code to not add or remove from the collection your iterating over.

yeah, i wasn't a fan of the iterator here either, but that was the only way to safely remove the item from the list in the loop. If we decide to keep the min percentage, I could do a second cleanup loop rather than do it within the loop

ssalinas · 2017-06-22T13:43:05Z

SingularityBase/src/main/java/com/hubspot/singularity/RequestUtilization.java

+    return numTasks;
+  }
+
+  public double getAvgMemBytesUsed() {


having these getters here means they would be included as json fields. Since they are easily calculated, and not used anywhere else in the code maybe exclude these? (think two extra fields x 3000+ requests all in one json blob)

these are actually used to determine each unused resource:
long unusedMemBytes = (long) (memoryBytesReserved - utilization.getAvgMemBytesUsed());

I could drop the fields and do an inline calculation though?:
long unusedMemBytes = (long) (memoryBytesReserved - (utilization.getMemBytesTotal() / utilization.getNumTasks()));

Could just do @JsonIgnoreand keep them on the object, but not the serialization

PtrTeixeira

I don't know whether this has anything to do with your changes, and is probably just because those tests were fragile to start off with, but it looks like the tests SingularityMesosSchedulerTest (these aren't a big problem; these look like a problem with double comparisons) and Whitney's old tests in SingularityUsageTest have started breaking.

matush-v · 2017-06-22T14:43:48Z

@PtrTeixeira I've fixed some of those tests locally already and am working on getting mine written and the others fixed as well

matush-v · 2017-07-13T18:14:14Z

@ssalinas looking good in qa. okay to get it into prod?

ssalinas · 2017-07-14T19:40:26Z

Going to merge this one. Any further updates can be done in future PRs

matush-v added 5 commits June 15, 2017 13:30

correctly convert bytes to mb

0095ddd

cluster utilization endpoint

d4c932e

keep a larger duration of task usages

086ba78

memRssBytes -> memTotalBytes

be11810

fallback to deleting the oldest usage

dc9ba52

matush-v requested a review from ssalinas June 20, 2017 15:49

matush-v added 3 commits June 20, 2017 14:38

simplify

8ac2c5a

move clusterUtilization into SingularityClusterUtilization

ad61dcb

test task usage clearing

e15bbf4

ssalinas reviewed Jun 22, 2017

View reviewed changes

PtrTeixeira reviewed Jun 22, 2017

View reviewed changes

matush-v added 7 commits June 22, 2017 13:44

don't serialize averages

f7fc273

rename

358a352

add tests and adjust logic

4dbba4c

fix tests

6c9e32c

remove usage filtering

49b1357

fix tests

fb04415

simplify method

5df329a

matush-v force-pushed the use-it-or-lose-it branch from bb6fb6e to 5df329a Compare June 23, 2017 17:20

ssalinas modified the milestone: 0.17.0 Jun 23, 2017

matush-v added 4 commits June 23, 2017 13:53

Merge branch 'master' into use-it-or-lose-it

af8e45a

switch avg bytes to long

a45d59d

track cpus properly

eb97445

use copied list

c3d5c47

matush-v added 4 commits June 24, 2017 19:37

calculate cpu usage from oldest usage

5995260

track requestIds associated with max utilizations

11a0ed6

make fields private

6f994b0

sad

98d5d38

matush-v force-pushed the use-it-or-lose-it branch from 27fb95f to 98d5d38 Compare June 25, 2017 00:51

track resources reserved in requestUtilization

3234d43

matush-v force-pushed the use-it-or-lose-it branch from 63b8dd8 to 3234d43 Compare June 27, 2017 21:39

matush-v added 4 commits June 28, 2017 14:16

correctly count resources reserved for requests with multiple tasks

afc0d92

Merge branch 'master' into use-it-or-lose-it

4620878

merge issues

ce8cd1b

merge conflicts

5dc6b5c

matush-v added hs_staging labels Jun 28, 2017

matush-v added 7 commits July 10, 2017 14:23

track max and min per request

6f598b7

use seconds consistently, update and simplfy tests

0e3c306

use correct json property

ff70d5e

keep 1 day of usage data

1a14d3d

get max and min from available history

77de0c1

typo

b9faf54

clear usage based on path

e748cca

matush-v added the hs_stable label Jul 13, 2017

ssalinas merged commit 2b00479 into master Jul 14, 2017

ssalinas deleted the use-it-or-lose-it branch July 14, 2017 19:40

baconmania modified the milestones: 0.18.0, 0.17.0 Sep 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resource usage endpoint #1570

resource usage endpoint #1570

matush-v commented Jun 20, 2017 •

edited

Loading

ssalinas Jun 22, 2017

ssalinas Jun 22, 2017

matush-v Jun 22, 2017

ssalinas Jun 22, 2017

matush-v Jun 22, 2017

ssalinas Jun 22, 2017

ssalinas Jun 22, 2017

matush-v Jun 22, 2017

ssalinas Jun 22, 2017

matush-v Jun 22, 2017

PtrTeixeira Jun 22, 2017

PtrTeixeira left a comment

matush-v commented Jun 22, 2017

matush-v commented Jul 13, 2017

ssalinas commented Jul 14, 2017

resource usage endpoint #1570

resource usage endpoint #1570

Conversation

matush-v commented Jun 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PtrTeixeira left a comment

Choose a reason for hiding this comment

matush-v commented Jun 22, 2017

matush-v commented Jul 13, 2017

ssalinas commented Jul 14, 2017

matush-v commented Jun 20, 2017 •

edited

Loading