-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resource usage endpoint #1570
resource usage endpoint #1570
Conversation
@@ -562,6 +566,26 @@ public SingularityState getState(Optional<Boolean> skipCache, Optional<Boolean> | |||
return Optional.of(response.getAs(SingularityTaskReconciliationStatistics.class)); | |||
} | |||
|
|||
public SingularityClusterUtilization getClusterUtilization() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would follow the format of the other get calls here. You'll see there is a getSingle
method where the http call building and deserialization is done for you
@VisibleForTesting | ||
void clearOldUsage(List<SingularityTaskUsage> taskUsages, String taskId) { | ||
if (taskUsages.size() + 1 > configuration.getNumUsageToKeep()) { | ||
long minMillisApart = configuration.getUsageIntervalMultiplier() * configuration.getCheckUsageEveryMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure of the purpose for this one, are you trying to make sure to keep a certain time period of data points rather than a count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the goal here is to increase the interval between each point (poll frequency * interval) so we have a wider span of data. I thought that would be more telling of usage since a task could be in a bad state (fail, be modified, etc.) for a few consecutive runs. The interval defaults to 3 so that's essentially 45 minutes of data (15 points 3 min apart) rather than 15 minutes (15 points 1 min apart)
includeUtilization = false; | ||
} | ||
|
||
if (unusedMemBytes / memoryBytesReserved >= minUnderUtilizedPct) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a downside to return all the data values instead of having mins that have to be hit? I feel like we'd want this to be a full report
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this min percentage to give us a goal of where we want our utilization to be. It's defaulted to 5% so that means any task using less than 95% of resources allocated would be considered in our report.
Without a min percent, it'd skew the counts and averages. Every task not using all the requested resources, no matter how small, will be counted as under-utilized. If some task is using 98 mb out of the 100 mb they requested, that seems very reasonable so it didn't make sense to count it as an under-utilized task.
With that said, I'm okay with changing it to consider all tasks if you think that would be more useful
double unusedCpu = cpuReserved - utilization.getAvgCpuUsed(); | ||
long unusedMemBytes = memoryBytesReserved - utilization.getMemBytesTotal(); | ||
|
||
if (unusedCpu / cpuReserved >= minUnderUtilizedPct) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar here, why not just report the over/under for everything rather than filtering it down?
maxUnderUtilizedMemBytes = Math.max(unusedMemBytes, maxUnderUtilizedMemBytes); | ||
minUnderUtilizedMemBytes = Math.min(unusedMemBytes, minUnderUtilizedMemBytes); | ||
} else if (!includeUtilization) { | ||
it.remove(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not filtering things down also means you'd get to remove this. Generally better practice and more readable in code to not add or remove from the collection your iterating over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i wasn't a fan of the iterator here either, but that was the only way to safely remove the item from the list in the loop. If we decide to keep the min percentage, I could do a second cleanup loop rather than do it within the loop
return numTasks; | ||
} | ||
|
||
public double getAvgMemBytesUsed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having these getters here means they would be included as json fields. Since they are easily calculated, and not used anywhere else in the code maybe exclude these? (think two extra fields x 3000+ requests all in one json blob)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are actually used to determine each unused resource:
long unusedMemBytes = (long) (memoryBytesReserved - utilization.getAvgMemBytesUsed());
I could drop the fields and do an inline calculation though?:
long unusedMemBytes = (long) (memoryBytesReserved - (utilization.getMemBytesTotal() / utilization.getNumTasks()));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just do @JsonIgnore
and keep them on the object, but not the serialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether this has anything to do with your changes, and is probably just because those tests were fragile to start off with, but it looks like the tests SingularityMesosSchedulerTest
(these aren't a big problem; these look like a problem with double comparisons) and Whitney's old tests in SingularityUsageTest
have started breaking.
@PtrTeixeira I've fixed some of those tests locally already and am working on getting mine written and the others fixed as well |
bb6fb6e
to
5df329a
Compare
27fb95f
to
98d5d38
Compare
63b8dd8
to
3234d43
Compare
@ssalinas looking good in qa. okay to get it into prod? |
Going to merge this one. Any further updates can be done in future PRs |
Added endpoint to get resource usage data for each tracked request and the cluster as a whole.
Increased the number of points collected form 5 -> 15 and increased the interval between each point (poll frequency x interval) so we have a wider span of data