Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource usage endpoint #1570

Merged
merged 35 commits into from
Jul 14, 2017
Merged

resource usage endpoint #1570

merged 35 commits into from
Jul 14, 2017

Conversation

matush-v
Copy link
Contributor

@matush-v matush-v commented Jun 20, 2017

  • add some unit tests

Added endpoint to get resource usage data for each tracked request and the cluster as a whole.
Increased the number of points collected form 5 -> 15 and increased the interval between each point (poll frequency x interval) so we have a wider span of data

Sorry, something went wrong.

@matush-v matush-v requested a review from ssalinas June 20, 2017 15:49
@@ -562,6 +566,26 @@ public SingularityState getState(Optional<Boolean> skipCache, Optional<Boolean>
return Optional.of(response.getAs(SingularityTaskReconciliationStatistics.class));
}

public SingularityClusterUtilization getClusterUtilization() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would follow the format of the other get calls here. You'll see there is a getSingle method where the http call building and deserialization is done for you

@VisibleForTesting
void clearOldUsage(List<SingularityTaskUsage> taskUsages, String taskId) {
if (taskUsages.size() + 1 > configuration.getNumUsageToKeep()) {
long minMillisApart = configuration.getUsageIntervalMultiplier() * configuration.getCheckUsageEveryMillis();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure of the purpose for this one, are you trying to make sure to keep a certain time period of data points rather than a count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the goal here is to increase the interval between each point (poll frequency * interval) so we have a wider span of data. I thought that would be more telling of usage since a task could be in a bad state (fail, be modified, etc.) for a few consecutive runs. The interval defaults to 3 so that's essentially 45 minutes of data (15 points 3 min apart) rather than 15 minutes (15 points 1 min apart)

includeUtilization = false;
}

if (unusedMemBytes / memoryBytesReserved >= minUnderUtilizedPct) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a downside to return all the data values instead of having mins that have to be hit? I feel like we'd want this to be a full report

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this min percentage to give us a goal of where we want our utilization to be. It's defaulted to 5% so that means any task using less than 95% of resources allocated would be considered in our report.

Without a min percent, it'd skew the counts and averages. Every task not using all the requested resources, no matter how small, will be counted as under-utilized. If some task is using 98 mb out of the 100 mb they requested, that seems very reasonable so it didn't make sense to count it as an under-utilized task.

With that said, I'm okay with changing it to consider all tasks if you think that would be more useful

double unusedCpu = cpuReserved - utilization.getAvgCpuUsed();
long unusedMemBytes = memoryBytesReserved - utilization.getMemBytesTotal();

if (unusedCpu / cpuReserved >= minUnderUtilizedPct) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar here, why not just report the over/under for everything rather than filtering it down?

maxUnderUtilizedMemBytes = Math.max(unusedMemBytes, maxUnderUtilizedMemBytes);
minUnderUtilizedMemBytes = Math.min(unusedMemBytes, minUnderUtilizedMemBytes);
} else if (!includeUtilization) {
it.remove();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not filtering things down also means you'd get to remove this. Generally better practice and more readable in code to not add or remove from the collection your iterating over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i wasn't a fan of the iterator here either, but that was the only way to safely remove the item from the list in the loop. If we decide to keep the min percentage, I could do a second cleanup loop rather than do it within the loop

return numTasks;
}

public double getAvgMemBytesUsed() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having these getters here means they would be included as json fields. Since they are easily calculated, and not used anywhere else in the code maybe exclude these? (think two extra fields x 3000+ requests all in one json blob)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are actually used to determine each unused resource:
long unusedMemBytes = (long) (memoryBytesReserved - utilization.getAvgMemBytesUsed());

I could drop the fields and do an inline calculation though?:
long unusedMemBytes = (long) (memoryBytesReserved - (utilization.getMemBytesTotal() / utilization.getNumTasks()));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just do @JsonIgnoreand keep them on the object, but not the serialization

Copy link
Contributor

@PtrTeixeira PtrTeixeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether this has anything to do with your changes, and is probably just because those tests were fragile to start off with, but it looks like the tests SingularityMesosSchedulerTest (these aren't a big problem; these look like a problem with double comparisons) and Whitney's old tests in SingularityUsageTest have started breaking.

@matush-v
Copy link
Contributor Author

@PtrTeixeira I've fixed some of those tests locally already and am working on getting mine written and the others fixed as well

@matush-v matush-v force-pushed the use-it-or-lose-it branch from bb6fb6e to 5df329a Compare June 23, 2017 17:20
@ssalinas ssalinas modified the milestone: 0.17.0 Jun 23, 2017
@matush-v matush-v force-pushed the use-it-or-lose-it branch from 27fb95f to 98d5d38 Compare June 25, 2017 00:51
@matush-v matush-v force-pushed the use-it-or-lose-it branch from 63b8dd8 to 3234d43 Compare June 27, 2017 21:39
@matush-v
Copy link
Contributor Author

@ssalinas looking good in qa. okay to get it into prod?

@ssalinas
Copy link
Member

Going to merge this one. Any further updates can be done in future PRs

@ssalinas ssalinas merged commit 2b00479 into master Jul 14, 2017
@ssalinas ssalinas deleted the use-it-or-lose-it branch July 14, 2017 19:40
@baconmania baconmania modified the milestones: 0.18.0, 0.17.0 Sep 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants