Skip two consecutive CPU utilization updates in the UtilizationBasedLimiter #5704

pracucci · 2023-08-09T12:27:59Z

What this PR does

We're still testing the CPU limit in the UtilizationBasedLimiter and we observe a weird behaviour. Sometimes, the limiter tracks a CPU utilization which is absurdly high (e.g. 108946 sec cores on a 32 cores machine). We haven't found the root cause, but I have a theory. In this PR I'm proposing a fix for that theory.

Theory

The UtilizationBasedLimiter.compute() is called by a time.Ticker. We know that a time.Ticker may call the function two consecutive times under some edge conditions. For example, look at this example. It prints:

2009-11-10 23:00:01 +0000 UTC m=+1.000000001
2009-11-10 23:00:02.99 +0000 UTC m=+2.990000001
2009-11-10 23:00:03 +0000 UTC m=+3.000000001
2009-11-10 23:00:04.99 +0000 UTC m=+4.990000001

Given ☝️ , a theory I have is that when ☝️ happens the computed timeSincePrevUpdate may be a very small number and so the divider in the following operation may be a very small number:

cpuUtil := (cpuTime - prevCPUTime) / timeSincePrevUpdate.Seconds()

When the divider is a very small number (e.g. 1 microsecond = 0.000001) it will amplify the value computed by cpuTime - prevCPUTime. We (myself included) expect cpuTime - prevCPUTime to be consistent with the elapsed time, so if the divider is a very small number then also cpuTime - prevCPUTime should be a very small number, but in practice we read from /proc and get the time.Now() at different times (the two operations are not atomic), so there's always a small drift between the two.

When the divider is 1 second, we don't notice this difference, but when the divider is a very small number because compute() was called two times consecutively, then the "small drift" will be amplified by factors, potentially leading to bogus CPU utilization computation.

This is just a theory, but I think what I'm proposing in this PR is a safe change to do anyway.

Which issue(s) this PR fixes or relates to

N/A

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…imiter Signed-off-by: Marco Pracucci <marco@pracucci.com>

aknuds1

LGTM, just wondering about a couple of things in the last test.

pkg/util/limiter/utilization.go

pkg/util/limiter/utilization_test.go

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci · 2023-08-09T13:37:51Z

Thanks a lot @aknuds1 for your review. I should have addressed all the comments.

aknuds1

Let's go

Skip two consecutive CPU utilization updates in the UtilizationBasedL…

6123c98

…imiter Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci marked this pull request as ready for review August 9, 2023 12:28

pracucci requested a review from a team as a code owner August 9, 2023 12:28

aknuds1 approved these changes Aug 9, 2023

View reviewed changes

pkg/util/limiter/utilization.go Outdated Show resolved Hide resolved

pkg/util/limiter/utilization_test.go Outdated Show resolved Hide resolved

pkg/util/limiter/utilization_test.go Outdated Show resolved Hide resolved

Addressed review feedback

939922b

Signed-off-by: Marco Pracucci <marco@pracucci.com>

aknuds1 approved these changes Aug 9, 2023

View reviewed changes

pracucci mentioned this pull request Aug 9, 2023

ingester: Clamp CPU utilization measurements to number of cores #5691

Closed

3 tasks

pracucci merged commit 78ebd8c into main Aug 9, 2023
28 checks passed

pracucci deleted the improve-cpu-based-limits branch August 9, 2023 15:15

pracucci mentioned this pull request Aug 17, 2023

UtilizationBasedLimiter: Sample also cgroup CPU utilization #5763

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip two consecutive CPU utilization updates in the UtilizationBasedLimiter #5704

Skip two consecutive CPU utilization updates in the UtilizationBasedLimiter #5704

pracucci commented Aug 9, 2023 •

edited

Loading

aknuds1 left a comment

pracucci commented Aug 9, 2023

aknuds1 left a comment

Skip two consecutive CPU utilization updates in the UtilizationBasedLimiter #5704

Skip two consecutive CPU utilization updates in the UtilizationBasedLimiter #5704

Conversation

pracucci commented Aug 9, 2023 • edited Loading

What this PR does

Theory

Which issue(s) this PR fixes or relates to

Checklist

aknuds1 left a comment

Choose a reason for hiding this comment

pracucci commented Aug 9, 2023

aknuds1 left a comment

Choose a reason for hiding this comment

pracucci commented Aug 9, 2023 •

edited

Loading