Skip two consecutive CPU utilization updates in the UtilizationBasedLimiter #5704
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does
We're still testing the CPU limit in the
UtilizationBasedLimiter
and we observe a weird behaviour. Sometimes, the limiter tracks a CPU utilization which is absurdly high (e.g. 108946 sec cores on a 32 cores machine). We haven't found the root cause, but I have a theory. In this PR I'm proposing a fix for that theory.Theory
The
UtilizationBasedLimiter.compute()
is called by atime.Ticker
. We know that atime.Ticker
may call the function two consecutive times under some edge conditions. For example, look at this example. It prints:Given ☝️ , a theory I have is that when ☝️ happens the computed
timeSincePrevUpdate
may be a very small number and so the divider in the following operation may be a very small number:When the divider is a very small number (e.g. 1 microsecond = 0.000001) it will amplify the value computed by
cpuTime - prevCPUTime
. We (myself included) expectcpuTime - prevCPUTime
to be consistent with the elapsed time, so if the divider is a very small number then alsocpuTime - prevCPUTime
should be a very small number, but in practice we read from/proc
and get thetime.Now()
at different times (the two operations are not atomic), so there's always a small drift between the two.When the divider is 1 second, we don't notice this difference, but when the divider is a very small number because
compute()
was called two times consecutively, then the "small drift" will be amplified by factors, potentially leading to bogus CPU utilization computation.This is just a theory, but I think what I'm proposing in this PR is a safe change to do anyway.
Which issue(s) this PR fixes or relates to
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]