Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester: Limit read path based on CPU/memory utilization #5012

Merged
merged 49 commits into from
Jun 16, 2023

Conversation

aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented May 15, 2023

What this PR does

In the ingester, limit read path based on CPU/memory utilization. Read requests are rejected if the configured CPU/memory utilization threshold is exceeded, with HTTP code 503 (service unavailable).

CPU and memory utilization are obtained from one of the following, in order:

  1. cgroup v2
  2. cgroup v1
  3. /proc/self

The implementation is Linux only, for simplicity. The feature is marked as experimental though, so it should be OK to not have a cross-platform solution (yet?).

CPU utilization is calculated as an exponential weighted moving average over the last minute.

TODO

  • Should we measure CPU and memory utilization on the process or system level? I'm currently uncertain of whether system CPU/memory utilization within a container tells you the container utilization or that of the host
  • Are there more direct ways of finding CPU and memory utilization under Docker or k8s?
  • Figure out why gateway seems to translate 503 to 500
  • Add tests

Which issue(s) this PR fixes or relates to

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@aknuds1 aknuds1 added enhancement New feature or request component/ingester labels May 15, 2023
@aknuds1 aknuds1 force-pushed the feat/resource-based-ingester-limiting branch 5 times, most recently from 99eb5a7 to a53bef7 Compare May 15, 2023 14:27
return
}

cpuUtil := (cpuTime - lastCPUTime) / now.Sub(lastUpdate).Seconds()
Copy link
Member

@jhalterman jhalterman May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just fetch current CPU usage via the library?

Copy link
Contributor Author

@aknuds1 aknuds1 May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, hadn't seen that. In any case, we likely won't be using gopsutil by the time the PR is finalized. I'll try to learn from the Percent implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the Percent function isn't suitable, since it calculates the system's CPU utilization and not that of a single process (we want to know how much CPU the ingester process is using).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see on the other hand that there's a Process.Percent method. I'll investigate its implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having checked the Process.Percent implementation, it's essentially the same as mine AFAICT. Though, I wonder why the cpu.Percent function subtracts Guest and GuestNice from the total CPU time, while process.Process.Percent does not?? Guess it warrants some research.

Copy link
Member

@jhalterman jhalterman May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, cpu.Percent might only be good to use if Mimir is the only container running in an ingester pod, which it seems to be, at least in the Helm chart.

Copy link
Contributor Author

@aknuds1 aknuds1 May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's generally alone in its pod. I've never experienced otherwise. Also, Mimir can run outside of Docker/k8s, so have to support that usage (as opposed to using some Docker/k8s method of finding CPU load).

Copy link
Contributor Author

@aknuds1 aknuds1 Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that since the original comment was made, the PR has changed into measuring utilization via cgroup v2 and cgroup v1 before it measures via the process' resource utilization.)

@aknuds1 aknuds1 force-pushed the feat/resource-based-ingester-limiting branch 7 times, most recently from b6a5f40 to 321314c Compare May 18, 2023 11:43
pkg/ingester/limiting.go Outdated Show resolved Hide resolved
@aknuds1 aknuds1 force-pushed the feat/resource-based-ingester-limiting branch 11 times, most recently from 42f9678 to 8bee2cf Compare May 25, 2023 07:15
@aknuds1 aknuds1 force-pushed the feat/resource-based-ingester-limiting branch from 66106c8 to fa42b92 Compare May 26, 2023 06:34
aknuds1 and others added 26 commits June 16, 2023 18:08
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
@aknuds1 aknuds1 force-pushed the feat/resource-based-ingester-limiting branch from 0be5820 to 21b157c Compare June 16, 2023 16:12
@aknuds1 aknuds1 merged commit 655e0ff into main Jun 16, 2023
@aknuds1 aknuds1 deleted the feat/resource-based-ingester-limiting branch June 16, 2023 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants