Helm: Support creation of HorizontalPodAutoscaler #4229

jgutschon · 2023-02-11T21:54:41Z

What this PR does

This adds a generic template for a HorizontalPodAutoscaler under templates/lib/hpa.tpl along with templating to generate HPAs for the following components:

alertmanager
compactor
distributor
ingester
querier
query-frontend
query-scheduler
store-gateway

The existing HPAs for the gateway and deprecated nginx pods have also been migrated to use this template. Notable changes resulting from this migration include that the behavior field may now be used, and the metrics target types have been changed to type: Utilization since these are scaling based on the percentage utilization of the metric. The requirements for using the autoscaling/v2 API have also been changed to K8s 1.23+ instead of 1.25+ since this API version is GA as of 1.23.

When using this with zone-aware components, individual HPAs are created per zone, and the hpa.minReplicas value is used instead of replicas in the zoneAwareReplicationMap template to ensure an even spread of pods at the minimum level.

Which issue(s) this PR fixes or relates to

Fixes #3430

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

CLAassistant · 2023-02-11T21:54:46Z

All committers have signed the CLA.

TaylorMutch · 2023-02-11T22:21:27Z

How would autoscaling the ingesters work with zone awareness enabled?

jgutschon · 2023-02-12T03:00:30Z

How would autoscaling the ingesters work with zone awareness enabled?

An HPA will be created for each zone if zone-awareness is enabled and then it will just scale within that zone based on the averageMemoryUtilization and averageCpuUtilization values set on the HPA. I've used this templating in our own deployment of mimir successfully with some extra behavior settings to avoid pod thrashing. For the ingester specifically, I set the scaleDown policy to disabled and instead handle downsizing manually due to the steps outlined here. It's still useful for us to have it able to scale up automatically however, so I've included here.

TaylorMutch · 2023-02-12T03:03:40Z

So just to confirm, if one of the statefulsets scales up by one, does it automatically scale up the other two (assuming 3 zones)? Otherwise you might have an imbalance

jgutschon · 2023-02-12T19:35:33Z

Since each zone would have its own HPA, they will only scale within the respective zone, so yes, it could become imbalanced. Realistically though, if one zone had to scale up due to high memory usage, the others would likely scale as well since the resource usage is almost the same across each zone. Either way, I didn't think this would be a problem since the docs just recommend balancing zones to ensure resource utilization is also balanced.

TaylorMutch · 2023-02-12T19:38:03Z

Likely true; makes sense

Logiraptor · 2023-02-18T01:16:40Z

@jgutschon Thanks for all this work! I've been out of office this week, so I've made a note to follow up when I'm back next week. I haven't finished completely reviewing / testing everything yet, but so far here are my thoughts in no particular order:

Internally we autoscale queriers based on the query scheduler queue size, since adding queriers won't help if the work to do is not shardable. Of course, this complicates things because now you need to query Mimir's own metrics, which means they need to be stored somewhere, and ideally that's not Mimir itself so you don't have a cascading failure.
I'm still a little nervous about autoscaling ingesters, specifically because automatically scaling down will lead to lost data by default. I would want to explicitly prevent downscaling somehow rather than relying on the users to configure it correctly. I think many users will see autoscaling and just flip all the booleans to true without reading deeper into it.
I noticed some fields changed in the gateway autoscaling, this is a breaking change and I'm wondering if we can keep the old naming to avoid that?
Thanks a ton for adding docs!
Have you tested the migration from current state to autoscaling? I think depending on how it all works out sometimes you can end up scaling to 0 temporarily, which would be bad for ingesters of course.
Depending on how shuffle sharding is configured, scaling up won't always make use of those extra replicas. This applies to several parts of the system, and I'm a little nervous about that edge case biting users over time. I wonder how we can avoid confusion here 🤔

Also tagging @krajorama and @dimitarvdimitrov in case they have thoughts :)

jgutschon · 2023-02-18T22:05:30Z

Hi @Logiraptor, thanks for taking a look. I've reverted the naming for the gateway autoscaling fields and disabled the scaleDown policy for the ingester HPA. I've also added some of the scaling behavior settings that we have been using in our own custom deployment of mimir to prevent pod thrashing.

For the querier autoscaling, I understand that the Jsonnet autoscaling makes use of KEDA to scale based on this queue size as well as some other metrics, such as in this template. I was tentatively planning to add support for KEDA ScaledObjects in a later PR if this one was accepted, but I figured this would be a good baseline for that and it would still be useful to have autoscaling based on the memory and/or CPU for now.

I've just tested migrating a little bit now and I'm seeing that it's scaling down to 1 replica momentarily after creating the HPA. As far as I understand, I think a new revision of the Deployment or StatefulSet needs to be created at the same time as the HPA, omitting the replicas field, and this behavior will be prevented but I'm not 100% sure. I'll investigate more and see if this can be avoided another way since changing the replicas won't trigger a new revision to be created.

hajowieland · 2023-02-28T08:11:24Z

Hy there, and thanks for your great work @jgutschon .
Any updates when this will be merged @krajorama @Logiraptor ?

Logiraptor · 2023-03-02T16:01:51Z

@hajowieland No ETA yet. Currently we need to find a solution to the migration process. It's not viable to release this without one - many users including paying customers of Grafana Enterprise Metrics use this chart in production. The expectation is that all upgrades can be done with zero downtime. Even a momentary scale to 1 replica will cause a complete outage.

I think we could merge it sooner if it includes only querier and distributor auto scaling, since as I mentioned previously these two are in production use at scale internally already, they're stateless, and cheap to spin up / down.

Auto scaling everything without extensive testing is risky because while all components are designed to scale horizontally, that doesn't necessarily mean they can be scaled often. Just to take a random example: compactors tend to have spiky demand as new blocks are created. So you may think that compactors can autoscale up every two hours, then back down in between. This might work, but it needs to be tested at scale. Compactors only re-plan the job sharding every so often (~ every 1h), so they will not immediately distribute work if you add 10 compactors for example. More likely, they will all compact the same blocks at first, then after 1h they will notice that more compactors have joined. So if you autoscale up and down constantly, I don't think they will make efficient use of resources.

That's just one example - each component needs to be tested individually at scale.

pstibrany · 2023-03-09T09:11:29Z

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

pracucci · 2023-10-10T14:33:52Z

@jgutschon @krajorama Hi! I'm checking draft PRs. What's the state of this PR? Should we move forward or close it?

jgutschon · 2023-10-11T16:02:20Z

Hi @pracucci, taking another look at this now, it seems that there are some k8s docs on the migration issues mentioned above: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#migrating-deployments-and-statefulsets-to-horizontal-autoscaling

I've just tried the steps in the client-side option after enabling the HPA and it worked successfully without scaling down the pods.

Worth noting here as well that I had originally solved this with ArgoCD by ignoring the replicas field entirely with the following config on the Application manifest, however this isn't a generalized solution for Helm users:

ignoreDifferences:
  - group: apps
    kind: Deployment
    managedFieldsManagers:
      - kube-controller-manager

I'll see if I have some time in the next few days to update the docs with the extra migration steps and address some of @Logiraptor's points, would be happy if this could still be merged.

dimitarvdimitrov · 2023-10-13T09:00:23Z

worth noting that there's another PR attempting to introduce HPA into the helm chart #4687. That uses KEDA and the same queries for autoscaling that the jsonnet library uses. I am not sure if @davidspek is still available to work on it.

I think it will be easier to maintain and provide support for if all the deployment tooling deployed similar autoscaling methods.

As @Logiraptor mentioned, autoscaling of different components can be very different. We're adding autoscaling of more components to jsonnet and internally testing it out extensively. Perhaps starting with some already tested components like the distributor, querier, query-frontend is a good starting point.

Regarding migrations: I think we might have some (internal?) docs for how to do the migration to KEDA autoscaling that we can share

pracucci · 2024-01-18T16:42:47Z

worth noting that there's another PR attempting to introduce HPA into the helm chart #4687

I'm closing this PR in favour of #4687 which looks has received more progress recently, but please feel free to re-open this PR if work gets resumed.

jgutschon added 6 commits February 10, 2023 22:27

create HPAs from generic template

2f3cde6

add default HPA values

828ca8b

support HPA creation in sts and dep manifests

0bfa23f

update zoneAwareReplicationMap to support HPA replicas

087c762

add compiled helm tests manifests

6a825f2

create documentation for autoscaling in the helm chart

5e38e31

update changelog

e4a16bb

jgutschon changed the title ~~feat(helm): Hpa autoscaling~~ Helm: Support creation of HorizontalPodAutoscaler Feb 11, 2023

jgutschon mentioned this pull request Feb 11, 2023

Add support for HPA in mimir-distributed #3430

Closed

jgutschon added 8 commits February 12, 2023 15:52

support autoscaling/v2beta1 API

38718f6

move mimir.hpa.version template to hpa and change to k8s 1.23

046c4ba

remove mistakenly added rendered template

9659363

standardize naming and change gateway hpa to use template instead

aff94f6

move mimir.hpa.version template to helpers

4670377

fix ctx references and whitespace trimming

0923713

switch nginx HPA to use the template

c894ae6

update helm-tests

ffeddde

lukas-unity mentioned this pull request Feb 14, 2023

Add HPA to Mimir Querier and Distributor #4133

Closed

3 tasks

jgutschon added 5 commits February 18, 2023 13:50

Merge branch 'main' into hpa-autoscaling

43ce0fe

maintain old autoscaling field names

d34964a

disable ingester scaledown by default

89258c6

fix docs links

50ef9c4

fix ci and docs checks

ccdd388

jgutschon added 2 commits February 18, 2023 16:07

update test values

f94ad1a

add default scaling behavior to prevent pod thrashing

b98fd95

krajorama self-requested a review February 26, 2023 18:03

krajorama added the helm label Feb 27, 2023

pstibrany added the release/notified-changelog-cut label Mar 9, 2023

Merge branch 'main' into hpa-autoscaling

fa6ec4a

pracucci closed this Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm: Support creation of HorizontalPodAutoscaler #4229

Helm: Support creation of HorizontalPodAutoscaler #4229

jgutschon commented Feb 11, 2023 •

edited

Loading

CLAassistant commented Feb 11, 2023 •

edited

Loading

TaylorMutch commented Feb 11, 2023

jgutschon commented Feb 12, 2023

TaylorMutch commented Feb 12, 2023

jgutschon commented Feb 12, 2023

TaylorMutch commented Feb 12, 2023

Logiraptor commented Feb 18, 2023

jgutschon commented Feb 18, 2023

hajowieland commented Feb 28, 2023

Logiraptor commented Mar 2, 2023

pstibrany commented Mar 9, 2023

pracucci commented Oct 10, 2023

jgutschon commented Oct 11, 2023 •

edited

Loading

dimitarvdimitrov commented Oct 13, 2023

pracucci commented Jan 18, 2024

Helm: Support creation of HorizontalPodAutoscaler #4229

Helm: Support creation of HorizontalPodAutoscaler #4229

Conversation

jgutschon commented Feb 11, 2023 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

CLAassistant commented Feb 11, 2023 • edited Loading

TaylorMutch commented Feb 11, 2023

jgutschon commented Feb 12, 2023

TaylorMutch commented Feb 12, 2023

jgutschon commented Feb 12, 2023

TaylorMutch commented Feb 12, 2023

Logiraptor commented Feb 18, 2023

jgutschon commented Feb 18, 2023

hajowieland commented Feb 28, 2023

Logiraptor commented Mar 2, 2023

pstibrany commented Mar 9, 2023

pracucci commented Oct 10, 2023

jgutschon commented Oct 11, 2023 • edited Loading

dimitarvdimitrov commented Oct 13, 2023

pracucci commented Jan 18, 2024

jgutschon commented Feb 11, 2023 •

edited

Loading

CLAassistant commented Feb 11, 2023 •

edited

Loading

jgutschon commented Oct 11, 2023 •

edited

Loading