Add HPA support to ChatQnA #327

byako · 2024-08-20T16:46:08Z

Description

This PR introduces HPA support to ChatQnA TGI, Embedding and Reranking services based on custom metrics.

helm-charts/chatqna/values.yaml

helm-charts/chatqna/templates/customMetrics.yaml

eero-t

Dependency to Prometheus is needed for metrics monitoring.

Because HPA overwrites / changes several things, it might be better to have it as a separate PR?

helm-charts/chatqna/templates/horizontalPorAutoscaler.yaml

helm-charts/chatqna/templates/servicemonitor.yaml

eero-t · 2024-08-20T18:56:49Z

RBAC rules installed by Prometheus allow it to query metrics only from services in default, kube-system and monitoring namespaces: https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/prometheus-roleBindingSpecificNamespaces.yaml

So asking Helm to install ChatQnA to some other namespace would mean there being no metrics for HPA.

Would be good to mention that somewhere, maybe in the ChatQnA helm-chart README?

helm-charts/chatqna/templates/service.yaml

helm-charts/common/embedding-usvc/templates/servicemonitor.yaml

byako · 2024-08-21T08:31:45Z

Fixed comments, also made ServiceMonitor take name from the Values like Service has it.

eero-t

Some comment update proposals.

helm-charts/common/embedding-usvc/templates/deployment.yaml

helm-charts/common/reranking-usvc/templates/deployment.yaml

helm-charts/common/tgi/templates/deployment.yaml

helm-charts/common/embedding-usvc/templates/horizontalPodAutoscaler.yaml

helm-charts/common/reranking-usvc/templates/horizontalPodAutoscaler.yaml

byako · 2024-08-21T09:23:19Z

Added all suggested comments.

eero-t · 2024-08-21T10:10:17Z

Something like this could be added to relevant Chart READMEs, e.g. between their Install and Verify sections:

## HorizontalPodAutoscaler (HPA) support

`horizontalPodAutoscaler` option enables HPA scaling for the deployment:
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Autoscaling is based on custom application metrics provided through [Prometheus](https://prometheus.io/).

### Pre-conditions

If cluster does not run [Prometheus operator](https://github.com/prometheus-operator/kube-prometheus)
yet, it SHOULD be be installed before enabling HPA, e.g. by using:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack

### Gotchas

Why HPA is opt-in:
* Enabling chart `horizontalPodAutoscaler` option will _overwrite_ cluster's current
  `PrometheusAdapter` configuration with its own custom metrics configuration.
  Take copy of the existing one before install, if that matters:
  `kubectl -n monitoring get cm/adapter-config -o yaml > adapter-config.yaml`
* `PrometheusAdapter` needs to be restarted after install, for it to read the new configuration:
  `ns=monitoring; kubectl -n $ns delete $(kubectl -n $ns get pod --selector app.kubernetes.io/name=prometheus-adapter -o name)`
* By default Prometheus adds [k8s RBAC rules](https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/prometheus-roleBindingSpecificNamespaces.yaml)
  for accessing metrics from `default`, `kube-system` and `monitoring` namespaces.  If Helm is
  asked to install OPEA services to some other namespace, those rules need to be updated accordingly
* Provided HPA rules are examples for Xeon, for efficient scaling they need to be fine-tune for given setup
  (underlying HW, used models, OPEA version etc)

byako · 2024-08-21T11:02:20Z

Added HPA sections to README.mds, and moved HPA from reranking-usvc chart to teirerank which seem to be the actual reranking. embedding-usvc seems to be tei-reranking, so HPA added to it seems to be the correct place.

eero-t · 2024-08-21T12:35:06Z

Bit more work required for the README:

I had a typo in my blurb: s/ be fine-tune / be fine-tuned /
horizontalPodAutoscaler option should be added to table in "Values" section of the READMEs
READMEs "Verify" sections could include following subsection on verifying that data required by HPA is present

Verify HPA metrics

To verify that metrics required by horizontalPodAutoscaler option work, check that...

Prometheus found the metric endpoints, i.e. last number on the line is non-zero:

prom_url=http://$(kubectl -n monitoring get -o jsonpath="{.spec.clusterIP}:{.spec.ports[0].port}" svc/prometheus-k8s)
curl --no-progress-meter $prom_url/metrics | grep scrape_pool_targets.*-svc

Prometheus adapter provides custom metrics for their data:

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .resources[].name

And those custom metrics have valid values for HPA rules:

ns=default;  # OPEA namespace
url=/apis/custom.metrics.k8s.io/v1beta1;
for m in $(kubectl get --raw $url | jq .resources[].name | tr -d '"' | grep namespaces | sed "s%/%/${ns}/metrics/%"); do
  kubectl get --raw $url/$m | jq;
done | grep -e metricName -e value

NOTE: HuggingFace TGI and TEI services provide metrics endpoint only after they've processed their first request!

eero-t · 2024-08-21T12:50:21Z

If we don't care that user cannot directly copy-paste the command, last check could be just:

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/<NAMESPACE>/metrics/<METRIC> | jq

eero-t

Helm charts helpers unconditionally add a selector label for deployments, so ServiceMonitors can use that instead of new svc label.

helm-charts/common/embedding-usvc/templates/_helpers.tpl

helm-charts/common/embedding-usvc/templates/servicemonitor.yaml

helm-charts/common/teirerank/templates/_helpers.tpl

helm-charts/common/teirerank/templates/servicemonitor.yaml

helm-charts/common/tgi/servicemonitor.yaml

helm-charts/common/tgi/templates/_helpers.tpl

eero-t

OPEA has changed the service names so *-svc pattern cannot be used to match the relevant one any more, each service needs its own grep pattern for validation.

helm-charts/common/embedding-usvc/README.md

helm-charts/common/teirerank/README.md

helm-charts/common/tgi/README.md

helm-charts/common/tgi/values.yaml

irisdingbj

Besides helm-chart, Would you also update the manifests( https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector/config/manifests) to accomodate this change?

eero-t · 2024-08-21T17:49:06Z

Besides helm-chart, Would you also update the manifests( https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector/config/manifests) to accomodate this change?

@irisdingbj As HPA support is disabled by default, running helm-charts/update_manifests.sh would just add few extra comments to the manifests, nothing else.

If you meant generating additional set of manifest files for HPA, I think that's a bad idea. User will then miss Pre-conditions and Gotchas documented in the Helm charts READMEs, and does not have options to configure HPA for underlying cloud setup (e.g. to how many replicas each deployment can be scaled, which depends on how many nodes are available).

eero-t

Nowadays Helm charts Service declarations use service's name also for port names.

As port name is hard-coded in Services, I'm suggesting same with ServiceMonitors, but I think both could as well switch to using Helm include instead...

helm-charts/common/embedding-usvc/templates/servicemonitor.yaml

helm-charts/common/teirerank/templates/servicemonitor.yaml

helm-charts/common/tgi/templates/servicemonitor.yaml

eero-t

Added suggestions on documenting the HPA dependencies between chatqna and TGI/TEI components.

eero-t · 2024-08-22T10:40:18Z

helm-charts/chatqna/values.yaml

+  #   for embedding, reranking, tgi services
+  # Upstream default configMap:
+  #  - https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/deploy/manifests/config-map.yaml
+  horizontalPodAutoscaler:


This will need "HorizontalPodAutoscaler (HPA) support" section in chatqna README too. It can be same as in common components, with first clause updated e.g. to:

`horizontalPodAutoscaler` option enables HPA scaling for the TGI and TEI inferencing deployments:

Verification section can be omitted I think, it's enough to have it in TGI & TEI READMEs.

Options table entry description could be e.g:

HPA autoscaling for the TGI and TEI service deployments based on metrics they provide. See #pre-conditions and #gotchas before enabling! (If one doesn't want one of them to be scaled, given service `maxReplicas` can be set to `1`)

helm-charts/common/tei/README.md

helm-charts/common/teirerank/README.md

helm-charts/common/tgi/README.md

helm-charts/chatqna/templates/customMetrics.yaml

eero-t

Came up with IHMO slightly more readable command line example than what I had earlier. But earlier works fine too.

helm-charts/common/tei/README.md

helm-charts/common/teirerank/README.md

helm-charts/common/tgi/README.md

byako · 2024-08-22T13:21:29Z

Added suggested fixes, rebased onto latest main.

eero-t

Approved, but please also update helm-charts/chatqna/README.md as suggested above.

byako · 2024-08-22T13:32:53Z

Approved, but please also update helm-charts/chatqna/README.md as suggested above.

I'll also update generated manifests.

helm-charts/chatqna/README.md

eero-t

Failing E2E test seems to be network issue:

docker push 100.80.243.74:5000/opea/gmcmanager:84507fc42fbae6a2104ce36457d8ddc5b02c4354
The push refers to repository [100.80.243.74:5000/opea/gmcmanager]
...
337cf9c1bd1f: Retrying in 1 second
received unexpected HTTP status: 500 Internal Server Error

Signed-off-by: Alexey Fomenko <alexey.fomenko@intel.com>

for more information, see https://pre-commit.ci

yongfengdu · 2024-08-23T06:59:05Z

helm-charts/chatqna/README.md

@@ -34,6 +34,35 @@ helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --

 1. Make sure your `MODELDIR` exists on the node where your workload is schedueled so you can cache the downloaded model for next time use. Otherwise, set `global.modelUseHostPath` to 'null' if you don't want to cache the model.

+## HorizontalPodAutoscaler (HPA) support


This HPA support section is generic enough. I think maybe we can put it in one place instead of copy/pasting it:
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/README.md

Done. Thanks, @eero-t for preparing patch.

poussa · 2024-08-23T10:19:01Z

@irisdingbj can you approve this and add the v0.9 label. Perhaps merge as well since the automatic merge does not happen since the e2e test fails due to network error in GMC.

byako · 2024-08-23T11:50:00Z

I squashed whitespace commit from CI into previous commit and force-pushed to trigger new test round.

Signed-off-by: Alexey Fomenko <alexey.fomenko@intel.com>

irisdingbj · 2024-08-23T15:56:13Z

@poussa @eero-t

@irisdingbj can you approve this and add the v0.9 label. Perhaps merge as well since the automatic merge does not happen since the e2e test fails due to network error in GMC.

All test passed now and merged into main branch. v0.9 label already added. Please ask @daisy-ycguo for the process to merge into v0.9 release.

byako force-pushed the chatqna-hpa branch from e9a5ee6 to f32d810 Compare August 20, 2024 16:47

eero-t reviewed Aug 20, 2024

View reviewed changes

helm-charts/chatqna/values.yaml Outdated Show resolved Hide resolved

eero-t reviewed Aug 20, 2024

View reviewed changes

helm-charts/chatqna/templates/customMetrics.yaml Outdated Show resolved Hide resolved

eero-t suggested changes Aug 20, 2024

View reviewed changes

eero-t reviewed Aug 20, 2024

View reviewed changes

helm-charts/chatqna/templates/horizontalPorAutoscaler.yaml Outdated Show resolved Hide resolved

helm-charts/chatqna/templates/horizontalPorAutoscaler.yaml Outdated Show resolved Hide resolved

helm-charts/chatqna/templates/servicemonitor.yaml Outdated Show resolved Hide resolved

byako force-pushed the chatqna-hpa branch from e04e380 to d39dc51 Compare August 21, 2024 06:06

lianhao reviewed Aug 21, 2024

View reviewed changes

helm-charts/chatqna/templates/service.yaml Outdated Show resolved Hide resolved

helm-charts/common/embedding-usvc/templates/servicemonitor.yaml Outdated Show resolved Hide resolved

helm-charts/common/embedding-usvc/templates/servicemonitor.yaml Outdated Show resolved Hide resolved

byako force-pushed the chatqna-hpa branch from ed21d72 to bb8c5d8 Compare August 21, 2024 08:30

byako force-pushed the chatqna-hpa branch 2 times, most recently from 258d387 to 24e1a63 Compare August 21, 2024 08:49

byako marked this pull request as ready for review August 21, 2024 08:49

eero-t reviewed Aug 21, 2024

View reviewed changes

byako force-pushed the chatqna-hpa branch from 0efd333 to 8b9cfde Compare August 21, 2024 09:22

byako force-pushed the chatqna-hpa branch from 4f252a4 to a6b42bc Compare August 21, 2024 11:00

byako force-pushed the chatqna-hpa branch from da0d074 to 6fdfdbe Compare August 21, 2024 13:03

byako requested a review from lianhao August 21, 2024 13:03

eero-t suggested changes Aug 21, 2024

View reviewed changes

helm-charts/common/embedding-usvc/README.md Outdated Show resolved Hide resolved

helm-charts/common/teirerank/README.md Outdated Show resolved Hide resolved

helm-charts/common/tgi/README.md Outdated Show resolved Hide resolved

eero-t reviewed Aug 21, 2024

View reviewed changes

helm-charts/common/tgi/values.yaml Outdated Show resolved Hide resolved

byako force-pushed the chatqna-hpa branch from 617a598 to 939976d Compare August 21, 2024 15:08

irisdingbj reviewed Aug 21, 2024

View reviewed changes

eero-t suggested changes Aug 21, 2024

View reviewed changes

helm-charts/common/embedding-usvc/templates/servicemonitor.yaml Outdated Show resolved Hide resolved

helm-charts/common/teirerank/templates/servicemonitor.yaml Outdated Show resolved Hide resolved

helm-charts/common/tgi/templates/servicemonitor.yaml Outdated Show resolved Hide resolved

eero-t reviewed Aug 22, 2024

View reviewed changes

eero-t suggested changes Aug 22, 2024

View reviewed changes

helm-charts/chatqna/templates/customMetrics.yaml Outdated Show resolved Hide resolved

eero-t reviewed Aug 22, 2024

View reviewed changes

helm-charts/common/tei/README.md Outdated Show resolved Hide resolved

helm-charts/common/teirerank/README.md Outdated Show resolved Hide resolved

helm-charts/common/tgi/README.md Outdated Show resolved Hide resolved

byako force-pushed the chatqna-hpa branch from a5f4570 to b49bdea Compare August 22, 2024 13:21

eero-t approved these changes Aug 22, 2024

View reviewed changes

byako force-pushed the chatqna-hpa branch from b3b124c to ecedfb4 Compare August 22, 2024 13:38

eero-t reviewed Aug 22, 2024

View reviewed changes

helm-charts/chatqna/README.md Outdated Show resolved Hide resolved

byako force-pushed the chatqna-hpa branch from 4da41bb to 53ae723 Compare August 22, 2024 16:14

byako requested review from lianhao, eero-t and irisdingbj August 22, 2024 16:15

eero-t approved these changes Aug 22, 2024

View reviewed changes

lianhao approved these changes Aug 23, 2024

View reviewed changes

byako and others added 2 commits August 23, 2024 14:40

Add HPA support to tei, teireranking, tgi services

c291ffd

Signed-off-by: Alexey Fomenko <alexey.fomenko@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

3f97188

for more information, see https://pre-commit.ci

lianhao force-pushed the chatqna-hpa branch from 84507fc to 3f97188 Compare August 23, 2024 06:40

yongfengdu reviewed Aug 23, 2024

View reviewed changes

byako force-pushed the chatqna-hpa branch from 10352ca to 76c1007 Compare August 23, 2024 11:48

Consolidate HPA documentation

2b89cd3

Signed-off-by: Alexey Fomenko <alexey.fomenko@intel.com>

byako force-pushed the chatqna-hpa branch from 76c1007 to 2b89cd3 Compare August 23, 2024 12:40

Merge branch 'main' into chatqna-hpa

679a42d

irisdingbj approved these changes Aug 23, 2024

View reviewed changes

irisdingbj merged commit cab7a88 into opea-project:main Aug 23, 2024
16 checks passed

irisdingbj added the milestone0.9 label Aug 23, 2024

eero-t mentioned this pull request Sep 5, 2024

HPA improvements #386

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HPA support to ChatQnA #327

Add HPA support to ChatQnA #327

byako commented Aug 20, 2024

eero-t left a comment

eero-t commented Aug 20, 2024

byako commented Aug 21, 2024

eero-t left a comment

byako commented Aug 21, 2024

eero-t commented Aug 21, 2024

byako commented Aug 21, 2024

eero-t commented Aug 21, 2024 •

edited

Loading

eero-t commented Aug 21, 2024

eero-t left a comment

eero-t left a comment

irisdingbj left a comment

eero-t commented Aug 21, 2024 •

edited

Loading

eero-t left a comment

eero-t left a comment

eero-t Aug 22, 2024

eero-t left a comment

byako commented Aug 22, 2024

eero-t left a comment

byako commented Aug 22, 2024

eero-t left a comment

yongfengdu Aug 23, 2024

byako Aug 23, 2024

poussa commented Aug 23, 2024

byako commented Aug 23, 2024

irisdingbj commented Aug 23, 2024

		@@ -34,6 +34,35 @@ helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --

		1. Make sure your `MODELDIR` exists on the node where your workload is schedueled so you can cache the downloaded model for next time use. Otherwise, set `global.modelUseHostPath` to 'null' if you don't want to cache the model.

		## HorizontalPodAutoscaler (HPA) support

Add HPA support to ChatQnA #327

Add HPA support to ChatQnA #327

Conversation

byako commented Aug 20, 2024

Description

eero-t left a comment

Choose a reason for hiding this comment

eero-t commented Aug 20, 2024

byako commented Aug 21, 2024

eero-t left a comment

Choose a reason for hiding this comment

byako commented Aug 21, 2024

eero-t commented Aug 21, 2024

byako commented Aug 21, 2024

eero-t commented Aug 21, 2024 • edited Loading

Verify HPA metrics

eero-t commented Aug 21, 2024

eero-t left a comment

Choose a reason for hiding this comment

eero-t left a comment

Choose a reason for hiding this comment

irisdingbj left a comment

Choose a reason for hiding this comment

eero-t commented Aug 21, 2024 • edited Loading

eero-t left a comment

Choose a reason for hiding this comment

eero-t left a comment

Choose a reason for hiding this comment

eero-t Aug 22, 2024

Choose a reason for hiding this comment

eero-t left a comment

Choose a reason for hiding this comment

byako commented Aug 22, 2024

eero-t left a comment

Choose a reason for hiding this comment

byako commented Aug 22, 2024

eero-t left a comment

Choose a reason for hiding this comment

yongfengdu Aug 23, 2024

Choose a reason for hiding this comment

byako Aug 23, 2024

Choose a reason for hiding this comment

poussa commented Aug 23, 2024

byako commented Aug 23, 2024

irisdingbj commented Aug 23, 2024

eero-t commented Aug 21, 2024 •

edited

Loading

eero-t commented Aug 21, 2024 •

edited

Loading