Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HPA support to ChatQnA #327

Merged
merged 4 commits into from
Aug 23, 2024
Merged

Add HPA support to ChatQnA #327

merged 4 commits into from
Aug 23, 2024

Conversation

byako
Copy link
Contributor

@byako byako commented Aug 20, 2024

Description

This PR introduces HPA support to ChatQnA TGI, Embedding and Reranking services based on custom metrics.

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency to Prometheus is needed for metrics monitoring.

Because HPA overwrites / changes several things, it might be better to have it as a separate PR?

@eero-t
Copy link
Contributor

eero-t commented Aug 20, 2024

RBAC rules installed by Prometheus allow it to query metrics only from services in default, kube-system and monitoring namespaces: https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/prometheus-roleBindingSpecificNamespaces.yaml

So asking Helm to install ChatQnA to some other namespace would mean there being no metrics for HPA.

Would be good to mention that somewhere, maybe in the ChatQnA helm-chart README?

@byako
Copy link
Contributor Author

byako commented Aug 21, 2024

Fixed comments, also made ServiceMonitor take name from the Values like Service has it.

@byako byako force-pushed the chatqna-hpa branch 2 times, most recently from 258d387 to 24e1a63 Compare August 21, 2024 08:49
@byako byako marked this pull request as ready for review August 21, 2024 08:49
Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment update proposals.

@byako
Copy link
Contributor Author

byako commented Aug 21, 2024

Added all suggested comments.

@eero-t
Copy link
Contributor

eero-t commented Aug 21, 2024

Something like this could be added to relevant Chart READMEs, e.g. between their Install and Verify sections:

## HorizontalPodAutoscaler (HPA) support

`horizontalPodAutoscaler` option enables HPA scaling for the deployment:
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Autoscaling is based on custom application metrics provided through [Prometheus](https://prometheus.io/).

### Pre-conditions

If cluster does not run [Prometheus operator](https://github.com/prometheus-operator/kube-prometheus)
yet, it SHOULD be be installed before enabling HPA, e.g. by using:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack

### Gotchas

Why HPA is opt-in:
* Enabling chart `horizontalPodAutoscaler` option will _overwrite_ cluster's current
  `PrometheusAdapter` configuration with its own custom metrics configuration.
  Take copy of the existing one before install, if that matters:
  `kubectl -n monitoring get cm/adapter-config -o yaml > adapter-config.yaml`
* `PrometheusAdapter` needs to be restarted after install, for it to read the new configuration:
  `ns=monitoring; kubectl -n $ns delete $(kubectl -n $ns get pod --selector app.kubernetes.io/name=prometheus-adapter -o name)`
* By default Prometheus adds [k8s RBAC rules](https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/prometheus-roleBindingSpecificNamespaces.yaml)
  for accessing metrics from `default`, `kube-system` and `monitoring` namespaces.  If Helm is
  asked to install OPEA services to some other namespace, those rules need to be updated accordingly
* Provided HPA rules are examples for Xeon, for efficient scaling they need to be fine-tune for given setup
  (underlying HW, used models, OPEA version etc) 

@byako
Copy link
Contributor Author

byako commented Aug 21, 2024

Added HPA sections to README.mds, and moved HPA from reranking-usvc chart to teirerank which seem to be the actual reranking. embedding-usvc seems to be tei-reranking, so HPA added to it seems to be the correct place.

@eero-t
Copy link
Contributor

eero-t commented Aug 21, 2024

Bit more work required for the README:

  • I had a typo in my blurb: s/ be fine-tune / be fine-tuned /
  • horizontalPodAutoscaler option should be added to table in "Values" section of the READMEs
  • READMEs "Verify" sections could include following subsection on verifying that data required by HPA is present

Verify HPA metrics

To verify that metrics required by horizontalPodAutoscaler option work, check that...

Prometheus found the metric endpoints, i.e. last number on the line is non-zero:

prom_url=http://$(kubectl -n monitoring get -o jsonpath="{.spec.clusterIP}:{.spec.ports[0].port}" svc/prometheus-k8s)
curl --no-progress-meter $prom_url/metrics | grep scrape_pool_targets.*-svc

Prometheus adapter provides custom metrics for their data:

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .resources[].name

And those custom metrics have valid values for HPA rules:

ns=default;  # OPEA namespace
url=/apis/custom.metrics.k8s.io/v1beta1;
for m in $(kubectl get --raw $url | jq .resources[].name | tr -d '"' | grep namespaces | sed "s%/%/${ns}/metrics/%"); do
  kubectl get --raw $url/$m | jq;
done | grep -e metricName -e value

NOTE: HuggingFace TGI and TEI services provide metrics endpoint only after they've processed their first request!

@eero-t
Copy link
Contributor

eero-t commented Aug 21, 2024

If we don't care that user cannot directly copy-paste the command, last check could be just:

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/<NAMESPACE>/metrics/<METRIC> | jq

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm charts helpers unconditionally add a selector label for deployments, so ServiceMonitors can use that instead of new svc label.

helm-charts/common/embedding-usvc/templates/_helpers.tpl Outdated Show resolved Hide resolved
helm-charts/common/teirerank/templates/_helpers.tpl Outdated Show resolved Hide resolved
helm-charts/common/teirerank/templates/servicemonitor.yaml Outdated Show resolved Hide resolved
helm-charts/common/tgi/servicemonitor.yaml Outdated Show resolved Hide resolved
helm-charts/common/tgi/templates/_helpers.tpl Outdated Show resolved Hide resolved
Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPEA has changed the service names so *-svc pattern cannot be used to match the relevant one any more, each service needs its own grep pattern for validation.

helm-charts/common/embedding-usvc/README.md Outdated Show resolved Hide resolved
helm-charts/common/teirerank/README.md Outdated Show resolved Hide resolved
helm-charts/common/tgi/README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@irisdingbj irisdingbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides helm-chart, Would you also update the manifests( https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector/config/manifests) to accomodate this change?

@eero-t
Copy link
Contributor

eero-t commented Aug 21, 2024

Besides helm-chart, Would you also update the manifests( https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector/config/manifests) to accomodate this change?

@irisdingbj As HPA support is disabled by default, running helm-charts/update_manifests.sh would just add few extra comments to the manifests, nothing else.

If you meant generating additional set of manifest files for HPA, I think that's a bad idea. User will then miss Pre-conditions and Gotchas documented in the Helm charts READMEs, and does not have options to configure HPA for underlying cloud setup (e.g. to how many replicas each deployment can be scaled, which depends on how many nodes are available).

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowadays Helm charts Service declarations use service's name also for port names.

As port name is hard-coded in Services, I'm suggesting same with ServiceMonitors, but I think both could as well switch to using Helm include instead...

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added suggestions on documenting the HPA dependencies between chatqna and TGI/TEI components.

# for embedding, reranking, tgi services
# Upstream default configMap:
# - https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/deploy/manifests/config-map.yaml
horizontalPodAutoscaler:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need "HorizontalPodAutoscaler (HPA) support" section in chatqna README too. It can be same as in common components, with first clause updated e.g. to:

`horizontalPodAutoscaler` option enables HPA scaling for the TGI and TEI inferencing deployments:

Verification section can be omitted I think, it's enough to have it in TGI & TEI READMEs.

Options table entry description could be e.g:

HPA autoscaling for the TGI and TEI service deployments based on metrics they provide. See #pre-conditions and #gotchas before enabling!  (If one doesn't want one of them to be scaled, given service `maxReplicas` can be set to `1`)

helm-charts/common/tei/README.md Outdated Show resolved Hide resolved
helm-charts/common/teirerank/README.md Outdated Show resolved Hide resolved
helm-charts/common/tgi/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came up with IHMO slightly more readable command line example than what I had earlier. But earlier works fine too.

helm-charts/common/tei/README.md Outdated Show resolved Hide resolved
helm-charts/common/teirerank/README.md Outdated Show resolved Hide resolved
helm-charts/common/tgi/README.md Outdated Show resolved Hide resolved
@byako
Copy link
Contributor Author

byako commented Aug 22, 2024

Added suggested fixes, rebased onto latest main.

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but please also update helm-charts/chatqna/README.md as suggested above.

@byako
Copy link
Contributor Author

byako commented Aug 22, 2024

Approved, but please also update helm-charts/chatqna/README.md as suggested above.

I'll also update generated manifests.

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing E2E test seems to be network issue:

docker push 100.80.243.74:5000/opea/gmcmanager:84507fc42fbae6a2104ce36457d8ddc5b02c4354
The push refers to repository [100.80.243.74:5000/opea/gmcmanager]
...
337cf9c1bd1f: Retrying in 1 second
received unexpected HTTP status: 500 Internal Server Error

byako and others added 2 commits August 23, 2024 14:40
@@ -34,6 +34,35 @@ helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --

1. Make sure your `MODELDIR` exists on the node where your workload is schedueled so you can cache the downloaded model for next time use. Otherwise, set `global.modelUseHostPath` to 'null' if you don't want to cache the model.

## HorizontalPodAutoscaler (HPA) support
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This HPA support section is generic enough. I think maybe we can put it in one place instead of copy/pasting it:
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/README.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks, @eero-t for preparing patch.

@poussa
Copy link

poussa commented Aug 23, 2024

@irisdingbj can you approve this and add the v0.9 label. Perhaps merge as well since the automatic merge does not happen since the e2e test fails due to network error in GMC.

@byako
Copy link
Contributor Author

byako commented Aug 23, 2024

I squashed whitespace commit from CI into previous commit and force-pushed to trigger new test round.

Signed-off-by: Alexey Fomenko <alexey.fomenko@intel.com>
@irisdingbj irisdingbj merged commit cab7a88 into opea-project:main Aug 23, 2024
16 checks passed
@irisdingbj
Copy link
Collaborator

@poussa @eero-t

@irisdingbj can you approve this and add the v0.9 label. Perhaps merge as well since the automatic merge does not happen since the e2e test fails due to network error in GMC.

All test passed now and merged into main branch. v0.9 label already added. Please ask @daisy-ycguo for the process to merge into v0.9 release.

@eero-t eero-t mentioned this pull request Sep 5, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants