Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatQnA: accelerate also teirerank with Gaudi #475

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

eero-t
Copy link
Contributor

@eero-t eero-t commented Oct 15, 2024

Description

Accelerate also teirerank with Gaudi, not just tei.

When reranking is used, it does not make sense (performance-wise) to accelerate just tei, as reranking is a larger bottleneck.

Issues

n/a.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Dependencies

n/a.

Tests

Manually checked the ChatQnA throughput.

@eero-t
Copy link
Contributor Author

eero-t commented Oct 15, 2024

Marking as draft because I'm not sure in which other places this should be added.

(I can add common/teirerank/gaudi-values.yaml, but IMHO guardrails-gaudi-values.yaml files should be something added on top of gaudi-values.yaml, not independent.)

And because I'm not sure what the recent changes in there other GenAI repos imply on reranking use...

@eero-t
Copy link
Contributor Author

eero-t commented Oct 15, 2024

Looking at the CI fails, CI seems to be currently in rather broken state:

  • pre-commit.ci fails due to CI check for Python doc formatting not being configured properly
  • Change to gaudi-values.yaml cannot be cause for the Xeon failures
  • PR does not touch guardrail files yet either

@yongfengdu
Copy link
Collaborator

I've fixed the pre-commit failure.
Guardrail failure should be caused by other changes(Docker or Python), I'm debugging it but maybe we can disable it first to make the CI work.
Please hold a while until this one merged.#473
Besides the guardrail fail, there is another relative path check failed, I'm sure it's not due to the code itself, but https://github.com/yongfengdu/GenAIInfra/blob/ci-fix/.github/workflows/pr-path-detection.yml

@yongfengdu
Copy link
Collaborator

The Xeon failure is caused by opea-project/GenAIExamples#891, which removed the use of microservice layers for LLM and Embedding. #474 is the follow up change for helm-charts.

For the guardrails-gaudi-values.yaml, unfortunately there is no way to include gaudi-values.yaml in helm chart, so it's ok to do duplicate changes there, or just keep it as is(Guardrail case still use CPU for reranking).

You'll have to rebase with latest change to continue.

@eero-t
Copy link
Contributor Author

eero-t commented Oct 16, 2024

For the guardrails-gaudi-values.yaml, unfortunately there is no way to include gaudi-values.yaml in helm chart, so it's ok to do duplicate changes there, or just keep it as is(Guardrail case still use CPU for reranking).

I added gaudi-values.yaml to common/teirerank/, but did not touch guardrails, as I haven't done any performance testing for it.

You'll have to rebase with latest change to continue.

Thanks, done!

@eero-t eero-t marked this pull request as ready for review October 16, 2024 12:46
@eero-t
Copy link
Contributor Author

eero-t commented Oct 16, 2024

I dropped the PR draft status, but I haven't tested this yet with the ChatQnA "nowrapper" changes that were merged after I filed this. I would expect rerank perf to be even more important after its wrapper service is not providing extra buffering / slowdown though...

@eero-t
Copy link
Contributor Author

eero-t commented Oct 16, 2024

As Gaudi rerank worked fine for me, CI failure for it could be result of the later nowrapper changes:

2024-10-16T13:00:32.007226Z  INFO text_embeddings_backend_python::management: backends/python/src/management.rs:136: Python backend process terminated
Error: Error when doing warmup

Caused by:
    Could not start backend: max_warmup_length (1024) exceeds model's max_input_length (512), you can modify this value adding `-e MAX_WARMUP_SEQUENCE_LENGTH=<new_warmup_length>` to your Docker run command

I think it's another bug in CI though, of only specifying one of the 2 related TEI options.

@eero-t
Copy link
Contributor Author

eero-t commented Oct 18, 2024

    Could not start backend: max_warmup_length (1024) exceeds model's max_input_length (512), you can modify this value adding `-e MAX_WARMUP_SEQUENCE_LENGTH=<new_warmup_length>` to your Docker run command

I think it's another bug in CI though, of only specifying one of the 2 related TEI options.

This is not a bug in CI, but with git HEAD, max warmup (matching specified max input length) is given only for tei, not teirerank: #483

Max input length applies to both, so teirerank needs also max warmup length.

Fixes: opea-project#483

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
@eero-t
Copy link
Contributor Author

eero-t commented Oct 18, 2024

Rebased teirerank config fix as first one, so that every commit works.

This PR does not change anything related to guardrails, but still that test fails, due to another CI bug:

#kubectl logs chatqna20241018105227-testpod -n chatqna-20241018105227
curl: (22) The requested URL returned error: 500
Internal Server Errorcurl failed with code 22

Fail is due to ChatQnA timeouting on guardrails:

-----------Pod: chatqna20241018105227-6897bff66b-cm9sv---------
#kubectl describe pod chatqna20241018105227-6897bff66b-cm9sv -n chatqna-20241018105227
...
     Image:           100.83.111.229:5000/opea/chatqna-guardrails:latest
...
INFO:     10.244.164.179:36050 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 1091, in _wrap_create_connection
    sock = await aiohappyeyeballs.start_connection(
...
 TimeoutError: [Errno 110] Connect call failed ('10.103.241.79', 80)
...
 aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna20241018105227-guardrails-usvc:80 ssl:default [Connect call failed ('10.103.241.79', 80)]

Although that service gets (eventually) to Ready state and its log shows now errors:

NAME                                                     READY   STATUS      RESTARTS       AGE     IP               NODE                NOMINATED NODE   READINESS GATES
...
chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z   1/1     Running     0              3m22s   10.244.164.175   aise-cluster-01-3   <none>           <none>
...
-----------Pod: chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z---------
#kubectl describe pod chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z -n chatqna-20241018105227
...
    Liveness:        http-get http://:guardrails-usvc/v1/health_check delay=5s timeout=1s period=5s #success=1 #failure=24
    Readiness:       http-get http://:guardrails-usvc/v1/health_check delay=5s timeout=1s period=5s #success=1 #failure=3
    Startup:         http-get http://:guardrails-usvc/v1/health_check delay=5s timeout=1s period=5s #success=1 #failure=120
...
  Type     Reason     Age    From               Message
  ----     ------     ----   ----               -------
  Normal   Created    3m11s  kubelet            Created container chatqna20241018105227
  Normal   Started    3m11s  kubelet            Started container chatqna20241018105227
  Warning  Unhealthy  3m5s   kubelet            Startup probe failed: Get "http://10.244.164.175:9090/v1/health_check": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
...
-----------Pod: chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z---------
#kubectl describe pod chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z -n chatqna-20241018105227
...
#kubectl logs chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z -n chatqna-20241018105227
...
INFO:     100.83.111.245:34224 - "GET /v1/health_check HTTP/1.1" 200 OK
INFO:     10.244.164.137:50580 - "POST /v1/guardrails HTTP/1.1" 200 OK
INFO:     100.83.111.245:52736 - "GET /v1/health_check HTTP/1.1" 200 OK

=> CI runs the query before verifying that all necessary backend pods (at least TGI & TEI) have reached Ready state?

@eero-t
Copy link
Contributor Author

eero-t commented Oct 18, 2024

=> CI runs the query before verifying that all necessary backend pods (at least TGI & TEI) have reached Ready state?

@lianhao Has this not been fixed in CI yet: #454 (comment) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants