ChatQnA: accelerate also teirerank with Gaudi #475

eero-t · 2024-10-15T14:10:08Z

Description

Accelerate also teirerank with Gaudi, not just tei.

When reranking is used, it does not make sense (performance-wise) to accelerate just tei, as reranking is a larger bottleneck.

Issues

n/a.

Type of change

Bug fix (non-breaking change which fixes an issue)

Dependencies

n/a.

Tests

Manually checked the ChatQnA throughput.

eero-t · 2024-10-15T14:32:04Z

Marking as draft because I'm not sure in which other places this should be added.

(I can add common/teirerank/gaudi-values.yaml, but IMHO guardrails-gaudi-values.yaml files should be something added on top of gaudi-values.yaml, not independent.)

And because I'm not sure what the recent changes in there other GenAI repos imply on reranking use...

eero-t · 2024-10-15T14:51:04Z

Looking at the CI fails, CI seems to be currently in rather broken state:

pre-commit.ci fails due to CI check for Python doc formatting not being configured properly
Change to gaudi-values.yaml cannot be cause for the Xeon failures
PR does not touch guardrail files yet either

yongfengdu · 2024-10-16T02:13:50Z

I've fixed the pre-commit failure.
Guardrail failure should be caused by other changes(Docker or Python), I'm debugging it but maybe we can disable it first to make the CI work.
Please hold a while until this one merged.#473
Besides the guardrail fail, there is another relative path check failed, I'm sure it's not due to the code itself, but https://github.com/yongfengdu/GenAIInfra/blob/ci-fix/.github/workflows/pr-path-detection.yml

yongfengdu · 2024-10-16T08:52:20Z

The Xeon failure is caused by opea-project/GenAIExamples#891, which removed the use of microservice layers for LLM and Embedding. #474 is the follow up change for helm-charts.

For the guardrails-gaudi-values.yaml, unfortunately there is no way to include gaudi-values.yaml in helm chart, so it's ok to do duplicate changes there, or just keep it as is(Guardrail case still use CPU for reranking).

You'll have to rebase with latest change to continue.

eero-t · 2024-10-16T12:46:19Z

For the guardrails-gaudi-values.yaml, unfortunately there is no way to include gaudi-values.yaml in helm chart, so it's ok to do duplicate changes there, or just keep it as is(Guardrail case still use CPU for reranking).

I added gaudi-values.yaml to common/teirerank/, but did not touch guardrails, as I haven't done any performance testing for it.

You'll have to rebase with latest change to continue.

Thanks, done!

eero-t · 2024-10-16T12:52:02Z

I dropped the PR draft status, but I haven't tested this yet with the ChatQnA "nowrapper" changes that were merged after I filed this. I would expect rerank perf to be even more important after its wrapper service is not providing extra buffering / slowdown though...

eero-t · 2024-10-16T14:44:59Z

As Gaudi rerank worked fine for me, CI failure for it could be result of the later nowrapper changes:

2024-10-16T13:00:32.007226Z  INFO text_embeddings_backend_python::management: backends/python/src/management.rs:136: Python backend process terminated
Error: Error when doing warmup

Caused by:
    Could not start backend: max_warmup_length (1024) exceeds model's max_input_length (512), you can modify this value adding `-e MAX_WARMUP_SEQUENCE_LENGTH=<new_warmup_length>` to your Docker run command

I think it's another bug in CI though, of only specifying one of the 2 related TEI options.

eero-t · 2024-10-18T08:13:00Z

    Could not start backend: max_warmup_length (1024) exceeds model's max_input_length (512), you can modify this value adding `-e MAX_WARMUP_SEQUENCE_LENGTH=<new_warmup_length>` to your Docker run command

I think it's another bug in CI though, of only specifying one of the 2 related TEI options.

This is not a bug in CI, but with git HEAD, max warmup (matching specified max input length) is given only for tei, not teirerank: #483

Max input length applies to both, so teirerank needs also max warmup length. Fixes: opea-project#483 Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

eero-t · 2024-10-18T11:44:46Z

Rebased teirerank config fix as first one, so that every commit works.

This PR does not change anything related to guardrails, but still that test fails, due to another CI bug:

#kubectl logs chatqna20241018105227-testpod -n chatqna-20241018105227
curl: (22) The requested URL returned error: 500
Internal Server Errorcurl failed with code 22

Fail is due to ChatQnA timeouting on guardrails:

-----------Pod: chatqna20241018105227-6897bff66b-cm9sv---------
#kubectl describe pod chatqna20241018105227-6897bff66b-cm9sv -n chatqna-20241018105227
...
     Image:           100.83.111.229:5000/opea/chatqna-guardrails:latest
...
INFO:     10.244.164.179:36050 - "POST /v1/chatqna HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 1091, in _wrap_create_connection
    sock = await aiohappyeyeballs.start_connection(
...
 TimeoutError: [Errno 110] Connect call failed ('10.103.241.79', 80)
...
 aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host chatqna20241018105227-guardrails-usvc:80 ssl:default [Connect call failed ('10.103.241.79', 80)]

Although that service gets (eventually) to Ready state and its log shows now errors:

NAME                                                     READY   STATUS      RESTARTS       AGE     IP               NODE                NOMINATED NODE   READINESS GATES
...
chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z   1/1     Running     0              3m22s   10.244.164.175   aise-cluster-01-3   <none>           <none>
...
-----------Pod: chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z---------
#kubectl describe pod chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z -n chatqna-20241018105227
...
    Liveness:        http-get http://:guardrails-usvc/v1/health_check delay=5s timeout=1s period=5s #success=1 #failure=24
    Readiness:       http-get http://:guardrails-usvc/v1/health_check delay=5s timeout=1s period=5s #success=1 #failure=3
    Startup:         http-get http://:guardrails-usvc/v1/health_check delay=5s timeout=1s period=5s #success=1 #failure=120
...
  Type     Reason     Age    From               Message
  ----     ------     ----   ----               -------
  Normal   Created    3m11s  kubelet            Created container chatqna20241018105227
  Normal   Started    3m11s  kubelet            Started container chatqna20241018105227
  Warning  Unhealthy  3m5s   kubelet            Startup probe failed: Get "http://10.244.164.175:9090/v1/health_check": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
...
-----------Pod: chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z---------
#kubectl describe pod chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z -n chatqna-20241018105227
...
#kubectl logs chatqna20241018105227-guardrails-usvc-7658584b7c-4d25z -n chatqna-20241018105227
...
INFO:     100.83.111.245:34224 - "GET /v1/health_check HTTP/1.1" 200 OK
INFO:     10.244.164.137:50580 - "POST /v1/guardrails HTTP/1.1" 200 OK
INFO:     100.83.111.245:52736 - "GET /v1/health_check HTTP/1.1" 200 OK

=> CI runs the query before verifying that all necessary backend pods (at least TGI & TEI) have reached Ready state?

eero-t · 2024-10-18T11:57:39Z

=> CI runs the query before verifying that all necessary backend pods (at least TGI & TEI) have reached Ready state?

@lianhao Has this not been fixed in CI yet: #454 (comment) ?

eero-t requested review from yongfengdu and lianhao as code owners October 15, 2024 14:10

eero-t marked this pull request as draft October 15, 2024 14:10

eero-t force-pushed the gaudi-accel branch from b034a8d to 513f319 Compare October 15, 2024 14:15

eero-t force-pushed the gaudi-accel branch from 513f319 to bb5cea1 Compare October 16, 2024 12:42

eero-t marked this pull request as ready for review October 16, 2024 12:46

eero-t mentioned this pull request Oct 17, 2024

Gaudi teirerank is in crash loop after Helm install #483

Open

lianhao force-pushed the gaudi-accel branch from bb5cea1 to c8f3d1f Compare October 18, 2024 01:30

eero-t added 3 commits October 18, 2024 13:47

Fix: use MAX_WARMUP_SEQUENCE_LENGTH for both tei & teirerank

c92afa7

Max input length applies to both, so teirerank needs also max warmup length. Fixes: opea-project#483 Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

gaudi-values.yaml for teirerank

1e6cf8e

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

ChatQnA: accelerate also teirerank

b57cb16

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

eero-t force-pushed the gaudi-accel branch from c8f3d1f to b57cb16 Compare October 18, 2024 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatQnA: accelerate also teirerank with Gaudi #475

ChatQnA: accelerate also teirerank with Gaudi #475

eero-t commented Oct 15, 2024

eero-t commented Oct 15, 2024 •

edited

Loading

eero-t commented Oct 15, 2024

yongfengdu commented Oct 16, 2024

yongfengdu commented Oct 16, 2024

eero-t commented Oct 16, 2024

eero-t commented Oct 16, 2024

eero-t commented Oct 16, 2024 •

edited

Loading

eero-t commented Oct 18, 2024

eero-t commented Oct 18, 2024

eero-t commented Oct 18, 2024 •

edited

Loading

ChatQnA: accelerate also teirerank with Gaudi #475

Are you sure you want to change the base?

ChatQnA: accelerate also teirerank with Gaudi #475

Conversation

eero-t commented Oct 15, 2024

Description

Issues

Type of change

Dependencies

Tests

eero-t commented Oct 15, 2024 • edited Loading

eero-t commented Oct 15, 2024

yongfengdu commented Oct 16, 2024

yongfengdu commented Oct 16, 2024

eero-t commented Oct 16, 2024

eero-t commented Oct 16, 2024

eero-t commented Oct 16, 2024 • edited Loading

eero-t commented Oct 18, 2024

eero-t commented Oct 18, 2024

eero-t commented Oct 18, 2024 • edited Loading

eero-t commented Oct 15, 2024 •

edited

Loading

eero-t commented Oct 16, 2024 •

edited

Loading

eero-t commented Oct 18, 2024 •

edited

Loading