-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChatQnA: accelerate also teirerank with Gaudi #475
base: main
Are you sure you want to change the base?
Conversation
Marking as draft because I'm not sure in which other places this should be added. (I can add And because I'm not sure what the recent changes in there other GenAI repos imply on reranking use... |
Looking at the CI fails, CI seems to be currently in rather broken state:
|
I've fixed the pre-commit failure. |
The Xeon failure is caused by opea-project/GenAIExamples#891, which removed the use of microservice layers for LLM and Embedding. #474 is the follow up change for helm-charts. For the guardrails-gaudi-values.yaml, unfortunately there is no way to include gaudi-values.yaml in helm chart, so it's ok to do duplicate changes there, or just keep it as is(Guardrail case still use CPU for reranking). You'll have to rebase with latest change to continue. |
I added
Thanks, done! |
I dropped the PR draft status, but I haven't tested this yet with the ChatQnA "nowrapper" changes that were merged after I filed this. I would expect rerank perf to be even more important after its wrapper service is not providing extra buffering / slowdown though... |
As Gaudi rerank worked fine for me, CI failure for it could be result of the later nowrapper changes:
I think it's another bug in CI though, of only specifying one of the 2 related TEI options. |
This is not a bug in CI, but with git HEAD, max warmup (matching specified max input length) is given only for |
Max input length applies to both, so teirerank needs also max warmup length. Fixes: opea-project#483 Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Rebased teirerank config fix as first one, so that every commit works. This PR does not change anything related to guardrails, but still that test fails, due to another CI bug:
Fail is due to ChatQnA timeouting on guardrails:
Although that service gets (eventually) to Ready state and its log shows now errors:
=> CI runs the query before verifying that all necessary backend pods (at least TGI & TEI) have reached Ready state? |
@lianhao Has this not been fixed in CI yet: #454 (comment) ? |
Description
Accelerate also
teirerank
with Gaudi, not justtei
.When reranking is used, it does not make sense (performance-wise) to accelerate just tei, as reranking is a larger bottleneck.
Issues
n/a
.Type of change
Dependencies
n/a
.Tests
Manually checked the ChatQnA throughput.