-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add openvino, trt #460
add openvino, trt #460
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR adds OpenVINO and TensorRT optimizations to improve model inference performance across different hardware platforms.
- Added OpenVINO support in
/libs/infinity_emb/Docker.template.yaml
withINFINITY_ENGINE="optimum"
for CPU builds - Updated TensorRT support with CUDA 12.3.2 and TensorRT 10.3.0 in
/libs/infinity_emb/Docker.template.yaml
- Added provider-specific optimizations in
/libs/infinity_emb/infinity_emb/transformer/utils_optimum.py
for OpenVINO and TensorRT - Added quantized model support for OpenVINO in
/libs/infinity_emb/infinity_emb/transformer/embedder/optimum.py
- Updated dependencies in
pyproject.toml
to use OpenVINO 2024.4.0 and TensorRT 10.6.0
11 file(s) reviewed, 7 comment(s)
Edit PR Review Bot Settings | Greptile
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files@@ Coverage Diff @@
## main #460 +/- ##
==========================================
- Coverage 79.23% 73.02% -6.21%
==========================================
Files 42 42
Lines 3380 3392 +12
==========================================
- Hits 2678 2477 -201
- Misses 702 915 +213 ☔ View full report in Codecov by Sentry. |
openvino not working, need to use #454
openvino:
tensorrt: