Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add openvino, trt #460

Merged
merged 4 commits into from
Nov 12, 2024
Merged

add openvino, trt #460

merged 4 commits into from
Nov 12, 2024

Conversation

michaelfeil
Copy link
Owner

@michaelfeil michaelfeil commented Nov 12, 2024

openvino not working, need to use #454

openvino:

  • tried openvino==2024.4 + nightly, but both keep breaking
  • openvino via poetry install does not work. Needs to install via pip after installation (uninstall openvino and install the same version builds the wheel / extensions / etc correctly)

tensorrt:

  • using cuda12
  • cuda12 has tensorrt==10.2 as only compatible version.
  • tensorrt==10.2 crashes on linux, as its linked to a windows dll
  • tensorrt==10.3 works, but the optimum inplementation has issues
  • onnxruntime==1.20 leaves tensorrt==10.5, but has has no compatible openvino distribution.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR adds OpenVINO and TensorRT optimizations to improve model inference performance across different hardware platforms.

  • Added OpenVINO support in /libs/infinity_emb/Docker.template.yaml with INFINITY_ENGINE="optimum" for CPU builds
  • Updated TensorRT support with CUDA 12.3.2 and TensorRT 10.3.0 in /libs/infinity_emb/Docker.template.yaml
  • Added provider-specific optimizations in /libs/infinity_emb/infinity_emb/transformer/utils_optimum.py for OpenVINO and TensorRT
  • Added quantized model support for OpenVINO in /libs/infinity_emb/infinity_emb/transformer/embedder/optimum.py
  • Updated dependencies in pyproject.toml to use OpenVINO 2024.4.0 and TensorRT 10.6.0

11 file(s) reviewed, 7 comment(s)
Edit PR Review Bot Settings | Greptile

@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 33.33333% with 12 lines in your changes missing coverage. Please review.

Project coverage is 73.02%. Comparing base (cdbe888) to head (bb34e3e).

Files with missing lines Patch % Lines
...nity_emb/infinity_emb/transformer/utils_optimum.py 33.33% 12 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

❗ There is a different number of reports uploaded between BASE (cdbe888) and HEAD (bb34e3e). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (cdbe888) HEAD (bb34e3e)
2 1
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #460      +/-   ##
==========================================
- Coverage   79.23%   73.02%   -6.21%     
==========================================
  Files          42       42              
  Lines        3380     3392      +12     
==========================================
- Hits         2678     2477     -201     
- Misses        702      915     +213     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@michaelfeil michaelfeil merged commit 9206840 into main Nov 12, 2024
36 checks passed
@michaelfeil michaelfeil deleted the optimum-upgrade branch November 12, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants