-
Notifications
You must be signed in to change notification settings - Fork 211
[LLM Runtime] refactor itrex backend based on the latest Jblas #769
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see the great refactor!
(seem someone need to fix the cpp graph)
Next-token beam_number=1 Xeon8480+ 48 threads pr branch: CompType=int8 1.08x pr branch: comp_type=bf16, ~1.3x pr branch: |
Fused-Attention part ( |
long prompt len=2023 Xeon8480+ 48 threads CompType=int8, 1.2x: pr branch: CompType=bf16 1.14x, pr branch: CompType=fp32 0.98x pr branch: |
intel_extension_for_transformers/llm/runtime/graph/application/main_pybind.cpp
Show resolved
Hide resolved
intel_extension_for_transformers/llm/runtime/graph/application/main_pybind.cpp
Outdated
Show resolved
Hide resolved
c65666f
to
31764ca
Compare
intel_extension_for_transformers/llm/runtime/graph/core/layers/mha_dense.cpp
Show resolved
Hide resolved
intel_extension_for_transformers/llm/runtime/graph/core/layers/mha_dense.cpp
Show resolved
Hide resolved
31764ca
to
ecea677
Compare
intel_extension_for_transformers/llm/runtime/graph/models/model_utils/model_utils.cpp
Show resolved
Hide resolved
intel_extension_for_transformers/llm/runtime/graph/core/layers/jblas_gemm.h
Show resolved
Hide resolved
Some model failed without output, just hangs for days (not due to disk), please check @zhewang1-intc @luoyu-intel |
d4d52ce
to
221b82b
Compare
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: Meng, Hengyu <airdldl@163.com>
bc9c455
to
192f979
Compare
Neuralchat UT failure is irrelevant, please check @lvliang-intel |
Type of Change
feature or bug fix or documentation or others
API changed or not
Description
detail description
JIRA ticket: xxx
Expected Behavior & Potential Risk
the expected behavior that triggered by this PR
How has this PR been tested?
how to reproduce the test (including hardware information)
Dependency Change?
any library dependency introduced or removed