-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openvino 2024.2.0 #35
Conversation
[CPU] Fix u8 kvcache for PagedAttention
46e6d30
to
ddfca9b
Compare
ddfca9b
to
0a35dc6
Compare
0c23762
to
315d639
Compare
Installation with OpenVINO | ||
======================== | ||
|
||
vLLM powered by OpenVINO supports all LLM models from [vLLM supported models list](../dev/models/supported_models.rst) and can perform optimal model serving on all x86-64 CPUs with, at least, AVX2 support. OpenVINO vLLM backend supports the following advanced vLLM features: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM powered by OpenVINO supports all LLM models from [vLLM supported models list](../dev/models/supported_models.rst) and can perform optimal model serving on all x86-64 CPUs with, at least, AVX2 support. OpenVINO vLLM backend supports the following advanced vLLM features: | |
vLLM powered by OpenVINO supports all LLM models from | |
:doc:`vLLM supported models list <../models/supported_models>` and can perform optimal model serving on all x86-64 CPUs with at least AVX2 support. OpenVINO vLLM backend supports the following advanced vLLM features: |
Table of contents: | ||
|
||
#. :ref:`Requirements <openvino_backend_requirements>` | ||
#. :ref:`Quick start using Dockerfile <openvino_backend_quick_start_dockerfile>` | ||
#. :ref:`Build from source <binstall_openvino_backend_from_source>` | ||
#. :ref:`Performance tips <openvino_backend_performance_tips>` | ||
#. :ref:`Limitations <openvino_backend_limitations>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table of contents: | |
#. :ref:`Requirements <openvino_backend_requirements>` | |
#. :ref:`Quick start using Dockerfile <openvino_backend_quick_start_dockerfile>` | |
#. :ref:`Build from source <binstall_openvino_backend_from_source>` | |
#. :ref:`Performance tips <openvino_backend_performance_tips>` | |
#. :ref:`Limitations <openvino_backend_limitations>` | |
**Table of contents**: | |
- :ref:`Requirements <openvino_backend_requirements>` | |
- :ref:`Quick start using Dockerfile <openvino_backend_quick_start_dockerfile>` | |
- :ref:`Build from source <install_openvino_backend_from_source>` | |
- :ref:`Performance tips <openvino_backend_performance_tips>` | |
- :ref:`Limitations <openvino_backend_limitations>` |
$ sudo apt-get update -y | ||
$ sudo apt-get install python3 | ||
|
||
- Second, install prerequisites vLLM OpenVINO backend installation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Second, install prerequisites vLLM OpenVINO backend installation: | |
- Then, install the prerequisites for vLLM OpenVINO backend installation: |
$ pip install --upgrade pip | ||
$ pip install -r requirements-build.txt --extra-index-url https://download.pytorch.org/whl/cpu | ||
|
||
- Finally, install vLLM with OpenVINO backend: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Finally, install vLLM with OpenVINO backend: | |
- Finally, install vLLM OpenVINO backend: |
Performance tips | ||
----------------- | ||
|
||
vLLM OpenVINO backend uses the following environment variables to control behavior: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM OpenVINO backend uses the following environment variables to control behavior: | |
To control behavior in vLLM OpenVINO backend, use the following environment variables: |
|
||
- ``VLLM_OPENVINO_KVCACHE_SPACE`` to specify the KV Cache size (e.g, ``VLLM_OPENVINO_KVCACHE_SPACE=40`` means 40 GB space for KV cache), larger setting will allow vLLM running more requests in parallel. This parameter should be set based on the hardware configuration and memory management pattern of users. | ||
|
||
- ``VLLM_OPENVINO_CPU_KV_CACHE_PRECISION=u8`` to control KV cache precision. By default, FP16 / BF16 is used depending on platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ``VLLM_OPENVINO_CPU_KV_CACHE_PRECISION=u8`` to control KV cache precision. By default, FP16 / BF16 is used depending on platform. | |
- ``VLLM_OPENVINO_CPU_KV_CACHE_PRECISION=u8`` controls KV cache precision. By default, ``FP16`` / ``BF16`` is used, depending on platform. |
|
||
- ``VLLM_OPENVINO_CPU_KV_CACHE_PRECISION=u8`` to control KV cache precision. By default, FP16 / BF16 is used depending on platform. | ||
|
||
- ``VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON`` to enable U8 weights compression during model loading stage. By default, compression is turned off. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ``VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON`` to enable U8 weights compression during model loading stage. By default, compression is turned off. | |
- ``VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON`` enables U8 weights compression during a model loading stage. By default, the compression is turned off. |
|
||
To enable better TPOT / TTFT latency, you can use vLLM's chunked prefill feature (``--enable-chunked-prefill``). Based on the experiments, the recommended batch size is ``256`` (``--max-num-batched-tokens``) | ||
|
||
OpenVINO best known configuration is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenVINO best known configuration is: | |
Best known configuration in OpenVINO is: |
Install from source | ||
----------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Install from source | |
----------------- | |
Install from source | |
------------------- |
Installation with OpenVINO | ||
======================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Installation with OpenVINO | |
======================== | |
Installation with OpenVINO | |
========================== |
.. code-block:: console | ||
|
||
$ sudo apt-get update -y | ||
$ sudo apt-get install python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. code-block:: console | |
$ sudo apt-get update -y | |
$ sudo apt-get install python3 | |
.. code-block:: console | |
$ sudo apt-get update -y | |
$ sudo apt-get install python3 |
.. code-block:: console | ||
|
||
$ pip install --upgrade pip | ||
$ pip install -r requirements-build.txt --extra-index-url https://download.pytorch.org/whl/cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. code-block:: console | |
$ pip install --upgrade pip | |
$ pip install -r requirements-build.txt --extra-index-url https://download.pytorch.org/whl/cpu | |
.. code-block:: console | |
$ pip install --upgrade pip | |
$ pip install -r requirements-build.txt --extra-index-url https://download.pytorch.org/whl/cpu |
.. code-block:: console | ||
|
||
$ PIP_PRE=1 PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu https://storage.openvinotoolkit.org/simple/wheels/nightly/" VLLM_TARGET_DEVICE=openvino python install -v . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. code-block:: console | |
$ PIP_PRE=1 PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu https://storage.openvinotoolkit.org/simple/wheels/nightly/" VLLM_TARGET_DEVICE=openvino python install -v . | |
.. code-block:: console | |
$ PIP_PRE=1 PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu https://storage.openvinotoolkit.org/simple/wheels/nightly/" VLLM_TARGET_DEVICE=openvino python install -v . |
No description provided.