Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openvino 2024.2.0 #35

Closed
wants to merge 37 commits into from
Closed

Openvino 2024.2.0 #35

wants to merge 37 commits into from

Conversation

ilya-lavrenov
Copy link
Owner

No description provided.

Installation with OpenVINO
========================

vLLM powered by OpenVINO supports all LLM models from [vLLM supported models list](../dev/models/supported_models.rst) and can perform optimal model serving on all x86-64 CPUs with, at least, AVX2 support. OpenVINO vLLM backend supports the following advanced vLLM features:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vLLM powered by OpenVINO supports all LLM models from [vLLM supported models list](../dev/models/supported_models.rst) and can perform optimal model serving on all x86-64 CPUs with, at least, AVX2 support. OpenVINO vLLM backend supports the following advanced vLLM features:
vLLM powered by OpenVINO supports all LLM models from
:doc:`vLLM supported models list <../models/supported_models>` and can perform optimal model serving on all x86-64 CPUs with at least AVX2 support. OpenVINO vLLM backend supports the following advanced vLLM features:

Comment on lines +11 to +17
Table of contents:

#. :ref:`Requirements <openvino_backend_requirements>`
#. :ref:`Quick start using Dockerfile <openvino_backend_quick_start_dockerfile>`
#. :ref:`Build from source <binstall_openvino_backend_from_source>`
#. :ref:`Performance tips <openvino_backend_performance_tips>`
#. :ref:`Limitations <openvino_backend_limitations>`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Table of contents:
#. :ref:`Requirements <openvino_backend_requirements>`
#. :ref:`Quick start using Dockerfile <openvino_backend_quick_start_dockerfile>`
#. :ref:`Build from source <binstall_openvino_backend_from_source>`
#. :ref:`Performance tips <openvino_backend_performance_tips>`
#. :ref:`Limitations <openvino_backend_limitations>`
**Table of contents**:
- :ref:`Requirements <openvino_backend_requirements>`
- :ref:`Quick start using Dockerfile <openvino_backend_quick_start_dockerfile>`
- :ref:`Build from source <install_openvino_backend_from_source>`
- :ref:`Performance tips <openvino_backend_performance_tips>`
- :ref:`Limitations <openvino_backend_limitations>`

$ sudo apt-get update -y
$ sudo apt-get install python3

- Second, install prerequisites vLLM OpenVINO backend installation:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Second, install prerequisites vLLM OpenVINO backend installation:
- Then, install the prerequisites for vLLM OpenVINO backend installation:

$ pip install --upgrade pip
$ pip install -r requirements-build.txt --extra-index-url https://download.pytorch.org/whl/cpu

- Finally, install vLLM with OpenVINO backend:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Finally, install vLLM with OpenVINO backend:
- Finally, install vLLM OpenVINO backend:

Performance tips
-----------------

vLLM OpenVINO backend uses the following environment variables to control behavior:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vLLM OpenVINO backend uses the following environment variables to control behavior:
To control behavior in vLLM OpenVINO backend, use the following environment variables:


- ``VLLM_OPENVINO_KVCACHE_SPACE`` to specify the KV Cache size (e.g, ``VLLM_OPENVINO_KVCACHE_SPACE=40`` means 40 GB space for KV cache), larger setting will allow vLLM running more requests in parallel. This parameter should be set based on the hardware configuration and memory management pattern of users.

- ``VLLM_OPENVINO_CPU_KV_CACHE_PRECISION=u8`` to control KV cache precision. By default, FP16 / BF16 is used depending on platform.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ``VLLM_OPENVINO_CPU_KV_CACHE_PRECISION=u8`` to control KV cache precision. By default, FP16 / BF16 is used depending on platform.
- ``VLLM_OPENVINO_CPU_KV_CACHE_PRECISION=u8`` controls KV cache precision. By default, ``FP16`` / ``BF16`` is used, depending on platform.


- ``VLLM_OPENVINO_CPU_KV_CACHE_PRECISION=u8`` to control KV cache precision. By default, FP16 / BF16 is used depending on platform.

- ``VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON`` to enable U8 weights compression during model loading stage. By default, compression is turned off.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ``VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON`` to enable U8 weights compression during model loading stage. By default, compression is turned off.
- ``VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON`` enables U8 weights compression during a model loading stage. By default, the compression is turned off.


To enable better TPOT / TTFT latency, you can use vLLM's chunked prefill feature (``--enable-chunked-prefill``). Based on the experiments, the recommended batch size is ``256`` (``--max-num-batched-tokens``)

OpenVINO best known configuration is:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenVINO best known configuration is:
Best known configuration in OpenVINO is:

Comment on lines +39 to +40
Install from source
-----------------

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Install from source
-----------------
Install from source
-------------------

Comment on lines +3 to +4
Installation with OpenVINO
========================

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Installation with OpenVINO
========================
Installation with OpenVINO
==========================

Comment on lines +44 to +47
.. code-block:: console

$ sudo apt-get update -y
$ sudo apt-get install python3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. code-block:: console
$ sudo apt-get update -y
$ sudo apt-get install python3
.. code-block:: console
$ sudo apt-get update -y
$ sudo apt-get install python3

Comment on lines +51 to +54
.. code-block:: console

$ pip install --upgrade pip
$ pip install -r requirements-build.txt --extra-index-url https://download.pytorch.org/whl/cpu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. code-block:: console
$ pip install --upgrade pip
$ pip install -r requirements-build.txt --extra-index-url https://download.pytorch.org/whl/cpu
.. code-block:: console
$ pip install --upgrade pip
$ pip install -r requirements-build.txt --extra-index-url https://download.pytorch.org/whl/cpu

Comment on lines +58 to +60
.. code-block:: console

$ PIP_PRE=1 PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu https://storage.openvinotoolkit.org/simple/wheels/nightly/" VLLM_TARGET_DEVICE=openvino python install -v .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. code-block:: console
$ PIP_PRE=1 PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu https://storage.openvinotoolkit.org/simple/wheels/nightly/" VLLM_TARGET_DEVICE=openvino python install -v .
.. code-block:: console
$ PIP_PRE=1 PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu https://storage.openvinotoolkit.org/simple/wheels/nightly/" VLLM_TARGET_DEVICE=openvino python install -v .

@ilya-lavrenov ilya-lavrenov deleted the openvino-2024.2.0 branch July 31, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants