Deploying and Serving LLMs with BigDL-LLM

BigDL-LLM is a library for running LLM (large language model) on Intel XPU (from Laptop to GPU to Cloud) using INT4 with very low latency (for any PyTorch model).

The integration with BigDL-LLM currently only supports running on Intel CPU.

Setup

Please follow setup.md to setup the environment first. Additional, you will need to install bigdl dependencies as below.

pip install .[bigdl-cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/

Configure Serving Parameters

Please follow the serving document for configuring the parameters. In the configuration file, you need to set bigdl and load_in_4bit to true. Example configuration files for enalbing bigdl-llm are availabe [here].(../inference/models/bigdl)

  bigdl: true
  config:
    load_in_4bit: true

Deploy and Test

Please follow the serving document for deploying and testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serve_bigdl.md

serve_bigdl.md

Deploying and Serving LLMs with BigDL-LLM

Setup

Configure Serving Parameters

Deploy and Test

Files

serve_bigdl.md

Latest commit

History

serve_bigdl.md

File metadata and controls

Deploying and Serving LLMs with BigDL-LLM

Setup

Configure Serving Parameters

Deploy and Test