Skip to content

Latest commit

 

History

History
57 lines (43 loc) · 870 Bytes

README.md

File metadata and controls

57 lines (43 loc) · 870 Bytes

Benchmarking with TGI

Pre-requites

Enable Docker as a non-root user (mandatory for Lambda Labs instances).

sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

Installation

...

Usage

Launch TGI server

hf_token=...
model=TinyLlama/TinyLlama-1.1B-Chat-v1.0

With GPU support

docker run \
    --rm \
    --name tgi \
    --gpus all \
    --shm-size 64g \
    -e HF_TOKEN=hf_token \
    -p 8080:80 \
    -v ./server/models:/data \
    ghcr.io/huggingface/text-generation-inference:2.1.1 \
    --model-id $model

With CPU-only support

docker run \
    --rm \
    --name tgi \
    --shm-size 64g \
    -e HF_TOKEN=hf_token \
    -p 8080:80 \
    -v ./server/models:/data \
    ghcr.io/huggingface/text-generation-inference:2.1.1 \
    --model-id $model

Launch client

...