Skip to content

Commit a121ab4

Browse files
ElefHeadjimpang
authored and
jimpang
committed
multi-lora documentation fix (vllm-project#3064)
1 parent 7bd5b89 commit a121ab4

File tree

1 file changed

+13
-1
lines changed

1 file changed

+13
-1
lines changed

docs/source/models/lora.rst

+13-1
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ LoRA adapted models can also be served with the Open-AI compatible vLLM server.
5858

5959
.. code-block:: bash
6060
61-
python -m vllm.entrypoints.api_server \
61+
python -m vllm.entrypoints.openai.api_server \
6262
--model meta-llama/Llama-2-7b-hf \
6363
--enable-lora \
6464
--lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
@@ -89,3 +89,15 @@ with its base model:
8989
Requests can specify the LoRA adapter as if it were any other model via the ``model`` request parameter. The requests will be
9090
processed according to the server-wide LoRA configuration (i.e. in parallel with base model requests, and potentially other
9191
LoRA adapter requests if they were provided and ``max_loras`` is set high enough).
92+
93+
The following is an example request
94+
95+
.. code-block::bash
96+
curl http://localhost:8000/v1/completions \
97+
-H "Content-Type: application/json" \
98+
-d '{
99+
"model": "sql-lora",
100+
"prompt": "San Francisco is a",
101+
"max_tokens": 7,
102+
"temperature": 0
103+
}' | jq

0 commit comments

Comments
 (0)