[Feature] Support LoRA path renaming and add LoRA serving benchmarks #1433

Ying1123 · 2024-09-15T19:20:05Z

This PR supports LoRA path renaming. See example below:

# launch server
python -m sglang.launch_server --model mistralai/Mistral-7B-Instruct-v0.3 --lora-paths /home/ying/test_lora lora1=/home/ying/test_lora_1 lora2=/home/ying/test_lora_2 --disable-radix --disable-cuda-graph --max-loras-per-batch 4

# send requests
# lora_path[i] specifies the LoRA used for text[i], so make sure they have the same length
# use None to specify base-only prompt, e.x. "lora_path": [None, "/home/ying/test_lora"]
import json
import requests

url = "http://127.0.0.1:30000"
json_data = {
        "text": ["prompt 1", "prompt 2", "prompt 3", "prompt 4", "prompt 5", "prompt 6", "prompt7"],
        "sampling_params": {"max_new_tokens": 32},
        "lora_path": ["/home/ying/test_lora", "lora1", "lora2", "lora1", "lora2", None, None],
}
response = requests.post(
        url + "/generate",
        json=json_data,
)
print(json.dumps(response.json()))

What has been done:

Initial LoRA support [Feature] Initial support for multi-LoRA serving #1307
This PR gives initial multi-LoRA serving support. Currently, it supports LoRA on attention (qkvo) and mlp (gate, up, down) linear layers. It supports dynamic loading and offloading, but it does not support unified memory. The memory pool for LoRA adapters is pre-allocated. Please use a smaller --mem-frac to launch server with larger --max-loras-per-batch.
Misc: path renaming (this PR)

You can expect the items below in the follow-up PRs.

References:
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Punica: Multi-Tenant LoRA Serving

Ying1123 added 2 commits September 15, 2024 18:10

support lora path renaming

ebd8343

benchmark lora serving

633030b

Ying1123 changed the title ~~Add LoRA serving benchmark~~ Support LoRA path renaming and add LoRA serving benchmarks Sep 15, 2024

Ying1123 changed the title ~~Support LoRA path renaming and add LoRA serving benchmarks~~ [Feature] Support LoRA path renaming and add LoRA serving benchmarks Sep 15, 2024

Ying1123 merged commit 3796339 into main Sep 15, 2024
11 checks passed

Ying1123 deleted the lora_bench branch September 15, 2024 19:46

This was referenced Sep 15, 2024

[Feature] Initial support for multi-LoRA serving #1307

Merged

Development Roadmap (2024 Q3) #634

Closed

Ying1123 mentioned this pull request Oct 20, 2024

[LoRA, Performance] Add gemm expand triton kernel for multi-LoRA #1728

Closed

15 tasks

Fridge003 mentioned this pull request Jan 14, 2025

[Feature] Support dynamic loading and unloading of Lora adapters #2891

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support LoRA path renaming and add LoRA serving benchmarks #1433

[Feature] Support LoRA path renaming and add LoRA serving benchmarks #1433

Ying1123 commented Sep 15, 2024 •

edited

Loading

[Feature] Support LoRA path renaming and add LoRA serving benchmarks #1433

[Feature] Support LoRA path renaming and add LoRA serving benchmarks #1433

Conversation

Ying1123 commented Sep 15, 2024 • edited Loading

Ying1123 commented Sep 15, 2024 •

edited

Loading