Skip to content
This repository was archived by the owner on Oct 11, 2024. It is now read-only.

Add NM benchmarking scripts & utils #14

Merged
merged 75 commits into from
Feb 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
1802429
Crete nm-benchmarks - wip
Feb 13, 2024
a62a1c1
move nm benchmark scripts to neural magic folder
Feb 14, 2024
76c0064
move scripts to scripts folder
Feb 14, 2024
77f7d03
add test config
Feb 14, 2024
e43ad27
add call_cmd from wand
Feb 14, 2024
c8cac08
import time.sh from wand
Feb 15, 2024
2cfc013
add time.sh from wand
Feb 15, 2024
171421d
Add benchmark runner scripts
Feb 15, 2024
88e80bc
ruff formatting
Feb 15, 2024
8cb4473
rename test_config -> benchmark_serving
Feb 15, 2024
a213440
add empty throughput config json
Feb 15, 2024
a745195
Add separate runners for benchmark througput and serving
Feb 15, 2024
75d933e
add common for common bench functions
Feb 15, 2024
d0884b4
Addd benchmark througput script and runner
Feb 15, 2024
082a3bd
Add empty common.py
Feb 15, 2024
57c8e7b
log benchmark environment
Feb 15, 2024
97e5a69
add 1/2 hour timeout
Feb 16, 2024
f41d0ba
add cuda device properties to bench env
Feb 16, 2024
3f6cf0d
wip
Feb 16, 2024
b49a7ac
some cleanups
Feb 16, 2024
f84384b
yapf
Feb 16, 2024
1cc82ad
add models of interest
Feb 16, 2024
603d456
update benchmark serving json
Feb 16, 2024
88c9e9f
Add benchmark trhoughput
Feb 18, 2024
44d3f72
Add synthetic dataset serving bench
Feb 18, 2024
acc1d04
add generate synthetic dataset (use sharegpt dataset)
Feb 18, 2024
20dd9f0
Move sample_requests and generate_synthetic_dataset to commons
Feb 19, 2024
9e0223b
Add prefill decode benchmarking script
Feb 19, 2024
51419b3
add prefill/decode benchmark json
Feb 19, 2024
f1bf28d
add empty benchmark runnder
Feb 19, 2024
033e7ce
add prefill_decode_throughput runner script
Feb 19, 2024
7327be1
fix ouptut dump
Feb 19, 2024
cf18c08
add prefill/decode benchmark to run_benchmarks
Feb 19, 2024
556e037
fix yapf
Feb 19, 2024
68c39a9
fix vscode warning
Feb 19, 2024
b6b6938
add dummy benchmark sparse serving
Feb 19, 2024
b79f0c0
add sparsity to benchmark serving
Feb 19, 2024
974eb15
Add sparsity to throughput scripts
Feb 19, 2024
8abba95
add prefill and decode case to thrpughput
Feb 19, 2024
c66c900
cleanup
Feb 19, 2024
f733fe9
fix json
Feb 19, 2024
4da6575
remove separate prefill/decode benchmarks
Feb 19, 2024
c810bcd
yapf
Feb 19, 2024
4f24272
fix throughput soarse json
Feb 19, 2024
6aa8fd1
format json files
Feb 19, 2024
cdff9e9
added example of dataset registry
Feb 19, 2024
59d1f14
fixed sharegpt
Feb 19, 2024
596081f
fixed sharegpt
Feb 19, 2024
fd25bcb
fix commons
Feb 20, 2024
bf833e5
cleanup
Feb 20, 2024
bb40116
add download datasets command
Feb 20, 2024
1ff3dc0
fix configs to remove dataset downloads
Feb 20, 2024
fa71ecc
cleanup
Feb 20, 2024
919bb2c
add num-prompts / request rate pair arg
Feb 20, 2024
ca7126d
update benchmark jsons
Feb 20, 2024
3acbebb
cleanup
Feb 20, 2024
0554d69
fix benchmark throughput
Feb 20, 2024
8c583a6
fixes
Feb 20, 2024
a9b732b
download model beforehand
Feb 20, 2024
7518d93
yapf
Feb 20, 2024
b32601a
add server warmup command
Feb 20, 2024
17c258b
add vllm engine warmpup
Feb 20, 2024
2491bd3
yapf
Feb 20, 2024
bb7752c
fix serving becnch
Feb 21, 2024
a86a0bf
fix benchmark throughput
Feb 21, 2024
fa634dd
yapf
Feb 21, 2024
fb98d2f
update readme
Feb 22, 2024
80c36f7
update note
Feb 22, 2024
c953aab
update time.sh
Feb 22, 2024
8fde41a
remove sparse version of the configs - to add in future
Feb 22, 2024
02cb4ab
fix dataset registry
Feb 22, 2024
75e6c7c
update readme
Feb 22, 2024
2aad328
appease ruff
Feb 22, 2024
1f59d64
fix strip in backend_request_func
Feb 22, 2024
dc64948
yapf
Feb 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added neuralmagic/__init__.py
Empty file.
64 changes: 64 additions & 0 deletions neuralmagic/benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Directory Structure:

- scripts/*.py - Benchmark scripts that perform the metric computation.

- configs/*.json - Config JSON files. These JSONs define what benchmark script to run and what combination of script parameters to use.

- *.py - Benchmark drivers. Given a config JSON, executes all the commands defined by the config JSON.

# Run Benchmark scripts

All `scripts/benchmark_*.py` files can be executed on their own.

Run `python -m neuralmagic/benchmarks/scripts/* --help` for script description and How-To run.

# Benchmarking drivers and Configs

All the benchmark driver *.py files, input a JSON config file and an output directory path.

As mentioned above, the config file defines what benchmark-script to run and what arguments to run it with.

The following is an example config JSON,

```
{
"description": "Benchmark vllm engine throughput - with dataset",
"models": [
"facebook/opt-125m",
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
],
"sparsity" : [],
"script_name": "benchmark_throughput",
"script_args": {
"dataset": [
"sharegpt",
"ultrachat"
],
"output-len": [
128
],
"num-prompts": [
1000
],
}
}
```
This config tells the benchmark driver to run benchmark_throughput script on all the listed models with all possible script-args combinations.
i.e. the config essentially translates to,

python -m neuralmagic.benchmarks.benchmark_throughput.py --model facebook/opt-125m --dataset sharegpt --output-len 128 --num-prompts 1000

python -m neuralmagic.benchmarks.benchmark_throughput.py --model facebook/opt-125m --dataset ultrachat --output-len 128 --num-prompts 1000

python -m neuralmagic.benchmarks.benchmark_throughput.py --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --dataset sharegpt --output-len 128 --num-prompts 1000

python -m neuralmagic.benchmarks.benchmark_throughput.py --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --dataset ultrachat --output-len 128 --num-prompts 1000

# Benchmarking with driver
```
python3 -m neuralmagic.benchmarks.run_benchmarks -i <path-to-config-file> -o <output-directory-path>
```

# About sparsity
The benchmark configs have a `sparsity` field. Populate this field with proper sparsity identifiers to inform vllm about model sparsity.
For the list of valid sparsity args, check `vllm/model_executor/layers/sparsity/*`
4 changes: 4 additions & 0 deletions neuralmagic/benchmarks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from neuralmagic.benchmarks.run_benchmark_serving import run_benchmark_serving_script
from neuralmagic.benchmarks.run_benchmark_throughput import run_benchmark_throughput_script

__all__ = [run_benchmark_serving_script, run_benchmark_throughput_script]
62 changes: 62 additions & 0 deletions neuralmagic/benchmarks/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
import itertools
import json

from argparse import Namespace
from pathlib import Path
from typing import NamedTuple, Iterable
# from neuralmagic.tools.call_cmd import call_cmd

from vllm.model_executor.weight_utils import prepare_hf_model_weights
from vllm.transformers_utils.tokenizer import get_tokenizer


def download_model(hf_model_id: str) -> None:
"""
Downloads a hugging face model to cache
"""
prepare_hf_model_weights(hf_model_id)
get_tokenizer(hf_model_id)


def script_args_to_cla(config: NamedTuple) -> Iterable[list[str]]:
#config is a NamedTuple constructed from some JSON in neuralmagic/benchmarks/configs

kv = vars(config.script_args)

keys = kv.keys()
arg_lists = kv.values()
assert all(map(lambda le: isinstance(le, list), arg_lists))

# Empty lists are arguments without any values (e.g. boolean args)
key_args = []
for k, v in zip(keys, arg_lists):
if len(v) == 0:
key_args.append(k)

key_args_cla = list(map(lambda k: f"--{k}", key_args))

# Remove empty lists from arg_lists and remove key args from keys
arg_lists = filter(lambda arg_list: len(arg_list) != 0, arg_lists)
keys = filter(lambda k: k not in key_args, keys)

for args in itertools.product(*arg_lists):
cla = key_args_cla
for name, value in zip(keys, args):
cla.extend([f"--{name}", f"{value}"])
yield cla


def benchmark_configs(config_file_path: Path) -> Iterable[NamedTuple]:
"""
Give a path to a config file in `neuralmagic/benchmarks/configs/*` return an Iterable of
(sub)configs in the file
"""
assert config_file_path.exists()

configs = None
with open(config_file_path, "r") as f:
configs = json.load(f, object_hook=lambda d: Namespace(**d))
assert configs is not None

for config in configs.configs:
yield config
56 changes: 56 additions & 0 deletions neuralmagic/benchmarks/configs/benchmark_serving.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{
"configs": [
{
"description": "Benchmark vllm serving",
"models": [
"facebook/opt-125m",
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"mistralai/Mistral-7B-Instruct-v0.2",
"NousResearch/Llama-2-7b-chat-hf"
],
"sparsity": [],
"script_name": "benchmark_serving",
"script_args": {
"nr-qps-pair_": [
"50,0.5",
"100,1",
"200,2",
"500,5"
],
"best-of": [
1
],
"dataset": [
"sharegpt"
]
}
},
{
"description": "Benchmark vllm serving",
"models": [
"facebook/opt-125m",
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"mistralai/Mistral-7B-Instruct-v0.2",
"NousResearch/Llama-2-7b-chat-hf"
],
"sparsity": [],
"script_name": "benchmark_serving",
"script_args": {
"num-prompts_": [
50,
100
],
"request-rate_": [
0.5,
"inf"
],
"best-of": [
1
],
"dataset": [
"sharegpt"
]
}
}
]
}
124 changes: 124 additions & 0 deletions neuralmagic/benchmarks/configs/benchmark_throughput.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
{
"configs": [
{
"description": "Benchmark vllm engine throughput - with dataset",
"models": [
"facebook/opt-125m",
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"mistralai/Mistral-7B-Instruct-v0.2",
"NousResearch/Llama-2-7b-chat-hf"
],
"script_name": "benchmark_throughput",
"script_args": {
"backend": [
"vllm"
],
"dataset": [
"sharegpt"
],
"output-len": [
128
],
"tensor-parallel-size": [
1
],
"n": [
1
],
"num-prompts": [
1000
],
"seed": [
0
],
"dtype": [
"auto"
]
}
},
{
"description": "Benchmark vllm engine prefill throughput - synthetic",
"models": [
"facebook/opt-125m",
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"mistralai/Mistral-7B-Instruct-v0.2",
"NousResearch/Llama-2-7b-chat-hf"
],
"script_name": "benchmark_throughput",
"script_args": {
"backend": [
"vllm"
],
"input-len": [
1,
16,
32,
64,
128,
256,
512,
1024
],
"output-len": [
1
],
"tensor-parallel-size": [
1
],
"n": [
1
],
"num-prompts": [
1
],
"seed": [
0
],
"dtype": [
"auto"
]
}
},
{
"description": "Benchmark vllm engine decode throughput - synthetic",
"models": [
"facebook/opt-125m",
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"mistralai/Mistral-7B-Instruct-v0.2",
"NousResearch/Llama-2-7b-chat-hf"
],
"script_name": "benchmark_throughput",
"script_args": {
"backend": [
"vllm"
],
"input-len": [
2
],
"output-len": [
128
],
"tensor-parallel-size": [
1
],
"n": [
1
],
"num-prompts": [
1,
4,
8,
16,
32,
64
],
"seed": [
0
],
"dtype": [
"auto"
]
}
}
]
}
Loading
Loading