srt-inference-perf
is a powerful tool designed to measure the performance of any OpenAI-compatible completions endpoint, including vLLM, Hugging Face TGI, LLama.CPP Server, and more. This app reads user-defined questions from a JSON or YAML file, queries multiple endpoints, and generates performance metrics for comparison. The primary objective is to help AI teams tune API configuration parameters for optimal performance.
- Reads questions from a JSON or YAML file
- Queries multiple OpenAI-compatible completions endpoints
- Measures response time, error rate, and other relevant metrics
- Supports parallel testing across multiple endpoints
- Generates a performance report
-
Clone the repository:
git clone https://github.com/SolidRusT/srt-inference-perf.git cd srt-inference-perf
-
Create and activate a virtual environment (optional but recommended):
python3 -m venv venv source venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Copy the example configuration file:
cp config-example.yaml config.yaml
-
Edit
config.yaml
to suit your needs.
-
Run the performance tester with your configuration file:
python main.py --config config.yaml
-
Display the results in a human-readable format:
python main.py --config config.yaml --human
-
Display the results in JSON format:
python main.py --config config.yaml --json
-
Show usage instructions:
python main.py --usage
This project is licensed under the MIT License. See the LICENSE file for details.
Suparious (suparious@solidrust.net)
This project is developed by SolidRusT Networks.