-
Notifications
You must be signed in to change notification settings - Fork 55
nvFuser Python Benchmarks
The python benchmarks use pytest-benchmark
and torch.profiler
.
To benchmark any target function, use run_benchmark
(python_benchmarks/core.py
):
run_benchmark(benchmark, target_function, function_inputs, rounds=10, warmup_rounds=1)
Arguments:
- benchmark: pytest-benchmark fixture passed to every function intended to be run as a benchmark by pytest.
- target_function: Function to benchmark
- function_inputs: List of inputs to the target_function
- rounds: Number of rounds the target_function is run (Default = 10).
- warmup_rounds: Number of warmup rounds the target_function is run before benchmarking (Default = 1).
Example:
# Parametrize over any number of arguments (e.g., input sizes, dtypes)
@pytest.mark.parametrize("param1", ...)
@pytest.mark.parametrize("param2", ...)
def test_example_benchmark(````
benchmark, param1, param2, ...
):
# Setup function inputs
run_benchmark(benchmark, target_function, function_inputs)
The benchmark name should start with test_
to be automatically discovered by pytest
.
-
Running a benchmark file:
pytest [options] <benchmark-file>
. -
Running the complete benchmark suite:
pytest [options] python_benchmarks/
-
Sharding: Pytest is memory-intensive resulting in CPU OOMs when running a large number of tests. Sharding is recommended when running the complete benchmarking suite. We use
pytest-shard
in our CI. To execute a specific shard withn
total shards:pytest --shard-id=i --num-shards=n [options]
wherei = {0..n-1}
.
Pytest/Pytest-benchmark options:
- Filtering benchmarks:
-k <filter>
- Saving benchmarks:
--benchmark-save=NAME
,--benchmark-autosave
,--benchmark-json=PATH
- Debugging:
--benchmark-verbose
.
Custom command-line options:
- Disable output validation:
--disable-validation
Skips the output validation in the nvFuser benchmarks. - Disable benchmarking:
--disable-benchmarking
Skips the nvFuser benchmarking, useful for only testing correctness of fusion definitions without benchmarking the fusions. - Run eager mode benchmarks:
--benchmark-eager
- Run torch.compile mode benchmarks:
--benchmark-torchcompile
- Pytest: https://pytest-benchmark.readthedocs.io/en/latest/
- Pytest-benchmarks: https://pytest-benchmark.readthedocs.io/en/latest/index.html
- Pytest-shard: https://pypi.org/project/pytest-shard/