Skip to content

Commit

Permalink
Merge pull request #1254 from pytorch/perf_changes
Browse files Browse the repository at this point in the history
feat(//tools/perf): Refactor perf_run.py, add fx2trt backend support, usage via CLI arguments
  • Loading branch information
peri044 authored Sep 8, 2022
2 parents 7142c82 + 77543a0 commit 1efe4b1
Show file tree
Hide file tree
Showing 7 changed files with 633 additions and 131 deletions.
92 changes: 70 additions & 22 deletions tools/perf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ This is a comprehensive Python benchmark suite to run perf runs using different

1. Torch
2. Torch-TensorRT
3. TensorRT
3. FX-TRT
4. TensorRT


Note: Please note that for ONNX models, user can convert the ONNX model to TensorRT serialized engine and then use this package.

Expand All @@ -25,21 +27,35 @@ Benchmark scripts depends on following Python packages in addition to requiremen
│ └── vgg16.yml
├── models
├── perf_run.py
├── hub.py
├── custom_models.py
├── requirements.txt
├── benchmark.sh
└── README.md
```

Please save your configuration files at config directory. Similarly, place your model files at models path.


* `config` - Directory which contains sample yaml configuration files for VGG network.
* `models` - Model directory
* `perf_run.py` - Performance benchmarking script which supports torch, torch_tensorrt, fx2trt, tensorrt backends
* `hub.py` - Script to download torchscript models for VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT
* `custom_models.py` - Script which includes custom models other than torchvision and timm (eg: HF BERT)
* `utils.py` - utility functions script
* `benchmark.sh` - This is used for internal performance testing of VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT.

## Usage

There are two ways you can run a performance benchmark.

### Using YAML config files

To run the benchmark for a given configuration file:

```
```python
python perf_run.py --config=config/vgg16.yml
```

## Configuration

There are two sample configuration files added.

* vgg16.yml demonstrates a configuration with all the supported backends (Torch, Torch-TensorRT, TensorRT)
Expand All @@ -48,23 +64,17 @@ There are two sample configuration files added.

### Supported fields

| Name | Supported Values | Description |
| --- | --- | --- |
| backend | all, torch, torch_tensorrt, tensorrt | Supported backends for inference. |
| input | - | Input binding names. Expected to list shapes of each input bindings |
| model | - | Configure the model filename and name |
| filename | - | Model file name to load from disk. |
| name | - | Model name |
| runtime | - | Runtime configurations |
| device | 0 | Target device ID to run inference. Range depends on available GPUs |
| precision | fp32, fp16 or half, int8 | Target precision to run inference. int8 cannot be used with 'all' backend |
| calibration_cache | - | Calibration cache file expected for torch_tensorrt runtime in int8 precision |

Note:
1. Please note that torch runtime perf is not supported for int8 yet.
2. Torchscript module filename should end with .jit.pt otherwise it will be treated as a TensorRT engine.


| Name | Supported Values | Description |
| ----------------- | ------------------------------------ | ------------------------------------------------------------ |
| backend | all, torch, torch_tensorrt, tensorrt | Supported backends for inference. |
| input | - | Input binding names. Expected to list shapes of each input bindings |
| model | - | Configure the model filename and name |
| filename | - | Model file name to load from disk. |
| name | - | Model name |
| runtime | - | Runtime configurations |
| device | 0 | Target device ID to run inference. Range depends on available GPUs |
| precision | fp32, fp16 or half, int8 | Target precision to run inference. int8 cannot be used with 'all' backend |
| calibration_cache | - | Calibration cache file expected for torch_tensorrt runtime in int8 precision |

Additional sample use case:

Expand All @@ -88,3 +98,41 @@ runtime:
- fp32
- fp16
```

Note:

1. Please note that measuring INT8 performance is only supported via a `calibration cache` file or QAT mode for `torch_tensorrt` backend.
2. TensorRT engine filename should end with `.plan` otherwise it will be treated as Torchscript module.

### Using CompileSpec options via CLI

Here are the list of `CompileSpec` options that can be provided directly to compile the pytorch module

* `--backends` : Comma separated string of backends. Eg: torch,torch_tensorrt, tensorrt or fx2trt
* `--model` : Name of the model file (Can be a torchscript module or a tensorrt engine (ending in `.plan` extension)). If the backend is `fx2trt`, the input should be a Pytorch module (instead of a torchscript module) and the options for model are (`vgg16` | `resnet50` | `efficientnet_b0`)
* `--inputs` : List of input shapes & dtypes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT
* `--batch_size` : Batch size
* `--precision` : Comma separated list of precisions to build TensorRT engine Eg: fp32,fp16
* `--device` : Device ID
* `--truncate` : Truncate long and double weights in the network in Torch-TensorRT
* `--is_trt_engine` : Boolean flag to be enabled if the model file provided is a TensorRT engine.
* `--report` : Path of the output file where performance summary is written.

Eg:

```
python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
--precision fp32,fp16 --inputs="(1, 3, 224, 224)@fp32" \
--batch_size 1 \
--backends torch,torch_tensorrt,tensorrt \
--report "vgg_perf_bs1.txt"
```

### Example models

This tool benchmarks any pytorch model or torchscript module. As an example, we provide VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT models in `hub.py` that we internally test for performance.
The torchscript modules for these models can be generated by running
```
python hub.py
```
You can refer to `benchmark.sh` on how we run/benchmark these models.
64 changes: 64 additions & 0 deletions tools/perf/benchmark.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/bin/bash

MODELS_DIR="models"

# Download the Torchscript models
python hub.py

batch_sizes=(1 2 4 8 16 32 64 128 256)

#Benchmark VGG16 model
echo "Benchmarking VGG16 model"
for bs in ${batch_sizes[@]}
do
python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
--batch_size ${bs} \
--backends torch,torch_tensorrt,tensorrt \
--report "vgg_perf_bs${bs}.txt"
done

# Benchmark Resnet50 model
echo "Benchmarking Resnet50 model"
for bs in ${batch_sizes[@]}
do
python perf_run.py --model ${MODELS_DIR}/resnet50_scripted.jit.pt \
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
--batch_size ${bs} \
--backends torch,torch_tensorrt,tensorrt \
--report "rn50_perf_bs${bs}.txt"
done

# Benchmark VIT model
echo "Benchmarking VIT model"
for bs in ${batch_sizes[@]}
do
python perf_run.py --model ${MODELS_DIR}/vit_scripted.jit.pt \
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
--batch_size ${bs} \
--backends torch,torch_tensorrt,tensorrt \
--report "vit_perf_bs${bs}.txt"
done

# Benchmark EfficientNet-B0 model
echo "Benchmarking EfficientNet-B0 model"
for bs in ${batch_sizes[@]}
do
python perf_run.py --model ${MODELS_DIR}/efficientnet_b0_scripted.jit.pt \
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
--batch_size ${bs} \
--backends torch,torch_tensorrt,tensorrt \
--report "eff_b0_perf_bs${bs}.txt"
done

# Benchmark BERT model
echo "Benchmarking Huggingface BERT base model"
for bs in ${batch_sizes[@]}
do
python perf_run.py --model ${MODELS_DIR}/bert_base_uncased_traced.jit.pt \
--precision fp32 --inputs="(${bs}, 128)@int32;(${bs}, 128)@int32" \
--batch_size ${bs} \
--backends torch,torch_tensorrt \
--truncate \
--report "bert_base_perf_bs${bs}.txt"
done
3 changes: 2 additions & 1 deletion tools/perf/config/vgg16.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ input:
- 224
- 224
num_inputs: 1
batch_size: 1
model:
filename: models/vgg16_traced.jit.pt
filename: models/vgg16_scripted.jit.pt
name: vgg16
runtime:
device: 0
Expand Down
30 changes: 30 additions & 0 deletions tools/perf/custom_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import torch
import torch.nn as nn
from transformers import BertModel, BertTokenizer, BertConfig
import torch.nn.functional as F


def BertModule():
model_name = "bert-base-uncased"
enc = BertTokenizer.from_pretrained(model_name)
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = enc.tokenize(text)
masked_index = 8
tokenized_text[masked_index] = "[MASK]"
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
config = BertConfig(
vocab_size_or_config_json_file=32000,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
torchscript=True,
)
model = BertModel(config)
model.eval()
model = BertModel.from_pretrained(model_name, torchscript=True)
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
return traced_model
132 changes: 132 additions & 0 deletions tools/perf/hub.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import timm
from transformers import BertModel, BertTokenizer, BertConfig
import os
import json
import custom_models as cm

torch.hub._validate_not_a_forked_repo = lambda a, b, c: True

torch_version = torch.__version__

# Detect case of no GPU before deserialization of models on GPU
if not torch.cuda.is_available():
raise Exception(
"No GPU found. Please check if installed torch version is compatible with CUDA version"
)

# Downloads all model files again if manifest file is not present
MANIFEST_FILE = "model_manifest.json"

BENCHMARK_MODELS = {
"vgg16": {"model": models.vgg16(weights=None), "path": "script"},
"resnet50": {"model": models.resnet50(weights=None), "path": "script"},
"efficientnet_b0": {
"model": timm.create_model("efficientnet_b0", pretrained=True),
"path": "script",
},
"vit": {
"model": timm.create_model("vit_base_patch16_224", pretrained=True),
"path": "script",
},
"bert_base_uncased": {"model": cm.BertModule(), "path": "trace"},
}


def get(n, m, manifest):
print("Downloading {}".format(n))
traced_filename = "models/" + n + "_traced.jit.pt"
script_filename = "models/" + n + "_scripted.jit.pt"
x = torch.ones((1, 3, 300, 300)).cuda()
if n == "bert-base-uncased":
traced_model = m["model"]
torch.jit.save(traced_model, traced_filename)
manifest.update({n: [traced_filename]})
else:
m["model"] = m["model"].eval().cuda()
if m["path"] == "both" or m["path"] == "trace":
trace_model = torch.jit.trace(m["model"], [x])
torch.jit.save(trace_model, traced_filename)
manifest.update({n: [traced_filename]})
if m["path"] == "both" or m["path"] == "script":
script_model = torch.jit.script(m["model"])
torch.jit.save(script_model, script_filename)
if n in manifest.keys():
files = list(manifest[n]) if type(manifest[n]) != list else manifest[n]
files.append(script_filename)
manifest.update({n: files})
else:
manifest.update({n: [script_filename]})
return manifest


def download_models(version_matches, manifest):
# Download all models if torch version is different than model version
if not version_matches:
for n, m in BENCHMARK_MODELS.items():
manifest = get(n, m, manifest)
else:
for n, m in BENCHMARK_MODELS.items():
scripted_filename = "models/" + n + "_scripted.jit.pt"
traced_filename = "models/" + n + "_traced.jit.pt"
# Check if model file exists on disk
if (
(
m["path"] == "both"
and os.path.exists(scripted_filename)
and os.path.exists(traced_filename)
)
or (m["path"] == "script" and os.path.exists(scripted_filename))
or (m["path"] == "trace" and os.path.exists(traced_filename))
):
print("Skipping {} ".format(n))
continue
manifest = get(n, m, manifest)


def main():
manifest = None
version_matches = False
manifest_exists = False

# Check if Manifest file exists or is empty
if not os.path.exists(MANIFEST_FILE) or os.stat(MANIFEST_FILE).st_size == 0:
manifest = {"version": torch_version}

# Creating an empty manifest file for overwriting post setup
os.system("touch {}".format(MANIFEST_FILE))
else:
manifest_exists = True

# Load manifest if already exists
with open(MANIFEST_FILE, "r") as f:
manifest = json.load(f)
if manifest["version"] == torch_version:
version_matches = True
else:
print(
"Torch version: {} mismatches \
with manifest's version: {}. Re-downloading \
all models".format(
torch_version, manifest["version"]
)
)

# Overwrite the manifest version as current torch version
manifest["version"] = torch_version

download_models(version_matches, manifest)

# Write updated manifest file to disk
with open(MANIFEST_FILE, "r+") as f:
data = f.read()
f.seek(0)
record = json.dumps(manifest)
f.write(record)
f.truncate()


main()
Loading

0 comments on commit 1efe4b1

Please sign in to comment.