This project provides code for performing inference with BiRefNet using TensorRT. The aim is to accelerate the inference process by leveraging the high-performance capabilities of TensorRT.
Method | Pytorch | ONNX | Tensorrt |
---|---|---|---|
inference time | 0.71s | 5.32s | 0.17s |
Method | Pytorch | ONNX | Tensorrt |
---|---|---|---|
inference time | 0.15s | 4.43s | 0.11s |
Note:
- Both the PyTorch and ONNX models are from the official BiRefNet GitHub.
- The TensorRT model was converted using Convert-ONNX-Model-to-TensorRT-Engine.
- All tests were conducted on a Win10 system with an RTX 4080 Super.
- Refer to model_compare.py for the conversion code.
- Efficient inference with BiRefNet using TensorRT
- foreground estimate
- colab example
- Performance comparison between PyTorch, ONNX, and TensorRT inference
- Inference using Docker for an isolated and reproducible environment
- NVIDIA GPU with CUDA(>=11.X) and Cudnn(>=8.X)
- Python 3.9
pip install -r requirements.txt
First, download onnx model from Google Drive
second, convert your ONNX model to a TensorRT engine using the provided conversion script:
from utils import convert_onnx_to_engine
onnx_file_path = "birefnet.onnx"
engine_file_path = "engine.trt"
convert_onnx_to_engine(onnx_file_path, engine_file_path)
Now, you can run inference using the TensorRT engine with the following command:
python .\infer.py --image-path image_path --output-path result.png --output-alpha-path result_alpha.png --engine-path .\engine.trt
python .\infer.py --image-path image_dir --output-path output_dir --output-alpha-path alpha_dir --engine-path .\engine.trt --mode m
Contributions are welcome! Please feel free to submit a Pull Request or open an Issue if you have any suggestions or find bugs.