This C++ application enables machine learning tasks, such as object detection and classification, using the Nvidia Triton Server. Triton manages multiple framework backends for streamlined model deployment.
- Object Detection: YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLO-NAS
- Instance Segmentation: YOLOv5, YOLOv8, YOLO11
- Classification: Torchvision API-based models, Tensorflow-Keras API(saved_model export)
To build the client libraries, refer to the official Triton Inference Server client libraries.
Ensure the following dependencies are installed:
- Nvidia Triton Inference Server (from NGC):
docker pull nvcr.io/nvidia/tritonserver:24.09-py3
- Triton client libraries: Tested on Release r24.09
- Protobuf and gRPC++: Versions compatible with Triton
- RapidJSON:
apt install rapidjson-dev
- libcurl:
apt install libcurl4-openssl-dev
- OpenCV 4: Tested version: 4.7.0
To build and compile the application, follow these steps:
-
Set the environment variable
TritonClientBuild_DIR
or update theCMakeLists.txt
with the path to your installed Triton client libraries. -
Create a build directory:
mkdir build
-
Navigate to the build directory:
cd build
-
Run CMake to configure the build:
cmake -DCMAKE_BUILD_TYPE=Release ..
Optional flags:
-DSHOW_FRAME
: Enable to display processed frames after inference.-DWRITE_FRAME
: Enable to write processed frames to disk.
-
Build the application:
cmake --build .
-
Other tasks like Pose Estimation, Optical Flow, LLM are in TODO list.
Ensure the model export versions match those supported by your Triton release. Check Triton releases here.
To deploy models, set up a model repository following the Triton Model Repository schema. The config.pbtxt
file is optional unless you're using the OpenVino backend, implementing an Ensemble pipeline, or passing custom inference parameters.
<model_repository>/
<model_name>/
config.pbtxt
<model_version>/
<model_binary>
To start Triton Server, run:
#!/bin/bash
docker run --gpus=1 --rm \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v /full/path/to/model_repository:/models \
nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver \
--model-repository=/models
Omit the --gpus
flag if using the CPU version.
For more examples, check the Triton Inference Server tutorials.
Use the following command to perform inference on a video or image:
./computer-vision-triton-cpp-client \
--source=/path/to/source.format \
--task_type=<task_type> \
--model_type=<model_type> \
--model=<model_name_folder_on_triton> \
--labelsFile=/path/to/labels/coco.names \
--protocol=<http or grpc> \
--serverAddress=<triton-ip> \
--port=<8000 for http, 8001 for grpc>
If the model has dynamic input sizes, use:
--input_sizes="c w h"
/path/to/source.format
: Path to the input video or image file.<task_type>
: Type of computer vision task (detection
,classification
, orinstance_segmentation
).<model_type>
: Model type (e.g.,yolov5
,yolov8
,yolo11
,yoloseg
,torchvision-classifier
,tensorflow-classifier
).<model_name_folder_on_triton>
: Name of the model folder on the Triton server./path/to/labels/coco.names
: Path to the label file (e.g., COCO labels).<http or grpc>
: Communication protocol (http
orgrpc
).<triton-ip>
: IP address of your Triton server.<8000 for http, 8001 for grpc>
: Port number.
To view all available parameters, run:
./computer-vision-triton-cpp-client --help
docker build --rm -t computer-vision-triton-cpp-client .
docker run --rm \
-v /path/to/host/data:/app/data \
computer-vision-triton-cpp-client \
--network host \
--source=<pat_to_source_on_container>\
--task_type=<task_type> \
--model_type=<model_type> \
--model=<model_name_folder_on_triton> \
--labelsFile=<path_to_labels_on_container> \
--protocol=<http or grpc> \
--serverAddress=<triton-ip> \
--port=<8000 for http, 8001 for grpc>
-v /path/to/host/data:/app/data
: Maps host data to/app/data
in the container for input/output.
Processed output is saved to the mapped directory on the host.
Real-time inference test (GPU Rtx 3060):
- YOLOv7-tiny exported to ONNX: YOLOv7-tiny Inference Test
- YOLO11s exported to onnx: YOLO11s Inference Test
- Triton Inference Server Client Example
- Triton User Guide
- Triton Tutorials
- ONNX Models
- Torchvision Models
- Tensorflow Model Garden
- Any feedback is greatly appreciated, if you have any suggestions, bug reports or questions don't hesitate to open an issue.