Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wholesale updates for tritonserver version #369

Merged
1 commit merged into from
Sep 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ Use the following command to launch a Docker container for Triton loading all of
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:22.06-py3 \
nvcr.io/nvidia/tritonserver:22.08-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ Note: This step assumes you have both [Docker](https://docs.docker.com/engine/in
From the root of the Morpheus project we will launch a Triton Docker container with the `models` directory mounted into the container:

```shell
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.02-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --log-info=true
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --log-info=true
```

Once we have Triton running, we can verify that it is healthy using [curl](https://curl.se/). The `/v2/health/live` endpoint should return a 200 status code:
Expand Down
4 changes: 2 additions & 2 deletions examples/abp_nvsmi_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,12 @@ This example utilizes the Triton Inference Server to perform inference.

Pull the Docker image for Triton:
```bash
docker pull nvcr.io/nvidia/tritonserver:22.02-py3
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
```

From the Morpheus repo root directory, run the following to launch Triton and load the `abp-nvsmi-xgb` XGBoost model:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.02-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model abp-nvsmi-xgb
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model abp-nvsmi-xgb
```

This will launch Triton and only load the `abp-nvsmi-xgb` model. This model has been configured with a max batch size of 32768, and to use dynamic batching for increased performance.
Expand Down
4 changes: 2 additions & 2 deletions examples/abp_pcap_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ To run this example, an instance of Triton Inference Server and a sample dataset

### Triton Inference Server
```bash
docker pull nvcr.io/nvidia/tritonserver:22.02-py3
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
```

##### Deploy Triton Inference Server
Expand All @@ -35,7 +35,7 @@ Bind the provided `abp-pcap-xgb` directory to the docker container model repo at
cd <MORPHEUS_ROOT>/examples/abp_pcap_detection

# Launch the container
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:22.02-py3 tritonserver --model-repository=/models --exit-on-error=false --model-control-mode=poll --repository-poll-secs=30
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models --exit-on-error=false --model-control-mode=poll --repository-poll-secs=30
```

##### Verify Model Deployment
Expand Down
4 changes: 2 additions & 2 deletions examples/log_parsing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```
docker pull nvcr.io/nvidia/tritonserver:22.02-py3
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
```

##### Start Triton Inference Server container
```
cd ${MORPHEUS_ROOT}/models

docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD:/models nvcr.io/nvidia/tritonserver:22.02-py3 tritonserver --model-repository=/models/triton-model-repo --model-control-mode=explicit --load-model log-parsing-onnx
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --model-control-mode=explicit --load-model log-parsing-onnx
```

##### Verify Model Deployment
Expand Down
2 changes: 1 addition & 1 deletion examples/nlp_si_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ This example utilizes the Triton Inference Server to perform inference. The neur
From the Morpheus repo root directory, run the following to launch Triton and load the `sid-minibert` model:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.02-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
```

Where `22.02-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (see [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).
Expand Down
4 changes: 2 additions & 2 deletions examples/ransomware_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```
docker pull nvcr.io/nvidia/tritonserver:22.06-py3
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
```

##### Start Triton Inference Server container
```bash
cd ${MORPHEUS_ROOT}/examples/ransomware_detection

# Run Triton in explicit mode
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:22.06-py3 \
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:22.08-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--model-control-mode=explicit \
Expand Down
2 changes: 1 addition & 1 deletion examples/sid_visualization/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ x-with-gpus: &with_gpus

services:
triton:
image: nvcr.io/nvidia/tritonserver:22.06-py3
image: nvcr.io/nvidia/tritonserver:22.08-py3
<<: *with_gpus
command: "tritonserver --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx --model-repository=/models/triton-model-repo"
environment:
Expand Down
7 changes: 2 additions & 5 deletions models/triton-model-repo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ To launch Triton with one of the models in `triton-model-repo`, this entire repo
### Load `sid-minibert-onnx` Model with Default Triton Image

```bash
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD:/models --name tritonserver nvcr.io/nvidia/tritonserver:21.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD:/models --name tritonserver nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
```

### Load `abp-nvsmi-xgb` Model with FIL Backend Triton
Expand All @@ -36,9 +36,6 @@ docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD:/model
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD:/models --name tritonserver triton_fil tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model abp-nvsmi-xgb
```

Note: The FIL Backend Triton image was built with `docker build -t triton_fil -f ops/Dockerfile .`. Adjust the image name as necessary.


### Load `sid-minibert-trt` Model with Default Triton Image from Morpheus Repo

To load a TensorRT model, it first must be compiled with the `morpheus tools onnx-to-trt` utility (See `triton-model-repo/sid-minibert-trt/1/README.md` for more info):
Expand All @@ -51,5 +48,5 @@ morpheus tools onnx-to-trt --input_model ../../sid-minibert-onnx/1/sid-minibert.
Then launch Triton:

```bash
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/models:/models --name tritonserver nvcr.io/nvidia/tritonserver:21.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-trt
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/models:/models --name tritonserver nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-trt
```
6 changes: 3 additions & 3 deletions scripts/validation/kafka_testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ For this test we are going to replace the from & to file stages from the ABP val
1. In a new terminal launch Triton:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v ${MORPHEUS_ROOT}/models:/models \
nvcr.io/nvidia/tritonserver:22.02-py3 \
nvcr.io/nvidia/tritonserver:22.08-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--model-control-mode=explicit \
Expand Down Expand Up @@ -338,7 +338,7 @@ For this test we are going to replace the from & to file stages from the Phishin
1. In a new terminal launch Triton:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v ${MORPHEUS_ROOT}/models:/models \
nvcr.io/nvidia/tritonserver:22.02-py3 \
nvcr.io/nvidia/tritonserver:22.08-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--model-control-mode=explicit \
Expand Down Expand Up @@ -411,7 +411,7 @@ Note: Due to the complexity of the input data and a limitation of the cudf reade
1. In a new terminal launch Triton:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v ${MORPHEUS_ROOT}/models:/models \
nvcr.io/nvidia/tritonserver:22.02-py3 \
nvcr.io/nvidia/tritonserver:22.08-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--model-control-mode=explicit \
Expand Down
2 changes: 1 addition & 1 deletion scripts/validation/val-globals.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ export e="\033[0;90m"
export y="\033[0;33m"
export x="\033[0m"

export TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:22.06-py3"}
export TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:22.08-py3"}

# TRITON_GRPC_PORT is only used when TRITON_URL is undefined
export TRITON_GRPC_PORT=${TRITON_GRPC_PORT:-"8001"}
Expand Down
2 changes: 1 addition & 1 deletion scripts/validation/val-utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ function wait_for_triton {

function ensure_triton_running {

TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:22.06-py3"}
TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:22.08-py3"}

IS_RUNNING=$(is_triton_running)

Expand Down
4 changes: 2 additions & 2 deletions tests/benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```
docker pull nvcr.io/nvidia/tritonserver:22.02-py3
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
```

##### Start Triton Inference Server container
```
cd ${MORPHEUS_ROOT}/models

docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models/triton-model-repo --model-control-mode=explicit --load-model sid-minibert-onnx --load-model abp-nvsmi-xgb --load-model phishing-bert-onnx
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --model-control-mode=explicit --load-model sid-minibert-onnx --load-model abp-nvsmi-xgb --load-model phishing-bert-onnx
```

##### Verify Model Deployments
Expand Down