Examples showcasing TorchServe Features and Integrations

TorchServe Internals

Creating mar file for an eager mode model
Creating mar file for torchscript mode model
Serving custom model with custom service handler
Serving model using Docker Container
Creating a Workflow
Custom Metrics
Dynamic Batch Processing
Dynamic Batched Async Requests

TorchServe Integrations

Kubernetes

Serving HuggingFace faster transformers model in K8s

KServe

Serving HuggingFace BERT model using KServe

Hugging Face

Serving HuggingFace transformers model

MLFlow

Deploy models using mlflow-torchserve plugin

Captum

Model Explainability with Captum

ONNX

Example for ONNX Integration

TensorRT

Support for TensorRT optimizations

Microsoft DeepSpeed-MII

HuggingFace Stable Diffusion Model with Microsoft DeepSpeed-MII

Prometheus and mtail

Custom Metrics with mtail and Prometheus

Intel® Extension for PyTorch

Boost Performance on Intel Hardware

TorchRec DLRM

Serving Torchrec DLRM (Recommender Model)

TorchData

Serving Image Classifier model and loading image data using TorchData (datapipes)

PyTorch 2.0

PyTorch 2.0 Integration

Stable Diffusion

Stable Diffusion using HuggingFace Diffusers

HuggingFace Large Models

HuggingFace Large Models with constrained resources

UseCases

Vision

Image Classification

Serving torchvision image classification models
Serving Image Classifier model for on-premise near real-time video

Object Detection

Serving object detection model
Serving image segmentation model

GAN

Serving image generator model

Text

Neural Machine Translation

Serving machine translation model
Serving Neural Machine Translation Workflow

Text Classification

Serving text classification model
Serving text classification model with scriptable tokenizer

Text to Speech

Serving waveglow text to speech synthesizer model

MultiModal

Serving multi modal framework model

TorchServe Examples

The following are examples on how to create and serve model archives with TorchServe.

Creating mar file for eager mode model

Following are the steps to create a torch-model-archive (.mar) to execute an eager mode torch model in TorchServe :

Pre-requisites to create a torch model archive (.mar) :
- serialized-file (.pt) : This file represents the state_dict in case of eager mode model.
- model-file (.py) : This file contains model class extended from torch nn.modules representing the model architecture. This parameter is mandatory for eager mode models. This file must contain only one class definition extended from torch.nn.Module.
- index_to_name.json : This file contains the mapping of predicted index to class. The default TorchServe handles returns the predicted index and probability. This file can be passed to model archiver using --extra-files parameter.
- version : Model's version.
- handler : TorchServe default handler's name or path to custom inference handler(.py)

Syntax

torch-model-archiver --model-name <model_name> --version <model_version_number> --model-file <path_to_model_architecture_file> --serialized-file <path_to_state_dict_file> --handler <path_to_custom_handler_or_default_handler_name> --extra-files <path_to_index_to_name_json_file>

Creating mar file for torchscript mode model

Following are the steps to create a torch-model-archive (.mar) to execute an eager mode torch model in TorchServe :

Pre-requisites to create a torch model archive (.mar) :
- serialized-file (.pt) : This file represents the state_dict in case of eager mode model or an executable ScriptModule in case of TorchScript.
- index_to_name.json : This file contains the mapping of predicted index to class. The default TorchServe handles returns the predicted index and probability. This file can be passed to model archiver using --extra-files parameter.
- version : Model's version.
- handler : TorchServe default handler's name or path to custom inference handler(.py)

Syntax

torch-model-archiver --model-name <model_name> --version <model_version_number> --serialized-file <path_to_executable_script_module> --extra-files <path_to_index_to_name_json_file> --handler <path_to_custom_handler_or_default_handler_name>

Serving image classification models

The following example demonstrates how to create image classifier model archive, serve it on TorchServe and run image prediction using TorchServe's default image_classifier handler :

Image classification models

Serving custom model with custom service handler

The following example demonstrates how to create and serve a custom NN model with custom handler archives in TorchServe :

Digit recognition with MNIST

Serving text classification model

The following example demonstrates how to create and serve a custom text_classification NN model with default text_classifier handler provided by TorchServe :

Text classification example

Serving text classification model with scriptable tokenizer

This example shows how to combine a text classification model with a scriptable tokenizer into a single, scripted artifact to serve with TorchServe. A scriptable tokenizer is a tokenizer compatible with TorchScript.

Scriptable Tokenizer example with scriptable tokenizer

Serving object detection model

The following example demonstrates how to create and serve a pretrained fast-rcnn NN model with default object_detector handler provided by TorchServe :

Object detection example

Serving image segmentation model

The following example demonstrates how to create and serve a pretrained fcn NN model with default image_segmenter handler provided by TorchServe :

Image segmentation example

Serving Huggingface Transformers

The following example demonstrates how to create and serve a pretrained transformer models from Huggingface such as BERT, RoBERTA, XLM

Hugging Face Transformers

Captum Integration

The following example demonstrates TorchServe's integration with Captum, an open source, extensible library for model interpretability built on PyTorch

Captum

Example to serve GAN model

The following example demonstrates how to create and serve a pretrained DCGAN model from facebookresearch/pytorch_GAN_zoo

GAN Image Generator

Serving Neural Machine Translation

The following example demonstrates how to create and serve a neural translation model using fairseq

Neural machine translation

Serving Waveglow text to speech synthesizer

The following example demonstrates how to create and serve the waveglow text to speech synthesizer

Waveglow text to speech synthesizer

Serving Multi modal model

The following example demonstrates how to create and serve a multi modal model including audio, text and video

Multi modal framework

Serving Image Classification Workflow

The following example demonstrates how to create and serve a complex image classification workflow for dog breed classification

Image classification workflow

Serving Neural Machine Translation Workflow

The following example demonstrates how to create and serve a complex neural machine translation workflow

Neural machine Translation workflow

Serving Torchrec DLRM (Recommender Model)

This example shows how to deploy a Deep Learning Recommendation Model (DLRM) with TorchRec

Torchrec DLRM

Serving Image Classifier Model for on-premise near real-time video

The following example demonstrates how to serve an image classification model with batching for near real-time video

Near Real-Time Video Batched Image Classification

Serving Image Classifier Model with TorchData datapipes

The following example demonstrates how to integrate TorchData with torchserve

Torchdata integration with torchserve an image classification example

Files

README.md

Latest commit

History