TorchServe is a flexible and easy to use tool for serving PyTorch models.
TorchServe is avaliable as docker image pytorch/torchserve
.
docker pull pytorch/torchserve
docker run --rm -it pytorch/torchserve:latest bash
To serve a model with TorchServe, first archive the model as a MAR file. You can use the model archiver to package a model. You can also create model stores to store your archived models.
- Get model and other files related to transformer models.
pytorch_model.bin
,vocab.txt
andconfig.json
file.
./get_model.sh
Now we will package the model inside torchserve docker container.
docker run --rm -it -p 8080:8080 -p 8081:8081 --name mar -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/examples:/home/model-server/examples -v $(pwd)/transformer_handler.py:/home/model-server/transformer_handler.py pytorch/torchserve:latest bash
Inside docker run torch-model-archiver
to get .mar
format model which will be used for inference. torch-model-archiver
takes model checkpoints or model definition file with state_dict, and package them into a .mar
file. This file can then be redistributed and served by anyone using TorchServe.
torch-model-archiver --model-name SentimentClassification --version 1.0 --serialized-file model-store/pytorch_model.bin --handler ./transformer_handler.py --extra-files "model-store/config.json,model-store/vocab.txt,examples/index_to_name.json"
After you archive and store the model, to register the model on TorchServe using the above model archive file, we run the following commands:
mv SentimentClassification.mar model-store/
torchserve --start --model-store model-store --models my_tc=SentimentClassification.mar --ncs
In a separate terminal,
curl -X POST http://127.0.0.1:8080/predictions/my_tc -T examples/sample_text1.txt
curl -X POST http://127.0.0.1:8080/predictions/my_tc -T examples/sample_text0.txt
To stop the currently running TorchServe instance, run
torchserve --stop
All the logs you've seen as output to stdout related to model registration, management, inference are recorded in the /logs
folder.
High level performance data like Throughput or Percentile Precision can be generated with Benchmark and visualized in a report.
Additionals:
Batch Inference: Batching
Inference: Supports inference through both gRPC and HTTP/REST.