OpenVINO™ Model Server includes a C++ implementation of gRPC and RESTful API interfaces defined by Tensorflow serving. In the backend it uses Inference Engine libraries from OpenVINO™ toolkit, which speeds up the execution on CPU, and enables it on iGPU and Movidius devices.
OpenVINO™ Model Server can be hosted on a bare metal server, virtual machine or inside a docker container. It is also suitable for landing in Kubernetes environment.
We are testing OpenVINO Model Server execution on baremetal on Ubuntu 20.04.x
For other operating systems we recommend using OVMS docker containers.
Check out supported configurations.
Look at VPU Plugins to see if your model is supported and use OpenVINO Model Optimizer and convert your model to the OpenVINO format.
-
Clone model server git repository using command :
git clone https://github.com/openvinotoolkit/model_server
-
Navigate to model server directory using command :
cd model_server
-
To install Model Server, you could use precompiled version or built it on your own inside a docker container. Build a docker container with automated steps using the command :
make docker_build
-
The
make docker_build
target will also make a copy of the binary package in a dist subfolder in the model server root directory. -
Navigate to the folder containing binary package and unpack the included tar.gz file using the command :
cd dist/ubuntu && tar -xzvf ovms.tar.gz
- The server can be started using the command in the folder, where OVMS was installed:
./ovms/bin/ovms --help
- The server can be started in interactive mode, as a background process or a daemon initiated by
systemctl/initd
depending on the Linux distribution and specific hosting requirements.
Refer to Running Model Server using Docker Container to get more details about the ovms parameters and configuration.
Note When AI accelerators are used for inference execution, there might be needed additional steps to install their drivers and dependencies. Learn more about it on OpenVINO installation guide.