OpenVINO Model Server 2021.1
This is a major release of OpenVINO Model Server. It is a completely rewritten implementation of the serving component. Upgrade from Python-based version (2020.4) to C++ implementation (2021.1) should be mostly transparent. There are no changes required on the client side. Exposed API is unchanged but some configuration settings and deployment methods might be slightly adjusted.
Key New Features and Enhancements
- Much higher scalability in a single service instance. You can now utilize the full capacity of the available hardware. Expect linear scalability when introducing additional resources while avoiding any bottleneck on the frontend.
- Lower latency between the client and the server. This is especially noticeable with high performance accelerators or CPUs.
- Reduced footprint. By switching to C++ and reducing dependencies, the Docker image is reduced to ~400MB (for CPU, NCS and HDDL support) and ~800MB (for the image including also iGPU support).
- Reduced RAM usage. Thanks to reduced number of external software dependencies, OpenVINO Model Server allocates less memory on start up.
- Easier deployment on bare-metal or inside a Docker container.
- Support for online model updates.The server monitors configuration file changes and reloads models as needed without restarting the service.
- Model ensemble (preview). Connect multiple models to deploy complex processing solutions and reduce overhead of sending data back and forth.
- Azure Blob Storage support. From now on you can host your models in Azure Blob Storage containers.
- Updated helm chart for easy deployment in Kubernetes
Changes in version 2021.1
Moving from 2020.4 to 2021.1 introduces a few changes and optimizations which primarily impact the server deployment and configuration process. These changes are documented below.
- Docker Container Entrypoint
To simplify deployment with containers, Docker imageentrypoint
was added. Now the container startup requires only parameters specific to the Model Server executable:
Old command:
docker run -d -v $(pwd)/model:/models/my_model/ -e LOG_LEVEL=DEBUG -p 9000:9000 openvino/model_server /ie-serving-py/start_server.sh ie_serving model --model_path /models/face-detection --model_name my_model --port 9000 --shape auto
New command:
docker run -d -v $(pwd)/model:/models/my_model/ -p 9000:9000 openvino/model_server --model_path /models/my_model --model_name my_model --port 9000 --shape auto --log_level DEBUG
- Simplified Command Line Parameters
Subcommandsmodel
andconfig
are no longer used. Single-model mode or multi-model mode of serving is determined based on whether --config_path or --model_name is defined. --config_path or --model_name are exclusive. - Changed default THROUGHPUT_STREAMS settings for the CPU and GPU device plugin
In python implementation, the default configuration was optimized for minimal latency results with a single stream of inference request. In version 2021.1, the default configuration for the server concurrency CPU_THROUGHPUT_STREAMS and GPU_THROUGHPUT_STREAMS are calculated automatically based on the available resources. It ensure both low latency and efficient parallel processing. If you need to serve the models only for a single client on high performance systems, set a parameter like below:
--plugin_config '{"CPU_THROUGHPUT_STREAMS":"1"}'
- Log Level and Log File Path
Instead of environment variables LOG_LEVEL and LOG_PATH, log level and path are now defined in command line parameters to simplify configuration.
--log_level DEBUG/INFO(default)/ERROR
- grpc_workers Parameter Meaning
In the Python implementation (2020.4 and below) this parameter defined the number of frontend threads. In the C++ implementation (2021.1 and above) this defines the number of internal gRPC server objects to increase the maximum bandwidth capacity. The default value of 1 should be sufficient for most scenarios. Consider tuning it if you expect very high load from multiple parallel clients. - Model Data Type Conversion
In the Python implementation (2020.4 and below) the input tensors of data type different than expected by the model were automatically converted to match required data type. In some cases, such conversion impacted the overall performance of inference request. In the version 2021.1, the user input data type must be the same as the model input data type. The client receives an error indicating incorrect input data precision, which gives immediate feedback to correct the format. - Proxy Settings
no_proxy environment variable is not used with the cloud storage for models. Thehttp_proxy
andhttps_proxy
settings are common for all remote models deployed in OpenVINO Model Server. In case you need to deploy both models stored behind the proxy and direct, run two instances of the model server.
Refer to troubleshooting guide to learn about known issues and workarounds. - Default Docker security context
By default OpenVINO Model Server process starts inside the docker container in the context of ovms account with uid 5000. It was root context in the previous versions. The change is enforcing the best practice of minimal required permissions. In case you need to change the security context, use–user
flag indocker run
command.
Note: Git history of C++ development is stored on a main
branch (new default). Python implementation history is preserved on a master
branch.
You can use an OpenVINO Model Server public Docker image based on centos* via the following command:
docker pull openvino/model_server:2021.1
or
docker pull openvino/model_server:2021.1-gpu