Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openshift article on using s2i in seldon-core #140

Merged
merged 1 commit into from
Apr 11, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/articles/graphs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
171 changes: 171 additions & 0 deletions docs/articles/openshift_s2i.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# Using Openshift Source-to-Image to facilitate Machine Learning Deployment

[Seldon-Core](https://github.com/SeldonIO/seldon-core) is an open source project that provides scalable machine learning deployment running on [Kubernetes](https://kubernetes.io/). One of Seldon-Core’s goals is to allow data scientists to continue to construct their training and inference components using any of the many available machine learning toolkits, be that python based (e.g., TensorFlow, sklearn), R or Java (e.g., Spark, H2O) amongst many popular options. Seldon-Core will then allow them easily to package and run their runtime prediction modules on Kubernetes. To achieve this goal we need to make it easy for data scientists to take their source code and package it as a Docker container in the correct form such that it can be managed as part of a runtime microservice graph on Kubernetes by Seldon-Core. For this we utilize Openshift’s Source-to-Image open source library to allow any code to be packaged in the correct format with minimal requirements from the data scientist.

# Seldon-Core Overview
Seldon-core provides scalebale machine learning deployments running on Kubernetes. To deploy their models data scientists follow the steps as shown below:

![API](../deploy.png)

1. Package their runtime model as a Docker image
1. Describe their runtime graph as a kubernetes resource
1. Deploy to kubernetes using standard tools such as kubectl, Helm, ksonnet.

Once running their deployment can be updated as new image releases are created for the runtime model as well as updates to the runtime graph.

The components of the runtime graph can be of various types. The most typical is a model which will provide predictions given some input features. Typically, the data scientist will have trained a model and saved the model parameters for use by a runtime component that will be provide new predictions at runtime. However, Seldon-Core allows a range of components to be created that can be joined together as building blocks to create more complex runtime graphs as show below:

![graphs](graphs.png)

The types of component you can create can include:

* Models - e.g., TensorFlow, sklearn models
* Routers - e.g., A-B Tests, Multi-Armed Bandits
* Combiners - e.g., Model ensemblers
* Transformers - e.g., Feature normalization, Outlier detection, concept drift

As the above diagram shows these need to be fitted into the microservice API of seldon-core either as REST or gRPC services.

# Source-to-Image integration
To integrate a component into seldon-core the data scientist needs to accomplish two things:

1. Create a Docker image from your source code
1. Wrap your component as a service that exposes REST or gRPC endpoints that follow the seldon-core miroserice API.

![wrap](wrap.png)

To accomplish this we use Openshift's [source-to-image (s2i)](https://github.com/openshift/source-to-image) open source tool. S2i allows data scientists to wrap their code using a single command line call that can easily be embedded into a continuous integration pipeline. Seldon provides s2i builder images that contain middleware code to wrap the data scientist's component within a REST or gRPC server that respects the seldon-core microservice API. All that is needed is for the data scientist to follow a few conventions when creating their component in various languages as will be illustrated below.

## Python
There are many popular machine learning libraries in python including Tensorflow, keras, sklearn, pyTorch and Statsmodels amongst many others. To use the Seldon-Core s2i builder image to package a python model the data scientist simply needs to provide:

* A python file with a class that runs your model
* optional requirements.txt or setup.py
* .s2i/environment - model definitions used by the s2i builder to correctly wrap your model

The data scientist's source code should contain a python file which defines a class of the same name as the file. For example:

```python
class MyModel(object):
"""
Model template. You can load your model parameters in __init__ from a location accessible at runtime
"""

def __init__(self):
"""
Add any initialization parameters. These will be passed at runtime from the graph definition parameters defined in your seldondeployment kubernetes resource manifest.
"""
print("Initializing")

def predict(self,X,features_names):
"""
Return a prediction.

Parameters
----------
X : array-like
feature_names : array of feature names (optional)
"""
print("Predict called - will run identity function")
return X
```

* The file is called MyModel.py and it defines a class MyModel
* The class contains a predict method that takes an array (numpy) X and feature_names and returns an array of predictions.
* Any required initialization can be put inside the class init method.

An optional requirements.txt can detail any software dependencies the code requires.

To allow the s2i builder image to correctly package the component the data scinetist needs to provide a few environment variables either in an .s2i/environment file in the source code folder or on the command line. An example is:

```bash
MODEL_NAME=MyModel
API_TYPE=REST
SERVICE_TYPE=MODEL
```

Finally we Use ```s2i build``` to create the Docker image from source code. Examples for python2 code are:

```bash
s2i build <git-repo> seldonio/seldon-core-s2i-python2 <my-image-name>
s2i build <src-folder> seldonio/seldon-core-s2i-python2 <my-image-name>
```

## R
R is a popular statistical langiage which provides many machine learning related packages.

To use the seldon s2i builder image to package an R model the requirements are:

* An R file which provides an S3 class for your model via an ```initialise_seldon``` function and that has appropriate generics for the component, e.g. predict for a model.
* An optional install.R to be run to install any libraries needed
* .s2i/environment - model definitions used by the s2i builder to correctly wrap your model

The data scientist's source code should contain an R file which defines an S3 class for their model. For example,

```R
library(methods)

predict.mymodel <- function(mymodel,newdata=list()) {
write("MyModel predict called", stdout())
newdata
}

new_mymodel <- function() {
structure(list(), class = "mymodel")
}

initialise_seldon <- function(params) {
new_mymodel()
}
```

The above contains:

* A ```seldon_initialise``` function that creates an S3 class for the model via a constructor ```new_mymodel```. This will be called on startup and you can run any configuration the model needs.
* A generic ```predict``` function is created for my model class. This will be called with a ```newdata``` field with the ```data.frame``` to be predicted.

An ```install.R``` with any software dependencies required. For example:

```R
install.packages('rpart')
```

Finally, as with all cases the builder image needs a few environment variables to be set to correctly package the R model. An example is:

```bash
MODEL_NAME=MyModel
API_TYPE=REST
SERVICE_TYPE=MODEL
```

These values can also be provided in an .s2i/environment file with the source code or overriden on the command line when building the image.

Once these steps are done we can use ```s2i build``` to create the Docker image from the source code.

```bash
s2i build <git-repo> seldonio/seldon-core-s2i-r <my-image-name>
s2i build <src-folder> seldonio/seldon-core-s2i-r <my-image-name>
```

An example invocation using the test template model inside seldon-core:

```bash
s2i build https://github.com/seldonio/seldon-core.git --context-dir=wrappers/s2i/R/test/model-template-app seldonio/seldon-core-s2i-r seldon-core-template-model
```

## Java
There are several popular machine learning libraries in Java including Spark, H2O and DL4J. Seldon-core also provides builder images for Java. To accomplish this we provide a Java library seldon-core-wrappers that can be included in a Maven Spring project to allow a Java component to be easily wrapped.

To use teh Seldon-Core s2i builder image to package a Java model the data scientist will need:

* A Maven project that depends on the ```io.seldon.wrapper``` library
* A Spring Boot configuration class
* A class that implements ```io.seldon.wrapper.SeldonPredictionService``` for the type of component you are creating
* An optional .s2i/environment - model definitions used by the s2i builder to correctly wrap your model

More details can be found in the seldon-core docs

# Summary

By utilizing Openshift's source-to-image tool data scientists can easily build Docker images for their runtime components to be deployed at scale using seldon-core. This allows data science teams to use the best machine learning tool for the task and deploy the resulting model in a consistent manner. The seldon-core project is working on providing full Openshift integration in the near future so that Enterprise customers can easily utilize machine learning models within their organisation.

Binary file added docs/articles/wrap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.