Update KFServing docs (kubeflow#897)

* Fix up kfserving install doc link * Update quick install for 0.3.0 * Upgrade quick install to use istio 1.6.2 * Add perf test job for sklearn example * Add KFServing demo gif * Reorganize examples * Add feature descriptions * Add feature table for model serve * Add alibi references * Update main README * Add batcher/gRPC example * Fix perf job for sklearn example * separate custom predictor * Update batching and alibi * Add roadmap
magdalenakuhn17 · Jun 26, 2020 · 19ee0bb · 19ee0bb
1 parent d722459
commit 19ee0bb
Show file tree

Hide file tree

Showing 21 changed files with 248 additions and 120 deletions.
diff --git a/README.md b/README.md
@@ -14,7 +14,7 @@ Knative Serving and Istio should be available on Kubernetes Cluster, Knative dep
 - [Istio](https://knative.dev/docs/install/installing-istio): v1.1.6+
 
 If you want to get up running Knative quickly or you do not need service mesh, we recommend installing Istio without service mesh(sidecar injection).
-- [Knative Serving](https://knative.dev/docs/install/knative-with-any-k8s): v0.11.1+
+- [Knative Serving](https://knative.dev/docs/install/knative-with-any-k8s): v0.11.2+
 
 Currently only `Knative Serving` is required, `cluster-local-gateway` is required to serve cluster-internal traffic for transformer and explainer use cases. Please follow instructions here to install [cluster local gateway](https://knative.dev/docs/install/installing-istio/#updating-your-install-to-use-cluster-local-gateway)
 
@@ -55,7 +55,6 @@ If you are using Kubeflow dashboard or [profile controller](https://www.kubeflow
 
 Make sure you have
 [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-on-linux),
-[kustomize v3.5.4+](https://github.com/kubernetes-sigs/kustomize/blob/master/docs/INSTALL.md),
 [helm 3](https://helm.sh/docs/intro/install) installed before you start.(2 mins for setup)
 1) If you do not have an existing kubernetes cluster you can create a quick kubernetes local cluster with [kind](https://github.com/kubernetes-sigs/kind#installation-and-usage).(this takes 30s)
 ```bash
@@ -65,6 +64,16 @@ kind create cluster
 ```bash
 ./hack/quick_install.sh
 ```
+#### Ingress Setup and Monitoring Stack
+- [Configure Custom Ingress Gateway](https://knative.dev/docs/serving/setting-up-custom-ingress-gateway/)
+  - In addition you need to update [KFServing configmap](config/default/configmap/inferenceservice.yaml) to use the custom ingress gateway.
+- [Configure HTTPS Connection](https://knative.dev/docs/serving/using-a-tls-cert/)
+- [Configure Custom Domain](https://knative.dev/docs/serving/using-a-custom-domain/)
+- [Metrics](https://knative.dev/docs/serving/accessing-metrics/)
+- [Tracing](https://knative.dev/docs/serving/accessing-traces/)
+- [Logging](https://knative.dev/docs/serving/using-a-custom-domain/)
+- [Dashboard for ServiceMesh](https://istio.io/latest/docs/tasks/observability/kiali/)
+
 ### Test KFServing Installation 
 
 1) To check if KFServing Controller is installed correctly, please run the following command
@@ -94,6 +103,21 @@ kubectl port-forward --namespace istio-system $(kubectl get pod --namespace isti
 SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kfserving-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
 curl -v -H "Host: ${SERVICE_HOSTNAME}" http://localhost:8080/v1/models/sklearn-iris:predict -d @./docs/samples/sklearn/iris-input.json
 ```
+5) Run Performance Test
+```bash
+kubectl create -f docs/samples/sklearn/perf.test
+# wait the job to be done and check the log
+kubectl logs load-test8b58n-rgfxr 
+Requests      [total, rate, throughput]         30000, 500.02, 499.99
+Duration      [total, attack, wait]             1m0s, 59.998s, 3.336ms
+Latencies     [min, mean, 50, 90, 95, 99, max]  1.743ms, 2.748ms, 2.494ms, 3.363ms, 4.091ms, 7.749ms, 46.354ms
+Bytes In      [total, mean]                     690000, 23.00
+Bytes Out     [total, mean]                     2460000, 82.00
+Success       [ratio]                           100.00%
+Status Codes  [code:count]                      200:30000  
+Error Set:
+```
+
 ### Use KFServing SDK
 * Install the SDK
   ```
@@ -103,8 +127,11 @@ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://localhost:8080/v1/models/sklearn-i
 
 * Follow the [example here](docs/samples/client/kfserving_sdk_sample.ipynb) to use the KFServing SDK to create, rollout, promote, and delete an InferenceService instance.
 
-### KFServing Examples 
-[KFServing examples](./docs/samples/README.md)
+### KFServing Features and Examples
+[KFServing Features and Examples](./docs/samples/README.md)
+
+### KFServing Roadmap
+[KFServing Roadmap](./ROADMAP.md)
 
 ### KFServing Concepts and Data Plane
 [KFServing Concepts and Data Plane](./docs/README.md)
@@ -123,3 +150,4 @@ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://localhost:8080/v1/models/sklearn-i
 
 ### Contributor Guide
 [Contributor Guide](./CONTRIBUTING.md)
+
diff --git a/docs/samples/README.md b/docs/samples/README.md
@@ -1,61 +1,115 @@
-## KFServing Examples
-
-### Deploy KFServing InferenceService with out of the box Predictor
-[SKLearn Model](./sklearn)
-
-[PyTorch Model](./pytorch)
-
-[Tensorflow Model](./tensorflow)
-
-[XGBoost Model](./xgboost)
-
-[ONNX Model with ONNX Runtime](./onnx)
-
-[Simple String Model with NVIDIA Triton Inference Server](./triton/simple_string)
-
-[Serve BERT Model with NVIDIA Triton Inference Server](./triton/bert)
-
-### Deploy KFServing InferenceService with a custom Predictor
-
-[Hello World Flask Server](./custom/hello-world)
-
-[KFServing Custom Model](./custom/kfserving-custom-model)
-
-[Prebuilt Image](./custom/prebuilt-image)
-
-[BentoML](./bentoml)
-
-### Deploy KFServing InferenceService with Transformer
-[Image Transformer with PyTorch Predictor](./transformer/image_transformer)
-
-### Deploy KFServing InferenceService with Explainer
-[Alibi Image Explainer](./explanation/alibi/imagenet)
-
-[Alibi Text Explainer](./explanation/alibi/moviesentiment)
-
-[Alibi Tabular Explainer](./explanation/alibi/income)
-
-### Deploy KFServing InferenceService with Cloud or PVC storage
-
-[Models on S3](./s3)
-
-[Models on PVC](./pvc)
-
-[Models on Azure](./azure)
-
-### Deploy KFServing InferenceService with Autoscaling, Canary Rollout and Other Integrations
+## KFServing Features and Examples
+
+### Deploy InferenceService with Predictor
+KFServing provides a simple Kubernetes CRD to allow deploying trained models onto model servers such as [TFServing](https://www.tensorflow.org/tfx/guide/serving), 
+[ONNXRuntime](https://github.com/microsoft/onnxruntime), [Triton Inference Server](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs),
+[KFServer](https://github.com/kubeflow/kfserving/tree/master/python/kfserving). These model servers are also exposing a standardised API for both REST and gRPC. You could also choose to build your own model server for more complex use case,
+KFServing provides basic API primitives to allow you easily build custom model server, you can use other tools like [BentoML](https://docs.bentoml.org/en/latest) to build your custom model serve image.
+After models are deployed onto model servers with KFServing, you get all the following serverless features provided by KFServing
+- Scale to and from Zero
+- Request based Autoscaling on CPU/GPU
+- Revision Management
+- Optimized Container
+- Batching and Logger
+- Traffic management
+- Security with AuthN/AuthZ
+- Distributed Tracing
+- Out-of-the-box metrics
+- Ingress/Egress control
+
+| Out-of-the-box Predictor  | Exported model| HTTP | gRPC | Examples |
+| ------------- | ------------- | ------------- | ------------- | ------------- |
+| Deploy SKLearn Model on KFServer | pickled model(model.pkl, model.joblib) | :heavy_check_mark: | V2 |[SKLearn Iris](./sklearn)  |
+| Deploy XGBoost Model on KFServer | pickled model(model.bst) | :heavy_check_mark: | V2 |[XGBoost Iris](./xgboost)  |
+| Deploy Pytorch Model on KFServer  | [torch.save model(model.pt)](https://pytorch.org/docs/master/generated/torch.save.html) | :heavy_check_mark: | V2 |  [PyTorch Cifar10](./pytorch)  |
+| Deploy Tensorflow Model on TFServing  | [Tensorflow SavedModel](https://www.tensorflow.org/guide/saved_model) | :heavy_check_mark: | :heavy_check_mark: | [Tensorflow Flowers](./tensorflow)  |
+| Deploy ONNX Model on ONNXRuntime  | [Exported onnx model(model.onnx)](https://github.com/onnx/tutorials#converting-to-onnx-format) | :heavy_check_mark: | :heavy_check_mark: |[ONNX Style Model](./onnx)  |
+| Deploy Model on Triton Server | [Tensorflow,PyTorch,ONNX,TensorRT](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html)| :heavy_check_mark: | :heavy_check_mark: | [Simple String](./triton/simple_string) |
+
+| Custom Predictor  | Examples |
+| ------------- |  ------------- |
+| Deploy model on custom KFServer | [Custom KFServer](./custom/kfserving-custom-model)|
+| Deploy model on BentoML | [SKLearn Iris with BentoML](./bentoml)|
+| Deploy model on custom HTTP Server  | [Prebuilt model server](./custom/prebuilt-image)|
+| Deploy model on custom gRPC Server  | [Prebuilt gRPC server](./custom/grpc-server)|
+
+In addition to deploy InferenceService with HTTP/gRPC endpoint, you can also deploy InferenceService with [Knative Event Sources](https://knative.dev/docs/eventing/sources/index.html) such as Kafka
+, you can find an example [here](./kafka) which shows how to build an async inference pipeline. 
+
+### Deploy InferenceService with Transformer
+KFServing transformer enables users to define a pre/post processing step before the prediction and explanation workflow.
+KFServing transformer runs as a separate microservice and can work with any type of pre-packaged model server, it can also 
+scale differently from the predictor if your transformer is CPU bound while predictor requires running on GPU. 
+
+| Features  | Examples |
+| ------------- | ------------- |
+| Deploy Transformer with KFServer | [Image Transformer with PyTorch KFServer](./transformer/image_transformer)  |
+| Deploy Transformer with Triton Server | [BERT Model with tokenizer](./triton/bert)  |
+
+### Deploy InferenceService with Explainer
+Model explainability answers the question: "Why did my model make this prediction" for a given instance. KFServing 
+integrates with [Alibi Explainer](https://github.com/SeldonIO/alibi) which implements a black-box algorithm by generating a lot of similar looking intances 
+for a given instance and send out to the model server to produce an explanation.
+
+
+| Features  | Examples |
+| ------------- | ------------- |
+| Deploy Alibi Image Explainer| [Imagenet Explainer](./explanation/alibi/imagenet)  |
+| Deploy Alibi Income Explainer| [Income Explainer](./explanation/alibi/income)  |
+| Deploy Alibi Text Explainer| [Alibi Text Explainer](./explanation/alibi/moviesentiment) |
+
+### Deploy InferenceService with Outlier/Drift Detector
+In order to trust and reliably act on model predictions, it is crucial to monitor the distribution of the incoming
+requests via various different type of detectors. KFServing integrates [Alibi Detect](https://github.com/SeldonIO/alibi-detect) with the following components:
+- Drift detector checks when the distribution of incoming requests is diverging from a reference distribution such as that of the training data 
+- Outlier detector flags single instances which do not follow the training distribution.
+
+| Features  | Examples |
+| ------------- | ------------- |
+| Deploy Alibi Outlier Detection| [Cifar outlier detector](./outlier-detection/alibi-detect/cifar10) |
+| Deploy Alibi Drift Detection| [Cifar drift detector](./drift-detection/alibi-detect/cifar10) |
+
+### Deploy InferenceService with Cloud/PVC storage
+| Feature  | Examples |
+| ------------- | ------------- |
+| Deploy Model on S3| [Mnist model on S3](./s3) |
+| Deploy Model on PVC| [Models on PVC](./pvc)  |
+| Deploy Model on Azure| [Models on Azure](./azure) |
+
+### Autoscaling
+KFServing's main serverless capability is to allow you to run inference workload without worrying about scaling your service manually once it is deployed. KFServing leverages Knative's [autoscaler](https://knative.dev/docs/serving/configuring-autoscaling/),
+the autoscaler works on GPU as well since the Autoscaler is based on request volume instead of GPU/CPU metrics which can be hard
+ to reason about. 
+
 [Autoscale inference workload on CPU/GPU](./autoscaling)
 
 [InferenceService on GPU nodes](./accelerators)
 
+### Canary Rollout
+Canary deployment enables rollout releases by splitting traffic between different versions to ensure safe rollout.
+
 [Canary Rollout](./rollouts)
 
+### Kubeflow Pipeline Integration
 [InferenceService with Kubeflow Pipeline](./pipelines)
 
-[InferenceService with Request/Response Logger](./logger/basic)
+### Request Batching(Alpha)
+Batching individual inference requests can be important as most of ML/DL frameworks are optimized for batch requests.
+In cases where the services receive heavy load of requests, its advantageous to batch the requests. This allows for maximally
+utilizing the CPU/GPU compute resource, but user needs to carefully perform enough tests to find optimal batch size and analyze 
+the traffic patterns before enabling the batch inference. KFServing injects a batcher sidecar so it can work with any model server
+deployed on KFServing, you can read more from this [example](./batcher).
 
-[InferenceService with Kafka Event Source](./kafka)
+### Request/Response Logger
+KFServing supports logging your inference request/response by injecting a sidecar alongside with your model server.
 
+| Feature  | Examples |
+| ------------- | ------------- |
+| Deploy Logger with a Logger Service| [Message Dumper Service](./logger/basic)  |
+| Deploy Async Logger| [Message Dumper Using Knative Eventing](./logger/knative-eventing)  |
+
+
+### Deploy InferenceService behind an Authentication Proxy with Kubeflow
 [InferenceService on Kubeflow with Istio-Dex](./istio-dex)
-### Deploy KFServing InferenceService behind an Authentication Proxy
+
 [InferenceService behind GCP Identity Aware Proxy (IAP) ](./gcp-iap)
diff --git a/docs/samples/autoscaling/README.md b/docs/samples/autoscaling/README.md
@@ -22,7 +22,7 @@
 
 # Autoscale InferenceService with your inference workload
 ## Setup
-1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
 2. Your cluster's Istio Ingress gateway must be network accessible.
 3. [Metrics installation](https://knative.dev/docs/serving/installing-logging-metrics-traces) for viewing scaling graphs (optional).
 4. The [hey](https://github.com/rakyll/hey) load generator installed (go get -u github.com/rakyll/hey).

diff --git a/docs/samples/bentoml/README.md b/docs/samples/bentoml/README.md
@@ -21,7 +21,7 @@ workflow, with DevOps best practices baked in.
 
 Before starting this guide, make sure you have the following:
 
-* Your ~/.kube/config should point to a cluster with KFServing installed.
+* Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
 * Your cluster's Istio Ingress gateway must be network accessible.
 * Docker and Docker hub must be properly configured on your local system
 * Python 3.6 or above

diff --git a/docs/samples/custom-domain/README.md b/docs/samples/custom-domain/README.md
@@ -2,7 +2,7 @@
 
 ## Setup
 
-1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
 2. Your cluster's Istio Ingress gateway must be network accessible.
 3. You have a custom domain configured to route incoming traffic either to the Cloud provided Kubernetes Ingress gateway or the istio-ingressgateway's IP address / Load Balancer.
 

diff --git a/docs/samples/custom/hello-world/README.md b/docs/samples/custom/hello-world/README.md
@@ -2,7 +2,7 @@
 
 ## Setup
 
-1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
 2. Your cluster's Istio Ingress gateway must be network accessible.
 
 ## Build and push the sample Docker Image

diff --git a/docs/samples/custom/kfserving-custom-model/README.md b/docs/samples/custom/kfserving-custom-model/README.md
@@ -30,7 +30,7 @@ Follow the instructions in the notebook to deploy the InferenseService with the
 
 ### Setup
 
-1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
 2. Your cluster's Istio Ingress gateway must be network accessible.
 
 ### Build and push the sample Docker Image

diff --git a/docs/samples/custom/prebuilt-image/README.md b/docs/samples/custom/prebuilt-image/README.md
@@ -2,7 +2,7 @@
 
 ## Setup
 
-1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
 2. Your cluster's Istio Ingress gateway must be network accessible.
 
 ## Create the InferenceService

diff --git a/docs/samples/gcp-iap/README.md b/docs/samples/gcp-iap/README.md
@@ -2,7 +2,7 @@
 When using Kubeflow with GCP it is common to use a [GCP Identity Aware Proxy](https://cloud.google.com/iap) (IAP) to manage client authentication to the KFServing endpoints.  The proxy intercepts and authenticates users and passes identity assertion (JWT) to kubernetes service/pods.  Whilst it is also possible to add access control (i.e. programmable or service mesh authorization), this is not described here.
 
 ### Prerequisites
-1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving) and have applied the [knative istio probe fix](https://github.com/kubeflow/manifests/commit/928cf483361730121ac18bc4d0e7a9c129f15ee2) (see below).
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving) and have applied the [knative istio probe fix](https://github.com/kubeflow/manifests/commit/928cf483361730121ac18bc4d0e7a9c129f15ee2) (see below).
 2. Your gcloud config is initialised to the project containing the k8s cluster and has a service-account that can download IAP key file.
 3. You are using Knative serving v0.11.2 or v0.14.0+
 4. You are using a recent version of KFServing (v0.3+)

diff --git a/docs/samples/kafka/README.md b/docs/samples/kafka/README.md
@@ -1,7 +1,7 @@
 
 # End to end inference example with Minio and Kafka
 ## Setup
-1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
 2. Your cluster's Istio Ingress gateway must be network accessible.
 3. Install Minio with following Minio deploy step.
 4. Use existing Kafka cluster or install Kafka on your cluster with [Confluent helm chart](https://www.confluent.io/blog/getting-started-apache-kafka-kubernetes/).

diff --git a/docs/samples/onnx/README.md b/docs/samples/onnx/README.md
@@ -1,7 +1,7 @@
 
 # Predict on a InferenceService using ONNX
 ## Setup
-1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
 2. Your cluster's Istio Ingress gateway must be network accessible.
 
 ## Create the InferenceService

diff --git a/docs/samples/pytorch/README.md b/docs/samples/pytorch/README.md
@@ -61,7 +61,7 @@ print(res.text)
 # Predict on a InferenceService using PyTorch
 
 ## Setup
-1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
 2. Your cluster's Istio Ingress gateway must be network accessible.
 
 ## Create the InferenceService

diff --git a/docs/samples/rollouts/README.md b/docs/samples/rollouts/README.md
@@ -2,7 +2,7 @@
 To test a canary rollout, you can use the canary.yaml, which declares a canary model that is set to receive 10% of requests.
 
 ## Setup
-1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
 2. Your cluster's Istio Ingress gateway must be network accessible.
 
 ## Create the InferenceService