A demonstration of the autoscaling capabilities of a Knative Serving Revision.
- A Kubernetes cluster with Knative Serving installed.
- A metrics installation for viewing scaling graphs (optional).
- Install Docker.
- Check out the code:
go get -d github.com/knative/docs/serving/samples/autoscale-go
Build the application container and publish it to a container registry:
-
Move into the sample directory:
cd $GOPATH/src/github.com/knative/docs
-
Set your preferred container registry:
export REPO="gcr.io/<YOUR_PROJECT_ID>"
- This example shows how to use Google Container Registry (GCR). You will need a Google Cloud Project and to enable the Google Container Registry API.
-
Use Docker to build your application container:
docker build \ --tag "${REPO}/serving/samples/autoscale-go" \ --file=serving/samples/autoscale-go/Dockerfile .
-
Push your container to a container registry:
docker push "${REPO}/serving/samples/autoscale-go"
-
Replace the image reference with our published image:
perl -pi -e \ "s@github.com/knative/docs/serving/samples/autoscale-go@${REPO}/serving/samples/autoscale-go@g" \ serving/samples/autoscale-go/service.yaml
-
Deploy the Knative Serving sample:
kubectl apply --filename serving/samples/autoscale-go/service.yaml
-
Find the ingress hostname and IP and export as an environment variable:
export IP_ADDRESS=`kubectl get svc knative-ingressgateway --namespace istio-system --output jsonpath="{.status.loadBalancer.ingress[*].ip}"`
-
Make a request to the autoscale app to see it consume some resources.
curl --header "Host: autoscale-go.default.example.com" "http://${IP_ADDRESS?}?sleep=100&prime=10000&bloat=5"
Allocated 5 Mb of memory. The largest prime less than 10000 is 9973. Slept for 100.13 milliseconds.
-
Ramp up traffic to maintain 10 in-flight requests.
go run serving/samples/autoscale-go/test/test.go -sleep 100 -prime 10000 -bloat 5 -qps 9999 -concurrency 300
REQUEST STATS: Total: 439 Inflight: 299 Done: 439 Success Rate: 100.00% Avg Latency: 0.4655 sec Total: 1151 Inflight: 245 Done: 712 Success Rate: 100.00% Avg Latency: 0.4178 sec Total: 1706 Inflight: 300 Done: 555 Success Rate: 100.00% Avg Latency: 0.4794 sec Total: 2334 Inflight: 264 Done: 628 Success Rate: 100.00% Avg Latency: 0.5207 sec Total: 2911 Inflight: 300 Done: 577 Success Rate: 100.00% Avg Latency: 0.4401 sec ...
Note: Use CTRL+C to exit the load test.
-
Watch the Knative Serving deployment pod count increase.
kubectl get deploy --watch
Note: Use CTRL+C to exit watch mode.
Knative Serving autoscaling is based on the average number of in-flight requests per pod (concurrency). The system has a default target concurency of 100.0.
For example, if a Revision is receiving 350 requests per second, each of which takes about about .5 seconds, Knative Serving will determine the Revision needs about 2 pods
350 * .5 = 175
175 / 100 = 1.75
ceil(1.75) = 2 pods
By default Knative Serving does not limit concurrency in Revision containers. A limit can be set per-Configuration using the ContainerConcurrency
field. The autoscaler will target a percentage of ContainerConcurrency
instead of the default 100.0
.
View the Knative Serving Scaling and Request dashboards (if configured).
kubectl port-forward --namespace monitoring $(kubectl get pods --namespace monitoring --selector=app=grafana --output=jsonpath="{.items..metadata.name}") 3000
-
Maintain 1000 concurrent requests.
go run serving/samples/autoscale-go/test/test.go -qps 9999 -concurrency 1000
-
Maintain 100 qps with fast requests.
go run serving/samples/autoscale-go/test/test.go -qps 100 -concurrency 9999
-
Maintain 100 qps with slow requests.
go run serving/samples/autoscale-go/test/test.go -qps 100 -concurrency 9999 -sleep 500
-
Heavy CPU usage.
go run serving/samples/autoscale-go/test/test.go -qps 9999 -concurrency 10 -prime 40000000
-
Heavy memory usage.
go run serving/samples/autoscale-go/test/test.go -qps 9999 -concurrency 5 -bloat 1000
kubectl delete --filename serving/samples/autoscale-go/service.yaml