Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add session affinity to k8s TS #2519

Merged
merged 10 commits into from
Aug 24, 2023
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@ repos:
- id: python-no-log-warn
- id: python-use-type-annotations
- repo: https://github.com/hadialqattan/pycln
rev: v2.1.3
rev: v2.1.5
hooks:
- id: pycln
args: [--all]
- repo: https://github.com/psf/black
rev: 23.1.0
rev: 23.7.0
hooks:
- id: black
additional_dependencies: ['click==8.0.4']
Expand Down
6 changes: 5 additions & 1 deletion kubernetes/Helm/templates/torchserve.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ spec:
- name: metrics
port: {{ .Values.torchserve.metrics_port }}
targetPort: ts-metrics
type: LoadBalancer
- name: grpc
port: {{ .Values.torchserve.grpc_inference_port }}
targetPort: ts-grpc
selector:
app: torchserve
---
Expand Down Expand Up @@ -55,6 +57,8 @@ spec:
containerPort: {{ .Values.torchserve.management_port }}
- name: ts-metrics
containerPort: {{ .Values.torchserve.metrics_port }}
- name: ts-grpc
containerPort: {{ .Values.torchserve.grpc_inference_port }}
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: {{ .Values.torchserve.pvd_mount }}
Expand Down
4 changes: 3 additions & 1 deletion kubernetes/Helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@ torchserve:
management_port: 8081
inference_port: 8080
metrics_port: 8082
grpc_inference_port: 7070

pvd_mount: /home/model-server/shared/
n_gpu: 4
n_cpu: 16
memory_limit: 32Gi

deployment:
replicas: 1
replicas: 2

persistentVolume:
name: efs-claim
46 changes: 46 additions & 0 deletions kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ torchserve:
management_port: 8081
inference_port: 8080
metrics_port: 8082
grpc_inference_port: 7070
pvd_mount: /home/model-server/shared/
n_gpu: 1
n_cpu: 1
Expand Down Expand Up @@ -290,6 +291,51 @@ Follow the link for log aggregation with EFK Stack.\
## Autoscaling
[Autoscaling with torchserve metrics](autoscale.md)

## Session Affinity with Multiple Torchserve pods

### Pre-requisites

- Follow the instructions above and deploy Torchserve with more than 1 replica to the kubernetes cluster
- Download Istio and add to path as shown [here](https://istio.io/latest/docs/setup/getting-started/#download)
- Install Istio with below command
- `istioctl install --set meshConfig.accessLogFile=/dev/stdout`

### Steps

Now we have multiple replicas of Torchserve running and istio installed. We can apply gateway, virtual service and destination rule to enable session affinity to the user requests.

- Apply the istio gateway via `kubectl apply -f gateway.yaml`
- This gateway exposes all the host behind it via port 80 as defined in the yaml file.
- Apply the virtual service with command `kubectl apply -f virtual_service.yaml`
- This with look for header named `protocol` in the incoming request and forward the request to Torchserve service. If the `protocol` header has a value `rest` then the request is forwarded to port `8080` of Torchserve service and if the `protocol` header has a value `grpc` then the request is forwarded to port `7070` for Torchserve service.
- Apply the destination Rule using the command `kubectl apply -f destination_rule.yaml`.
- The destination rule look for a http cookie with a key `session_id`. The request with `session_id` is served by the same pod that served the previous request with the same `session_id`

### HTTP Inference

- Fetch the external IP from istio-ingress gateway using the below command

```bash
ubuntu@ubuntu$ kubectl get svc -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 10.100.84.243 a918b2zzzzzzzzzzzzzzzzzzzzzz-1466623565.us-west-2.elb.amazonaws.com 15021:32270/TCP,80:31978/TCP,443:31775/TCP,70:31778/TCP 2d6h
```

- Make Request as shown below

```bash
curl -v -H "protocol: REST" --cookie "session_id="12345" http://a918b2d70dbddzzzzzzzzzzz49ec8cf03b-1466623565.us-west-2.elb.amazonaws.com:80/predictions/<model_name> -d "data=<input-string>"
```

### gRPC Inference

- Refer [grpc_api](../docs/grpc_api.md) to generate python files and run

```bash
python ts_scripts/torchserve_grpc_client.py infer <model_name> <input-string>
```


## Roadmap

* [] Log / Metrics Aggregation using [AWS Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html)
Expand Down
13 changes: 13 additions & 0 deletions kubernetes/destination_rule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: torchserve-dr
spec:
host: torchserve.default.svc.cluster.local # <ts-service-name>.<namespace>.svc.cluster.local
trafficPolicy:
loadBalancer:
consistentHash:
# httpHeaderName: x-user
httpCookie:
name: session_id
ttl: 60s
14 changes: 14 additions & 0 deletions kubernetes/gateway.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: torchserve-gw
spec:
selector:
istio: ingressgateway
servers:
- hosts:
- '*'
port:
name: http
number: 80
protocol: HTTP
36 changes: 36 additions & 0 deletions kubernetes/virtual_service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: torchserve-vs
spec:
hosts:
- "*"
gateways:
- torchserve-gw
http:
- match:
- uri:
prefix: /metrics
route:
- destination:
host: torchserve.default.svc.cluster.local
port:
number: 8082
- match:
- headers:
protocol:
exact: REST
route:
- destination:
host: torchserve.default.svc.cluster.local # <ts-service-name>.<namespace>.svc.cluster.local
port:
number: 8080
- match:
- headers:
protocol:
exact: gRPC
route:
- destination:
host: torchserve.default.svc.cluster.local # <ts-service-name>.<namespace>.svc.cluster.local
port:
number: 7070
3 changes: 3 additions & 0 deletions ts_scripts/spellcheck_conf/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1065,6 +1065,9 @@ ActionSLAM
statins
ci
chatGPT
accessLogFile
istioctl
meshConfig
baseimage
cuDNN
Xformer
Expand Down
17 changes: 10 additions & 7 deletions ts_scripts/torchserve_grpc_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,14 @@ def get_management_stub():
return stub


def infer(stub, model_name, model_input):
def infer(stub, model_name, model_input, metadata):
with open(model_input, "rb") as f:
data = f.read()

input_data = {"data": data}
response = stub.Predictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data)
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data),
metadata=metadata,
)

try:
Expand All @@ -35,13 +36,14 @@ def infer(stub, model_name, model_input):
exit(1)


def infer_stream(stub, model_name, model_input):
def infer_stream(stub, model_name, model_input, metadata):
with open(model_input, "rb") as f:
data = f.read()

input_data = {"data": data}
responses = stub.StreamPredictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data)
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data),
metadata=metadata,
)

try:
Expand Down Expand Up @@ -92,7 +94,6 @@ def unregister(stub, model_name):


if __name__ == "__main__":

parent_parser = argparse.ArgumentParser(add_help=False)
parent_parser.add_argument(
"model_name",
Expand Down Expand Up @@ -141,10 +142,12 @@ def unregister(stub, model_name):

args = parser.parse_args()

metadata = (("protocol", "gRPC"), ("session_id", "12345"))

if args.action == "infer":
infer(get_inference_stub(), args.model_name, args.model_input)
infer(get_inference_stub(), args.model_name, args.model_input, metadata)
elif args.action == "infer_stream":
infer_stream(get_inference_stub(), args.model_name, args.model_input)
infer_stream(get_inference_stub(), args.model_name, args.model_input, metadata)
elif args.action == "register":
register(get_management_stub(), args.model_name, args.mar_set)
elif args.action == "unregister":
Expand Down