Skip to content

Commit

Permalink
Merge 4247233 into 448aad3
Browse files Browse the repository at this point in the history
  • Loading branch information
Jagadeesh J authored Aug 24, 2023
2 parents 448aad3 + 4247233 commit 2324175
Show file tree
Hide file tree
Showing 9 changed files with 132 additions and 11 deletions.
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@ repos:
- id: python-no-log-warn
- id: python-use-type-annotations
- repo: https://github.com/hadialqattan/pycln
rev: v2.1.3
rev: v2.1.5
hooks:
- id: pycln
args: [--all]
- repo: https://github.com/psf/black
rev: 23.1.0
rev: 23.7.0
hooks:
- id: black
additional_dependencies: ['click==8.0.4']
Expand Down
6 changes: 5 additions & 1 deletion kubernetes/Helm/templates/torchserve.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ spec:
- name: metrics
port: {{ .Values.torchserve.metrics_port }}
targetPort: ts-metrics
type: LoadBalancer
- name: grpc
port: {{ .Values.torchserve.grpc_inference_port }}
targetPort: ts-grpc
selector:
app: torchserve
---
Expand Down Expand Up @@ -55,6 +57,8 @@ spec:
containerPort: {{ .Values.torchserve.management_port }}
- name: ts-metrics
containerPort: {{ .Values.torchserve.metrics_port }}
- name: ts-grpc
containerPort: {{ .Values.torchserve.grpc_inference_port }}
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: {{ .Values.torchserve.pvd_mount }}
Expand Down
4 changes: 3 additions & 1 deletion kubernetes/Helm/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@ torchserve:
management_port: 8081
inference_port: 8080
metrics_port: 8082
grpc_inference_port: 7070

pvd_mount: /home/model-server/shared/
n_gpu: 4
n_cpu: 16
memory_limit: 32Gi

deployment:
replicas: 1
replicas: 2

persistentVolume:
name: efs-claim
46 changes: 46 additions & 0 deletions kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ torchserve:
management_port: 8081
inference_port: 8080
metrics_port: 8082
grpc_inference_port: 7070
pvd_mount: /home/model-server/shared/
n_gpu: 1
n_cpu: 1
Expand Down Expand Up @@ -290,6 +291,51 @@ Follow the link for log aggregation with EFK Stack.\
## Autoscaling
[Autoscaling with torchserve metrics](autoscale.md)

## Session Affinity with Multiple Torchserve pods

### Pre-requisites

- Follow the instructions above and deploy Torchserve with more than 1 replica to the kubernetes cluster
- Download Istio and add to path as shown [here](https://istio.io/latest/docs/setup/getting-started/#download)
- Install Istio with below command
- `istioctl install --set meshConfig.accessLogFile=/dev/stdout`

### Steps

Now we have multiple replicas of Torchserve running and istio installed. We can apply gateway, virtual service and destination rule to enable session affinity to the user requests.

- Apply the istio gateway via `kubectl apply -f gateway.yaml`
- This gateway exposes all the host behind it via port 80 as defined in the yaml file.
- Apply the virtual service with command `kubectl apply -f virtual_service.yaml`
- This with look for header named `protocol` in the incoming request and forward the request to Torchserve service. If the `protocol` header has a value `rest` then the request is forwarded to port `8080` of Torchserve service and if the `protocol` header has a value `grpc` then the request is forwarded to port `7070` for Torchserve service.
- Apply the destination Rule using the command `kubectl apply -f destination_rule.yaml`.
- The destination rule look for a http cookie with a key `session_id`. The request with `session_id` is served by the same pod that served the previous request with the same `session_id`

### HTTP Inference

- Fetch the external IP from istio-ingress gateway using the below command

```bash
ubuntu@ubuntu$ kubectl get svc -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 10.100.84.243 a918b2zzzzzzzzzzzzzzzzzzzzzz-1466623565.us-west-2.elb.amazonaws.com 15021:32270/TCP,80:31978/TCP,443:31775/TCP,70:31778/TCP 2d6h
```

- Make Request as shown below

```bash
curl -v -H "protocol: REST" --cookie "session_id="12345" http://a918b2d70dbddzzzzzzzzzzz49ec8cf03b-1466623565.us-west-2.elb.amazonaws.com:80/predictions/<model_name> -d "data=<input-string>"
```

### gRPC Inference

- Refer [grpc_api](../docs/grpc_api.md) to generate python files and run

```bash
python ts_scripts/torchserve_grpc_client.py infer <model_name> <input-string>
```


## Roadmap

* [] Log / Metrics Aggregation using [AWS Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html)
Expand Down
13 changes: 13 additions & 0 deletions kubernetes/destination_rule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: torchserve-dr
spec:
host: torchserve.default.svc.cluster.local # <ts-service-name>.<namespace>.svc.cluster.local
trafficPolicy:
loadBalancer:
consistentHash:
# httpHeaderName: x-user
httpCookie:
name: session_id
ttl: 60s
14 changes: 14 additions & 0 deletions kubernetes/gateway.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: torchserve-gw
spec:
selector:
istio: ingressgateway
servers:
- hosts:
- '*'
port:
name: http
number: 80
protocol: HTTP
36 changes: 36 additions & 0 deletions kubernetes/virtual_service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: torchserve-vs
spec:
hosts:
- "*"
gateways:
- torchserve-gw
http:
- match:
- uri:
prefix: /metrics
route:
- destination:
host: torchserve.default.svc.cluster.local
port:
number: 8082
- match:
- headers:
protocol:
exact: REST
route:
- destination:
host: torchserve.default.svc.cluster.local # <ts-service-name>.<namespace>.svc.cluster.local
port:
number: 8080
- match:
- headers:
protocol:
exact: gRPC
route:
- destination:
host: torchserve.default.svc.cluster.local # <ts-service-name>.<namespace>.svc.cluster.local
port:
number: 7070
3 changes: 3 additions & 0 deletions ts_scripts/spellcheck_conf/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1065,6 +1065,9 @@ ActionSLAM
statins
ci
chatGPT
accessLogFile
istioctl
meshConfig
baseimage
cuDNN
Xformer
Expand Down
17 changes: 10 additions & 7 deletions ts_scripts/torchserve_grpc_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,14 @@ def get_management_stub():
return stub


def infer(stub, model_name, model_input):
def infer(stub, model_name, model_input, metadata):
with open(model_input, "rb") as f:
data = f.read()

input_data = {"data": data}
response = stub.Predictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data)
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data),
metadata=metadata,
)

try:
Expand All @@ -35,13 +36,14 @@ def infer(stub, model_name, model_input):
exit(1)


def infer_stream(stub, model_name, model_input):
def infer_stream(stub, model_name, model_input, metadata):
with open(model_input, "rb") as f:
data = f.read()

input_data = {"data": data}
responses = stub.StreamPredictions(
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data)
inference_pb2.PredictionsRequest(model_name=model_name, input=input_data),
metadata=metadata,
)

try:
Expand Down Expand Up @@ -92,7 +94,6 @@ def unregister(stub, model_name):


if __name__ == "__main__":

parent_parser = argparse.ArgumentParser(add_help=False)
parent_parser.add_argument(
"model_name",
Expand Down Expand Up @@ -141,10 +142,12 @@ def unregister(stub, model_name):

args = parser.parse_args()

metadata = (("protocol", "gRPC"), ("session_id", "12345"))

if args.action == "infer":
infer(get_inference_stub(), args.model_name, args.model_input)
infer(get_inference_stub(), args.model_name, args.model_input, metadata)
elif args.action == "infer_stream":
infer_stream(get_inference_stub(), args.model_name, args.model_input)
infer_stream(get_inference_stub(), args.model_name, args.model_input, metadata)
elif args.action == "register":
register(get_management_stub(), args.model_name, args.mar_set)
elif args.action == "unregister":
Expand Down

0 comments on commit 2324175

Please sign in to comment.