-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Injecting S3 Credentials into tfserving Container #749
Comments
The solution is discussed in #748 @ryandawsonuk can expand on where we are with it. |
In the latest snapshot version the code is now in place for a SeldonDeployment to work like https://github.com/kubeflow/kfserving/tree/9b14e08038405348ac21f1acbbf8b26e4c26631d/docs/samples/s3 So you'd create the Secret and ServiceAccount like in that example and in the SeldonDeployment specify a serviceAccountName alsongside the modelUri like:
I've checked that secrets do get applied to the pod like this. I haven't fully checked that a model is downloaded from a private bucket, though as far as I can see it should work. @devstein if you'd like to try it on the latest snapshot version it would certainly be appreciated. |
@ryandawsonuk Thanks for fixing. After updating my deployment, I'm getting the following error from the
|
Could you do a |
I saw the same error trace with the yaml file defined here |
Can we support the injection of credential environment variables via secretRef instead of via serviceAccount? This is the hack we patched our operator with to keep it compatible with how it used to work.
|
apiVersion: v1
kind: Pod
metadata:
annotations:
prometheus.io/path: prometheus
prometheus.io/port: "8000"
prometheus.io/scrape: "true"
creationTimestamp: "2019-08-09T17:34:07Z"
generateName: tf-model-default-6b6f7df5c7-
labels:
app: tf-model-default
fluentd: "true"
pod-template-hash: "2629389173"
seldon-app: tf-model-tf-model-default
seldon-app-tf-model: tf-model-default-tf-model-seldonio-tfserving-proxy-rest-0-3
seldon-deployment-id: tf-model-tf-model
version: default
name: tf-model-default-6b6f7df5c7-9mp7g
namespace: seldon-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: tf-model-default-6b6f7df5c7
uid: e3890a0e-bacb-11e9-9b1e-0a666f37ba34
resourceVersion: "20280551"
selfLink: /api/v1/namespaces/seldon-system/pods/tf-model-default-6b6f7df5c7-9mp7g
uid: e3918f1d-bacb-11e9-9b1e-0a666f37ba34
spec:
containers:
- env:
- name: PREDICTIVE_UNIT_PARAMETERS
value: '[{"name":"signature_name","value":"predict","type":"STRING"},{"name":"model_name","value":"tf-model","type":"STRING"},{"name":"rest_endpoint","value":"http://0.0.0.0:2001","type":"STRING"}]'
- name: PREDICTIVE_UNIT_SERVICE_PORT
value: "9000"
- name: PREDICTIVE_UNIT_ID
value: tf-model
- name: PREDICTOR_ID
value: default
- name: SELDON_DEPLOYMENT_ID
value: tf-model
image: seldonio/tfserving-proxy_rest:0.3
imagePullPolicy: Always
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- /bin/sleep 10
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 5
successThreshold: 1
tcpSocket:
port: http
timeoutSeconds: 1
name: tf-model
ports:
- containerPort: 9000
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
tcpSocket:
port: http
timeoutSeconds: 1
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-dnnrf
readOnly: true
- args:
- /usr/bin/tensorflow_model_server
- --port=2000
- --rest_api_port=2001
- --model_name=tf-model
- --model_base_path=/mnt/models
image: tensorflow/serving:latest
imagePullPolicy: IfNotPresent
name: tfserving
ports:
- containerPort: 2000
protocol: TCP
- containerPort: 2001
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /mnt/models
name: tfserving-provision-location
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-dnnrf
readOnly: true
- env:
- name: ENGINE_PREDICTOR
value: eyJuYW1lIjoiZGVmYXVsdCIsImdyYXBoIjp7Im5hbWUiOiJ0Zi1tb2RlbCIsInR5cGUiOiJNT0RFTCIsImltcGxlbWVudGF0aW9uIjoiVEVOU09SRkxPV19TRVJWRVIiLCJlbmRwb2ludCI6eyJzZXJ2aWNlX2hvc3QiOiJsb2NhbGhvc3QiLCJzZXJ2aWNlX3BvcnQiOjkwMDAsInR5cGUiOiJSRVNUIn0sInBhcmFtZXRlcnMiOlt7Im5hbWUiOiJzaWduYXR1cmVfbmFtZSIsInZhbHVlIjoicHJlZGljdCIsInR5cGUiOiJTVFJJTkcifSx7Im5hbWUiOiJtb2RlbF9uYW1lIiwidmFsdWUiOiJ0Zi1tb2RlbCIsInR5cGUiOiJTVFJJTkcifV0sIm1vZGVsVXJpIjoiczM6Ly9tb2RlbHMudmlhZHVjdC5haS9kZXZzdGVpbi9zYXZlLWxvY2F0aW9uLW1vZGVsLWFzLXRmL2xvY2F0aW9uLW1vZGVsLXRmLXNlcnZpbmctLW5pZ2h0bHktLTItYjRkd2QvbW9kZWwiLCJzZXJ2aWNlQWNjb3VudE5hbWUiOiJzZWxkb24tY29yZS1vcGVyYXRvciJ9LCJyZXBsaWNhcyI6MSwiZW5naW5lUmVzb3VyY2VzIjp7fSwibGFiZWxzIjp7InZlcnNpb24iOiJkZWZhdWx0In0sInN2Y09yY2hTcGVjIjp7fSwiZXhwbGFpbmVyIjp7ImNvbnRhaW5lclNwZWMiOnsibmFtZSI6IiIsInJlc291cmNlcyI6e319fX0=
- name: DEPLOYMENT_NAME
value: tf-model
- name: DEPLOYMENT_NAMESPACE
value: seldon-system
- name: ENGINE_SERVER_PORT
value: "8000"
- name: ENGINE_SERVER_GRPC_PORT
value: "5001"
- name: JAVA_OPTS
value: -Dcom.sun.management.jmxremote.rmi.port=9090 -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9090 -Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false
-Djava.rmi.server.hostname=127.0.0.1
- name: SELDON_LOG_MESSAGES_EXTERNALLY
value: "false"
image: docker.io/seldonio/engine:0.3.2-SNAPSHOT
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- curl 127.0.0.1:8000/pause; /bin/sleep 10
livenessProbe:
failureThreshold: 7
httpGet:
path: /live
port: admin
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 2
name: seldon-container-engine
ports:
- containerPort: 8000
protocol: TCP
- containerPort: 5001
protocol: TCP
- containerPort: 8082
name: admin
protocol: TCP
- containerPort: 9090
name: jmx
protocol: TCP
readinessProbe:
failureThreshold: 1
httpGet:
path: /ready
port: admin
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 2
resources:
requests:
cpu: 100m
securityContext:
runAsUser: 8888
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/podinfo
name: podinfo
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-dnnrf
readOnly: true
dnsPolicy: ClusterFirst
initContainers:
- args:
- s3://path/to/model
- /mnt/models
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: awsAccessKeyID
name: seldon-aws-creds
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: awsSecretAccessKey
name: seldon-aws-creds
image: gcr.io/kfserving/model-initializer:latest
imagePullPolicy: IfNotPresent
name: tfserving-model-initializer
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /mnt/models
name: tfserving-provision-location
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-dnnrf
readOnly: true
nodeName: ip-123-123.us-west-2.compute.internal
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 20
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: tfserving-provision-location
- downwardAPI:
defaultMode: 420
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.annotations
path: annotations
name: podinfo
- name: default-token-dnnrf
secret:
defaultMode: 420
secretName: default-token-dnnrf
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-08-09T17:34:07Z"
message: 'containers with incomplete status: [tfserving-model-initializer]'
reason: ContainersNotInitialized
status: "False"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2019-08-09T17:34:07Z"
message: 'containers with unready status: [tf-model tfserving seldon-container-engine]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
message: 'containers with unready status: [tf-model tfserving seldon-container-engine]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2019-08-09T17:34:07Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: docker.io/seldonio/engine:0.3.2-SNAPSHOT
imageID: ""
lastState: {}
name: seldon-container-engine
ready: false
restartCount: 0
state:
waiting:
reason: PodInitializing
- image: seldonio/tfserving-proxy_rest:0.3
imageID: ""
lastState: {}
name: tf-model
ready: false
restartCount: 0
state:
waiting:
reason: PodInitializing
- image: tensorflow/serving:latest
imageID: ""
lastState: {}
name: tfserving
ready: false
restartCount: 0
state:
waiting:
reason: PodInitializing
hostIP: 172.20.38.82
initContainerStatuses:
- containerID: docker://b613cb57827e9941c4bc25da0674714f4039a5d2c9b2112ad7e76d83fb5aef37
image: gcr.io/kfserving/model-initializer:latest
imageID: docker-pullable://gcr.io/kfserving/model-initializer@sha256:9aebf5116f2186eae53d9b9e73697e6ca340b9d4a65ea2ca66fbbcdc76d030bb
lastState:
terminated:
containerID: docker://b613cb57827e9941c4bc25da0674714f4039a5d2c9b2112ad7e76d83fb5aef37
exitCode: 1
finishedAt: "2019-08-09T17:34:57Z"
reason: Error
startedAt: "2019-08-09T17:34:57Z"
name: tfserving-model-initializer
ready: false
restartCount: 3
state:
waiting:
message: Back-off 40s restarting failed container=tfserving-model-initializer
pod=tf-model-default-6b6f7df5c7-9mp7g_seldon-system(e3918f1d-bacb-11e9-9b1e-0a666f37ba34)
reason: CrashLoopBackOff
phase: Pending
podIP: 100.96.2.26
qosClass: Burstable
startTime: "2019-08-09T17:34:07Z" |
I believe the PRs are merged now for this so will close. Please reopen if mistaken. |
Hi I'm getting the error with SKLEARN_SERVER Traceback (most recent call last): Secret created : Seldon-installed : Yaml similar to iris example following parameters
|
Secondly what I wanted to understand is that if IAM role is attached to an EC2 instance then is there a provision to fetch model from s3 without any of these configurations? |
@arunbenoyv Have you run through the docs?: https://docs.seldon.io/projects/seldon-core/en/latest/servers/overview.html#handling-credentials Which version of Seldon are you running? If not latest then could it be the issue fixed by: #885 |
This is a cool idea but I think the answer is no right now. The download code expects to be hooked up with credentials in order to make the download. I suspect that even if the EC2 instance were given the role, that role wouldn't be used automatically by the code running in the kubernetes node. The code being used borrows from minio and there's a similar discussion on the minio github - minio/minio#6124. There's also a related idea being floated in #1865 |
@cliveseldon : As I have provided the repo in helm install my understanding was that it would pull the latest helm install seldon-core seldon-core-operator --repo https://storage.googleapis.com/seldon-charts --set usageMetrics.enabled=true --set predictiveUnit.defaultEnvSecretRefName=seldon-core-init-container-secret From describing the pod the image tag is : docker.io/seldonio/seldon-core-operator:1.1.0. Is the error expected with this version? |
Please consider increasing priority on developing IAM Role capability. This is a critically important security feature for enterprises as distributing keys is a security concern. We only use IAM roles now and temporary keys that expire after a specified number of hours after lessons learned from other companies' high-profile data breaches. |
@DFuller134 we have production use-cases of seldon integrated with IAM role capability, the way that it's set up atm is using boto3 to download the artifacts. Because of this I can confirm this can be done today, the two ways you can approach this is either by extending/creating your own initContainer that uses boto3, or by adding this functionality to use boto3 to load the model in your python model server wrapper's init function. @adriangonz can provide more details as he has seen this usecase runnign in production clusters, and should be enough to address your needs - this is the main reason why we made initContainers disjoint as they are very easy to extend (and alternatively addit it to the wrapper itself) |
@arunbenoyv looking from: https://github.com/SeldonIO/seldon-core/blob/master/python/seldon_core/storage.py , your AWS_ENDPOINT_URL should be 'http://s3.amazonaws.com'. |
Hello, Secret tried multiple variations of AWS_ENDPOINT_URL
MLFlow Server serving yaml Error logs from the init-model container |
hello, I had similar issue as @dping1 with AWS S3.
AWS_ENDPOINT_URL = https://s3.eu-central-1.amazonaws.com ( based on my bucket location, EU central)
|
Given the
SeldonDeployment
below, I want to inject AWS credentials in thetfserving
container. Is there a way to do this?The text was updated successfully, but these errors were encountered: