-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #2: End to end model training/serving example using S3, Argo, and Kubeflow #42
Changes from 104 commits
26a13a4
83cf370
68bdb62
c9c7fd9
135cfd8
e3bb844
6305eca
747488f
2910012
c795661
5c6fdce
55cea99
b81752a
d760478
5b6f6b2
1e6a77e
067f237
c25d1e5
718441f
767a45c
2bb262a
5df6c33
e044a87
c7763ea
c519b46
a052a3d
6f9ff59
1d69ec0
cc5d3ec
3bc0c65
7163997
b1b2085
3c6f72a
8a10f70
e734d3d
9b10134
4bec712
b9aee49
af960ac
c7b9a81
bcb8c58
ea6bd32
7a2f2b3
536d64d
b75b7fe
46d811d
0241f41
98c99ae
1487dad
d3834f7
ca13b34
ac7475a
e05dbc4
91353cd
de2d2ac
10a50f0
d33cdc8
f263a0f
1d41e1a
9998b3f
bc22f94
8f1d8a1
799363e
72296f0
f89f554
9e65e44
5b45e1a
66f7bd5
987b1e1
22625e8
783e370
74c60f0
72254a3
4a3da68
6efdef2
d41e090
22360e0
522e75f
f0e05d3
c0de628
b12ab37
065921f
bd232a1
4fc2e11
609eea3
3c4770c
c2e87bf
5662edb
76075d3
d466411
fb169ed
2854dc5
18d467f
de4aede
e5989e1
7cf5508
13d26a9
1bc5611
8900d92
8be1415
60ee9f2
2f8a32c
914835d
0f7e9e5
d0c3608
7596396
8388749
32a3596
8650e2a
137d2d7
3b60c7d
1225714
9f14541
d357846
460f494
b462ae2
7498396
7b312bf
9773c6c
63cd9d5
4d3909f
7c5bc37
2589394
b1e9ad7
50c94a2
3465825
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
FROM ubuntu:16.04 | ||
|
||
ENV KUBECTL_VERSION v1.9.2 | ||
ENV KSONNET_VERSION 0.8.0 | ||
|
||
RUN apt-get update | ||
RUN apt-get -y install curl | ||
#RUN apk add --update ca-certificates openssl && update-ca-certificates | ||
|
||
RUN curl -O -L https://github.com/ksonnet/ksonnet/releases/download/v${KSONNET_VERSION}/ks_${KSONNET_VERSION}_linux_amd64.tar.gz | ||
RUN tar -zxvf ks_${KSONNET_VERSION}_linux_amd64.tar.gz -C /usr/bin/ --strip-components=1 ks_${KSONNET_VERSION}_linux_amd64/ks | ||
RUN chmod +x /usr/bin/ks | ||
|
||
RUN curl -L https://storage.googleapis.com/kubernetes-release/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl -o /usr/bin/kubectl | ||
RUN chmod +x /usr/bin/kubectl | ||
|
||
#ksonnet doesn't work without a kubeconfig, the following is just to add a utility to generate a kubeconfig from a service account. | ||
ADD https://raw.githubusercontent.com/zlabjp/kubernetes-scripts/cb265de1d4c4dc4ad0f15f4aaaf5b936dcf639a5/create-kubeconfig /usr/bin/ | ||
ADD https://raw.githubusercontent.com/zlabjp/kubernetes-scripts/cb265de1d4c4dc4ad0f15f4aaaf5b936dcf639a5/LICENSE.txt /usr/bin/create-kubeconfig.LICENSE | ||
RUN chmod +x /usr/bin/create-kubeconfig | ||
|
||
RUN kubectl config set-context default --cluster=default | ||
RUN kubectl config use-context default | ||
|
||
ENV USER root | ||
|
||
ADD ksonnet-entrypoint.sh / | ||
RUN chmod +x /ksonnet-entrypoint.sh | ||
|
||
ENTRYPOINT ["/ksonnet-entrypoint.sh"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
FROM elsonrodriguez/mytfserver:1.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this base image? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a base image containing the shim for TF_CONFIG and the stock grpc tf server. I ended up not using it in the example, however it helps reduce build time, and helps protect the model container from upstream changes (the official TF tags get overwritten sometimes) I can make it an un-namespaced local image. (and also add more comments) |
||
|
||
ADD model.py /opt/model.py | ||
ADD export.py /opt/export.py | ||
|
||
RUN chmod +x /opt/model.py | ||
RUN chmod +x /opt/export.py | ||
|
||
CMD ["python", "/opt/model.py"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
FROM gcr.io/kubeflow/jupyterhub-k8s:1.0.1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you need this for? tensorboard should be installed in the standard tensorflow Docker image. So if all you need is a Docker image with tensorboard can we just use the stock TensorFlow docker image? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I couldn't get it to work with the stock TF image. It might be a Tensorboard 1.5 issue or the way the image is built. But I just kept getting "No dashboards are active for the current data set." The combination of of TF 1.5 and Tensorboard 1.6 works fine. EDIT: Nevermind, I don't know what I was seeing, but 1.5.1 looks fine. Will remove this image from the guide. |
||
|
||
RUN pip install tensorboard==1.6.0 tensorflow==1.5.0 | ||
|
||
|
||
ENTRYPOINT ["/usr/local/bin/tensorboard", "--logdir"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#FROM tensorflow/tf_grpc_test_server:ccbc039fbe5a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's this for? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "1.5" image in TF was broken when I was writing the guide, and they didn't have any tag with 1.5 other than "latest". So I just made a note of the sha1. There's a new 1.5.1 image that might work, I'll try that. |
||
FROM tensorflow/tf_grpc_test_server:latest | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need our own custom gRPC server for parameter servers? If the code is using the tf.Estimator API and you call train_and_evaluate I think that will automatically start PS as needed based on TF_CONFIG which calls run_ps There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah this is confusing, and I ended up not using the standard grpc server. I think I'm going to strip out references to it other than as an initial base image for the model. |
||
ADD tf_job_shim.py /tf_job_shim.py | ||
RUN chmod +x /tf_job_shim.py | ||
|
||
ENTRYPOINT ["/tf_job_shim.py"] | ||
CMD ["python", "/var/tf-k8s/server/grpc_tensorflow_server.py"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this Dockerfile for? Is this for boot strapping?
Could you add a comment explaining that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to run ksonnet in a container in the workflow. I'll add a comment.