E2E Test for TFServing with GPUs #291

jlewi · 2018-02-24T22:49:10Z

We need an E2E test for TF Serving with GPUs.

As part of this we should built it continuously with prow.

jlewi · 2018-03-05T22:45:13Z

I have some cycles to work on this.
I'm going to start by adding a ksonnet component to our E2E test to deploy with GPUs.

…GPUs. * This is the first step to creating an E2E for the GPU serving kubeflow#291. * This deployment is suitable for testing that we can deploy the GPU container and not have it crash because of linking errors. * This caught a bug in the Dockerfile. * Fix the Docker file for the GPU image; we need to remove the symbolic links from /usr/local/nvidia to /usr/local/cuda * On GKE the device plugin will make drivers available at /usr/local/nvidia and we don't want this to override /usr/local/cuda Related to kubeflow#291

jlewi · 2018-03-07T00:17:32Z

Bump to P1 since we want to have GPU serving in our 0.1 release.

@lluunn How can we serve a model on GPUs and verify that GPUs were actually used?

…GPUs. (#362) * This is the first step to creating an E2E for the GPU serving #291. * This deployment is suitable for testing that we can deploy the GPU container and not have it crash because of linking errors. * This caught a bug in the Dockerfile. * Fix the Docker file for the GPU image; we need to remove the symbolic links from /usr/local/nvidia to /usr/local/cuda * On GKE the device plugin will make drivers available at /usr/local/nvidia and we don't want this to override /usr/local/cuda Related to #291

jlewi · 2018-03-07T16:44:23Z

https://stackoverflow.com/questions/42630762/how-to-verify-tensorflow-serving-is-using-gpus-on-a-gpu-instance
Suggests looking at nvidia-smi output
Some of these metrics should now be available via stackdriver

jlewi · 2018-03-07T23:56:54Z

@lluunn This is blocked on #292 and the changes I requested in #383 to pass in a list of parameters to set on the ksonnet component.

Once those are fixed do you want to pick this up? I think the next step would be adding appropriate steps to our E2E workflow to run the test using GPUs just like we do with CPUs.

lluunn · 2018-03-09T18:32:25Z

I am changing the cluster to kubeflow-ci, which has GPU pool.
kubeflow/testing#18

jlewi · 2018-03-19T12:39:08Z

@lluunn Any update on the E2E test?

lluunn · 2018-03-19T17:31:29Z

WIP #442

jlewi added the area/inference label Feb 24, 2018

jlewi assigned lluunn Feb 24, 2018

This was referenced Feb 24, 2018

Add GPU Support for k8s-model-server on Kubeflow #194

Closed

feat(model-server): add Dockerfile of model-server with gpu support #210

Merged

jlewi assigned jlewi and unassigned lluunn Mar 5, 2018

jlewi added the priority/p2 label Mar 6, 2018

jlewi mentioned this issue Mar 6, 2018

Create a GPU model deployment to use for E2E testing of serving with GPUs #362

Merged

jlewi added priority/p1 and removed priority/p2 labels Mar 7, 2018

lluunn assigned lluunn and unassigned jlewi Mar 9, 2018

lluunn mentioned this issue Mar 16, 2018

E2e test for TF serving with GPU #442

Merged

k8s-ci-robot closed this as completed in #442 Mar 22, 2018

elenzio9 pushed a commit to arrikto/kubeflow that referenced this issue Oct 31, 2022

Add in alexlatchford to Kubeflow contributors (kubeflow#291)

d212eaf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2E Test for TFServing with GPUs #291

E2E Test for TFServing with GPUs #291

jlewi commented Feb 24, 2018 •

edited

Loading

jlewi commented Mar 5, 2018

jlewi commented Mar 7, 2018

jlewi commented Mar 7, 2018

jlewi commented Mar 7, 2018

lluunn commented Mar 9, 2018

jlewi commented Mar 19, 2018

lluunn commented Mar 19, 2018

E2E Test for TFServing with GPUs #291

E2E Test for TFServing with GPUs #291

Comments

jlewi commented Feb 24, 2018 • edited Loading

jlewi commented Mar 5, 2018

jlewi commented Mar 7, 2018

jlewi commented Mar 7, 2018

jlewi commented Mar 7, 2018

lluunn commented Mar 9, 2018

jlewi commented Mar 19, 2018

lluunn commented Mar 19, 2018

jlewi commented Feb 24, 2018 •

edited

Loading