Add support for requesting GPUs #509

tkanng · 2019-06-05T09:19:20Z

Hi, this PR adds support for requesting nvidia GPUs, according to #426.

It might be useful for ML programs. And I'm not sure it's good enough, so please help me check it again.

Thanks!

googlebot · 2019-06-05T09:19:23Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

tkanng · 2019-06-05T09:22:49Z

I signed it!

googlebot · 2019-06-05T09:22:52Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

liyinan926 · 2019-06-06T15:04:38Z

pkg/apis/sparkoperator.k8s.io/v1beta1/types.go

@@ -346,6 +346,9 @@ type SparkPodSpec struct {
 	// MemoryOverhead is the amount of off-heap memory to allocate in cluster mode, in MiB unless otherwise specified.
 	// Optional.
 	MemoryOverhead *string `json:"memoryOverhead,omitempty"`
+	// GPU is the number of nvidia.com/gpu to request for the pod
+	// Optional.
+	GPU *int64 `json:"gpu,omitempty"`


We can make a struct type that looks like the following to support all GPU types from all vendors:

type GPUSpec struct { Name string `json:"name"` Quantity int64 `json:"quantity"` }

Then this field becomes:

GPU *GPUSpec `json:"gpu,omitempty"`

liyinan926 · 2019-06-06T15:06:03Z

pkg/webhook/patch_test.go

+		},
+	}
+	tests := []testcase{
+		{nil,


nil should be on a new line.

tkanng · 2019-06-06T17:19:39Z

Thank you for your patience! The latest commit adds GPUSpec type and refines corresponding unit tests. Users can specify gpu field, just like this:

  executor:
   # cores: 1
    instances: 1
   # memory: "512m"
    gpu: 
      name: example.com/gpu 
      quantity: 1

liyinan926

Looks good to me overall. Left a couple of minor comments. Please add a section to the user guide on how to specify and use gpus.

liyinan926 · 2019-06-06T19:03:07Z

pkg/webhook/patch.go

@@ -18,6 +18,8 @@ package webhook

 import (
 	"fmt"
+	"k8s.io/apimachinery/pkg/api/resource"


Nit: k8s.io imports should go immediately under built-in ones.

liyinan926 · 2019-06-06T19:04:19Z

pkg/webhook/patch_test.go

@@ -676,6 +678,199 @@ func TestPatchSparkPod_Sidecars(t *testing.T) {
 	assert.Equal(t, "sidecar2", modifiedExecutorPod.Spec.Containers[2].Name)
 }

+func TestPatchSparkPod_GPU(t *testing.T) {
+


Nit: empty line can be removed.

liyinan926 · 2019-06-06T19:05:32Z

pkg/apis/sparkoperator.k8s.io/v1beta1/types.go

+type GPUSpec struct {
+	// Name is GPU resource name, such as: nvidia.com/gpu or amd.com/gpu
+	Name string `json:"name"`
+	// Quantity is the number of GPU to request for driver or executor.


number of GPUs.

tkanng · 2019-06-07T02:17:27Z

Hi, the latest commit added user guide on how to use gpus and refined code format. :)

liyinan926

LGTM. Thanks!

gyj0825 · 2022-06-28T08:51:54Z

How to add multiple GPU parameters to support bitfusion, like
limits:
bitfusion.io/gpu-amount: 2
bitfusion.io/gpu-percent: 50
thanks.

tkanng added 2 commits June 5, 2019 17:06

Add support for requesting GPUs

a87c021

refine unit test

8735402

liyinan926 reviewed Jun 6, 2019

View reviewed changes

Add GPU spec and refine unit test

c819490

liyinan926 reviewed Jun 6, 2019

View reviewed changes

tkanng added 2 commits June 7, 2019 10:06

Add user guide about GPU and refine code format

6c3675b

fix typo

0752b30

liyinan926 approved these changes Jun 7, 2019

View reviewed changes

liyinan926 merged commit fbdd41b into kubeflow:master Jun 7, 2019

tkanng deleted the gpu-support branch June 7, 2019 06:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for requesting GPUs #509

Add support for requesting GPUs #509

tkanng commented Jun 5, 2019

googlebot commented Jun 5, 2019

tkanng commented Jun 5, 2019

googlebot commented Jun 5, 2019

liyinan926 Jun 6, 2019

liyinan926 Jun 6, 2019

tkanng commented Jun 6, 2019

liyinan926 left a comment

liyinan926 Jun 6, 2019

liyinan926 Jun 6, 2019

liyinan926 Jun 6, 2019

tkanng commented Jun 7, 2019

liyinan926 left a comment

gyj0825 commented Jun 28, 2022

Add support for requesting GPUs #509

Add support for requesting GPUs #509

Conversation

tkanng commented Jun 5, 2019

googlebot commented Jun 5, 2019

What to do if you already signed the CLA

Individual signers

Corporate signers

tkanng commented Jun 5, 2019

googlebot commented Jun 5, 2019

liyinan926 Jun 6, 2019

Choose a reason for hiding this comment

liyinan926 Jun 6, 2019

Choose a reason for hiding this comment

tkanng commented Jun 6, 2019

liyinan926 left a comment

Choose a reason for hiding this comment

liyinan926 Jun 6, 2019

Choose a reason for hiding this comment

liyinan926 Jun 6, 2019

Choose a reason for hiding this comment

liyinan926 Jun 6, 2019

Choose a reason for hiding this comment

tkanng commented Jun 7, 2019

liyinan926 left a comment

Choose a reason for hiding this comment

gyj0825 commented Jun 28, 2022