Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for requesting GPUs #509

Merged
merged 5 commits into from
Jun 7, 2019
Merged

Conversation

tkanng
Copy link
Contributor

@tkanng tkanng commented Jun 5, 2019

Hi, this PR adds support for requesting nvidia GPUs, according to #426.

It might be useful for ML programs. And I'm not sure it's good enough, so please help me check it again.

Thanks!

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@tkanng
Copy link
Contributor Author

tkanng commented Jun 5, 2019

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@@ -346,6 +346,9 @@ type SparkPodSpec struct {
// MemoryOverhead is the amount of off-heap memory to allocate in cluster mode, in MiB unless otherwise specified.
// Optional.
MemoryOverhead *string `json:"memoryOverhead,omitempty"`
// GPU is the number of nvidia.com/gpu to request for the pod
// Optional.
GPU *int64 `json:"gpu,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make a struct type that looks like the following to support all GPU types from all vendors:

type GPUSpec struct {
    Name string `json:"name"`
    Quantity int64 `json:"quantity"`
}

Then this field becomes:

GPU *GPUSpec `json:"gpu,omitempty"`

},
}
tests := []testcase{
{nil,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil should be on a new line.

@tkanng
Copy link
Contributor Author

tkanng commented Jun 6, 2019

Thank you for your patience! The latest commit adds GPUSpec type and refines corresponding unit tests. Users can specify gpu field, just like this:

  executor:
   # cores: 1
    instances: 1
   # memory: "512m"
    gpu: 
      name: example.com/gpu 
      quantity: 1

Copy link
Collaborator

@liyinan926 liyinan926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. Left a couple of minor comments. Please add a section to the user guide on how to specify and use gpus.

@@ -18,6 +18,8 @@ package webhook

import (
"fmt"
"k8s.io/apimachinery/pkg/api/resource"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: k8s.io imports should go immediately under built-in ones.

@@ -676,6 +678,199 @@ func TestPatchSparkPod_Sidecars(t *testing.T) {
assert.Equal(t, "sidecar2", modifiedExecutorPod.Spec.Containers[2].Name)
}

func TestPatchSparkPod_GPU(t *testing.T) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: empty line can be removed.

type GPUSpec struct {
// Name is GPU resource name, such as: nvidia.com/gpu or amd.com/gpu
Name string `json:"name"`
// Quantity is the number of GPU to request for driver or executor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number of GPUs.

@tkanng
Copy link
Contributor Author

tkanng commented Jun 7, 2019

Hi, the latest commit added user guide on how to use gpus and refined code format. :)

Copy link
Collaborator

@liyinan926 liyinan926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@liyinan926 liyinan926 merged commit fbdd41b into kubeflow:master Jun 7, 2019
@tkanng tkanng deleted the gpu-support branch June 7, 2019 06:23
@gyj0825
Copy link

gyj0825 commented Jun 28, 2022

How to add multiple GPU parameters to support bitfusion, like
limits:
bitfusion.io/gpu-amount: 2
bitfusion.io/gpu-percent: 50
thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants