-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for requesting GPUs #509
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
I signed it! |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
@@ -346,6 +346,9 @@ type SparkPodSpec struct { | |||
// MemoryOverhead is the amount of off-heap memory to allocate in cluster mode, in MiB unless otherwise specified. | |||
// Optional. | |||
MemoryOverhead *string `json:"memoryOverhead,omitempty"` | |||
// GPU is the number of nvidia.com/gpu to request for the pod | |||
// Optional. | |||
GPU *int64 `json:"gpu,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make a struct type that looks like the following to support all GPU types from all vendors:
type GPUSpec struct {
Name string `json:"name"`
Quantity int64 `json:"quantity"`
}
Then this field becomes:
GPU *GPUSpec `json:"gpu,omitempty"`
pkg/webhook/patch_test.go
Outdated
}, | ||
} | ||
tests := []testcase{ | ||
{nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nil
should be on a new line.
Thank you for your patience! The latest commit adds executor:
# cores: 1
instances: 1
# memory: "512m"
gpu:
name: example.com/gpu
quantity: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. Left a couple of minor comments. Please add a section to the user guide on how to specify and use gpus.
pkg/webhook/patch.go
Outdated
@@ -18,6 +18,8 @@ package webhook | |||
|
|||
import ( | |||
"fmt" | |||
"k8s.io/apimachinery/pkg/api/resource" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: k8s.io
imports should go immediately under built-in ones.
pkg/webhook/patch_test.go
Outdated
@@ -676,6 +678,199 @@ func TestPatchSparkPod_Sidecars(t *testing.T) { | |||
assert.Equal(t, "sidecar2", modifiedExecutorPod.Spec.Containers[2].Name) | |||
} | |||
|
|||
func TestPatchSparkPod_GPU(t *testing.T) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: empty line can be removed.
type GPUSpec struct { | ||
// Name is GPU resource name, such as: nvidia.com/gpu or amd.com/gpu | ||
Name string `json:"name"` | ||
// Quantity is the number of GPU to request for driver or executor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of GPUs
.
Hi, the latest commit added user guide on how to use gpus and refined code format. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
How to add multiple GPU parameters to support bitfusion, like |
Hi, this PR adds support for requesting nvidia GPUs, according to #426.
It might be useful for ML programs. And I'm not sure it's good enough, so please help me check it again.
Thanks!