Skip to content
This repository has been archived by the owner on May 25, 2023. It is now read-only.

Support nvidia.com/gpu as a resource managed by kube-… #181

Merged
merged 2 commits into from
May 10, 2018

Conversation

mitake
Copy link
Contributor

@mitake mitake commented Mar 16, 2018

…batchd

This commit lets kube-batchd manage GPU resources
(nvidia.com/gpu) and allocate them for pods like other
resources (CPU and memory).

/cc @YujiOshima

@k8s-ci-robot
Copy link
Contributor

@mitake: GitHub didn't allow me to request PR reviews from the following users: YujiOshima.

Note that only kubernetes-incubator members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

…batchd

This commit lets kube-batchd manage GPU resources
(alpha.kubernetes.io/nvidia-gpu) and allocate them for pods like other
resources (CPU and memory).

This PR is a half-baked thing. It is not tested with a working cluster and kubeflow components (I'll do it next week). If I can have feedback from the maintainers or other developers who are already working on similar feature, it's really nice.

/cc @YujiOshima

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 16, 2018
}

const (
GPUResourceName = "alpha.kubernetes.io/nvidia-gpu"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU resource name has been changed from v1.8 onwards. ( https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#v18-onwards )
You should use nvidia.com/gpu for v1.8 and v1.9.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a useful const in the k8s api package: https://github.com/kubernetes/api/blob/master/core/v1/types.go#L4084
Probably using it would be the most straightforward way for the purpose?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After v1.8, k8s delegated the management of GPU to Nvidia device plugin.
So the resource name is defined in Nvidia device plugin. https://github.com/NVIDIA/k8s-device-plugin/blob/66a35b71ac4b5cbfb04714678b548bd77e5ba719/server.go#L20

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the const seems to be private. We need to have our own local copy in kube-batchd...

}
}
return r
}

func (r *Resource) IsEmpty() bool {
return r.MilliCPU < minMilliCPU && r.Memory < minMemory
return r.MilliCPU < minMilliCPU && r.Memory < minMemory && r.GPU == 0
Copy link
Contributor

@sedflix sedflix Mar 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if GPU is not present on any node in the cluster?

Copy link
Contributor Author

@mitake mitake Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pods won't be bond forever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the desired behavior? Will all pod require GPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the desired behavior?

Probably yes.

Will all pod require GPU?

It depends on workloads.

@sedflix
Copy link
Contributor

sedflix commented Mar 21, 2018

If this PR is alright, I could do the the same for quota-alloc.

@k82cn
Copy link
Contributor

k82cn commented Mar 21, 2018

@geekSiddharth , please hold the enhancement to kube-quotalloc; we'd like to make kube-batchd ready firstly :)

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 28, 2018
@mitake
Copy link
Contributor Author

mitake commented Mar 28, 2018

@k82cn I tested this PR with the test command of tf-operator (added in this PR: kubeflow/training-operator#509), and it seems to be working well for now. Could you take a look?

@mitake mitake changed the title WIP, DO NOT MERGE: Support alpha.kubernetes.io/nvidia-gpu as a resource managed by kube-… Support nvidia.com/gpu as a resource managed by kube-… Mar 28, 2018
}

func (r *Resource) LessEqual(rr *Resource) bool {
return (r.MilliCPU < rr.MilliCPU || math.Abs(rr.MilliCPU-r.MilliCPU) < 0.01) &&
(r.Memory < rr.Memory || math.Abs(rr.Memory-r.Memory) < 1)
(r.Memory < rr.Memory || math.Abs(rr.Memory-r.Memory) < 1) &&
(r.GPU < rr.GPU || rr.GPU == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitake I think it should be (r.GPU < rr.GPU || r.GPU == rr.GPU) :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean r.GPU <= rr.GPU?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we use math.Abs as == does not work for float :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll fix it later. Thanks for pointing out.

@k82cn
Copy link
Contributor

k82cn commented Apr 19, 2018

@mitake, thanks for your contribution; it'll be better to have a UT for GPU in drf_test.go.

@mitake
Copy link
Contributor Author

mitake commented Apr 20, 2018

@k82cn ok, I'll add the UT.

This commit lets kube-batchd manage GPU resources (nvidia.com/gpu) and
allocate them for pods like other resources (CPU and memory).
@k82cn k82cn merged commit 0593b2d into kubernetes-retired:master May 10, 2018
@k82cn
Copy link
Contributor

k82cn commented May 10, 2018

merge this firstly, please add UT later :)

@mitake
Copy link
Contributor Author

mitake commented May 11, 2018

@k82cn thanks, I'll add the UT later

@mitake mitake deleted the gpu branch May 11, 2018 06:46
mitake added a commit to mitake/kube-arbitrator that referenced this pull request May 29, 2018
kevin-wangzefeng pushed a commit to kevin-wangzefeng/scheduler that referenced this pull request Jun 28, 2019
Support nvidia.com/gpu as a resource managed by kube-…
kevin-wangzefeng pushed a commit to kevin-wangzefeng/scheduler that referenced this pull request Jun 28, 2019
kevin-wangzefeng pushed a commit to kevin-wangzefeng/scheduler that referenced this pull request Jun 28, 2019
Support nvidia.com/gpu as a resource managed by kube-…
kevin-wangzefeng pushed a commit to kevin-wangzefeng/scheduler that referenced this pull request Jun 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants