Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nvidia-gpu with kvm-driver #13972

Merged
merged 2 commits into from
Apr 28, 2022
Merged

Conversation

betaboon
Copy link
Contributor

hello,

took me a while to figure this out, but this fixes kvm-nvidia-gpu support for me.

this might also resolve #6912

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Apr 18, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Apr 18, 2022
@k8s-ci-robot
Copy link
Contributor

Welcome @betaboon!

It looks like this is your first PR to kubernetes/minikube 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/minikube has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 18, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @betaboon. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 18, 2022
@minikube-bot
Copy link
Collaborator

Can one of the admins verify this patch?

Copy link
Member

@medyagh medyagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ver much for fixing this !

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: betaboon, medyagh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 18, 2022
@medyagh
Copy link
Member

medyagh commented Apr 18, 2022

betaboon thank you for fixing this, do u mind signing the CLA

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 18, 2022
@betaboon
Copy link
Contributor Author

just did.

FYI:
this only seems to work for Latest Legacy GPU version (390.xx series): [390.147]

it doesnt work for either of:

Latest Production Branch Version: 510.60.02
Latest New Feature Branch Version: [495.46]

this it seems like i can only get cuda9 here.

I'm tinkering around with getting 510 working.

if i figure that out, I'll send it in another PR tho ;)

@medyagh
Copy link
Member

medyagh commented Apr 18, 2022

just did.

FYI: this only seems to work for Latest Legacy GPU version (390.xx series): [390.147]

it doesnt work for either of:

Latest Production Branch Version: 510.60.02
Latest New Feature Branch Version: [495.46]

this it seems like i can only get cuda9 here.

I'm tinkering around with getting 510 working.

if i figure that out, I'll send it in another PR tho ;)

Thank you for figuring it out, do u mind adding it our docs ?
https://minikube.sigs.k8s.io/docs/tutorials/nvidia_gpu/

and mention it currently been known to work with that specific series

I would happily review your PR if you decide to make it work with newer series,

btw I am curious what is the use case of using GPU for you if you like to share

@betaboon
Copy link
Contributor Author

i figured out what is preventing newer drivers to work.
but I'm not yet sure how to fix this, as i am lacking the knowledge about buildroot.

here's the gist:

  • current nvidia-drivers require Module.symvers to be present in the kernel-source directory.
  • for Module.symvers to be build CONFIG_MODVERSIONS=y is required

here's what i tried (which lead to partial success):

  • enable CONFIG_MODVERSIONS in linux_defconfig
  • build and run iso (Module.symvers is not available in buildroot/output/build/linux-4.19.202, but NOT in the resulting iso)
  • copy Module.symvers into vm using minikube cp
  • change nvidia-driver-installer-daemonset to make Module.symver available in /usr/src/linux-4.19.202
  • apply the nvidia-driver-installer-daemonset
  • profit (aka driver install works with 510.60.02)

two possible solutions:

  • figure out how to include Module.symvers in the iso
  • change the entrypoint-script of the nvidia-driver-installer to run make modules which should recreate it

@spowelljr
Copy link
Member

Not sure if this is the route you want to go, but if you want an example of how to copy a file into the ISO you can look at #12081.

If you look at buildkit-bin.mk you can see how files from the repo are copied into the ISO.

@betaboon betaboon marked this pull request as draft April 19, 2022 20:12
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 19, 2022
@betaboon
Copy link
Contributor Author

ok i converted this PR to a draft for now.

@spowelljr thanks for the pointers.

i just pushed a commit that fixes it for the latest nvidia-drivers.
there is a hack in there tho.

this is what it does:

  • change the kernel-config to enable CONFIG_MODVERSIONS so that Module.symversgets generated
  • extend the buildroot linux-target to include Module.symvers in /lib/modules/...
  • change the command of the nvidia-driver-installer-daemonset to monkeypatch /entrypoint.sh (this is the hack)

the hack could and should be removed, but for that the entrypoint.sh in the upstream-package should has to be adapted.

@betaboon
Copy link
Contributor Author

@medyagh just fyi there is something to review here now :) (dont want to hurry you tho )

@betaboon betaboon marked this pull request as ready for review April 26, 2022 08:23
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 26, 2022
@k8s-ci-robot k8s-ci-robot requested a review from klaases April 26, 2022 08:23
@k8s-ci-robot k8s-ci-robot requested a review from spowelljr April 26, 2022 08:23
@medyagh medyagh merged commit 4187e9e into kubernetes:master Apr 28, 2022
@medyagh
Copy link
Member

medyagh commented Apr 28, 2022

thank you betaboon I look forward to see more contributions from you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nvidia-driver-installer addon fails to start (driver fails to install in the container)
5 participants