-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable gpu fails on 20.04 (focal) and MicroK8s 1.21 - failed to get sandbox runtime: no runtime for "nvidia" is configured #3226
Comments
I think a previously opened issue in #2575 might have the same root cause, but at that point a solution was to use 1.22, which doesn't support Kubeflow in this case. |
facing the same issue. any workaround for 1.21? I can't upgrade to 1.22 because KubeFlow is not supported |
@ktsakalozos , Any workaround available for 1.21 |
The nvidia operator will not deploy cleanly on 1.21. One of the reasons I am aware of is that containerd daemon is configured to be of type simple [1]. Considering how close to end of support 1.21 is and the risk involved in changing the daemon's behavior, it is unlikely we will get this issue addressed. |
Ah, okay. You are talking about this change. |
Yes, this is the top level patch |
Okay, at least the theory has been confirmed; the patchset below can make GPU enablement step work. |
@nobuto-m , Can you pls help me how to use this patch. I am not able to modify files inside snap directory |
You can grab a custom built snap here (only for testing): |
build is failing with below error ubuntu@dasec-node2:~/microk8s-master$ sudo snapcraft
|
@debimishra89 Note that the GPU addon is broken in 1.21. MicroK8s 1.21 is also out of support, so we have no immediate plans to change it. However, it is possible to enable GPU support using host drivers and container runtime. On a fresh installation of MicroK8s 1.21, you can follow the instructions below (also in this gist):
PS. We are in the process of updating the documentation page accordingly. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Documentation note has been added in https://microk8s.io/docs/addon-gpu#microk8s-121-12, closing the issue |
Summary
Now that the operator version has been bumped by canonical/microk8s-core-addons#44, I tested the equivalent change against MicroK8s 1.21, but it's still failing. 1.21 is still required for Kubeflow use case and Nvidia's doc states K8s 1.21 is supported with GPU Operator Release 1.10.
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/platform-support.html#kubernetes-platforms
What Should Happen Instead?
All operator related pods are up and running.
Reproduction Steps
snap run --shell microk8s
bash -eux ./enable.gpu.sh
Introspection Report
inspection-report-20220608_072258.tar.gz
Can you suggest a fix?
Are you interested in contributing with a fix?
The text was updated successfully, but these errors were encountered: