-
Notifications
You must be signed in to change notification settings - Fork 94
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
packer: Enable support for adding nvidia gpu to podvm image
NVIDIA GPU support is enabled by default when building packer based image for aws and azure A default build of podvm in aws took 10 min ``` ==> Wait completed after 10 minutes 16 seconds ==> Builds finished. The artifacts of successful builds are: --> peer-pod-ubuntu.amazon-ebs.ubuntu: AMIs were created: us-east-2: ami-0463ae5aa8d5b3606 rm -fr toupload real 10m34.352s user 0m18.919s sys 0m10.044s ``` If you want to disable, then run with ENABLE_NVIDIA_GPU=no For example: cd azure/image PODVM_DISTRO=ubuntu ENABLE_NVIDIA_GPU=no make image This results in the qcow2/setup_addons.sh script executing the addons/nvidia_gpu/setup.sh to setup NVIDIA drivers, libraries and prestart hook into the podvm image Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
- Loading branch information
Showing
14 changed files
with
342 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
## Introduction | ||
|
||
The addons directory is used to enable different addons for the podvm image. | ||
Each addon and its associated files (binaries, configuration etc) should be under | ||
specific sub-dir under `addons`. | ||
|
||
Each addon sub-dir needs to have `setup.sh` for setting up the addon. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
## Introduction | ||
|
||
This addon enables nvidia GPU support in the podvm image. | ||
|
||
You need to specify the GPU instance types in the cloud-api-adaptor configMap (peer-pods-cm). | ||
|
||
Here is an example. Replace it as appropriate depending on the specific provider and region | ||
|
||
``` | ||
# For AWS | ||
PODVM_INSTANCE_TYPES: "t3.small,c5.xlarge,p3.2xlarge" | ||
# For Azure | ||
AZURE_INSTANCE_SIZES: "Standard_D8as_v5,Standard_D4as_v5,Standard_NC6s_v3,Standard_NC4as_T4_v3" | ||
``` | ||
|
||
Example pod definition: | ||
``` | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: gpu-test | ||
labels: | ||
app: test | ||
annotations: | ||
io.katacontainers.config.hypervisor.machine_type: Standard_NC4as_T4_v3 | ||
io.containerd.cri.runtime-handler: kata-remote | ||
spec: | ||
runtimeClassName: kata-remote | ||
containers: | ||
- name: ubuntu | ||
image: ubuntu | ||
command: ["sleep"] | ||
args: ["infinity"] | ||
env: | ||
- name: NVIDIA_VISIBLE_DEVICES | ||
value: "all" | ||
``` | ||
|
||
You can verify the GPU devices by execing a shell in the pod as shown below: | ||
|
||
``` | ||
$ kubectl exec -it gpu-test -- bash | ||
root@gpu-test:/# nvidia-smi | ||
Thu Nov 23 17:30:58 2023 | ||
+---------------------------------------------------------------------------------------+ | ||
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | | ||
|-----------------------------------------+----------------------+----------------------+ | ||
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | ||
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | ||
| | | MIG M. | | ||
|=========================================+======================+======================| | ||
| 0 Tesla T4 Off | 00000001:00:00.0 Off | Off | | ||
| N/A 36C P8 9W / 70W | 2MiB / 16384MiB | 0% Default | | ||
| | | N/A | | ||
+-----------------------------------------+----------------------+----------------------+ | ||
+---------------------------------------------------------------------------------------+ | ||
| Processes: | | ||
| GPU GI CI PID Type Process name GPU Memory | | ||
| ID ID Usage | | ||
|=======================================================================================| | ||
| No running processes found | | ||
+---------------------------------------------------------------------------------------+ | ||
root@gpu-test:/# nvidia-smi -L | ||
GPU 0: Tesla T4 (UUID: GPU-2b9a9945-a56c-fcf3-7156-8e380cf1d0cc) | ||
root@gpu-test:/# ls -l /dev/nvidia* | ||
crw-rw-rw- 1 root root 235, 0 Nov 23 17:27 /dev/nvidia-uvm | ||
crw-rw-rw- 1 root root 235, 1 Nov 23 17:27 /dev/nvidia-uvm-tools | ||
crw-rw-rw- 1 root root 195, 0 Nov 23 17:27 /dev/nvidia0 | ||
crw-rw-rw- 1 root root 195, 255 Nov 23 17:27 /dev/nvidiactl | ||
``` |
Oops, something went wrong.