Skip to content

Latest commit

 

History

History
93 lines (75 loc) · 5.53 KB

File metadata and controls

93 lines (75 loc) · 5.53 KB

Mixtral 8x7B on 4xA10G GPUs (WIP)

This demo shows how to deploy the Mixtral 8x7B model on 4xA10G GPUs.

Usage

  • Deploy the Mixtral 8x7B model on 4xA10G GPUs:
kubectl apply -k llm-servers/overlays/mixtral-8x7B
  • Remember to add your HUGGING_FACE_HUB_TOKEN into the Environment Variables to be able to download the model from the Hugging Face Hub.

  • Check that the LLM is running properly:

kubectl get pod -n multi-gpu-poc
NAME                   READY   STATUS    RESTARTS   AGE
llm1-f687846b9-68bvq   1/1     Running   0          2m1s
  • Check the logs of the Pod LLM:
kubectl logs -n multi-gpu-poc -l app=llm1
  • Check the NVIDIA GPU consumption:
POD_NAME=$(kubectl get pod -n nvidia-gpu-operator -l app=nvidia-device-plugin-daemonset -o jsonpath="{.items[0].metadata.name}")
kubectl exec -n nvidia-gpu-operator $POD_NAME -- nvidia-smi
  • The output should be similar to:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A10G                    On  |   00000000:00:16.0 Off |                    0 |
|  0%   29C    P0             70W /  300W |   10364MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A10G                    On  |   00000000:00:17.0 Off |                    0 |
|  0%   28C    P0             68W /  300W |   10364MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A10G                    On  |   00000000:00:18.0 Off |                    0 |
|  0%   27C    P0             68W /  300W |   10364MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A10G                    On  |   00000000:00:19.0 Off |                    0 |
|  0%   28C    P0             68W /  300W |   10364MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A10G                    On  |   00000000:00:1A.0 Off |                    0 |
|  0%   28C    P0             68W /  300W |   10364MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A10G                    On  |   00000000:00:1B.0 Off |                    0 |
|  0%   27C    P0             69W /  300W |   10364MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A10G                    On  |   00000000:00:1C.0 Off |                    0 |
|  0%   27C    P0             69W /  300W |   10364MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A10G                    On  |   00000000:00:1D.0 Off |                    0 |
|  0%   28C    P0             69W /  300W |   10364MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Fix for AWS g5.48xlarge

kubectl create configmap kernel-module-params -n gpu-operator --from-file=nvidia.conf=./bootstrap/nvidia.conf
oc patch clusterpolicy/gpu-cluster-policy -n nvidia-gpu-operator --type='json' -p='[{"op": "add", "path": "/spec/driver/kernelModuleConfig/name", "value":"kernel-module-params"}]'

This patch is needed to fix the issue with the AWS instances type g5.48xlarge. The issue is related to the kernel module parameters that need to be set for the NVIDIA driver to work properly. Don't rush, will take at least 10 minutes for the nodes to be ready to be consumed by the LLM.