Skip to content

Latest commit

 

History

History
58 lines (40 loc) · 1.52 KB

mixtral.md

File metadata and controls

58 lines (40 loc) · 1.52 KB

Deploying Mixtral with Multi-Node Multi-GPU

This is tested with 8x g5.x4large AWS instances with 24GiB each.

Demo Guide

Deploy RHOAI and Prereqs

If it's not deployed yet, follow the section of 5.1 Deploy RHOAI and Prereqs in the README.

Deploy vLLM Multi-Node prerequisites

DEMO_NAMESPACE="demo-multi-node-multi-gpu"
MODEL_NAME="mixtral-8-7b"
MODEL_TYPE="mixtral"
kubectl apply -k 3-demo-prep/overlays/$MODEL_TYPE
  • Deploy Custom CRD and vLLM Multi Node Serving Runtime Template
kubectl apply -k 4-demo-deploy-is-sr/overlays
oc process vllm-multinode-runtime-template -n $DEMO_NAMESPACE | kubectl apply -n $DEMO_NAMESPACE -f -  

Check and Validate the Model deployed in Multi-Node with Multi-GPUs

  • Check the GPU resource status
podName=$(oc get pod -n $DEMO_NAMESPACE -l app=isvc.$MODEL_NAME-predictor --no-headers|cut -d' ' -f1)
workerPodName=$(kubectl get pod -n $DEMO_NAMESPACE -l app=isvc.$MODEL_NAME-predictor-worker --no-headers|cut -d' ' -f1)

oc -n $DEMO_NAMESPACE wait --for=condition=ready pod/${podName} --timeout=300s

Mixtral1

Mixtral2

  • Send a RESTful request to the LLM deployed in Multi-Node Multi-GPU:
curl https://$isvc_url/v1/completions \
   -H "Content-Type: application/json" \
   -d "{
        \"model\": \"$MODEL_NAME\",
        \"prompt\": \"What is the biggest clothes retail company in the world?\",
        \"max_tokens\": 100,
        \"temperature\": 0
    }"

Mixtral3