support different types of computing hardware #5138

hzy46 · 2020-12-02T04:09:37Z

Motivation

Currently, OpenPAI has supported the most widely used computing devices: Nvidia GPU, AMD GPU and CPU. In addition, it has the potential to support other types of device, e.g. AI computing chips (NPU).

Goal

Decouple OpenPAI services and specific hardware types. One OpenPAI service container can support a list of hardware types.

Requirements

For every type of computing device, the vendor should guarantee:

one machine should only have one type of computing device
driver and k8s device plugin are successfully deployed in each machine
devices work correctly with docker and k8s
compatible framework and docker images

MVP with default scheduler

By assuming that there is only one type of computing device in a cluster, we could build a minimal viable solution with the default scheduler by

configure ComputeDevice (default is nvidia.com/gpu) in deployment and record it in configmap
add option to turn off HivdD scheduler in quick start
bypass (or do other) pre-checks according to ComputeDevice in quick start
chage nvidia.com/gpu to ComputeDevice in rest server
change vc resource information when use default scheduler

pai/src/rest-server/src/models/v2/job/k8s.js

Lines 483 to 487 in 2fb370a

    
           memory: `${config.taskRoles[taskRole].resourcePerInstance.memoryMB}Mi`, 
        
           'github.com/fuse': 1, 
        
           'nvidia.com/gpu': 
        
             config.taskRoles[taskRole].resourcePerInstance.gpu, 
        
           ...(infinibandDevice && { 'rdma/hca': 1 }),

Beside the necessary works, we (pai-dev team and device vendor) could make better support by

refactor and organize device-related codes in devices subfolders. The basic idea is to quick locate device related codes and isolate codes for different devices (e.g. different device vendors should avoid editing the same file).
If a component must support diverse types of computing device, there will be a devices folder in it. For PAI services, they should take these files into consideration in build time. And one container will support a list of different machine models. For other components like the deploy script, they should check these files in runtime.
provide monitoring tool like nvidia-smi and prometheus exporter
update webportal terms

Perfect support with HiveD

By enabling HiveD, we could get better support

allow multiple device types in a cluster
support virtual clusters
topology aware scheduling to guarantee sharing safety of DL scenario

Some extra efforts must be done to achieve this

offer a container runtime for every device type. Container runtime is a modified version of runc adding a custom pre-start hook to all containers. Here are two examples nvidia-container-runtime and runtime for AMD Radeon Open Compute
describe machines and devices in layout.yaml replace master.csv / worker.csv by layout.yaml #5151
make sure HiveD config generation is independent of computing devices
add appropriate environment variables in rest-server when generate pod spec in addition to NVIDIA_VISIBLE_DEVICES and PAI_AMD_VISIBLE_DEVICES.

pai/src/rest-server/src/models/v2/job/k8s.js

Lines 656 to 676 in 2fb370a

    
             if (config.taskRoles[taskRole].resourcePerInstance.gpu > 0) { 
        
               frameworkTaskRole.task.pod.spec.containers[0].env.push( 
        
                 { 
        
                   name: 'NVIDIA_VISIBLE_DEVICES', 
        
                   valueFrom: { 
        
                     fieldRef: { 
        
                       fieldPath: `metadata.annotations['hivedscheduler.microsoft.com/pod-leaf-cell-isolation']`, 
        
                     }, 
        
                   }, 
        
                 }, 
        
                 { 
        
                   name: 'PAI_AMD_VISIBLE_DEVICES', 
        
                   valueFrom: { 
        
                     fieldRef: { 
        
                       fieldPath: `metadata.annotations['hivedscheduler.microsoft.com/pod-leaf-cell-isolation']`, 
        
                     }, 
        
                   }, 
        
                 }, 
        
               ); 
        
             } 
        
           }

Some optional work items include

clarify and unify the machine sku description in layout.yaml and HiveD skus
make sku-(cpu,gpu,mem) converting simply, predictably and decoupled with devices CPU/GPU/Memory information to SKU definition API #5148.
health report for computing device. This is not mandatory since node-level health check is provided by k8s already.

The text was updated successfully, but these errors were encountered:

hzy46 · 2020-12-09T09:32:39Z

Detailed Work Items for this issue:

Installation
- P0 add a flag to use default scheduler, and not enable hived scheduler Add a flag to disable hived scheduler during installation #5198
- P0 Refactor the code structure to run prechecks items of different hardwares
  - Add a computing_devices folder. Use name like nvidia.com_gpu
Device Plugin ETA 12.11
- P0 Refactor the code structure to install different device plugins PR [Multiple Hardwares] [Device Plugin] Read computing devices from layout.yaml #5168
Rest-server ETA 12.11
- P0 read defaultComputingDeviceType from layout.yaml (Use nvidia.com/gpu if not found) PR [Multiple Hardwares][Rest-server] Support multiple hardwares in default scheduler #5165
  - Should support it in https://github.com/microsoft/pai/blob/master/src/rest-server/src/models/v2/virtual-cluster/k8s.js , https://github.com/microsoft/pai/blob/master/src/rest-server/src/models/v2/utils/frameworkConverter.js , https://github.com/microsoft/pai/blob/d0aef5dc009794b4804027b4f21c78556024d2ec/src/rest-server/src/models/v2/job/k8s.js
- P1 support config hivedComputingDeviceList in https://github.com/microsoft/pai/blob/d0aef5dc009794b4804027b4f21c78556024d2ec/src/rest-server/src/models/v2/job/k8s.js
P2 webportal
P2 refactor job_exporter, node_exporter, watchdog codes to support different hardwares

If all P0 items are done, we can support different hardwares in default scheduler.
If all P1 items are done, we can support different hardwares in hived scheduler.
P2 items are nice-to-have.

hzy46 · 2020-12-24T08:37:19Z

Test cases for rest-server:

1. Default Scheduler: Test the resource requirement is correctly specified in pod definition.

./paictl.py service stop -n hivedscheduler cluster-configuration rest-server
Modify services-configuration.yaml: disable hivedscheduler
Modify layout.yaml: set the cluster workers' computing device type to a.b.com/c e.g. :

machine-sku:
  master-machine: # define a machine sku
    # the resource requirements for all the machines of this sku
    # We use the same memory format as Kubernetes, e.g. Gi, Mi
    # Reference: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory
    mem: 60Gi
    cpu:
      # the number of CPU vcores
      vcore: 24
  gpu-machine:
    computing-device:
      type: a.b.com/c
      model: faked
      count: 4
    mem: 220Gi
    cpu:
      vcore: 24

machine-list:
  - hostname: pai-master # name of the machine, **do not** use upper case alphabet letters for hostname
    hostip: 10.0.0.1
    machine-type: master-machine # only one master-machine supported
    pai-master: "true"
  - hostname: pai-worker1
    hostip: 10.0.0.2
    machine-type: gpu-machine
    pai-worker: "true"
  - hostname: pai-worker2
    hostip: 10.0.0.3
    machine-type: gpu-machine
    pai-worker: "true"
………………

push the modified config to k8s
./paictl.py service start -n hivedscheduler cluster-configuration rest-server
submit a job from webportal
expect there is a a.b.com/c resource request in the pod spec

2. Hived Scheduler: Test the environment varibales is set in pod spec.

./paictl.py service stop -n hivedscheduler cluster-configuration rest-server
Modify services-configuration.yaml: enable hivedscheduler; set rest-server.hived-computing-device-envs to TEST,NVIDIA_VISIBLE_DEVICES,HIVED_VISIBLE_DEVICES
Modify layout.yaml: set the cluster workers' computing device type back to nvidia.com/gpu
push the modified config to k8s
./paictl.py service start -n hivedscheduler cluster-configuration rest-server
submit a job from webportal
In the pod, expect environment variable TEST is set to something like 0,1,.....

hzy46 added the raised by customer label Dec 2, 2020

beingj mentioned this issue Dec 2, 2020

add support to enflame DTO #5140

Open

4 tasks

scarlett2018 added the 1.5 candidate label Dec 2, 2020

scarlett2018 mentioned this issue Dec 2, 2020

2021 Jan release plan #5141

Closed

52 tasks

hzy46 mentioned this issue Dec 3, 2020

CPU/GPU/Memory information to SKU definition API #5148

Open

scarlett2018 added the pai-dev label Dec 7, 2020

debuggy mentioned this issue Jan 4, 2021

2021 Jan. Release Test Plan #5218

Closed

14 tasks

hzy46 mentioned this issue Jan 13, 2021

add 'enflame.com/dtu' to rest-server #5139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support different types of computing hardware #5138

support different types of computing hardware #5138

hzy46 commented Dec 2, 2020 •

edited

Loading

hzy46 commented Dec 9, 2020 •

edited

Loading

hzy46 commented Dec 24, 2020 •

edited

Loading

support different types of computing hardware #5138

support different types of computing hardware #5138

Comments

hzy46 commented Dec 2, 2020 • edited Loading

Motivation

Goal

Requirements

MVP with default scheduler

Perfect support with HiveD

hzy46 commented Dec 9, 2020 • edited Loading

hzy46 commented Dec 24, 2020 • edited Loading

1. Default Scheduler: Test the resource requirement is correctly specified in pod definition.

2. Hived Scheduler: Test the environment varibales is set in pod spec.

hzy46 commented Dec 2, 2020 •

edited

Loading

hzy46 commented Dec 9, 2020 •

edited

Loading

hzy46 commented Dec 24, 2020 •

edited

Loading