Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In MII-Legacy we had
deploy_rank
which would allow us to specify which GPUs to deploy a model to. This did not compose well with multiple replicas, so we I've refactored that code and brought it into the latest MII.Here we had a
device_map
to the config that allows users to specify which devices they want to deploy a model to for the persistent deployment (mii.serve
). This works with multi-replica and multi-node cases. We can provide the following types todevice_map
:int
:device_map = 1
- deploy single GPU model to GPU1List[int]
:device_map = [2,3]
- deploy a 2 GPU model to GPU2 and GPU3List[List[int]]
:device_map = [[0,2],[1,3]]
- deploy 2 dual-GPU replicas, one to GPU0 & GPU2, the other to GPU1 & GPU3Dict[str,List[List[int]]]
:device_map = {"host0":[[0,1],[2,3]], "host1":[[0,1],[2,3]]}
- deploy 4 dual-GPU replicas across 2 nodesThe default value is
"auto"
which will automatically place models / replicas across devices and nodes. Users must still specify the properreplica_num
andtensor_parallel
values, and these values must match with the device map provided. The device map is not required and is only needed when the non-default model/replica placement is not desired.resolves #283