Skip to content

Commit

Permalink
do allgather only in shared optimizer states groups (#4167)
Browse files Browse the repository at this point in the history
* skip all-gather

* add notes

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
  • Loading branch information
3 people authored Aug 21, 2023
1 parent 7711bdb commit 7f3e82f
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions deepspeed/runtime/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -944,6 +944,10 @@ def all_gather_dp_groups(partitioned_param_groups, dp_process_group, start_align
partition_id = dist.get_rank(group=dp_process_group[group_id])
dp_world_size = dist.get_world_size(group=dp_process_group[group_id])

if dp_world_size == 1:
# no groups share optimizer states
# pipeline parallel with bf16 will default call this even if dp size = 1.
continue
num_shards = max(1, partitioned_params[partition_id].numel() * dp_world_size // allgather_bucket_size)

shard_size = partitioned_params[partition_id].numel() // num_shards
Expand Down

0 comments on commit 7f3e82f

Please sign in to comment.