Skip to content

Commit

Permalink
Explictly set device when reusing dist env (#6696)
Browse files Browse the repository at this point in the history
A rank of a process can change when reusing the environment. This PR
explicitly sets the device when reusing the environment.
  • Loading branch information
tohtana authored Nov 1, 2024
1 parent 95ea95f commit b24dfa9
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion tests/unit/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,11 @@ def _launch_procs(self, num_procs, init_method):
self._launch_daemonic_procs(num_procs, init_method)

def _dist_run(self, local_rank, num_procs, master_port, init_method, skip_msg=""):
if not dist.is_initialized():
if dist.is_initialized():
if get_accelerator().is_available():
# local_rank might not match the rank in the previous run if you are reusing the environment
get_accelerator().set_device(dist.get_rank())
else:
""" Initialize deepspeed.comm and execute the user function. """
if self.set_dist_env:
os.environ['MASTER_ADDR'] = '127.0.0.1'
Expand Down

0 comments on commit b24dfa9

Please sign in to comment.