You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After constructing a PyTorch DTensor device mesh object using torch.distributed._tensor.device_mesh.init_device_mesh, the device mesh object does not support querying the rank via the get_rank interface.
To Reproduce
# test_device_mesh_get_rank.py
import os
import subprocess
import unittest
from torch.distributed._tensor.device_mesh import init_device_mesh
class TestDeviceMeshGetRank(unittest.TestCase):
def realtest(self):
_world_size = int(os.environ["WORLD_SIZE"])
device_type = os.environ.get("TEST_DEVICE_TYPE", 'xla')
if device_type == 'xla':
from torch_xla import runtime as xr
xr.use_spmd()
device_mesh = init_device_mesh(device_type=device_type, mesh_shape=(_world_size,))
_rank = device_mesh.get_rank()
assert _rank == int(os.environ["RANK"])
def test_driver(self):
if 'TEST_INTERNAL_IS_TORCHRUN' in os.environ:
return self.realtest()
device_count = 2
env = os.environ.copy()
env['TEST_INTERNAL_IS_TORCHRUN'] = '1'
cmd = ['torchrun', '--nnodes=1', f'--nproc_per_node={device_count}', __file__]
subprocess.check_call(cmd, env=env)
if __name__ == '__main__':
unittest.main()
Steps to reproduce the behavior:
Save the above script as test_device_mesh_get_rank.py.
Executing env PJRT_DEVICE=CPU python test_device_mesh_get_rank.py under torch-xla 2.5.1 gives error message ValueError: Default process group has not been initialized, please make sure to call init_process_group..
In comparison, running env TEST_DEVICE_TYPE='cuda' python test_device_mesh_get_rank.py on CUDA PyTorch can pass the test.
Expected behavior
Environment
Reproducible on XLA backend [CPU/TPU/CUDA]: CPU and the AWS Neuron PJRT plugin
The XLA backend for distributed tensors works slightly differently from native pytorch. XLA backend doesn't require creating a separate process for each device because the XLA compiler handles sharding the tensors according to the specified sharding spec.
That's why you don't see any process groups with the XLA backend here.
Please feel free to take a look at the DTensor integration RFC with XLA backend here pytorch/pytorch#92909 and let us know if you have any further questions.
The distribute_tensor and distribute_module APIs should work as expected.
🐛 Bug
After constructing a PyTorch DTensor device mesh object using
torch.distributed._tensor.device_mesh.init_device_mesh
, the device mesh object does not support querying the rank via theget_rank
interface.To Reproduce
Steps to reproduce the behavior:
test_device_mesh_get_rank.py
.env PJRT_DEVICE=CPU python test_device_mesh_get_rank.py
under torch-xla 2.5.1 gives error messageValueError: Default process group has not been initialized, please make sure to call init_process_group.
.env TEST_DEVICE_TYPE='cuda' python test_device_mesh_get_rank.py
on CUDA PyTorch can pass the test.Expected behavior
Environment
Additional context
Was trying to adapt https://github.com/pytorch/examples/blob/1bef748/distributed/tensor_parallelism/tensor_parallel_example.py for the XLA device/mesh type.
The text was updated successfully, but these errors were encountered: