Introduce virtual device #4091

steventk-g · 2022-10-12T20:31:09Z

Changes in this PR

Expose virtual device via a flag XLA_USE_SPMD
Use virtual device to conditionally delay the data transfer of a tensor. This is accomplished by setting the device on the backend based on the flag, so the PJRT computation client can check for the virtual device before transferring data (NOTE: We need to use the device on the backend rather than checking for the flag directly, so that we still have a way to transfer sharded data later on).
When the flag is enabled, transfer sharded data without redownloading from an xla device. This is done in _xla_mark_sharding. The re-downloading path is preserved so that XLA_USE_SPMD=0 still works as well.
When the flag is enabled, ensure that the user gets xla:0 from xm.xla_device(). At this point, users should expect all tensors to be treated as if they are on the virtual device when SPMD is enabled.

yeounoh · 2022-10-12T22:23:42Z

Let's make sure that we cover the explict sharded cases, where we want to avoid the initial unpartitioned data transfer. We will have to double-check, but Modify XLATensor::Compile to begin data transfer on implicitly sharded tensors. this may not be needed.

steventk-g · 2022-10-12T23:49:13Z

Notes after chat with Yeounoh:

We need to locate the place where data transfer is initiated to backend device. This is probably in upstream code. This is where we can check the device type and potentially skip the data transfer.
We need to determine how to check the device type of an at::Tensor or XLATensor. The XlaDeviceType of tensors to shard will be "SPMD", and the device type will be XLA (like physical XLA devices: TPU, CPU, GPU).
Explicitly sharded tensors on an SPMD device will be transferred to the backend device by a call to CreateTensorData in _xla_mark_sharding.
We need to decide when to transfer data for implicitly sharded tensors, if not in XLATensor::Compile

JackCaoG · 2022-10-13T00:01:22Z

We need to locate the place where data transfer is initiated to backend device add a log to

xla/third_party/xla_client/pjrt_computation_client.cc

Line 107 in 4a267b4

    
           std::vector<ComputationClient::DataPtr> PjRtComputationClient::TransferToServer(

This is the only entry for the transfer data to device.

steventk-g · 2022-10-18T06:19:48Z

Remaining implementation details before I can start testing:

Determine how to filter tensors and devices in CreateTensorsData methods. We want all devices passed into the sharded method to be SPMD, and we want to stop data transfer to backend devices when a tensor with an SPMD device is passed into the non-sharded method. Can we simply remove the non-SPMD tensors in the first case, and remove the SPMD tensors in the second case?
Figure out what to return from TensorToXlaData when we don't transfer data to a real backend.

test/test_xla_sharding.py

torch_xla/csrc/aten_xla_type.cpp

torch_xla/csrc/init_python_bindings.cpp

torch_xla/csrc/tensor_util.cpp

yeounoh

LGTM, thank you @steventk-g 👍

torch_xla/csrc/tensor_util.cpp

steventk-g force-pushed the virtual-device branch from a7274ef to 9976bb2 Compare October 12, 2022 20:37

steventk-g mentioned this pull request Oct 12, 2022

[SPMD][PoC] compile & execute with PjRt #3684

Merged

steventk-g force-pushed the virtual-device branch from 9976bb2 to 2b5dd6e Compare October 12, 2022 21:38

yeounoh self-requested a review October 12, 2022 22:19

steventk-g force-pushed the virtual-device branch 14 times, most recently from 6a326ec to 32ff407 Compare October 18, 2022 06:18

steventk-g force-pushed the virtual-device branch 8 times, most recently from f64c1a5 to 01fa14d Compare October 26, 2022 19:08

steventk-g force-pushed the virtual-device branch 14 times, most recently from df56ba8 to ede9395 Compare November 16, 2022 21:21

steventk-g requested review from yeounoh and jonb377 November 16, 2022 22:02

yeounoh reviewed Nov 18, 2022

View reviewed changes

test/test_xla_sharding.py Show resolved Hide resolved

yeounoh reviewed Nov 18, 2022

View reviewed changes

torch_xla/csrc/aten_xla_type.cpp Show resolved Hide resolved

yeounoh reviewed Nov 18, 2022

View reviewed changes

torch_xla/csrc/init_python_bindings.cpp Show resolved Hide resolved

yeounoh reviewed Nov 18, 2022

View reviewed changes

torch_xla/csrc/tensor_util.cpp Outdated Show resolved Hide resolved

steventk-g force-pushed the virtual-device branch 3 times, most recently from 17cf0a0 to 5b7ac6f Compare November 18, 2022 19:43

steventk-g mentioned this pull request Nov 18, 2022

Virtual device unit tests #4220

Closed

steventk-g requested a review from yeounoh November 18, 2022 19:47

yeounoh approved these changes Nov 18, 2022

View reviewed changes

torch_xla/csrc/tensor_util.cpp Show resolved Hide resolved

Introduce virtual device class

1a43d3a

steventk-g force-pushed the virtual-device branch from 5b7ac6f to 1a43d3a Compare November 18, 2022 21:50

steventk-g merged commit b2bd721 into master Nov 19, 2022

yeounoh added the arm label Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce virtual device #4091

Introduce virtual device #4091

steventk-g commented Oct 12, 2022 •

edited

Loading

yeounoh commented Oct 12, 2022

steventk-g commented Oct 12, 2022

JackCaoG commented Oct 13, 2022

steventk-g commented Oct 18, 2022

yeounoh left a comment

Introduce virtual device #4091

Introduce virtual device #4091

Conversation

steventk-g commented Oct 12, 2022 • edited Loading

yeounoh commented Oct 12, 2022

steventk-g commented Oct 12, 2022

JackCaoG commented Oct 13, 2022

steventk-g commented Oct 18, 2022

yeounoh left a comment

Choose a reason for hiding this comment

steventk-g commented Oct 12, 2022 •

edited

Loading