[SPMD] Introduce high level manual sharding APIs #6931

alanwaketan · 2024-04-17T00:46:51Z

Summary:
This pull request introduces:

enable_manual_sharding: which starts the manual sharding region.
disable_manual_sharding: which disable the manual sharding region.

Test Plan:
PJRT_DEVICE=TPU python test/spmd/test_xla_sharding.py -v -k test_manual_sharding_api_e2e

yeounoh · 2024-04-17T17:26:09Z

torch_xla/distributed/spmd/xla_sharding.py

+                           *,
+                           mesh: Mesh = None) -> XLAShardedTensor:
+  """
+  This API enables manual sharding for the given tensor. Manual sharding disables auto sharding proporgation and auto


"auto" --> "SPMD", think it's important to not confuse.

yeounoh

LGTM, left a comment for comment :)

jonb377

LGTM!

jonb377 · 2024-04-17T17:36:50Z

torch_xla/distributed/spmd/xla_sharding.py

+  """
+  mesh = get_global_mesh() if mesh is None else mesh
+  t = mark_sharding(unwrap_sharded_tensor(t), mesh, partition_spec)
+  t = torch_xla._XLAC._spmd_full_to_shard_shape(unwrap_sharded_tensor(t))


Can t here be DeviceData?

You mean the input? Yes!

jonb377 · 2024-04-17T17:38:40Z

torch_xla/distributed/spmd/xla_sharding.py

+  """
+  This API enables manual sharding for the given tensor. Manual sharding disables auto sharding proporgation and auto
+  partition for the given tensor and all subsequential tensors that produced by an op that uses the given tensor as
+  input, and therefore allows the user to manually call collectives for the tensor and subsequential tensors. It


Also just curious - how will we enable collectives in a manual region?

XLA cc ops by default should work. Just use it as normal. However, we need to teach our cc ops wrapper to be aware of SPMD mode. So, it will be phase 2 of the mnual sharding.

Summary: This pull request introduces: 1. enable_manual_sharding: which starts the manual sharding region. 2. disable_manual_sharding: which disable the manual sharding region. Test Plan: PJRT_DEVICE=TPU python test/spmd/test_xla_sharding.py -v -k test_manual_sharding_api_e2e

alanwaketan added 4 commits April 17, 2024 00:40

Add an e2e test

0908fb3

initial commit

ce606ef

Fix linters

0bf9554

Fix linters

651415a

alanwaketan requested review from yeounoh and jonb377 April 17, 2024 00:46

alanwaketan self-assigned this Apr 17, 2024

yeounoh reviewed Apr 17, 2024

View reviewed changes

yeounoh approved these changes Apr 17, 2024

View reviewed changes

jonb377 approved these changes Apr 17, 2024

View reviewed changes

jonb377 reviewed Apr 17, 2024

View reviewed changes

Fix comment

4aafe14

alanwaketan merged commit 9b2ac4b into master Apr 17, 2024
4 checks passed

alanwaketan deleted the alanwaketan/manual_sharding_api branch April 17, 2024 18:28

baoleai mentioned this pull request Aug 6, 2024

Add manual sharding API for SPMD AlibabaPAI/xla#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPMD] Introduce high level manual sharding APIs #6931

[SPMD] Introduce high level manual sharding APIs #6931

alanwaketan commented Apr 17, 2024

yeounoh Apr 17, 2024

yeounoh left a comment

jonb377 left a comment

jonb377 Apr 17, 2024

alanwaketan Apr 17, 2024

jonb377 Apr 17, 2024

alanwaketan Apr 17, 2024

[SPMD] Introduce high level manual sharding APIs #6931

[SPMD] Introduce high level manual sharding APIs #6931

Conversation

alanwaketan commented Apr 17, 2024

yeounoh Apr 17, 2024

Choose a reason for hiding this comment

yeounoh left a comment

Choose a reason for hiding this comment

jonb377 left a comment

Choose a reason for hiding this comment

jonb377 Apr 17, 2024

Choose a reason for hiding this comment

alanwaketan Apr 17, 2024

Choose a reason for hiding this comment

jonb377 Apr 17, 2024

Choose a reason for hiding this comment

alanwaketan Apr 17, 2024

Choose a reason for hiding this comment