[SPMD] Add apply_backward_optimization_barrier #6097

alanwaketan · 2023-12-11T21:03:26Z

Summary:
This pull request adds a new API to xla_sharding.py called apply_backward_optimization_barrier where registers a full backward hook that apply an optimization barrier to the given module. This API will prevent the XLA compiler from fusing the module's backward pass with others. And It's useful to prevent gigantic buffers being allocated to synchronize the gradients.

It's also used in pytorch-tpu/transformers#50.

Test Plan:
python test/spmd/test_xla_sharding.py -v -k test_backward_optimization_barrier

jonb377

Nice one! LGTM

torch_xla/distributed/spmd/xla_sharding.py

alanwaketan · 2023-12-11T22:11:08Z

test/spmd/test_xla_sharding.py

+    # The first layer won't have gradients in the hook. Not sure why.
+    xs.xla_sharding.apply_backward_optimization_barrier(model.fc2)
+
+    # optimizer.zero_grad()


I should remove this. oops...

alanwaketan · 2023-12-11T22:11:24Z

Thanks Jon for approving the pull request.

yeounoh

LGTM thanks!

Summary: This pull request adds a new API to xla_sharding.py called apply_backward_optimization_barrier where registers a full backward hook that apply an optimization barrier to the given module. This API will prevent the XLA compiler from fusing the module's backward pass with others. And It's useful to prevent gigantic buffers being allocated to synchronize the gradients. Test Plan: python test/spmd/test_xla_sharding.py -v -k test_backward_optimization_barrier

Summary: This pull request adds a new API to xla_sharding.py called apply_backward_optimization_barrier where registers a full backward hook that apply an optimization barrier to the given module. This API will prevent the XLA compiler from fusing the module's backward pass with others. And It's useful to prevent gigantic buffers being allocated to synchronize the gradients. It's also used in pytorch-tpu/transformers#50. Test Plan: python test/spmd/test_xla_sharding.py -v -k test_backward_optimization_barrier

alanwaketan requested review from yeounoh, jonb377 and JackCaoG December 11, 2023 21:03

jonb377 approved these changes Dec 11, 2023

View reviewed changes

torch_xla/distributed/spmd/xla_sharding.py Show resolved Hide resolved

JackCaoG added the backport_2.2 label Dec 11, 2023

alanwaketan commented Dec 11, 2023

View reviewed changes

yeounoh approved these changes Dec 11, 2023

View reviewed changes

JackCaoG approved these changes Dec 11, 2023

View reviewed changes

alanwaketan added 3 commits December 12, 2023 04:09

Fix linters

f03ab86

nit

cb3bad6

alanwaketan force-pushed the alanwaketan/opt-barrier branch from 23d1272 to cb3bad6 Compare December 12, 2023 04:09

alanwaketan merged commit 07540f2 into master Dec 12, 2023

alanwaketan mentioned this pull request Dec 14, 2023

2.2 backport PR request list #6036

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPMD] Add apply_backward_optimization_barrier #6097

[SPMD] Add apply_backward_optimization_barrier #6097

alanwaketan commented Dec 11, 2023 •

edited

Loading

jonb377 left a comment

alanwaketan Dec 11, 2023

alanwaketan commented Dec 11, 2023

yeounoh left a comment

[SPMD] Add apply_backward_optimization_barrier #6097

[SPMD] Add apply_backward_optimization_barrier #6097

Conversation

alanwaketan commented Dec 11, 2023 • edited Loading

jonb377 left a comment

Choose a reason for hiding this comment

alanwaketan Dec 11, 2023

Choose a reason for hiding this comment

alanwaketan commented Dec 11, 2023

yeounoh left a comment

Choose a reason for hiding this comment

alanwaketan commented Dec 11, 2023 •

edited

Loading