Add default v5 flags #6168

jonb377 · 2023-12-14T23:50:36Z

Adds default values for TPU-specific XLA flags on v5e and v5p.

Verifying performance with a v5e run of Llama2 70B.

JackCaoG · 2023-12-14T23:51:24Z

do we need these for 2.2 release?

jonb377 · 2023-12-14T23:52:50Z

do we need these for 2.2 release?

It would be nice to include so v5 is performant out-of-the-box on 2.2

alanwaketan

Thanks Jon for making this change. I have a few questions regarding to some of the flags.

alanwaketan · 2023-12-15T11:53:26Z

torch_xla/__init__.py

+        'xla_enable_async_all_gather': 'true',
+        'xla_enable_async_collective_permute': 'true',
+        # Limit compiler-injected rematerialization
+        'xla_jf_rematerialization_percent_shared_memory_limit': '10000',


Do we need this? I don't think MaxText adds this one as well.

For reference: https://github.com/google/maxtext/blob/d2bd833a8b21dd9fd2f02832112dbb4a21b26f69/MaxText/configs/experimental/32b.sh#L32

I tried a run without all three flags mentioned, and the absolute MFU reduction was 1.5%. Including this one increases MFU by 1%. I have another run to identify which of the other two flags accounts for the other 0.5% gain.

alanwaketan · 2023-12-15T11:53:57Z

torch_xla/__init__.py

+        'xla_tpu_enable_async_collective_fusion_fuse_all_gather': 'true',
+        'xla_tpu_enable_async_collective_fusion_multiple_steps': 'true',
+        # Disable net router
+        'xla_tpu_enable_net_router_in_all_gather': 'false',


Should we drop this one?

alanwaketan · 2023-12-15T11:54:15Z

torch_xla/__init__.py

+        # Disable net router
+        'xla_tpu_enable_net_router_in_all_gather': 'false',
+        # Disable experimental Reduce+Broadcast->ReduceWindow-Conv fusion
+        'xla_tpu_rwb_fusion': 'false',


Does this really give us any performance boost?

alanwaketan · 2023-12-15T11:56:30Z

BTW, do we need flags for MultiSlice?

jonb377 · 2023-12-15T18:40:37Z

Thanks @alanwaketan - let me try matching the MaxText flags exactly. A lot of these were recommended by the XLA team prior to your optimization barrier change, which may make them unnecessary.

jonb377 · 2023-12-15T23:03:13Z

I saw ~1.5% performance degradation using the MaxText flags and when omitting the three specified. I think the main driver of this is the rematerialization flag. Trying another run with the remat flag set.

jonb377 · 2023-12-16T00:30:12Z

BTW, do we need flags for MultiSlice?

Comparing to the MaxText flags, the only MultiSlice-related flag without the default value only relates to scanned compilation. I don't think we need to include any in the defaults here, but there can still be gains from tuning flags for the specific MultiSlice environment for a given workload.

alanwaketan · 2023-12-17T23:18:43Z

I saw ~1.5% performance degradation using the MaxText flags and when omitting the three specified. I think the main driver of this is the rematerialization flag. Trying another run with the remat flag set.

@jonb377 Have you figured out where the ~1.5% comes from?

jonb377 · 2023-12-18T18:06:53Z

I found the following:

Adding only xla_jf_rematerialization_percent_shared_memory_limit back yields 0.5% degredation over baseline
Removing only xla_tpu_enable_net_router_in_all_gather from the original set yields ~1.5% improvement over baseline
xla_tpu_rwb_fusion is inconclusive from the tests I've run so far, but doesn't seem to do much.

I'll do one more test removing only xla_tpu_rwb_fusion.

JackCaoG · 2023-12-18T18:09:26Z

This is not really a bug fix, if we want to get this in for 2.2, I would like to get it merge ASAP so we have enough time for testing.

jonb377 · 2023-12-18T19:15:34Z

cc @alanwaketan, let's be conservative on the flags then and land today - I'll update this PR to only enable async collectives. There's also still an issue with async AG fusion in the current libtpu pin for v5p. Users can always specify LIBTPU_INIT_ARGS for the best perf.

Add default v5 flags

7e20976

jonb377 requested review from alanwaketan and JackCaoG December 14, 2023 23:50

jonb377 added the backport_2.2 label Dec 14, 2023

yapf

2ea42e1

alanwaketan reviewed Dec 15, 2023

View reviewed changes

jonb377 mentioned this pull request Dec 15, 2023

2.2 backport PR request list #6036

Open

Only enable async collectives

8861b88

JackCaoG approved these changes Dec 18, 2023

View reviewed changes

jonb377 merged commit 2e6e183 into master Dec 18, 2023

jonb377 deleted the jonbolin/v5-flags branch December 18, 2023 23:43

jonb377 added a commit that referenced this pull request Dec 18, 2023

Add default v5 flags (#6168)

6aebcdd

jonb377 mentioned this pull request Dec 18, 2023

[Backport] Add default v5 flags #6204

Merged

mbzomowski pushed a commit to mbzomowski-test-org/xla that referenced this pull request Jan 3, 2024

Add default v5 flags (pytorch#6168)

a7c7a5f

golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024

Add default v5 flags (#6168)

5e78d7a

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Add default v5 flags (#6168)

9cf5933

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add default v5 flags #6168

Add default v5 flags #6168

jonb377 commented Dec 14, 2023 •

edited

Loading

JackCaoG commented Dec 14, 2023

jonb377 commented Dec 14, 2023

alanwaketan left a comment

alanwaketan Dec 15, 2023

alanwaketan Dec 15, 2023

jonb377 Dec 16, 2023

alanwaketan Dec 15, 2023

alanwaketan Dec 15, 2023

alanwaketan commented Dec 15, 2023

jonb377 commented Dec 15, 2023

jonb377 commented Dec 15, 2023

jonb377 commented Dec 16, 2023

alanwaketan commented Dec 17, 2023

jonb377 commented Dec 18, 2023 •

edited

Loading

JackCaoG commented Dec 18, 2023

jonb377 commented Dec 18, 2023 •

edited

Loading

Add default v5 flags #6168

Add default v5 flags #6168

Conversation

jonb377 commented Dec 14, 2023 • edited Loading

JackCaoG commented Dec 14, 2023

jonb377 commented Dec 14, 2023

alanwaketan left a comment

Choose a reason for hiding this comment

alanwaketan Dec 15, 2023

Choose a reason for hiding this comment

alanwaketan Dec 15, 2023

Choose a reason for hiding this comment

jonb377 Dec 16, 2023

Choose a reason for hiding this comment

alanwaketan Dec 15, 2023

Choose a reason for hiding this comment

alanwaketan Dec 15, 2023

Choose a reason for hiding this comment

alanwaketan commented Dec 15, 2023

jonb377 commented Dec 15, 2023

jonb377 commented Dec 15, 2023

jonb377 commented Dec 16, 2023

alanwaketan commented Dec 17, 2023

jonb377 commented Dec 18, 2023 • edited Loading

JackCaoG commented Dec 18, 2023

jonb377 commented Dec 18, 2023 • edited Loading

jonb377 commented Dec 14, 2023 •

edited

Loading

jonb377 commented Dec 18, 2023 •

edited

Loading

jonb377 commented Dec 18, 2023 •

edited

Loading