[TIR] Handle axis_separators during FlattenBuffer #12652

Lunderberg · 2022-08-30T16:29:19Z

For buffers with more than one physical axis, the axis_separators are required in order to know which groups of logical axes to fuse into each physical axis. The implementation in tir.FlattenBuffer assumed that all buffers were being flattened to a single physical axis. Because tir.LowerOpaqueBlock replaces the BlockNode::alloc_buffers with Allocate nodes, tir.FlattenBuffer no longer has access to the axis separators and performs inconsistent flattening for Allocate as opposed to BufferLoad/BufferStore. This was introduced in #12172, which decoupled the lowering/flattening steps.

The commit reorders the tir.FlattenBuffer to occur before tir.LowerOpaqueBlock, to make use of the axis separators. Any Allocate nodes that exist at that point (e.g. from hand-written schedules) are still flattened to 1-d physical buffers, but the BlockNode::alloc_buffers are flattened according to the axis separators. See #12652 (comment).

This PR adds a DeclBuffer node to the output of tir.LowerOpaqueBlock, which is then used during tir.FlattenBuffer to identify the axis separators.

cc @Hzfengsy @junrushao1994

For buffers with more than one physical axis, the `axis_separators` are required in order to know which groups of logical axes to fuse into each physical axis. The implementation in `tir.FlattenBuffer` assumed that all buffers were being flattened to a single physical axis. Because `tir.LowerOpaqueBlock` replaces the `BlockNode::alloc_buffers` with `Allocate` nodes, `tir.FlattenBuffer` no longer has access to the axis separators and performs inconsistent flattening for `Allocate` as opposed to `BufferLoad`/`BufferStore`. This was introduced in apache#12172, which decoupled the lowering/flattening steps. The commit reorders the `tir.FlattenBuffer` to occur before `tir.LowerOpaqueBlock`, to make use of the axis separators. Any `Allocate` nodes that exist at that point (e.g. from hand-written schedules) are still flattened to 1-d physical buffers, but the `BlockNode::alloc_buffers` are flattened according to the axis separators.

wrongtest-intellif · 2022-08-31T00:53:20Z

LGTM. Here are my remaining some questions:

Since now we have decl_buffer node, if we also convert T.alloc_buffer to T.decl_buffer + T.allocate in LowerOpaqueBlock, could the axis_separators get preserved? (I am not so familiar with it, but I notice that it is in the Buffer object's field). Maybe then the order of the two passes could get totally free. Our team rely on the IR form which is block-free but multi-dimensional buffer accessing to perform certain analysis and rewriting, thus prefer to lower block before flatten in a customized configuration.
Do we have some protection case for the axis_separators feature? It seems [TIR Pass] Decouple flatten buffer to lower opaque block and flatten buffer. #12172 passes without recognizing the introduced issue.

@Lunderberg

tests/python/unittest/test_tir_transform_flatten_buffer.py

wrongtest-intellif · 2022-08-31T01:09:29Z

tests/python/unittest/test_tir_transform_flatten_buffer.py

+
+
+class TestGPU(BaseCompare):
+    """Buffers allocated inside GPU-specific constructs are ignored.


Does the meaning of "ignored" refer to B?

Whoops, I was mistaken in that comment, and was getting it mixed up with some of the allocation handling in StorageRewrite. Updated the comment.

Lunderberg · 2022-08-31T14:52:11Z

Since now we have decl_buffer node, if we also convert T.alloc_buffer to T.decl_buffer + T.allocate in LowerOpaqueBlock, could the axis_separators get preserved? (I am not so familiar with it, but I notice that it is in the Buffer object's field). Maybe then the order of the two passes could get totally free. Our team rely on the IR form which is block-free but multi-dimensional buffer accessing to perform certain analysis and rewriting, thus prefer to lower block before flatten in a customized configuration.

I like that idea, and having independent order of passes would be better overall. There was some similar logic in StorageFlatten that would look for a BufferLoad/BufferStore in order to know the appropriate axes to flatten, but the DeclBuffer usage would be even cleaner.

Do we have some protection case for the axis_separators feature?

In principle, the tests/python/contrib/test_hexagon/test_2d_physical_buffers.py::TestElementWise::test_cache_shape should have caught it, as it validates the shape of a buffer after lowering. Looking at it again, it uses a TE-based schedule, so it goes through StorageFlatten instead of LowerOpaqueBlock/FlattenBuffer.

Adding a test that validates the buffer shape after running through all of tvm.lower, and will be expanding the hexagon-focused PRs in a follow-up.

(Side-note: If StorageFlatten were to replace BufferRealize with Allocate as it currently does, but add a DeclBuffer instead of performing the flattening itself, then the same FlattenBuffer pass could apply to both types of schedules. That would be another argument in favor of having FlattenBuffer read from DeclBuffer, to minimize duplication.)

The DeclBuffer node can be inserted during LowerOpaqueBlock, then provide the missing Buffer information required to flatten the allocation.

Lunderberg · 2022-08-31T16:11:04Z

Inserting the DeclBuffer node during LowerOpaqueBlock, such that FlattenBuffer can be performed after LowerOpaqueBlock. (And renaming the title of this PR accordingly.)

I'm also reverting most of the unit test changes, since they should go back to operating on Allocate nodes. I expect some of these tests to be gated on #12412, which will be necessary to correctly express the axis separators in TVMScript.

With the insertion of `DeclBuffer` nodes, `LowerOpaqueBlock` no longer needs to be before `FlattenBuffer`, and has been moved back to its original position. Revering the tests to use `T.allocate` instead of `T.alloc_buffer` more closely represents the functions as they are being lowered.

Previously, the test cases only tested TE-based schedules. This commit runs the same tests for equivalent TIR-based schedules as well. This is intended to catch Hexagon-specific regressions, such as the one resolved in apache#12652.

The DeclBuffer annotations aren't yet supported in all passes. This restricts them to being introduced in LowerOpaqueBuffer, then immediately removed in FlattenBuffer.

Lunderberg · 2022-09-01T16:41:59Z

cc @cconvey

Lunderberg · 2022-09-07T13:06:29Z

@tvm-bot rerun

Lunderberg · 2022-09-07T16:45:34Z

@wrongtest-intellif Requesting a re-review, if you have the time, as the implementation changed significantly from the reviewed/approved version.

wrongtest-intellif

LGTM. Thanks for the great efforts~

Previously, the test cases only tested TE-based schedules. This commit runs the same tests for equivalent TIR-based schedules as well. This is intended to catch Hexagon-specific regressions, such as the one resolved in apache#12652.

) Previously, the test cases only tested TE-based schedules. This commit runs the same tests for equivalent TIR-based schedules as well. This is intended to catch Hexagon-specific regressions, such as the one resolved in #12652.

* [TIR] Moved tir.FlattenBuffer to occur before tir.LowerOpaqueBlock For buffers with more than one physical axis, the `axis_separators` are required in order to know which groups of logical axes to fuse into each physical axis. The implementation in `tir.FlattenBuffer` assumed that all buffers were being flattened to a single physical axis. Because `tir.LowerOpaqueBlock` replaces the `BlockNode::alloc_buffers` with `Allocate` nodes, `tir.FlattenBuffer` no longer has access to the axis separators and performs inconsistent flattening for `Allocate` as opposed to `BufferLoad`/`BufferStore`. This was introduced in apache#12172, which decoupled the lowering/flattening steps. The commit reorders the `tir.FlattenBuffer` to occur before `tir.LowerOpaqueBlock`, to make use of the axis separators. Any `Allocate` nodes that exist at that point (e.g. from hand-written schedules) are still flattened to 1-d physical buffers, but the `BlockNode::alloc_buffers` are flattened according to the axis separators. * Add unit test to validate non-flat memory after tvm.lower * Explicitly write T.reads for test on BufferRegion updates * Update incorrect docstring for test * Use DeclBuffer information in FlattenBuffer The DeclBuffer node can be inserted during LowerOpaqueBlock, then provide the missing Buffer information required to flatten the allocation. * Use T.allocate in unit tests With the insertion of `DeclBuffer` nodes, `LowerOpaqueBlock` no longer needs to be before `FlattenBuffer`, and has been moved back to its original position. Revering the tests to use `T.allocate` instead of `T.alloc_buffer` more closely represents the functions as they are being lowered. * Fix usage of T.decl_buffer in updated tests * Update LowerOpaqueBuffer to expect the DeclBuffer nodes * Strip DeclBuffer annotation in FlattenBuffer The DeclBuffer annotations aren't yet supported in all passes. This restricts them to being introduced in LowerOpaqueBuffer, then immediately removed in FlattenBuffer. * Strip out all DeclBuffer nodes in FlattenBuffer * Update unit tests to remove expectation of DeclBuffer nodes

…che#12662) Previously, the test cases only tested TE-based schedules. This commit runs the same tests for equivalent TIR-based schedules as well. This is intended to catch Hexagon-specific regressions, such as the one resolved in apache#12652.

Lunderberg added the status: need review label Aug 30, 2022

vinx13 requested a review from wrongtest-intellif August 30, 2022 23:03

wrongtest-intellif reviewed Aug 31, 2022

View reviewed changes

tests/python/unittest/test_tir_transform_flatten_buffer.py Show resolved Hide resolved

wrongtest-intellif approved these changes Aug 31, 2022

View reviewed changes

junrushao assigned wrongtest-intellif Aug 31, 2022

github-actions bot requested a review from Hzfengsy August 31, 2022 04:37

Add unit test to validate non-flat memory after tvm.lower

b123adb

Lunderberg added 3 commits August 31, 2022 10:05

Explicitly write T.reads for test on BufferRegion updates

6bc203c

Update incorrect docstring for test

a4b7573

Use DeclBuffer information in FlattenBuffer

97fcb29

The DeclBuffer node can be inserted during LowerOpaqueBlock, then provide the missing Buffer information required to flatten the allocation.

Lunderberg changed the title ~~[TIR] Moved tir.FlattenBuffer to occur before tir.LowerOpaqueBlock~~ [TIR] Handle axis_separators during FlattenBuffer Aug 31, 2022

Lunderberg mentioned this pull request Aug 31, 2022

[Hexagon] Validate 2-d physical shapes for TIR-derived schedules #12662

Merged

Lunderberg added 2 commits August 31, 2022 11:53

Merge branch 'main' into swap_loweropaqueblock_flattenbuffer

82ac543

Fix usage of T.decl_buffer in updated tests

aaa47b8

Lunderberg added 2 commits September 1, 2022 10:21

Update LowerOpaqueBuffer to expect the DeclBuffer nodes

ea0a87a

Strip DeclBuffer annotation in FlattenBuffer

12bba28

The DeclBuffer annotations aren't yet supported in all passes. This restricts them to being introduced in LowerOpaqueBuffer, then immediately removed in FlattenBuffer.

Lunderberg added 2 commits September 6, 2022 10:25

Strip out all DeclBuffer nodes in FlattenBuffer

2a35839

Update unit tests to remove expectation of DeclBuffer nodes

018f4f8

Lunderberg force-pushed the swap_loweropaqueblock_flattenbuffer branch from 23cf49b to 018f4f8 Compare September 6, 2022 21:29

Lunderberg requested a review from wrongtest-intellif September 7, 2022 16:43

Merge branch 'main' into swap_loweropaqueblock_flattenbuffer

337ed4d

wrongtest-intellif approved these changes Sep 8, 2022

View reviewed changes

wrongtest-intellif merged commit b2bd434 into apache:main Sep 8, 2022

Lunderberg deleted the swap_loweropaqueblock_flattenbuffer branch September 8, 2022 16:31

AndrewZhaoLuo mentioned this pull request Oct 4, 2022

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] Handle axis_separators during FlattenBuffer #12652

[TIR] Handle axis_separators during FlattenBuffer #12652

Lunderberg commented Aug 30, 2022 •

edited

Loading

wrongtest-intellif commented Aug 31, 2022 •

edited

Loading

wrongtest-intellif Aug 31, 2022

Lunderberg Aug 31, 2022

Lunderberg commented Aug 31, 2022

Lunderberg commented Aug 31, 2022

Lunderberg commented Sep 1, 2022

Lunderberg commented Sep 7, 2022

Lunderberg commented Sep 7, 2022

wrongtest-intellif left a comment



		class TestGPU(BaseCompare):
		"""Buffers allocated inside GPU-specific constructs are ignored.

[TIR] Handle axis_separators during FlattenBuffer #12652

[TIR] Handle axis_separators during FlattenBuffer #12652

Conversation

Lunderberg commented Aug 30, 2022 • edited Loading

wrongtest-intellif commented Aug 31, 2022 • edited Loading

wrongtest-intellif Aug 31, 2022

Choose a reason for hiding this comment

Lunderberg Aug 31, 2022

Choose a reason for hiding this comment

Lunderberg commented Aug 31, 2022

Lunderberg commented Aug 31, 2022

Lunderberg commented Sep 1, 2022

Lunderberg commented Sep 7, 2022

Lunderberg commented Sep 7, 2022

wrongtest-intellif left a comment

Choose a reason for hiding this comment

Lunderberg commented Aug 30, 2022 •

edited

Loading

wrongtest-intellif commented Aug 31, 2022 •

edited

Loading