Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIR] Handle axis_separators during FlattenBuffer #12652

Merged

Conversation

Lunderberg
Copy link
Contributor

@Lunderberg Lunderberg commented Aug 30, 2022

For buffers with more than one physical axis, the axis_separators are required in order to know which groups of logical axes to fuse into each physical axis. The implementation in tir.FlattenBuffer assumed that all buffers were being flattened to a single physical axis. Because tir.LowerOpaqueBlock replaces the BlockNode::alloc_buffers with Allocate nodes, tir.FlattenBuffer no longer has access to the axis separators and performs inconsistent flattening for Allocate as opposed to BufferLoad/BufferStore. This was introduced in #12172, which decoupled the lowering/flattening steps.

The commit reorders the tir.FlattenBuffer to occur before tir.LowerOpaqueBlock, to make use of the axis separators. Any Allocate nodes that exist at that point (e.g. from hand-written schedules) are still flattened to 1-d physical buffers, but the BlockNode::alloc_buffers are flattened according to the axis separators. See #12652 (comment).

This PR adds a DeclBuffer node to the output of tir.LowerOpaqueBlock, which is then used during tir.FlattenBuffer to identify the axis separators.

cc @Hzfengsy @junrushao1994

For buffers with more than one physical axis, the `axis_separators`
are required in order to know which groups of logical axes to fuse
into each physical axis.  The implementation in `tir.FlattenBuffer`
assumed that all buffers were being flattened to a single physical
axis.  Because `tir.LowerOpaqueBlock` replaces the
`BlockNode::alloc_buffers` with `Allocate` nodes, `tir.FlattenBuffer`
no longer has access to the axis separators and performs inconsistent
flattening for `Allocate` as opposed to `BufferLoad`/`BufferStore`.
This was introduced in apache#12172, which
decoupled the lowering/flattening steps.

The commit reorders the `tir.FlattenBuffer` to occur before
`tir.LowerOpaqueBlock`, to make use of the axis separators.  Any
`Allocate` nodes that exist at that point (e.g. from hand-written
schedules) are still flattened to 1-d physical buffers, but the
`BlockNode::alloc_buffers` are flattened according to the axis
separators.
@wrongtest-intellif
Copy link
Contributor

wrongtest-intellif commented Aug 31, 2022

LGTM. Here are my remaining some questions:

  1. Since now we have decl_buffer node, if we also convert T.alloc_buffer to T.decl_buffer + T.allocate in LowerOpaqueBlock, could the axis_separators get preserved? (I am not so familiar with it, but I notice that it is in the Buffer object's field). Maybe then the order of the two passes could get totally free. Our team rely on the IR form which is block-free but multi-dimensional buffer accessing to perform certain analysis and rewriting, thus prefer to lower block before flatten in a customized configuration.

  2. Do we have some protection case for the axis_separators feature? It seems [TIR Pass] Decouple flatten buffer to lower opaque block and flatten buffer. #12172 passes without recognizing the introduced issue.

@Lunderberg



class TestGPU(BaseCompare):
"""Buffers allocated inside GPU-specific constructs are ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the meaning of "ignored" refer to B?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I was mistaken in that comment, and was getting it mixed up with some of the allocation handling in StorageRewrite. Updated the comment.

@Lunderberg
Copy link
Contributor Author

Since now we have decl_buffer node, if we also convert T.alloc_buffer to T.decl_buffer + T.allocate in LowerOpaqueBlock, could the axis_separators get preserved? (I am not so familiar with it, but I notice that it is in the Buffer object's field). Maybe then the order of the two passes could get totally free. Our team rely on the IR form which is block-free but multi-dimensional buffer accessing to perform certain analysis and rewriting, thus prefer to lower block before flatten in a customized configuration.

I like that idea, and having independent order of passes would be better overall. There was some similar logic in StorageFlatten that would look for a BufferLoad/BufferStore in order to know the appropriate axes to flatten, but the DeclBuffer usage would be even cleaner.

Do we have some protection case for the axis_separators feature?

In principle, the tests/python/contrib/test_hexagon/test_2d_physical_buffers.py::TestElementWise::test_cache_shape should have caught it, as it validates the shape of a buffer after lowering. Looking at it again, it uses a TE-based schedule, so it goes through StorageFlatten instead of LowerOpaqueBlock/FlattenBuffer.

Adding a test that validates the buffer shape after running through all of tvm.lower, and will be expanding the hexagon-focused PRs in a follow-up.

(Side-note: If StorageFlatten were to replace BufferRealize with Allocate as it currently does, but add a DeclBuffer instead of performing the flattening itself, then the same FlattenBuffer pass could apply to both types of schedules. That would be another argument in favor of having FlattenBuffer read from DeclBuffer, to minimize duplication.)

The DeclBuffer node can be inserted during LowerOpaqueBlock, then
provide the missing Buffer information required to flatten the
allocation.
@Lunderberg
Copy link
Contributor Author

Inserting the DeclBuffer node during LowerOpaqueBlock, such that FlattenBuffer can be performed after LowerOpaqueBlock. (And renaming the title of this PR accordingly.)

I'm also reverting most of the unit test changes, since they should go back to operating on Allocate nodes. I expect some of these tests to be gated on #12412, which will be necessary to correctly express the axis separators in TVMScript.

@Lunderberg Lunderberg changed the title [TIR] Moved tir.FlattenBuffer to occur before tir.LowerOpaqueBlock [TIR] Handle axis_separators during FlattenBuffer Aug 31, 2022
With the insertion of `DeclBuffer` nodes, `LowerOpaqueBlock` no longer
needs to be before `FlattenBuffer`, and has been moved back to its
original position.  Revering the tests to use `T.allocate` instead of
`T.alloc_buffer` more closely represents the functions as they are
being lowered.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Aug 31, 2022
Previously, the test cases only tested TE-based schedules.  This
commit runs the same tests for equivalent TIR-based schedules as
well.  This is intended to catch Hexagon-specific regressions, such as
the one resolved in apache#12652.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Aug 31, 2022
Previously, the test cases only tested TE-based schedules.  This
commit runs the same tests for equivalent TIR-based schedules as
well.  This is intended to catch Hexagon-specific regressions, such as
the one resolved in apache#12652.
The DeclBuffer annotations aren't yet supported in all passes.  This
restricts them to being introduced in LowerOpaqueBuffer, then
immediately removed in FlattenBuffer.
@Lunderberg
Copy link
Contributor Author

cc @cconvey

@Lunderberg Lunderberg force-pushed the swap_loweropaqueblock_flattenbuffer branch from 23cf49b to 018f4f8 Compare September 6, 2022 21:29
@Lunderberg
Copy link
Contributor Author

@tvm-bot rerun

@Lunderberg
Copy link
Contributor Author

@wrongtest-intellif Requesting a re-review, if you have the time, as the implementation changed significantly from the reviewed/approved version.

Copy link
Contributor

@wrongtest-intellif wrongtest-intellif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the great efforts~

@wrongtest-intellif wrongtest-intellif merged commit b2bd434 into apache:main Sep 8, 2022
@Lunderberg Lunderberg deleted the swap_loweropaqueblock_flattenbuffer branch September 8, 2022 16:31
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Sep 8, 2022
Previously, the test cases only tested TE-based schedules.  This
commit runs the same tests for equivalent TIR-based schedules as
well.  This is intended to catch Hexagon-specific regressions, such as
the one resolved in apache#12652.
mehrdadh pushed a commit that referenced this pull request Sep 12, 2022
)

Previously, the test cases only tested TE-based schedules.  This
commit runs the same tests for equivalent TIR-based schedules as
well.  This is intended to catch Hexagon-specific regressions, such as
the one resolved in #12652.
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
* [TIR] Moved tir.FlattenBuffer to occur before tir.LowerOpaqueBlock

For buffers with more than one physical axis, the `axis_separators`
are required in order to know which groups of logical axes to fuse
into each physical axis.  The implementation in `tir.FlattenBuffer`
assumed that all buffers were being flattened to a single physical
axis.  Because `tir.LowerOpaqueBlock` replaces the
`BlockNode::alloc_buffers` with `Allocate` nodes, `tir.FlattenBuffer`
no longer has access to the axis separators and performs inconsistent
flattening for `Allocate` as opposed to `BufferLoad`/`BufferStore`.
This was introduced in apache#12172, which
decoupled the lowering/flattening steps.

The commit reorders the `tir.FlattenBuffer` to occur before
`tir.LowerOpaqueBlock`, to make use of the axis separators.  Any
`Allocate` nodes that exist at that point (e.g. from hand-written
schedules) are still flattened to 1-d physical buffers, but the
`BlockNode::alloc_buffers` are flattened according to the axis
separators.

* Add unit test to validate non-flat memory after tvm.lower

* Explicitly write T.reads for test on BufferRegion updates

* Update incorrect docstring for test

* Use DeclBuffer information in FlattenBuffer

The DeclBuffer node can be inserted during LowerOpaqueBlock, then
provide the missing Buffer information required to flatten the
allocation.

* Use T.allocate in unit tests

With the insertion of `DeclBuffer` nodes, `LowerOpaqueBlock` no longer
needs to be before `FlattenBuffer`, and has been moved back to its
original position.  Revering the tests to use `T.allocate` instead of
`T.alloc_buffer` more closely represents the functions as they are
being lowered.

* Fix usage of T.decl_buffer in updated tests

* Update LowerOpaqueBuffer to expect the DeclBuffer nodes

* Strip DeclBuffer annotation in FlattenBuffer

The DeclBuffer annotations aren't yet supported in all passes.  This
restricts them to being introduced in LowerOpaqueBuffer, then
immediately removed in FlattenBuffer.

* Strip out all DeclBuffer nodes in FlattenBuffer

* Update unit tests to remove expectation of DeclBuffer nodes
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…che#12662)

Previously, the test cases only tested TE-based schedules.  This
commit runs the same tests for equivalent TIR-based schedules as
well.  This is intended to catch Hexagon-specific regressions, such as
the one resolved in apache#12652.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants