Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hexagon][LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout #10905

Merged
merged 11 commits into from
Apr 12, 2022

Conversation

Lunderberg
Copy link
Contributor

  • In the CodeGenLLVM::CreateIntrinsic handler for builtin::address_of(), pass N-d indices to CodeGenLLVM::CreateBufferPtr. The base class implementation still asserts that there is a flat memory space, while the CodeGenHexagon::CreateBufferPtr override allows 2-d memory.

  • Enable tensorization in test_cache_read_write.py, using tir.address_of to pass the lowered value.

Co-authored-by: Adam Straw astraw@octoml.ai

- In the `CodeGenLLVM::CreateIntrinsic` handler for
  `builtin::address_of()`, pass N-d indices to
  `CodeGenLLVM::CreateBufferPtr`.  The base class implementation still
  asserts that there is a flat memory space, while the
  `CodeGenHexagon::CreateBufferPtr` override allows 2-d memory.

- Enable tensorization in `test_cache_read_write.py`, using
  `tir.address_of` to pass the lowered value.

Co-authored-by: Adam Straw <astraw@octoml.ai>
@Lunderberg
Copy link
Contributor Author

This change shouldn't depend on the changes introduced in #10878 and #10903, but local testing was done including those changes as well.

Lunderberg and others added 3 commits April 5, 2022 14:14
Previously, any `buffer_bind_scope` attribute that provides a view
into a non-flat buffer would result in an error.  After this commit,
`buffer_bind_scope` may be used for non-flat buffers, but use of
`arg_buffer->elem_offset` within the body of the bind statement is
still an error.

The `BufferNode::elem_offset` field represents the offset between the
pointer of the backing allocation and the first element of the buffer.
This offset is only well-defined for flat memory spaces.
@Lunderberg Lunderberg changed the title [Hexagon][LLVM] Enable/test tensorized Hexagon DMA [Hexagon][LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout Apr 6, 2022
def layout_transform_2d(n):
return [n // 16, te.AXIS_SEPARATOR, n % 16]


@requires_hexagon_toolchain
def test_cache_read_write(hexagon_session):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test demonstrating the approach for discontiguous memory and contiguous memory? I notice you are overwriting the old test coverage with this change and it's likely useful to maintain coverage for both cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 485 to 487
// Sentinel value for ArgBinder::BindBuffer to state that any usage
// of element offset is invalid.
slice.CopyOnWrite()->elem_offset = PrimExpr();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding this comment to explain why the use of element offset is invalid in the Nd case, or even better a short TODO to update once the IR/buffer is changed to support Nd offset would help. It took me a little while to understand why this was necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lunderberg Let me know if you are OK with the comment rewrite here including the TODO I wrote related to PR #10816.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change and the TODO look good to me.

@csullivan csullivan merged commit 11d22bd into apache:main Apr 12, 2022
@csullivan
Copy link
Contributor

Thanks @adstraw @Lunderberg! This is merged

@Lunderberg Lunderberg deleted the hexagon_2d_dma_tensorization branch April 12, 2022 19:56
AndrewZhaoLuo added a commit to AndrewZhaoLuo/tvm that referenced this pull request Apr 15, 2022
* main: (527 commits)
  [hexagon] 'add_hvx' test to explore HVX usage. (apache#10604)
  [COMMUNITY] @yzh119 -> Reviewer (apache#10993)
  [Metaschedule] Make custom schedule_rule registration optional (apache#10975)
  [ONNX] Add imports for BERT contrib operators (apache#10949)
  sort axes (apache#10985)
  [Hexagon] Remove HexagonBuffer external constructor and support (apache#10978)
  [CI] Update GPU image (apache#10992)
  [Runtime][Vulkan] Add RGP support to TVM for vulkan device (apache#10953)
  [FIX] resolve int64/32 for AttrStmtNode (apache#10983)
  [TVMC] Allow output module name to be passed as a command line argument (apache#10962)
  [ONNX] Add MatMulInteger importer (apache#10450)
  [COMMUNITY] @guberti -> Reviewer (apache#10976)
  Support `qnn.conv2d` in FoldExplicitPading (apache#10982)
  change Hexagon docker version (apache#10981)
  remove exception handling of autotvm xgboost extract functions (apache#10948)
  [CUDNN] Add partitioning support for conv2d and log_softmax (apache#10961)
  [Hexagon][LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout (apache#10905)
  [Hexagon] Move aot/graph_executor interactions into launcher (apache#10907)
  [HEXAGON] Split huge 1D DMA Transfers into smaller transfers with legal sizes. (apache#10971)
  [CI][DOCKER] Add pytest-lazy-fixture to images (apache#10970)
  ...
Lucien0 pushed a commit to Lucien0/tvm that referenced this pull request Apr 19, 2022
…layout (apache#10905)

* [Hexagon][LLVM] Enable/test tensorized Hexagon DMA

- In the `CodeGenLLVM::CreateIntrinsic` handler for
  `builtin::address_of()`, pass N-d indices to
  `CodeGenLLVM::CreateBufferPtr`.  The base class implementation still
  asserts that there is a flat memory space, while the
  `CodeGenHexagon::CreateBufferPtr` override allows 2-d memory.

- Enable tensorization in `test_cache_read_write.py`, using
  `tir.address_of` to pass the lowered value.

Co-authored-by: Adam Straw <astraw@octoml.ai>

* [TIR] Allow buffer_bind_scope of N-d buffers

Previously, any `buffer_bind_scope` attribute that provides a view
into a non-flat buffer would result in an error.  After this commit,
`buffer_bind_scope` may be used for non-flat buffers, but use of
`arg_buffer->elem_offset` within the body of the bind statement is
still an error.

The `BufferNode::elem_offset` field represents the offset between the
pointer of the backing allocation and the first element of the buffer.
This offset is only well-defined for flat memory spaces.

* update test to tensorize cache_read `y` (works) and cache_write `z` (fails)

* add `split` to allow for tensorization of cache_write of `z`

* fix typo and cleanup comment

* add back original 1d test_cache_read_write

* update comments

* format error

Co-authored-by: Adam Straw <astraw@octoml.ai>
altanh pushed a commit to altanh/tvm that referenced this pull request Apr 28, 2022
…layout (apache#10905)

* [Hexagon][LLVM] Enable/test tensorized Hexagon DMA

- In the `CodeGenLLVM::CreateIntrinsic` handler for
  `builtin::address_of()`, pass N-d indices to
  `CodeGenLLVM::CreateBufferPtr`.  The base class implementation still
  asserts that there is a flat memory space, while the
  `CodeGenHexagon::CreateBufferPtr` override allows 2-d memory.

- Enable tensorization in `test_cache_read_write.py`, using
  `tir.address_of` to pass the lowered value.

Co-authored-by: Adam Straw <astraw@octoml.ai>

* [TIR] Allow buffer_bind_scope of N-d buffers

Previously, any `buffer_bind_scope` attribute that provides a view
into a non-flat buffer would result in an error.  After this commit,
`buffer_bind_scope` may be used for non-flat buffers, but use of
`arg_buffer->elem_offset` within the body of the bind statement is
still an error.

The `BufferNode::elem_offset` field represents the offset between the
pointer of the backing allocation and the first element of the buffer.
This offset is only well-defined for flat memory spaces.

* update test to tensorize cache_read `y` (works) and cache_write `z` (fails)

* add `split` to allow for tensorization of cache_write of `z`

* fix typo and cleanup comment

* add back original 1d test_cache_read_write

* update comments

* format error

Co-authored-by: Adam Straw <astraw@octoml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants