-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hexagon][LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout #10905
[Hexagon][LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout #10905
Conversation
- In the `CodeGenLLVM::CreateIntrinsic` handler for `builtin::address_of()`, pass N-d indices to `CodeGenLLVM::CreateBufferPtr`. The base class implementation still asserts that there is a flat memory space, while the `CodeGenHexagon::CreateBufferPtr` override allows 2-d memory. - Enable tensorization in `test_cache_read_write.py`, using `tir.address_of` to pass the lowered value. Co-authored-by: Adam Straw <astraw@octoml.ai>
Previously, any `buffer_bind_scope` attribute that provides a view into a non-flat buffer would result in an error. After this commit, `buffer_bind_scope` may be used for non-flat buffers, but use of `arg_buffer->elem_offset` within the body of the bind statement is still an error. The `BufferNode::elem_offset` field represents the offset between the pointer of the backing allocation and the first element of the buffer. This offset is only well-defined for flat memory spaces.
def layout_transform_2d(n): | ||
return [n // 16, te.AXIS_SEPARATOR, n % 16] | ||
|
||
|
||
@requires_hexagon_toolchain | ||
def test_cache_read_write(hexagon_session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a test demonstrating the approach for discontiguous memory and contiguous memory? I notice you are overwriting the old test coverage with this change and it's likely useful to maintain coverage for both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/tir/ir/buffer.cc
Outdated
// Sentinel value for ArgBinder::BindBuffer to state that any usage | ||
// of element offset is invalid. | ||
slice.CopyOnWrite()->elem_offset = PrimExpr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanding this comment to explain why the use of element offset is invalid in the Nd case, or even better a short TODO to update once the IR/buffer is changed to support Nd offset would help. It took me a little while to understand why this was necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lunderberg Let me know if you are OK with the comment rewrite here including the TODO
I wrote related to PR #10816.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change and the TODO look good to me.
Thanks @adstraw @Lunderberg! This is merged |
* main: (527 commits) [hexagon] 'add_hvx' test to explore HVX usage. (apache#10604) [COMMUNITY] @yzh119 -> Reviewer (apache#10993) [Metaschedule] Make custom schedule_rule registration optional (apache#10975) [ONNX] Add imports for BERT contrib operators (apache#10949) sort axes (apache#10985) [Hexagon] Remove HexagonBuffer external constructor and support (apache#10978) [CI] Update GPU image (apache#10992) [Runtime][Vulkan] Add RGP support to TVM for vulkan device (apache#10953) [FIX] resolve int64/32 for AttrStmtNode (apache#10983) [TVMC] Allow output module name to be passed as a command line argument (apache#10962) [ONNX] Add MatMulInteger importer (apache#10450) [COMMUNITY] @guberti -> Reviewer (apache#10976) Support `qnn.conv2d` in FoldExplicitPading (apache#10982) change Hexagon docker version (apache#10981) remove exception handling of autotvm xgboost extract functions (apache#10948) [CUDNN] Add partitioning support for conv2d and log_softmax (apache#10961) [Hexagon][LLVM] Enable/test tensorized Hexagon DMA on 2d transformed layout (apache#10905) [Hexagon] Move aot/graph_executor interactions into launcher (apache#10907) [HEXAGON] Split huge 1D DMA Transfers into smaller transfers with legal sizes. (apache#10971) [CI][DOCKER] Add pytest-lazy-fixture to images (apache#10970) ...
…layout (apache#10905) * [Hexagon][LLVM] Enable/test tensorized Hexagon DMA - In the `CodeGenLLVM::CreateIntrinsic` handler for `builtin::address_of()`, pass N-d indices to `CodeGenLLVM::CreateBufferPtr`. The base class implementation still asserts that there is a flat memory space, while the `CodeGenHexagon::CreateBufferPtr` override allows 2-d memory. - Enable tensorization in `test_cache_read_write.py`, using `tir.address_of` to pass the lowered value. Co-authored-by: Adam Straw <astraw@octoml.ai> * [TIR] Allow buffer_bind_scope of N-d buffers Previously, any `buffer_bind_scope` attribute that provides a view into a non-flat buffer would result in an error. After this commit, `buffer_bind_scope` may be used for non-flat buffers, but use of `arg_buffer->elem_offset` within the body of the bind statement is still an error. The `BufferNode::elem_offset` field represents the offset between the pointer of the backing allocation and the first element of the buffer. This offset is only well-defined for flat memory spaces. * update test to tensorize cache_read `y` (works) and cache_write `z` (fails) * add `split` to allow for tensorization of cache_write of `z` * fix typo and cleanup comment * add back original 1d test_cache_read_write * update comments * format error Co-authored-by: Adam Straw <astraw@octoml.ai>
…layout (apache#10905) * [Hexagon][LLVM] Enable/test tensorized Hexagon DMA - In the `CodeGenLLVM::CreateIntrinsic` handler for `builtin::address_of()`, pass N-d indices to `CodeGenLLVM::CreateBufferPtr`. The base class implementation still asserts that there is a flat memory space, while the `CodeGenHexagon::CreateBufferPtr` override allows 2-d memory. - Enable tensorization in `test_cache_read_write.py`, using `tir.address_of` to pass the lowered value. Co-authored-by: Adam Straw <astraw@octoml.ai> * [TIR] Allow buffer_bind_scope of N-d buffers Previously, any `buffer_bind_scope` attribute that provides a view into a non-flat buffer would result in an error. After this commit, `buffer_bind_scope` may be used for non-flat buffers, but use of `arg_buffer->elem_offset` within the body of the bind statement is still an error. The `BufferNode::elem_offset` field represents the offset between the pointer of the backing allocation and the first element of the buffer. This offset is only well-defined for flat memory spaces. * update test to tensorize cache_read `y` (works) and cache_write `z` (fails) * add `split` to allow for tensorization of cache_write of `z` * fix typo and cleanup comment * add back original 1d test_cache_read_write * update comments * format error Co-authored-by: Adam Straw <astraw@octoml.ai>
In the
CodeGenLLVM::CreateIntrinsic
handler forbuiltin::address_of()
, pass N-d indices toCodeGenLLVM::CreateBufferPtr
. The base class implementation still asserts that there is a flat memory space, while theCodeGenHexagon::CreateBufferPtr
override allows 2-d memory.Enable tensorization in
test_cache_read_write.py
, usingtir.address_of
to pass the lowered value.Co-authored-by: Adam Straw astraw@octoml.ai