[TIR][Schedule] Scoped CacheRead/Write producing compact region #15236

MasterJH5574 · 2023-07-05T07:07:43Z

This PR enhances CacheRead/Write so that when a cache operation is performed under an inner block, the generated cache buffer will have the shape as compact as possible, by region consumption analysis.

The motivation of this change comes from the needs of dynamic shape TIR scheduling, in which case we may isolate a "static shape" internal block using blockize, and do further scheduling inside the internal block. For such cases, the current CacheRead/Write inside the static-shape block will still produce dynamic-shape cache buffers, which is not ideal for analysis and subsequent scheduling.

One thing that worths noting is that, to ensure the IR correctness after inserting the cache block, we will only compact the cache buffer when all the consumer blocks of the read buffer (for CacheRead) or the write buffer (for CacheWrite) are children blocks of the cache block insertion location. Otherwise we will insist allocating the full-size cache buffer.

Co-authored-by: Bohan Hou spectrometerh@gmail.com

tvm-bot · 2023-07-05T07:07:46Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @Hzfengsy, @junrushao, @quic-sanirudh, @shingjan _{See #10317 for details}

_{Generated by tvm-bot}

MasterJH5574 · 2023-07-05T07:08:23Z

src/tir/schedule/primitive.h

@@ -105,7 +105,7 @@ TVM_DLL std::vector<int64_t> SamplePerfectTile(
 *  The sampled tile size will be partitioned into two parts. The second part has a guarantee
 *  that their extent's product have a factor of `innerpart_factor`. The first part is loops at
 *  [0, partition_pos); the second part is loops at [partition_pos, n) and we will have
- *  `innerpart_factor` | \prod_{l=partition_pos}^{n-1} l.extent
+ *  `innerpart_factor` | prod_{l=partition_pos}^{n-1} l.extent


Updating this since Clang will warn

1 warning generated. In file included from /home/ruihangl/tvm/src/tir/schedule/primitive/for_kind.cc:19: In file included from /home/ruihangl/tvm/src/tir/schedule/primitive/../utils.h:49: /home/ruihangl/tvm/src/tir/schedule/primitive/.././primitive.h:108:26: warning: unknown command tag name [-Wdocumentation-unknown-command] * `innerpart_factor` | \prod_{l=partition_pos}^{n-1} l.extent ^~~~~~

src/tir/schedule/primitive/cache_read_write.cc

junrushao

Overall LGTM! Likely it won't break anything as scopes don't happen a lot in existing TIR/MS pipeline

This PR enhances CacheRead/Write so that when a cache operation is performed under an inner block, the generated cache buffer will have the shape as compact as possible, by region consumption analysis. The motivation of this change comes from the needs of dynamic shape TIR scheduling, in which case we may isolate a "static shape" internal block using blockize, and do further scheduling inside the internal block. For such cases, the current CacheRead/Write inside the static-shape block will still produce dynamic-shape cache buffers, which is not ideal for analysis and subsequent scheduling. One thing that worths noting is that, to ensure the IR correctness after inserting the cache block, we will only compact the cache buffer when all the consumer blocks of the read buffer (for CacheRead) or the write buffer (for CacheWrite) are children blocks of the cache block insertion location. Otherwise we will insist allocating the full-size cache buffer. Co-authored-by: Bohan Hou <spectrometerh@gmail.com>

…he#15236) This PR enhances CacheRead/Write so that when a cache operation is performed under an inner block, the generated cache buffer will have the shape as compact as possible, by region consumption analysis. The motivation of this change comes from the needs of dynamic shape TIR scheduling, in which case we may isolate a "static shape" internal block using blockize, and do further scheduling inside the internal block. For such cases, the current CacheRead/Write inside the static-shape block will still produce dynamic-shape cache buffers, which is not ideal for analysis and subsequent scheduling. One thing that worths noting is that, to ensure the IR correctness after inserting the cache block, we will only compact the cache buffer when all the consumer blocks of the read buffer (for CacheRead) or the write buffer (for CacheWrite) are children blocks of the cache block insertion location. Otherwise we will insist allocating the full-size cache buffer. Co-authored-by: Bohan Hou <spectrometerh@gmail.com>

MasterJH5574 commented Jul 5, 2023

View reviewed changes

junrushao reviewed Jul 5, 2023

View reviewed changes

src/tir/schedule/primitive/cache_read_write.cc Outdated Show resolved Hide resolved

junrushao approved these changes Jul 5, 2023

View reviewed changes

MasterJH5574 force-pushed the tvm-dev/2023-07-05-cache-read-write branch 2 times, most recently from 3ce17c3 to 6c35f3f Compare July 5, 2023 21:11

MasterJH5574 force-pushed the tvm-dev/2023-07-05-cache-read-write branch from 6c35f3f to 1d432c5 Compare July 6, 2023 00:31

tqchen merged commit 81463d7 into apache:main Jul 6, 2023
5 checks passed

ysh329 mentioned this pull request Oct 18, 2023

[Release] v0.14.0 Release Candidate Notes #15948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR][Schedule] Scoped CacheRead/Write producing compact region #15236

[TIR][Schedule] Scoped CacheRead/Write producing compact region #15236

MasterJH5574 commented Jul 5, 2023

tvm-bot commented Jul 5, 2023

MasterJH5574 Jul 5, 2023

junrushao left a comment

[TIR][Schedule] Scoped CacheRead/Write producing compact region #15236

[TIR][Schedule] Scoped CacheRead/Write producing compact region #15236

Conversation

MasterJH5574 commented Jul 5, 2023

tvm-bot commented Jul 5, 2023

MasterJH5574 Jul 5, 2023

Choose a reason for hiding this comment

junrushao left a comment

Choose a reason for hiding this comment