Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059

Merged

Conversation

vinx13
Copy link
Member

@vinx13 vinx13 commented Jul 11, 2022

@vinx13 vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch 5 times, most recently from 2c85011 to 14bd9ee Compare July 11, 2022 21:39
…tion on CUDA

Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
@vinx13 vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from 14bd9ee to 8dd2de5 Compare July 11, 2022 22:19
Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't started looking at multi_level_tiling_tensor_core.cc yet.

How about providing an integration test to demonstrate that auto-tensorization on cuda works now?

include/tvm/meta_schedule/schedule_rule.h Outdated Show resolved Hide resolved
python/tvm/meta_schedule/testing/schedule_rule.py Outdated Show resolved Hide resolved
LOG(WARNING) << "Tensorize failed with error " << e.what();
}
});
} else if (block_name.find("init") && vectorize_init_loop) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever hit this condition after your change in rewrite_reduction_block.cc?

To vectorize init loop, should we switch to using tir::attr::meta_schedule_auto_tensorize_init?

Copy link
Member Author

@vinx13 vinx13 Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In rewrite_reduction_block, tir::attr::meta_schedule_auto_tensorize will be removed from the init block by default, unless the original reduction block is annotated with tir::attr::meta_schedule_auto_tensorize_init. tir::attr::meta_schedule_auto_tensorize_init will be renamed to tir::attr::meta_schedule_auto_tensorize so that in rewrite_tensorize we can check a single annotation. However I hit another issue that block_name.find("init") is not safe. I changed the logic here a bit let me know if that makes sense to you

src/meta_schedule/schedule_rule/multi_level_tiling.h Outdated Show resolved Hide resolved
src/meta_schedule/schedule_rule/multi_level_tiling.h Outdated Show resolved Hide resolved
@vinx13 vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from 8bd7a05 to 806c890 Compare July 12, 2022 19:19
@vinx13 vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from 806c890 to 826a3fe Compare July 12, 2022 19:20
@vinx13 vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch 2 times, most recently from a3472de to b2faca6 Compare July 12, 2022 22:52
@vinx13 vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from b2faca6 to d27dd7b Compare July 12, 2022 23:53
python/tvm/meta_schedule/testing/schedule_rule.py Outdated Show resolved Hide resolved
@@ -110,6 +112,32 @@ def multi_level_tiling(target: Target) -> ScheduleRule:
raise NotImplementedError(f"{target.kind.name} is not supported")


def multi_level_tiling_tensor_core(
target: Target, scope="shared", in_dtype="float16", out_dtype="float32", trans_b=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs doc on what scope is. Or just rename it to reuse_scope or something.

Do read and write always use the same scope?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's write scope here but I think we also need a read scope param to support different read scopes

python/tvm/tir/tensor_intrin/cuda.py Outdated Show resolved Hide resolved
};

f_tensorize_load(0, "wmma.matrix_a", intrin_group.load_a_intrin);
f_tensorize_load(1, "wmma.matrix_b", intrin_group.load_b_intrin);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we infer the scope from the provided intrinsic? Otherwise I think we need to associate scope information to intrinsics somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be left for future work as long as there is a clear solution. I imagine, we can traverse and examine the intrinsic prim func to extract scope information, at worst.

@vinx13 vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from cde325a to 65bbba6 Compare July 13, 2022 18:04
src/tir/schedule/analysis.h Outdated Show resolved Hide resolved
@vinx13 vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch 2 times, most recently from f4b585e to 5ad0386 Compare July 13, 2022 21:09
@vinx13 vinx13 force-pushed the feat/auto-tensorization-tensor-core-upstream branch from 5ad0386 to c4269e7 Compare July 13, 2022 22:28
@vinx13 vinx13 merged commit e084791 into apache:main Jul 14, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…tion on CUDA (apache#12059)

* [MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA

Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>

* address comments

* update intrin registrations

* fix tests

* address comments

* add warning when storage align doesn't work

* remove print

Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
mikeseven pushed a commit to mikeseven/tvm that referenced this pull request Sep 27, 2023
…tion on CUDA (apache#12059)

* [MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA

Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>

* address comments

* update intrin registrations

* fix tests

* address comments

* add warning when storage align doesn't work

* remove print

Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants