-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059
[MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059
Conversation
2c85011
to
14bd9ee
Compare
…tion on CUDA Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com>
14bd9ee
to
8dd2de5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't started looking at multi_level_tiling_tensor_core.cc
yet.
How about providing an integration test to demonstrate that auto-tensorization on cuda works now?
LOG(WARNING) << "Tensorize failed with error " << e.what(); | ||
} | ||
}); | ||
} else if (block_name.find("init") && vectorize_init_loop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever hit this condition after your change in rewrite_reduction_block.cc?
To vectorize init loop, should we switch to using tir::attr::meta_schedule_auto_tensorize_init
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In rewrite_reduction_block
, tir::attr::meta_schedule_auto_tensorize
will be removed from the init block by default, unless the original reduction block is annotated with tir::attr::meta_schedule_auto_tensorize_init
. tir::attr::meta_schedule_auto_tensorize_init
will be renamed to tir::attr::meta_schedule_auto_tensorize
so that in rewrite_tensorize
we can check a single annotation. However I hit another issue that block_name.find("init")
is not safe. I changed the logic here a bit let me know if that makes sense to you
8bd7a05
to
806c890
Compare
806c890
to
826a3fe
Compare
a3472de
to
b2faca6
Compare
b2faca6
to
d27dd7b
Compare
src/meta_schedule/schedule_rule/multi_level_tiling_tensor_core.cc
Outdated
Show resolved
Hide resolved
@@ -110,6 +112,32 @@ def multi_level_tiling(target: Target) -> ScheduleRule: | |||
raise NotImplementedError(f"{target.kind.name} is not supported") | |||
|
|||
|
|||
def multi_level_tiling_tensor_core( | |||
target: Target, scope="shared", in_dtype="float16", out_dtype="float32", trans_b=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs doc on what scope
is. Or just rename it to reuse_scope
or something.
Do read and write always use the same scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's write scope here but I think we also need a read scope param to support different read scopes
}; | ||
|
||
f_tensorize_load(0, "wmma.matrix_a", intrin_group.load_a_intrin); | ||
f_tensorize_load(1, "wmma.matrix_b", intrin_group.load_b_intrin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we infer the scope from the provided intrinsic? Otherwise I think we need to associate scope information to intrinsics somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be left for future work as long as there is a clear solution. I imagine, we can traverse and examine the intrinsic prim func to extract scope information, at worst.
src/meta_schedule/schedule_rule/multi_level_tiling_tensor_core.cc
Outdated
Show resolved
Hide resolved
src/meta_schedule/schedule_rule/multi_level_tiling_tensor_core.cc
Outdated
Show resolved
Hide resolved
cde325a
to
65bbba6
Compare
f4b585e
to
5ad0386
Compare
5ad0386
to
c4269e7
Compare
…tion on CUDA (apache#12059) * [MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> * address comments * update intrin registrations * fix tests * address comments * add warning when storage align doesn't work * remove print Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com>
…tion on CUDA (apache#12059) * [MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> * address comments * update intrin registrations * fix tests * address comments * add warning when storage align doesn't work * remove print Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
cc @junrushao1994 @masahi @spectrometerHBH @jinhongyii