-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MetaSchedule] Upstream the leftover changes #10689
[MetaSchedule] Upstream the leftover changes #10689
Conversation
a1191cd
to
053a2e7
Compare
053a2e7
to
edce105
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a small nit.
@@ -93,6 +91,7 @@ def main(): | |||
cache_line_bytes=64, | |||
max_shared_memory_per_block=int(ARGS.target.attrs["max_shared_memory_per_block"]), | |||
max_threads_per_block=int(ARGS.target.attrs["max_threads_per_block"]), | |||
max_local_memory_per_block=12345678, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Have a comment saying this is useless but just a placeholder.
Thanks @junrushao1994 |
while (true) { | ||
std::vector<int64_t> result = SamplePerfectTile(rand_state, extent, n_splits); | ||
if (result.back() <= max_innermost_factor) { | ||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found an interesting issue with this change. After this PR, my VNNI tuning test hangs at https://gist.github.com/masahi/3579be9b2b9506106270eb2217746e74#file-vnni_dense_tir-py-L164, i.e. the second call in
def schedule_rule_batch_matmul_vnni(sch: tir.Schedule, bmm_block):
sch_copy = sch.copy()
schedule_batch_matmul(bmm_block, None, True, sch, layout_trans_compute_root=False)
schedule_batch_matmul(bmm_block, None, True, sch_copy, layout_trans_compute_root=True) # <- Hangs at sample_perfect_tile
return [sch, sch_copy]
If I remove the second schedule_batch_matmul
, everything is ok. Apparently, during the second call to sample_perfect_tile
, SamplePerfectTile
above keeps returning 1024 when max_inner_most_factor
is 128 and extent
is 1024. Does this bring any bell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ooops! so this PR basically loops until the innermost factor satisfies some certain constraint, with the assumption that there is always a case where the factor could be 1 where 1 <= max_innermost_factor
always holds.
To make sure I understand this correctly, in your particular case, extent = 1024
, max_innermost_factor = 128
, and what is n_splits
? And it causes forever hang on this method? If so, probably @zxybazh or I could reproduce this bug and get it fixed :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, extent = 1024
, max_innermost_factor = 128
and n_splits=2
.
It hangs on the second call to schedule_batch_matmul
, where SamplePerfectTile
keeps returning 1024. The following alternative definition works fine:
def schedule_rule_batch_matmul_vnni(sch: tir.Schedule, bmm_block):
schedule_batch_matmul(bmm_block, None, True, sch, layout_trans_compute_root=False)
return [sch]
I think the issue has to be related to the use of rand_state
, since this is the only stateful input to SamplePerfectTile(rand_state, extent, n_splits)
. Note that I'm applying the same schedule on two copies of sch
. Maybe when I make a copy of schedule by sch_copy = sch.copy()
, the RNG engine is not copied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱 seems not!!!!!
We didn't set the new random state by:
n->rand_state_ = self->ForkSeed(); // (also we need to change the Copy method to a non-const one)
and thus likely the state is always some default value......I'm very surprised that we didn't find out in previous performance alignment.
Would you mind sending a quick PR fixing this dangerous behavior? Thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that in the commit f9c655a, but it didn't fix the hang problem. I'll look into more this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #10806
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Masa, thanks for the quick fix, just want to confirm with you if the hang problem is fixed after #10806 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it was fixed in #10806
* [MetaSchedule] Upstream the leftover changes * Update driver_api.cc
This PR upstreams the leftovers in MetaSchedule upstream process for performance and stability.