[CUTLASS] Refactor cutlass kernel generation and selection #9800
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, when we enumerate cutlass kernels for profiling, for each parameter config we generate all variants of the kernel with different epilogues. See for example
tvm/python/tvm/contrib/cutlass/gen_gemm.py
Lines 67 to 106 in 1afcf36
After profiling, we select which variant of epilogue to use based on the pattern name:
tvm/python/tvm/contrib/cutlass/build.py
Lines 219 to 230 in 1afcf36
This approach simply doesn't work when we introduce support for residual connection fusion, because there are so many different kinds of epilogues.
The idea of this change is to split kernel generation into two steps:
(1) First, we generate all kernels without any epilogue. This is used for profiling
(2) After profiling decides the best parameter configuration, use that information to generate a single kernel with the required epilogue.
Overall I believe this refactoring of kernel generation and selection have made things much cleaner, and makes us well-prepared for residual block fusion.
cc @comaniac @Laurawly