[CUTLASS] Refactor cutlass kernel generation and selection #9800

masahi · 2021-12-23T06:00:13Z

Currently, when we enumerate cutlass kernels for profiling, for each parameter config we generate all variants of the kernel with different epilogues. See for example

tvm/python/tvm/contrib/cutlass/gen_gemm.py

Lines 67 to 106 in 1afcf36

    
           op = GemmOperation( 
        
               tile_description.minimum_compute_capability, 
        
               tile_description, 
        
               A, 
        
               B, 
        
               C, 
        
               element_epilogue, 
        
               EpilogueFunctor.LinearCombination, 
        
               swizzling_functor, 
        
           ) 
        
           op_bias = GemmOperation( 
        
               tile_description.minimum_compute_capability, 
        
               tile_description, 
        
               A, 
        
               B, 
        
               C, 
        
               element_epilogue, 
        
               EpilogueFunctor.LinearCombinationBias, 
        
               swizzling_functor, 
        
           ) 
        
           op_bias_relu = GemmOperation( 
        
               tile_description.minimum_compute_capability, 
        
               tile_description, 
        
               A, 
        
               B, 
        
               C, 
        
               element_epilogue, 
        
               EpilogueFunctor.LinearCombinationRelu, 
        
               swizzling_functor, 
        
           ) 
        
           op_bias_gelu = GemmOperation( 
        
               tile_description.minimum_compute_capability, 
        
               tile_description, 
        
               A, 
        
               B, 
        
               C, 
        
               element_epilogue, 
        
               EpilogueFunctor.LinearCombinationGelu, 
        
               swizzling_functor, 
        
           )

After profiling, we select which variant of epilogue to use based on the pattern name:

tvm/python/tvm/contrib/cutlass/build.py

Lines 219 to 230 in 1afcf36

    
           if op_type == "cutlass.conv2d": 
        
               cutlass_op_def = out["opdef"] 
        
           elif op_type == "cutlass.conv2d_bias": 
        
               cutlass_op_def = out["opdef_bias"] 
        
           elif op_type == "cutlass.conv2d_bias_relu": 
        
               cutlass_op_def = out["opdef_bias_relu"] 
        
           elif op_type == "cutlass.conv2d_bias_sigmoid": 
        
               cutlass_op_def = out["opdef_bias_sigmoid"] 
        
           elif op_type == "cutlass.conv2d_bias_silu": 
        
               cutlass_op_def = out["opdef_bias_silu"] 
        
           elif op_type == "cutlass.conv2d_bias_hardswish": 
        
               cutlass_op_def = out["opdef_bias_hardswish"]

This approach simply doesn't work when we introduce support for residual connection fusion, because there are so many different kinds of epilogues.

The idea of this change is to split kernel generation into two steps:
(1) First, we generate all kernels without any epilogue. This is used for profiling
(2) After profiling decides the best parameter configuration, use that information to generate a single kernel with the required epilogue.

Overall I believe this refactoring of kernel generation and selection have made things much cleaner, and makes us well-prepared for residual block fusion.

cc @comaniac @Laurawly

masahi · 2021-12-30T22:40:35Z

@comaniac Can you take a look (no functional change, should be easy)? The cutlass side change to enable residual block fusion was merged yesterday in NVIDIA/cutlass#391, so I'm ready to send residual fusion support (with good speed up!)

Laurawly

LGTM

comaniac · 2021-12-30T23:20:22Z

Sorry I was on vacation. Thanks @masahi @Laurawly

Refactor cutlass kernel generation and selection

fd67595

masahi requested review from anijain2305, comaniac, icemelon, jroesch, junrushao, jwfromm, MarisaKirisame, mbrookhart, slyubomirsky, tqchen, vinx13, wweic, yzhliu, zhiics and ZihengJiang as code owners December 23, 2021 06:00

fill in TODO doc

87b36db

masahi force-pushed the cutlass-refactor branch from 096d8d8 to 87b36db Compare December 23, 2021 06:17

masahi added 3 commits December 23, 2021 15:47

fix no_beta_scaling values

d3b681d

Remove SimplifyExpr pass from the pipeline (makes DETR result nan)

ce9d52f

fix bad merge

6bb1c3b

Laurawly approved these changes Dec 30, 2021

View reviewed changes

comaniac merged commit 6d35f0b into apache:main Dec 30, 2021

ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022

[CUTLASS] Refactor cutlass kernel generation and selection (apache#9800)

c157ae0

ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022

[CUTLASS] Refactor cutlass kernel generation and selection (apache#9800)

ef6475b

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUTLASS] Refactor cutlass kernel generation and selection #9800

[CUTLASS] Refactor cutlass kernel generation and selection #9800

masahi commented Dec 23, 2021 •

edited

Loading

masahi commented Dec 30, 2021

Laurawly left a comment

comaniac commented Dec 30, 2021

	op = GemmOperation(
	tile_description.minimum_compute_capability,
	tile_description,
	A,
	B,
	C,
	element_epilogue,
	EpilogueFunctor.LinearCombination,
	swizzling_functor,
	)
	op_bias = GemmOperation(
	tile_description.minimum_compute_capability,
	tile_description,
	A,
	B,
	C,
	element_epilogue,
	EpilogueFunctor.LinearCombinationBias,
	swizzling_functor,
	)
	op_bias_relu = GemmOperation(
	tile_description.minimum_compute_capability,
	tile_description,
	A,
	B,
	C,
	element_epilogue,
	EpilogueFunctor.LinearCombinationRelu,
	swizzling_functor,
	)
	op_bias_gelu = GemmOperation(
	tile_description.minimum_compute_capability,
	tile_description,
	A,
	B,
	C,
	element_epilogue,
	EpilogueFunctor.LinearCombinationGelu,
	swizzling_functor,
	)

	if op_type == "cutlass.conv2d":
	cutlass_op_def = out["opdef"]
	elif op_type == "cutlass.conv2d_bias":
	cutlass_op_def = out["opdef_bias"]
	elif op_type == "cutlass.conv2d_bias_relu":
	cutlass_op_def = out["opdef_bias_relu"]
	elif op_type == "cutlass.conv2d_bias_sigmoid":
	cutlass_op_def = out["opdef_bias_sigmoid"]
	elif op_type == "cutlass.conv2d_bias_silu":
	cutlass_op_def = out["opdef_bias_silu"]
	elif op_type == "cutlass.conv2d_bias_hardswish":
	cutlass_op_def = out["opdef_bias_hardswish"]

[CUTLASS] Refactor cutlass kernel generation and selection #9800

[CUTLASS] Refactor cutlass kernel generation and selection #9800

Conversation

masahi commented Dec 23, 2021 • edited Loading

masahi commented Dec 30, 2021

Laurawly left a comment

Choose a reason for hiding this comment

comaniac commented Dec 30, 2021

masahi commented Dec 23, 2021 •

edited

Loading