[SLM] Fuse Add and RMSNorm #1627

jinhongyii · 2024-01-18T19:59:04Z

This PR adds a fusion pass that applies to "binary add" and "RMSNorm".
This is a temporary workaround that allows us to fuse "add" into RMSNorm
once it is not fused into GEMM epilogue.

jinhongyii · 2024-01-18T19:59:11Z

cc: @MasterJH5574

junrushao · 2024-01-18T20:40:37Z

python/mlc_chat/compiler_pass/fuse_add_norm.py

+
+
+def get_add_rmsnorm_tir(hidden_size: int, is_decode=True):
+    @T.prim_func(private=True)


I suppose the fused add-rmsnorm operator could be expressed by TE and scheduled by Dlight easily

No it's not that easy. The tricky point is that add_rmsnorm function has 2 outputs: both results of add and rms_norm, which makes compute_inline/compute_at/reverse_compute_at all fail in such case. I have to write scheduled TIR to work around.

Got it. Thanks for the elaboration!

What's the implication on performance? If the manual schedule can't generalize to all cases we can try supporting such pattern in cublas fusion

I heard from @MasterJH5574 that cublas fusion cannot successfully fuse matmul and divide_add now, so I create this small pass to unblock our effort on mlc serve perf profiling. In the long term, surely this can be replaced by better cublas fusion, but it doesn't hurt to work as a fallback or as a target to compare.

junrushao · 2024-01-21T14:34:39Z

Whenever we use TIR, I know they are auto generated ones, but lets work to make sure they are human readable so that they establish positive examples demonstrating “TIR is actually great”

jinhongyii · 2024-01-22T03:42:19Z

Thanks @junrushao for pointing out. I accidentally used black to format this TIR and I will regenerate this TIR.

python/mlc_chat/compiler_pass/fuse_add_norm.py

junrushao · 2024-01-25T07:38:24Z

I think the PR is ready to merge in terms of code quality and correctness. I believe 1) @vinx13 has some further comments on its performance implication and generalizability, and 2) I have some concern about the prefill kernel which assumes batch_size = 1. Anyways, it's a good start.

junrushao reviewed Jan 18, 2024

View reviewed changes

tqchen assigned junrushao Jan 19, 2024

junrushao closed this Jan 21, 2024

junrushao reopened this Jan 21, 2024

vinx13 reviewed Jan 25, 2024

View reviewed changes

python/mlc_chat/compiler_pass/fuse_add_norm.py Outdated Show resolved Hide resolved

junrushao force-pushed the add_norm_fuse branch 6 times, most recently from f365ae4 to c09b4a2 Compare January 25, 2024 07:35

junrushao changed the title ~~[SLIM] Fuse Add and RMSNorm~~ [SLM] Fuse Add and RMSNorm Jan 25, 2024

squash

e0e078c

jinhongyii force-pushed the add_norm_fuse branch from 24e1527 to e0e078c Compare January 25, 2024 20:05

vinx13 approved these changes Jan 25, 2024

View reviewed changes

jit misc

7274713

junrushao force-pushed the add_norm_fuse branch from 56692f7 to 7274713 Compare January 25, 2024 20:52

MasterJH5574 approved these changes Jan 25, 2024

View reviewed changes

junrushao merged commit b01b06c into mlc-ai:main Jan 25, 2024
1 check passed

CharlieFRuan mentioned this pull request Feb 3, 2024

Support paged kv cache for single batch chat module #1651

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SLM] Fuse Add and RMSNorm #1627

[SLM] Fuse Add and RMSNorm #1627

jinhongyii commented Jan 18, 2024 •

edited by junrushao

Loading

jinhongyii commented Jan 18, 2024

junrushao Jan 18, 2024

jinhongyii Jan 18, 2024

junrushao Jan 18, 2024

vinx13 Jan 20, 2024

jinhongyii Jan 22, 2024

junrushao commented Jan 21, 2024

jinhongyii commented Jan 22, 2024

junrushao commented Jan 25, 2024



		def get_add_rmsnorm_tir(hidden_size: int, is_decode=True):
		@T.prim_func(private=True)

[SLM] Fuse Add and RMSNorm #1627

[SLM] Fuse Add and RMSNorm #1627

Conversation

jinhongyii commented Jan 18, 2024 • edited by junrushao Loading

jinhongyii commented Jan 18, 2024

junrushao Jan 18, 2024

Choose a reason for hiding this comment

jinhongyii Jan 18, 2024

Choose a reason for hiding this comment

junrushao Jan 18, 2024

Choose a reason for hiding this comment

vinx13 Jan 20, 2024

Choose a reason for hiding this comment

jinhongyii Jan 22, 2024

Choose a reason for hiding this comment

junrushao commented Jan 21, 2024

jinhongyii commented Jan 22, 2024

junrushao commented Jan 25, 2024

jinhongyii commented Jan 18, 2024 •

edited by junrushao

Loading