-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Refactor Cutlass c3x #10049
[Kernel] Refactor Cutlass c3x #10049
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
@tlrmchlsmth @ProExpertProg @LucasWilkinson PTAL. Thanks! |
LGTM (just FYI may conflict with #9855) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7dbe3b3
to
4f44aac
Compare
This pull request has merge conflicts that must be resolved before it can be |
4f44aac
to
16879db
Compare
@tlrmchlsmth re-requesting review as the PR is now rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! (Likely wait for #10995)
@@ -1,384 +1,22 @@ | |||
// clang-format will break include orders | |||
// switch off clang format as the include statement indentation is inconsistent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually do you need to turn off clang-format here? The reason for turning it off is that CUTLASS headers need to be included in a specific order but it looks like that's not the case following the refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using clang-format in this block, turns,
#include <cudaTypedefs.h>
#if defined CUDA_VERSION && CUDA_VERSION >= 12000
#include "scaled_mm_c3x_sm90_fp8_dispatch.cuh"
#include "scaled_mm_c3x_sm90_int8_dispatch.cuh"
#include "cutlass_extensions/epilogue/scaled_mm_epilogues_c3x.hpp"
using namespace vllm;
into
#include <cudaTypedefs.h>
#if defined CUDA_VERSION && CUDA_VERSION >= 12000
#include "scaled_mm_c3x_sm90_fp8_dispatch.cuh"
#include "scaled_mm_c3x_sm90_int8_dispatch.cuh"
#include "cutlass_extensions/epilogue/scaled_mm_epilogues_c3x.hpp"
using namespace vllm;
the #if
seems trigger inconsistent indenting. I switched off clang-format in this block to avoid that.
I moved the original clang-format toggle to scaled_mm_c3x.cuh
.
// clang-format will break include orders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had second thoughts about this and removed the clang-format
block. It is probably better to stick to the convention.
55d9927
to
e5f324b
Compare
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
e5f324b
to
e033b41
Compare
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Refactor Cutlass c3x kernels for better maintainability and easier experimentation.