-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Allow vllm to still work if triton is not installed. #6786
Merged
Merged
Changes from 13 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
ea4de02
Allow vllm to still work if triton is not installed.
tdoublep cb939d4
Merge branch 'main' into fix-triton-import
tdoublep 5e5f6ec
Reduce diff
tdoublep e80fc34
Fix in custom_cache_manager.
tdoublep e136740
Rework without the mock
tdoublep f778898
Separate FP8 FusedMoE into separate module
tdoublep 8777cf2
Fix conflict
tdoublep 0039ba3
Fix type checking
tdoublep b35adce
Remove additional redundant Triton deps.
tdoublep 3f94f68
Move get_num_triton_sampler_splits into triton_utils
tdoublep d1a4c51
fmt
tdoublep e1cff0a
Fix error in sampler test
tdoublep 9816ac7
put fused_moe back in fp8.py
tdoublep b9bb0b4
Improved handling of ruff in __init__.py
tdoublep 936290d
Fix small bug introduced.
tdoublep 1e41679
Resolve conflict
tdoublep c753e00
Fix new (minor) conflict.
tdoublep 0948708
Merge branch 'main' into fix-triton-import
tdoublep File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,22 @@ | ||
from vllm.model_executor.layers.fused_moe.fused_moe import ( | ||
fused_experts, fused_moe, fused_topk, get_config_file_name, grouped_topk) | ||
from vllm.model_executor.layers.fused_moe.layer import (FusedMoE, | ||
FusedMoEMethodBase) | ||
from vllm.triton_utils import HAS_TRITON | ||
|
||
if HAS_TRITON: | ||
from vllm.model_executor.layers.fused_moe.fused_moe import ( | ||
fused_experts, fused_moe, fused_topk, get_config_file_name, | ||
grouped_topk) | ||
|
||
__all__ = [ | ||
"FusedMoE", | ||
"FusedMoEMethodBase", | ||
"fused_moe", | ||
"fused_topk", | ||
"fused_experts", | ||
"get_config_file_name", | ||
"grouped_topk", | ||
"FusedMoE", | ||
"FusedMoEMethodBase", | ||
] | ||
|
||
if not HAS_TRITON: | ||
# need to do it like this other ruff complains | ||
__all__ = __all__[:2] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,11 @@ | ||
from vllm.triton_utils.custom_cache_manager import ( | ||
maybe_set_triton_cache_manager) | ||
from vllm.triton_utils.importing import HAS_TRITON | ||
|
||
__all__ = [ | ||
"maybe_set_triton_cache_manager", | ||
] | ||
if HAS_TRITON: | ||
from vllm.triton_utils.custom_cache_manager import ( | ||
maybe_set_triton_cache_manager) | ||
|
||
__all__ = ["HAS_TRITON", "maybe_set_triton_cache_manager"] | ||
|
||
if not HAS_TRITON: | ||
# need to do this afterwards due to ruff complaining | ||
__all__.pop() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
from importlib.util import find_spec | ||
|
||
from vllm.logger import init_logger | ||
|
||
logger = init_logger(__name__) | ||
|
||
HAS_TRITON = find_spec("triton") is not None | ||
|
||
if not HAS_TRITON: | ||
logger.info("Triton not installed; certain GPU-related functions" | ||
" will be not be available.") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
import math | ||
|
||
# This is a hardcoded limit in Triton (max block size). | ||
MAX_TRITON_N_COLS = 131072 | ||
|
||
|
||
def get_num_triton_sampler_splits(n_cols: int) -> int: | ||
"""Get the number of splits to use for Triton sampling. | ||
|
||
Triton has a limit on the number of columns it can handle, so we need to | ||
split the tensor and call the kernel multiple times if it's too large. | ||
""" | ||
return math.ceil(n_cols / MAX_TRITON_N_COLS) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off topic: @Yard1 I feel this should be a general function for all triton kernels instead of just sampler. Do you think it makes sense to rename it to
get_num_triton_input_chunks
so something similar, and use it here as well?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify: is this something you'd like to have addressed in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's not necessary. We can merge this PR first.