-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Setup] Enable TORCH_CUDA_ARCH_LIST
for selecting target GPUs
#1074
Conversation
@zhuohan123 This PR is also ready for review. |
When I build this branch on an A10, it just stops after the initial request (shortened):
If I use branch All I do to test this in this and
Is there a way to increase logging and find out what is happening? In the case where it doesn't work, GPU memory stays at around 18GB, in the other case it goes up to 23GB, so I don't think much is happening. There is no error coming back from the server to the client sending the request, it just stays open. UPDATE: Launching the default api server via
|
@v1nc3nt27 What is the |
@WoosukKwon sorry, it was this branch #1032 |
# based on the NVCC CUDA version. | ||
compute_capabilities = set(SUPPORTED_ARCHS) | ||
if nvcc_cuda_version < Version("11.1"): | ||
compute_capabilities.remove("8.6") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discard
does not raise an error if the element is not present in the set. Similar for the remove
below.
compute_capabilities.remove("8.6") | |
compute_capabilities.discard("8.6") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we might need to remove *+PTX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L80 and below is executed when compute_capabilities
is empty. In this case, we add all SUPPORTED_ARCHS
and remove some of them based on the user's CUDA version. So, I think using remove
is more appropriate than discard
, and we don't need to remove *PTX
because SUPPORTED_ARCHS
does not have it.
# CUDA 11.8 is required to generate the code targeting compute capability 8.9. | ||
# However, GPUs with compute capability 8.9 can also run the code generated by | ||
# the previous versions of CUDA 11 and targeting compute capability 8.0. | ||
# Therefore, if CUDA 11.8 is not available, we target compute capability 8.0 | ||
# instead of 8.9. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, should we print a warning for this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhuohan123 Addressed your comments. PTAL.
# based on the NVCC CUDA version. | ||
compute_capabilities = set(SUPPORTED_ARCHS) | ||
if nvcc_cuda_version < Version("11.1"): | ||
compute_capabilities.remove("8.6") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L80 and below is executed when compute_capabilities
is empty. In this case, we add all SUPPORTED_ARCHS
and remove some of them based on the user's CUDA version. So, I think using remove
is more appropriate than discard
, and we don't need to remove *PTX
because SUPPORTED_ARCHS
does not have it.
# CUDA 11.8 is required to generate the code targeting compute capability 8.9. | ||
# However, GPUs with compute capability 8.9 can also run the code generated by | ||
# the previous versions of CUDA 11 and targeting compute capability 8.0. | ||
# Therefore, if CUDA 11.8 is not available, we target compute capability 8.0 | ||
# instead of 8.9. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the contribution!
Fixes #1070
The
TORCH_CUDA_ARCH_LIST
env variable is a standard place to specify the target GPUs one wants to build a PyTorch project for. This PR enables using the variable in oursetup.py
. This will be especially useful for those who build vLLM images for specific GPUs.