[Bugfix][CI/Build] Fix CUDA 11.8 Build #9386

LucasWilkinson · 2024-10-15T18:44:32Z

Dont build 9.0a for scaled_mm_c2x since it's outside of cuda 12.0 guard and won't help perf that much that anyways.

The issue here was that for CUDA 11.8 if we were building for 9.0 we wouldn't build scaled_mm_c3x so we would instead try to build scaled_mm_c2x for all versions, i.e. "7.5;8.0;8.6;8.9;9.0;9.0a", this is incorrect though since 9.0a isn't supported by 11.8. We can just drop 9.0a for scaled_mm_c2x since scaled_mm_c2x won't take advantage of the 9.0a features anyways.

(and fix c3x error message false reporting that there were no compatible arches when on 11.8)

…t help perf that much that anyways

github-actions · 2024-10-15T18:44:48Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

simon-mo · 2024-10-15T20:31:21Z

Thanks @LucasWilkinson, is this ready for review?

LucasWilkinson · 2024-10-15T20:33:00Z

Thanks @LucasWilkinson, is this ready for review?

Basically just waiting for my docker build tests to finish to confirm the fix, they are slow haha

LucasWilkinson · 2024-10-15T20:58:13Z

Confirmed, this builds (i.e. this PR)

FROM pytorch/pytorch:2.4.0-cuda11.8-cudnn9-devel AS build

ARG torch_cuda_arch_list='7.0 7.5 8.0 8.6 8.9 9.0'
ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}

RUN apt update && apt install gcc g++ git -y && apt clean && rm -rf /var/lib/apt/lists/*

ENV PATH=/workspace-lib:/workspace-lib/bin:$PATH
ENV PYTHONUSERBASE=/workspace-lib

RUN pip install git+https://github.com/neuralmagic/vllm.git@5a7b00e7a6377ca7971de3ca762583a9153f4a55 --no-cache-dir --user -v

and this fails (i.e. main):

FROM pytorch/pytorch:2.4.0-cuda11.8-cudnn9-devel AS build

ARG torch_cuda_arch_list='7.0 7.5 8.0 8.6 8.9 9.0'
ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}

RUN apt update && apt install gcc g++ git -y && apt clean && rm -rf /var/lib/apt/lists/*

ENV PATH=/workspace-lib:/workspace-lib/bin:$PATH
ENV PYTHONUSERBASE=/workspace-lib

RUN pip install git+https://github.com/vllm-project/vllm.git@e9d517f27673ec8736c026f2311d3c250d5f9061 --no-cache-dir --user -v

tlrmchlsmth

LGTM, thanks for the fix!

mgoin

Thanks for the quick fix!

Dont build 9.0a for c2x since its outside of cuda 12.0 guard and won'…

2b8e268

…t help perf that much that anyways

fix error message

5a7b00e

LucasWilkinson force-pushed the lwilkinson/fix-cuda-118-build branch from f98ce3d to 5a7b00e Compare October 15, 2024 19:10

LucasWilkinson marked this pull request as ready for review October 15, 2024 20:57

LucasWilkinson requested review from tlrmchlsmth and WoosukKwon as code owners October 15, 2024 20:57

tlrmchlsmth approved these changes Oct 15, 2024

View reviewed changes

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 15, 2024

tlrmchlsmth enabled auto-merge (squash) October 15, 2024 21:05

mgoin approved these changes Oct 15, 2024

View reviewed changes

tlrmchlsmth merged commit 717a5f8 into vllm-project:main Oct 16, 2024
89 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix][CI/Build] Fix CUDA 11.8 Build #9386

[Bugfix][CI/Build] Fix CUDA 11.8 Build #9386

LucasWilkinson commented Oct 15, 2024 •

edited

Loading

github-actions bot commented Oct 15, 2024

simon-mo commented Oct 15, 2024

LucasWilkinson commented Oct 15, 2024

LucasWilkinson commented Oct 15, 2024

tlrmchlsmth left a comment

mgoin left a comment

[Bugfix][CI/Build] Fix CUDA 11.8 Build #9386

[Bugfix][CI/Build] Fix CUDA 11.8 Build #9386

Conversation

LucasWilkinson commented Oct 15, 2024 • edited Loading

github-actions bot commented Oct 15, 2024

simon-mo commented Oct 15, 2024

LucasWilkinson commented Oct 15, 2024

LucasWilkinson commented Oct 15, 2024

tlrmchlsmth left a comment

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

LucasWilkinson commented Oct 15, 2024 •

edited

Loading