Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure due to CUDA version mismatch #129

Closed
WoosukKwon opened this issue May 26, 2023 · 25 comments
Closed

Build failure due to CUDA version mismatch #129

WoosukKwon opened this issue May 26, 2023 · 25 comments
Assignees
Labels
installation Installation problems

Comments

@WoosukKwon
Copy link
Collaborator

I failed to build the system with the latest NVIDIA PyTorch docker image. The reason is PyTorch installed by pip is built with CUDA 11.7 while the container uses CUDA 12.1.

RuntimeError:
The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.
@WoosukKwon WoosukKwon self-assigned this May 26, 2023
@WoosukKwon WoosukKwon added the installation Installation problems label May 26, 2023
@Joejoequ
Copy link

Joejoequ commented Jun 29, 2023

Same Issue Here. It looks like it did not use CUDA 11.8 in the conda environmnent.
CUDA 11.8 Python 3.8.16 Nvidia A100 80G ubuntu

File "/tmp/pip-build-env-_5k66uxz/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (12.0) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

(vllm) x@x:~/xx$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

@WoosukKwon
Copy link
Collaborator Author

@Joejoequ Thanks for reporting it! I think in your case, the problem can be easily solved by installing CUDA 11.8 version of PyTorch:

pip3 install torch --index-url https://download.pytorch.org/whl/cu118

@Joejoequ
Copy link

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

@Joejoequ
Copy link

Joejoequ commented Jul 6, 2023

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Finally i installed successfully by changing module env and use "module load" on linux server.

@mosheduminer
Copy link

I have the same problem. Pytorch using cuda 11.8 installed, yet I get this error when installing.

The detected CUDA version (12.1) mismatches the version that was used to compile
      PyTorch (11.8). Please make sure to use the same CUDA versions.

@DavidPeleg6
Copy link

im also having the same issue:
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

and i have installed nightly version of pytorch with cuda 12.1 support

@antonpolishko
Copy link

antonpolishko commented Aug 27, 2023

nvcr.io/nvidia/pytorch:22.12-py3 image is the last one with CUDA 11.8 according to compatibility matrix. Then images switched to CUDA 12+

@Ikkyu321
Copy link

Ikkyu321 commented Nov 2, 2023

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Finally i installed successfully by changing module env and use "module load" on linux server.

@Joejoequ I got the same problem. Can you show me how to solve this?

@valentin-fngr
Copy link

I have the same problem. Pytorch using cuda 11.8 installed, yet I get this error when installing.

The detected CUDA version (12.1) mismatches the version that was used to compile
      PyTorch (11.8). Please make sure to use the same CUDA versions.

Option 1 : You have cuda 12.1 therefore you should simply uninstall the current binaries of pytorch that you have and then reinstall it using :

pip3 install torch torchvision torchaudio

Option 2 : If you do not want to use the cuda 12.1 that you have installed, you can use another version of cuda (11.7, 11.8, ...).
First you need to uninstall your current cuda (https://stackoverflow.com/a/56827564) and then select the one you want on the cuda install website

@xunfeng1980
Copy link

same problem:
The detected CUDA version (12.3) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

@Kkkassini
Copy link

Possible to use cuda 11.7? there's other services that require 11.7

@Lumingous
Copy link

I'm having the same problem. I've re-installed my pytorch to support cuda 11.8. Don't know why still shows this error

@jaesuny
Copy link

jaesuny commented Nov 10, 2023

Removing pyproject.toml may be a solution.
In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

@xunfeng1980
Copy link

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

think,it work for me.

@StevenZ-G
Copy link

RuntimeError:
The detected CUDA version (12.3) mismatches the version that was used to compile
PyTorch (11.6). Please make sure to use the same CUDA versions.

@quanhephia
Copy link

RuntimeError: The detected CUDA version (12.3) mismatches the version that was used to compile PyTorch (11.6). Please make sure to use the same CUDA versions.

I resolved this error by downgrading the version of vllm from 0.2.2 to 0.2.1

@Mruduldhawley
Copy link

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Finally i installed successfully by changing module env and use "module load" on linux server.

please elaborate.

@0-hero
Copy link
Contributor

0-hero commented Dec 10, 2023

  1. Changed env to run with CUDA 11.7
  2. Install vllm with pip install vllm Without this step I face another error
  3. Then install from source with pip install -e .

This solved it for me

@DuAooo
Copy link

DuAooo commented Dec 25, 2023

For my problem, I found the the code used /usr/bin/nvcc which will print a different version
The right nvcc is in /usr/local/cuda/bin,
So I delete /usr/bin/nvcc, now my code works fine.

@4daJKong
Copy link

Possible to use cuda 11.7? there's other services that require 11.7

Is it possible to use vllm in cuda 11.7 ?
I tried pip install vllm in an conda environment, but still cannot install successfully.

@DrAlexLiu
Copy link

I solve this by running this:
conda install nvidia/label/cuda-11.8.0::cuda-nvcc

looks like your conda env has no nvcc installed, and it calls your system-based nvcc, which is not 11.8 or the cuda version you installed.

@Daishijun
Copy link

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

Thanks! You are a true hero.

@hmellor
Copy link
Collaborator

hmellor commented Apr 4, 2024

@WoosukKwon is this resolved now?

@hmellor
Copy link
Collaborator

hmellor commented Apr 20, 2024

Closing because the build system has changed dramatically since this was opened

@hmellor hmellor closed this as completed Apr 20, 2024
yukavio pushed a commit to yukavio/vllm that referenced this issue Jul 3, 2024
SUMMARY:
`github-action-benchmark` action, needs a JSON file with metrics for
reporting. It throws an error when the JSON is empty or doesn't have any
data.

Bug: On the `remote push` benchmark job, we produce the necessary files,
but they don't have any JSON data.

Fix: Make the logging script skip file creation if the file is going to
be empty. In the GHA side, add logic to skip processing when a desired
file does not exist.

Additional changes:
 - Rename `GHABenchmarkToolName` -> `BenchmarkMetricType`
- Add a `Observation` BenchmarkMetricType - This could be useful in the
near future when we discover volatile metrics.

TEST PLAN:
Jobs on this PR.

Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
@oJro
Copy link

oJro commented Jul 25, 2024

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

how to remove pyproject.toml?

mht-sharma added a commit to mht-sharma/vllm that referenced this issue Aug 15, 2024
* Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters (vllm-project#114)

* Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters

* Adding HTTP headers

* Add distributed executor backend to benchmark scripts (vllm-project#118)

* Add weight padding for moe (vllm-project#119)

* add weight padding for moe

* enable padding by default

* fix linter

* fix linter

* fix linter

* using envs.py

* fix linter

* [BugFix] Fix navi build after many custom for MI kernels added (vllm-project#116)

* fix navi build

* Created dummy kernels of unsupported on Navi to avoid function not found crashes at runtime

* replacing ifdefs on host code with those on kernels

* refactoring code to avoid unsupported call on Navi

* syntactic change

* import statements fix

* moving env variables to envs.py

* style fixes

* cosmetic changes for isort

* remved extra include

* moving use_skinny to be member

---------

Co-authored-by: lcskrishna <lollachaitanya@gmail.com>
Co-authored-by: maleksan85 <maleksan@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>

* add emtpy_cache() after each padding (vllm-project#120)

* [FIX] Gradlib OOM on Navi and sometimes on MI (vllm-project#124)

* add memory clean up after every shape and parameter to reduce cache invalidation buffers

* small typo

* syntax change

---------

Co-authored-by: maleksan85 <maleksan@amd.com>

* save shape when fp8 solution not found (vllm-project#123)

Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>

* Fix unit test for moe by adding padding (vllm-project#128)

* fix test_moe

* fix linter

* Llama3.1 (vllm-project#129)

* Add support for a rope extension method (vllm-project#6553)

* [BugFix] Fix RoPE error in Llama 3.1 (vllm-project#6693)

---------

Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* chat/completions endpoint (vllm-project#121)

* Initial implementation of chat/completions endpoint and its streaming variant

* Reusing datatypes from the openai entrypoints

* Response role from arg

* Added models endpoint and model validation from the request

* Optimize custom all reduce (vllm-project#130)

* First version

* Revert error.

While there, add missing finalize.

* Use the correct defaults for ROCm.

Increase sampling area to capture crossover.

* Scope end_sync as well.

* Guard only volatile keyword for ifndef USE_ROCM

* Document crossover

* Add BF16 support to custom PA (vllm-project#133)

* tightened atol for custom PA; enable supported head size, block sizes in testing

* update num_blocks and num_iters in benchmark PA to realistic settings

* move to generic b16 type

* bf16 first port

* enabled all bf16 tests, set atol for bf16

* enable custom PA for bf16 as well as block size 32 and head size 64

* fix cast to zero in custom PA reduce

* py linter fixes

* clang format fixes

* div round up clang-format

---------

Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>

* Making check for output match in original types. It saves some memory. (vllm-project#135)

Co-authored-by: maleksan85 <maleksan@amd.com>

* Make CAR ROCm 6.1 compatible. (vllm-project#137)

* remove scoping
* while there fix a typo
* while there remove unused variable

* Car revert (vllm-project#140)

* Per @iotamudelta suggestion until the deadlocks issue is better understood
Revert "Make CAR ROCm 6.1 compatible. (vllm-project#137)"

This reverts commit 4d2dda6.

* Per @iotamudelta suggestion until the deadlocks issue is better understood
Revert "Optimize custom all reduce (vllm-project#130)"

This reverts commit 636ff01.

---------

Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: Matt Wong <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com>
Co-authored-by: lcskrishna <lollachaitanya@gmail.com>
Co-authored-by: maleksan85 <maleksan@amd.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: iotamudelta <dieterich@ogolem.org>
Co-authored-by: sanyalington <shomy.sanyal@amd.com>
Xaenalt pushed a commit to Xaenalt/vllm that referenced this issue Aug 15, 2024
* Re-enable FusedRoPE for Gaudi1

* add fallback impl of rope
mht-sharma added a commit to mht-sharma/vllm that referenced this issue Aug 21, 2024
* Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters (vllm-project#114)

* Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters

* Adding HTTP headers

* Add distributed executor backend to benchmark scripts (vllm-project#118)

* Add weight padding for moe (vllm-project#119)

* add weight padding for moe

* enable padding by default

* fix linter

* fix linter

* fix linter

* using envs.py

* fix linter

* [BugFix] Fix navi build after many custom for MI kernels added (vllm-project#116)

* fix navi build

* Created dummy kernels of unsupported on Navi to avoid function not found crashes at runtime

* replacing ifdefs on host code with those on kernels

* refactoring code to avoid unsupported call on Navi

* syntactic change

* import statements fix

* moving env variables to envs.py

* style fixes

* cosmetic changes for isort

* remved extra include

* moving use_skinny to be member

---------

Co-authored-by: lcskrishna <lollachaitanya@gmail.com>
Co-authored-by: maleksan85 <maleksan@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>

* add emtpy_cache() after each padding (vllm-project#120)

* [FIX] Gradlib OOM on Navi and sometimes on MI (vllm-project#124)

* add memory clean up after every shape and parameter to reduce cache invalidation buffers

* small typo

* syntax change

---------

Co-authored-by: maleksan85 <maleksan@amd.com>

* save shape when fp8 solution not found (vllm-project#123)

Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>

* Fix unit test for moe by adding padding (vllm-project#128)

* fix test_moe

* fix linter

* Llama3.1 (vllm-project#129)

* Add support for a rope extension method (vllm-project#6553)

* [BugFix] Fix RoPE error in Llama 3.1 (vllm-project#6693)

---------

Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

* chat/completions endpoint (vllm-project#121)

* Initial implementation of chat/completions endpoint and its streaming variant

* Reusing datatypes from the openai entrypoints

* Response role from arg

* Added models endpoint and model validation from the request

* Optimize custom all reduce (vllm-project#130)

* First version

* Revert error.

While there, add missing finalize.

* Use the correct defaults for ROCm.

Increase sampling area to capture crossover.

* Scope end_sync as well.

* Guard only volatile keyword for ifndef USE_ROCM

* Document crossover

* Add BF16 support to custom PA (vllm-project#133)

* tightened atol for custom PA; enable supported head size, block sizes in testing

* update num_blocks and num_iters in benchmark PA to realistic settings

* move to generic b16 type

* bf16 first port

* enabled all bf16 tests, set atol for bf16

* enable custom PA for bf16 as well as block size 32 and head size 64

* fix cast to zero in custom PA reduce

* py linter fixes

* clang format fixes

* div round up clang-format

---------

Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>

* Making check for output match in original types. It saves some memory. (vllm-project#135)

Co-authored-by: maleksan85 <maleksan@amd.com>

* Make CAR ROCm 6.1 compatible. (vllm-project#137)

* remove scoping
* while there fix a typo
* while there remove unused variable

* Car revert (vllm-project#140)

* Per @iotamudelta suggestion until the deadlocks issue is better understood
Revert "Make CAR ROCm 6.1 compatible. (vllm-project#137)"

This reverts commit 4d2dda6.

* Per @iotamudelta suggestion until the deadlocks issue is better understood
Revert "Optimize custom all reduce (vllm-project#130)"

This reverts commit 636ff01.

* Using the correct datatypes for streaming non-chat completions (vllm-project#134)

* Adding UNREACHABLE_CODE macro for non MI300 and MI250 cards (vllm-project#138)

* Adding UNREACHABLE_CODE macro

* clang format fixes

* clang formatting fix

* minor updates in syntax

* clang format update

* clang format fix one more try

* clang format one more try

* clang format fix one more try

---------

Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>

* gfx90a typo fix (vllm-project#142)

Co-authored-by: maleksan85 <maleksan@amd.com>

* wvsplitk templatized and better tuned for MI300 (vllm-project#132)

* improvements to wvSpltK

* wvsplt gemm; better handle MI300 and large A[] sizes

* lint fix

* Adjustments to better handle small weights in TP8.

* early-out bug fix

* better wave load balancing in wvSplt

* add missing skip for wvsplt_big

* Bug fix for wvSplt_big in load balancing at M4, lint fix.

* [Bugfix] Dockerfile.rocm (vllm-project#141)

* Dockerfile.rocm bug fix

* naming preference

---------

Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>

* Update test-template.j2 (vllm-project#145)

* Adding Triton implementations awq_dequantize and awq_gemm to ROCm (vllm-project#136)

* basic support for AWQ added
* awq_dequantize implementation in Triton
* awq_gemm implementation in Triton
* unit tests in tests/kernels/test_awq_triton.py

---------

Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: Matt Wong <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com>
Co-authored-by: lcskrishna <lollachaitanya@gmail.com>
Co-authored-by: maleksan85 <maleksan@amd.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: iotamudelta <dieterich@ogolem.org>
Co-authored-by: sanyalington <shomy.sanyal@amd.com>
Co-authored-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com>
Co-authored-by: Zachary Streeter <90640993+zstreet87@users.noreply.github.com>
Co-authored-by: omkar kakarparthi <75638701+okakarpa@users.noreply.github.com>
Co-authored-by: rasmith <Randall.Smith@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation Installation problems
Projects
None yet
Development

No branches or pull requests