Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fix install cuda12.1, python=3.9, flash-atten=2.3.2 #598

Closed
batman-do opened this issue Oct 11, 2023 · 17 comments
Closed

How to fix install cuda12.1, python=3.9, flash-atten=2.3.2 #598

batman-do opened this issue Oct 11, 2023 · 17 comments

Comments

@batman-do
Copy link

batman-do commented Oct 11, 2023

image

can u suggest to me solution for fix error that?, thank you

@batman-do batman-do reopened this Oct 11, 2023
@tridao
Copy link
Contributor

tridao commented Oct 11, 2023

Can you check if you can download that wheel manually (e.g. with wget)?
I havne't seen the error "invalid cross-device link". Do you have write permission to /tmp?

@batman-do
Copy link
Author

Can you check if you can download that wheel manually (e.g. with wget)? I havne't seen the error "invalid cross-device link". Do you have write permission to /tmp?

I exported ~/tmp but meet error:

image

how to fix that ?

@tridao
Copy link
Contributor

tridao commented Oct 12, 2023

Do you have write permission to /home/dodx/tmp?
I haven't seen this error but that's what I'm guessing. The setup script downloads the wheel and copies to $TMP, and it's running into problem at the copy step.

@batman-do
Copy link
Author

Do you have write permission to /home/dodx/tmp? I haven't seen this error but that's what I'm guessing. The setup script downloads the wheel and copies to $TMP, and it's running into problem at the copy step.

yes , I has just fix , I can run but I have a question

Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm

so I use rtx 3090 don't use layer_norm right

image

@tridao
Copy link
Contributor

tridao commented Oct 12, 2023

You can try the layer_norm, I think it should work but I haven't tested extensively on 3080.

@batman-do
Copy link
Author

You can try the layer_norm, I think it should work but I haven't tested extensively on 3080.

thanks u so much for reply me :))

@YuehChuan
Copy link

@batman-do see this
#595

python3.10
https://www.python.org/downloads/release/python-3100/
win11

python -m venv venv

cd venc/Scripts
activate
-----------------------

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention

pip install packaging 
pip install wheel

set MAX_JOBS=4
python setup.py install

@batman-do
Copy link
Author

batman-do commented Oct 14, 2023

@batman-do see this #595

python3.10
https://www.python.org/downloads/release/python-3100/
win11

python -m venv venv

cd venc/Scripts
activate
-----------------------

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention

pip install packaging 
pip install wheel

set MAX_JOBS=4
python setup.py install

I got error this
`running bdist_egg
running egg_info
writing flash_attn.egg-info/PKG-INFO
writing dependency_links to flash_attn.egg-info/dependency_links.txt
writing requirements to flash_attn.egg-info/requires.txt
writing top-level names to flash_attn.egg-info/top_level.txt
reading manifest file 'flash_attn.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '.cu' under directory 'flash_attn'
warning: no files found matching '
.h' under directory 'flash_attn'
warning: no files found matching '.cuh' under directory 'flash_attn'
warning: no files found matching '
.cpp' under directory 'flash_attn'
warning: no files found matching '*.hpp' under directory 'flash_attn'
adding license file 'LICENSE'
adding license file 'AUTHORS'
writing manifest file 'flash_attn.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 12.1
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'flash_attn_2_cuda' extension
Emitting ninja build file /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Using envvar MAX_JOBS (4) as the number of workers...
[1/49] /usr/local/cuda/bin/nvcc -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/cutlass/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/TH -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/dodx/anaconda3/envs/flash_attention/include/python3.10 -c -c /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu -o /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o
/usr/local/cuda/bin/nvcc -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/cutlass/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/TH -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/dodx/anaconda3/envs/flash_attention/include/python3.10 -c -c /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu -o /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/ATen/cuda/Exceptions.h(56): error: identifier "cusparseStatus_t" is undefined
const char *cusparseGetErrorString(cusparseStatus_t status);
^

/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h(76): error: identifier "cusparseHandle_t" is undefined
attribute((visibility("default"))) cusparseHandle_t getCurrentCUDASparseHandle();
^

2 errors detected in the compilation of "/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu".
[2/49] /usr/local/cuda/bin/nvcc -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/cutlass/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/TH -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/dodx/anaconda3/envs/flash_attention/include/python3.10 -c -c /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.cu -o /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.o
/usr/local/cuda/bin/nvcc -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/cutlass/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/TH -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/dodx/anaconda3/envs/flash_attention/include/python3.10 -c -c /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.cu -o /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/ATen/cuda/Exceptions.h(56): error: identifier "cusparseStatus_t" is undefined
const char *cusparseGetErrorString(cusparseStatus_t status);
^

/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h(76): error: identifier "cusparseHandle_t" is undefined
attribute((visibility("default"))) cusparseHandle_t getCurrentCUDASparseHandle();
^

2 errors detected in the compilation of "/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.cu".
[3/49] /usr/local/cuda/bin/nvcc -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/cutlass/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/TH -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/dodx/anaconda3/envs/flash_attention/include/python3.10 -c -c /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.cu -o /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.o
/usr/local/cuda/bin/nvcc -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/cutlass/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/TH -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/dodx/anaconda3/envs/flash_attention/include/python3.10 -c -c /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.cu -o /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/ATen/cuda/Exceptions.h(56): error: identifier "cusparseStatus_t" is undefined
const char *cusparseGetErrorString(cusparseStatus_t status);
^

/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h(76): error: identifier "cusparseHandle_t" is undefined
attribute((visibility("default"))) cusparseHandle_t getCurrentCUDASparseHandle();
^

2 errors detected in the compilation of "/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.cu".
[4/49] c++ -MMD -MF /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/flash_api.o.d -pthread -B /data/dodx/anaconda3/envs/flash_attention/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /data/dodx/anaconda3/envs/flash_attention/include -fPIC -O2 -isystem /data/dodx/anaconda3/envs/flash_attention/include -fPIC -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/cutlass/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/TH -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/dodx/anaconda3/envs/flash_attention/include/python3.10 -c -c /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/flash_api.cpp -o /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/flash_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/flash_api.o
c++ -MMD -MF /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/flash_api.o.d -pthread -B /data/dodx/anaconda3/envs/flash_attention/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /data/dodx/anaconda3/envs/flash_attention/include -fPIC -O2 -isystem /data/dodx/anaconda3/envs/flash_attention/include -fPIC -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src -I/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/cutlass/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/TH -I/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/dodx/anaconda3/envs/flash_attention/include/python3.10 -c -c /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/flash_api.cpp -o /data/dodx/GenerateAI/test_LLM_local/flash-attention/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/flash_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:22,
from /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/flash_api.cpp:8:
/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/ATen/cuda/Exceptions.h:56:36: error: ‘cusparseStatus_t’ was not declared in this scope; did you mean ‘cublasStatus_t’?
56 | const char cusparseGetErrorString(cusparseStatus_t status);
| ^~~~~~~~~~~~~~~~
| cublasStatus_t
In file included from /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/flash_api.cpp:8:
/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:76:20: error: ‘cusparseHandle_t’ does not name a type; did you mean ‘cublasHandle_t’?
76 | TORCH_CUDA_CPP_API cusparseHandle_t getCurrentCUDASparseHandle();
| ^~~~~~~~~~~~~~~~
| cublasHandle_t
/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/flash_api.cpp: In function ‘void set_params_fprop(Flash_fwd_params&, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, at::Tensor, at::Tensor, at::Tensor, at::Tensor, void
, void*, void*, void*, float, float, int, int)’:
/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/flash_api.cpp:47:38: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘struct Flash_fwd_params’; use assignment or value-initialization instead [-Wclass-memaccess]
47 | memset(&params, 0, sizeof(params));
| ^
In file included from /data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/flash_api.cpp:13:
/data/dodx/GenerateAI/test_LLM_local/flash-attention/csrc/flash_attn/src/flash.h:51:8: note: ‘struct Flash_fwd_params’ declared here
51 | struct Flash_fwd_params : public Qkv_params {
| ^~~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
subprocess.run(
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '4']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/dodx/GenerateAI/test_LLM_local/flash-attention/setup.py", line 288, in
setup(
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/init.py", line 107, in setup
return distutils.core.setup(**attrs)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/command/install.py", line 80, in run
self.do_egg_install()
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/command/install.py", line 129, in do_egg_install
self.run_command('bdist_egg')
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/command/install_lib.py", line 111, in build
self.run_command('build_ext')
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 873, in build_extensions
build_ext.build_extensions(self)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension`

@YuehChuan

@batman-do
Copy link
Author

i use MAX_JOBS=4 pip install flash-attn --no-build-isolation alterative build from source

@batman-do
Copy link
Author

Hi @tridao , why I install layer_norm don't it respond, stopped like this

image

@tridao
Copy link
Contributor

tridao commented Oct 14, 2023

It probably takes a very long time if you don't have ninja or lots of CPU cores to compile. You don't have to use that extension.

@batman-do
Copy link
Author

It probably takes a very long time if you don't have ninja or lots of CPU cores to compile. You don't have to use that extension.

Thanks @tridao , I will try maybe later

@Batwho
Copy link

Batwho commented Oct 20, 2023

@batman-do Hi, I got the exact same bug when trying pip install flash-attn==2.0.4 --no-build-isolation. How did you solve your problem eventually?

@YuehChuan
Copy link

@batman-do
According to
/data/dodx/anaconda3/envs/flash_attention/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h(76): error: identifier "cusparseHandle_t" is undefined

It seems that CUDA library cusparseHandle_t not locate properly.

I am using venv virtual environment not anaconda.

An do make sure
pytorch 2.2.0 with cuda 12 installed in your environment.
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121

And also, layerNorm is deprecate in flash attention2, no need to install.

@CliuGeek9229
Copy link

use shutil.move(wheel_filename, wheel_path) instead os.rename(src, dst) in setup.py

@SingL3
Copy link

SingL3 commented Nov 9, 2023

use shutil.move(wheel_filename, wheel_path) instead os.rename(src, dst) in setup.py

Thanks! It works for me.

@drzraf
Copy link

drzraf commented Aug 30, 2024

It keeps affecting users. Shouldn't have been closed. I guess it happens when /tmp/ or other pip cache directories are on different filesystems or tmpfs. shutil.move() does the trick, this should be changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants