Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to build from source on ROCm (with pytorch and xformers working correctly) #3067

Closed
nayn99 opened this issue Feb 28, 2024 · 8 comments
Labels
installation Installation problems rocm

Comments

@nayn99
Copy link

nayn99 commented Feb 28, 2024

OS: Linux 6.6.17-1-lts
HW: AMD 4650G (Renoir), gfx90c
SW: torch==2.3.0.dev20240224+rocm5.7, xformers==0.0.23 (both confirmed working).

Description of the issue: Following the installation guide for ROCm to build from source:

Total number of replaced kernel launches: 21
running install
/home/toto/tmp/testenv/lib/python3.11/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/home/toto/tmp/testenv/lib/python3.11/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing vllm.egg-info/PKG-INFO
writing dependency_links to vllm.egg-info/dependency_links.txt
writing requirements to vllm.egg-info/requires.txt
writing top-level names to vllm.egg-info/top_level.txt
reading manifest file 'vllm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'vllm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'vllm._C' extension
Emitting ninja build file /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
g++ -shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/activation_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/hip_utils_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/layernorm_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/moe_align_block_size_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/pos_encoding_kernels.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/pybind.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/quantization/gptq/q_gemm.o /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/quantization/squeezellm/quant_hip_kernel.o -L/home/toto/tmp/testenv/lib/python3.11/site-packages/torch/lib -L/opt/rocm/lib -L/opt/rocm/hip/lib -L/usr/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lamdhip64 -lc10_hip -ltorch_hip -o build/lib.linux-x86_64-cpython-311/vllm/_C.cpython-311-x86_64-linux-gnu.so
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__float2bfloat16(float)':
cache_kernels.hip:(.text+0x0): multiple definition of `__float2bfloat16(float)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x0): first defined here
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__bfloat1622float2(__hip_bfloat162)':
cache_kernels.hip:(.text+0x40): multiple definition of `__bfloat1622float2(__hip_bfloat162)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x40): first defined here
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__double2bfloat16(double)':
cache_kernels.hip:(.text+0x60): multiple definition of `__double2bfloat16(double)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x60): first defined here
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__float22bfloat162_rn(HIP_vector_type<float, 2u>)':
cache_kernels.hip:(.text+0xa0): multiple definition of `__float22bfloat162_rn(HIP_vector_type<float, 2u>)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0xa0): first defined here
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__high2float(__hip_bfloat162)':
cache_kernels.hip:(.text+0x110): multiple definition of `__high2float(__hip_bfloat162)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x110): first defined here
/usr/bin/ld: /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/cache_kernels.o: in function `__low2float(__hip_bfloat162)':
cache_kernels.hip:(.text+0x120): multiple definition of `__low2float(__hip_bfloat162)'; /home/toto/tmp/vllm/build/temp.linux-x86_64-cpython-311/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x120): first defined here
collect2: error: ld returned 1 exit status
error: command '/usr/bin/g++' failed with exit code 1
@george-kuanli-peng
Copy link

george-kuanli-peng commented Feb 29, 2024

I have the same problem building vllm from source on two platforms:

First:

  • vllm tag v0.3.2
  • rocm 6.0.2
  • PyTorch 2.1.2+git98a6632
  • xformers 0.0.23

Second:

  • vllm tag v0.3.2
  • rocm 5.7.0
  • PyTorch 2.0.1+git4c8bc42
  • xformers 0.0.23
g++ -pthread -B /opt/conda/envs/py_3.10/compiler_compat -shared -Wl,-rpath,/opt/conda/envs/py_3.10/lib -Wl,-rpath-link,/opt/conda/envs/py_3.10/lib -L/opt/conda/envs/py_3.10/lib -Wl,-rpath,/opt/conda/envs/py_3.10/lib -Wl,-rpath-link,/opt/conda/envs/py_3.10/lib -L/opt/conda/envs/py_3.10/lib /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/activation_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/hip_utils_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/layernorm_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/moe_align_block_size_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/pos_encoding_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/pybind.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/quantization/gptq/q_gemm.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/quantization/squeezellm/quant_hip_kernel.o -L/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib -L/opt/rocm/lib -L/opt/rocm/hip/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lamdhip64 -lc10_hip -ltorch_hip -o build/lib.linux-x86_64-cpython-310/vllm/_C.cpython-310-x86_64-linux-gnu.so
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__float2bfloat16(float)':
cache_kernels.hip:(.text+0x0): multiple definition of `__float2bfloat16(float)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x0): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__bfloat1622float2(__hip_bfloat162)':
cache_kernels.hip:(.text+0x40): multiple definition of `__bfloat1622float2(__hip_bfloat162)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x40): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__double2bfloat16(double)':
cache_kernels.hip:(.text+0x60): multiple definition of `__double2bfloat16(double)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x60): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__float22bfloat162_rn(HIP_vector_type<float, 2u>)':
cache_kernels.hip:(.text+0xa0): multiple definition of `__float22bfloat162_rn(HIP_vector_type<float, 2u>)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0xa0): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__high2float(__hip_bfloat162)':
cache_kernels.hip:(.text+0x110): multiple definition of `__high2float(__hip_bfloat162)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x110): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__low2float(__hip_bfloat162)':
cache_kernels.hip:(.text+0x120): multiple definition of `__low2float(__hip_bfloat162)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x120): first defined here
collect2: error: ld returned 1 exit status
error: command '/usr/bin/g++' failed with exit code 1

@george-kuanli-peng
Copy link

I have the same problem building vllm from source on two platforms:

First:

  • vllm tag v0.3.2
  • rocm 6.0.2
  • PyTorch 2.1.2+git98a6632
  • xformers 0.0.23

Second:

  • vllm tag v0.3.2
  • rocm 5.7.0
  • PyTorch 2.0.1+git4c8bc42
  • xformers 0.0.23
g++ -pthread -B /opt/conda/envs/py_3.10/compiler_compat -shared -Wl,-rpath,/opt/conda/envs/py_3.10/lib -Wl,-rpath-link,/opt/conda/envs/py_3.10/lib -L/opt/conda/envs/py_3.10/lib -Wl,-rpath,/opt/conda/envs/py_3.10/lib -Wl,-rpath-link,/opt/conda/envs/py_3.10/lib -L/opt/conda/envs/py_3.10/lib /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/activation_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/hip_utils_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/layernorm_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/moe_align_block_size_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/pos_encoding_kernels.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/pybind.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/quantization/gptq/q_gemm.o /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/quantization/squeezellm/quant_hip_kernel.o -L/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib -L/opt/rocm/lib -L/opt/rocm/hip/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -lamdhip64 -lc10_hip -ltorch_hip -o build/lib.linux-x86_64-cpython-310/vllm/_C.cpython-310-x86_64-linux-gnu.so
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__float2bfloat16(float)':
cache_kernels.hip:(.text+0x0): multiple definition of `__float2bfloat16(float)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x0): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__bfloat1622float2(__hip_bfloat162)':
cache_kernels.hip:(.text+0x40): multiple definition of `__bfloat1622float2(__hip_bfloat162)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x40): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__double2bfloat16(double)':
cache_kernels.hip:(.text+0x60): multiple definition of `__double2bfloat16(double)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x60): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__float22bfloat162_rn(HIP_vector_type<float, 2u>)':
cache_kernels.hip:(.text+0xa0): multiple definition of `__float22bfloat162_rn(HIP_vector_type<float, 2u>)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0xa0): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__high2float(__hip_bfloat162)':
cache_kernels.hip:(.text+0x110): multiple definition of `__high2float(__hip_bfloat162)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x110): first defined here
/opt/conda/envs/py_3.10/compiler_compat/ld: /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/cache_kernels.o: in function `__low2float(__hip_bfloat162)':
cache_kernels.hip:(.text+0x120): multiple definition of `__low2float(__hip_bfloat162)'; /workspace/vllm/build/temp.linux-x86_64-cpython-310/csrc/attention/attention_kernels.o:attention_kernels.hip:(.text+0x120): first defined here
collect2: error: ld returned 1 exit status
error: command '/usr/bin/g++' failed with exit code 1

Well, I can now build vllm from source on the first platform (rocm 6.0.2) by appending static at the end of two lines in /opt/rocm/include/hip/amd_detail/amd_hip_bf16.h as in ROCm/clr@77c581a

Ref: #2646 (comment)

However, later I have another issue as in #3061

@cocoderss
Copy link

I have the same problem building vllm from source on two platforms:
First:

  • vllm tag v0.3.2
  • rocm 6.0.2
  • PyTorch 2.1.2+git98a6632
  • xformers 0.0.23

Second:

  • vllm tag v0.3.2
  • rocm 5.7.0
  • PyTorch 2.0.1+git4c8bc42
  • xformers 0.0.23

However, later I have another issue as in #3061

I will try and give it a try, what is your system setup? Is it also an AMD iGPU?

@george-kuanli-peng
Copy link

I have the same problem building vllm from source on two platforms:
First:

  • vllm tag v0.3.2
  • rocm 6.0.2
  • PyTorch 2.1.2+git98a6632
  • xformers 0.0.23

Second:

  • vllm tag v0.3.2
  • rocm 5.7.0
  • PyTorch 2.0.1+git4c8bc42
  • xformers 0.0.23

However, later I have another issue as in #3061

I will try and give it a try, what is your system setup? Is it also an AMD iGPU?

I am not using AMD integrated GPUs. They are MI210 and MI300X.

@hliuca
Copy link
Contributor

hliuca commented Mar 7, 2024

Caused by a compiler bug. Need to fix a header file by adding static.

--- amd_hip_bf16.h 2024-02-06 18:28:58.268699142 +0000
+++ amd_hip_bf16.h.new 2024-02-06 18:28:31.988647133 +0000
@@ -90,10 +90,10 @@
#include "math_fwd.h" // ocml device functions

#if defined(HIPCC_RTC)
-#define HOST_DEVICE device
+#define HOST_DEVICE device static
#else
#include
-#define HOST_DEVICE host device
+#define HOST_DEVICE host device static inline
#endif

@fxmarty
Copy link

fxmarty commented Apr 18, 2024

same issue even with #2648

@fxmarty
Copy link

fxmarty commented Apr 18, 2024

FYI #2790 is need and this is fixed for me. This may be closed imo

@hongxiayang hongxiayang added rocm installation Installation problems labels Jul 13, 2024
@hongxiayang
Copy link
Collaborator

@nayn99 Please update this issue or close this issue if your problem is resolved. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation Installation problems rocm
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants