Fix build error - error: function "torchao::marlin_24_gemm" has already been defined (previous definition at line 83) #863

kshitij12345 · 2024-09-10T12:07:22Z

Building for CUDA_ARCH < 800 fails with

[1/1] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/kkalambarkar/ao/build/temp.linux-x86_64-cpython-310/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.o.d -I/home/kkalambarkar/git/pytorch/torch/include -I/home/kkalambarkar/git/pytorch/torch/include/torch/csrc/api/include -I/home/kkalambarkar/git/pytorch/torch/include/TH -I/home/kkalambarkar/git/pytorch/torch/include/THC -I/usr/local/cuda/include -I/home/kkalambarkar/miniconda3/envs/pytorch-dev/include/python3.10 -c -c /home/kkalambarkar/ao/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.cu -o /home/kkalambarkar/ao/build/temp.linux-x86_64-cpython-310/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -t=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17
    FAILED: /home/kkalambarkar/ao/build/temp.linux-x86_64-cpython-310/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.o
    /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/kkalambarkar/ao/build/temp.linux-x86_64-cpython-310/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.o.d -I/home/kkalambarkar/git/pytorch/torch/include -I/home/kkalambarkar/git/pytorch/torch/include/torch/csrc/api/include -I/home/kkalambarkar/git/pytorch/torch/include/TH -I/home/kkalambarkar/git/pytorch/torch/include/THC -I/usr/local/cuda/include -I/home/kkalambarkar/miniconda3/envs/pytorch-dev/include/python3.10 -c -c /home/kkalambarkar/ao/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.cu -o /home/kkalambarkar/ao/build/temp.linux-x86_64-cpython-310/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -t=0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17
    /home/kkalambarkar/ao/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.cu(1012): error: function "torchao::marlin_24_gemm" has already been defined
      torch::Tensor marlin_24_gemm(torch::Tensor& a, torch::Tensor& b_q_weight,
                    ^
    
    1 error detected in the compilation of "/home/kkalambarkar/ao/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.cu".

For CUDA_ARCH < 800

ao/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.cu

Lines 55 to 57 in 3ac2ab8

    
           #if defined(__CUDA_ARCH__) && __CUDA_ARCH__ < 800 
        
           template <const int num_bits,         // weight bits

we get multiple definition of marlin_24_gemm due to incorrect placement on endif.

ao/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.cu

Lines 83 to 92 in 3ac2ab8

    
           torch::Tensor marlin_24_gemm(torch::Tensor& a, torch::Tensor& b_q_weight, 
        
                                             torch::Tensor& b_meta, 
        
                                             torch::Tensor& b_scales, 
        
                                             torch::Tensor& workspace, int64_t num_bits, 
        
                                             int64_t size_m, int64_t size_n, 
        
                                             int64_t size_k) { 
        
             TORCH_CHECK_NOT_IMPLEMENTED( 
        
                 false, "marlin_24_gemm(..) requires CUDA_ARCH >= 8.0"); 
        
             return torch::empty({1, 1}); 
        
           }

ao/torchao/csrc/cuda/sparse_marlin/marlin_kernel_nm.cu

Lines 1012 to 1019 in 3ac2ab8

    
           torch::Tensor marlin_24_gemm(torch::Tensor& a, torch::Tensor& b_q_weight, 
        
                                             torch::Tensor& b_meta, 
        
                                             torch::Tensor& b_scales, 
        
                                             torch::Tensor& workspace, int64_t num_bits, 
        
                                             int64_t size_m, int64_t size_n, 
        
                                             int64_t size_k) { 
        
             // Verify num_bits 
        
             TORCH_CHECK(num_bits == 4 || num_bits == 8,

Fix - Move the endif to the appropriate location.

pytorch-bot · 2024-09-10T12:07:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/863

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 44cb0dc with merge base 3ac2ab8 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

msaroufim · 2024-09-10T12:10:08Z

mind just pasting the error you get before this change as well? otherwise lgtm

kshitij12345 · 2024-09-10T12:15:54Z

Sure, I have updated the PR description. Thanks for the quick review :)

…dy been defined (previous definition at line 83) (#863) fix compile error

* Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

* Removing all references to HQQ * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Creating an initial Quantization Directory (pytorch#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

* Removing GPTQ from all of torchchat * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Rebase + Add back accidental deletion * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Creating an initial Quantization Directory (pytorch#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Removing all references to HQQ (pytorch#869) * Removing all references to HQQ * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Creating an initial Quantization Directory (pytorch#863) * Initial Creation of a quantization directory * Moving qops * updating import * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867) * Update Quant call using llama.cpp (pytorch#868) llama.cpp did a BC breaking refactor: ggerganov/llama.cpp@1c641e6 resulting in some of our CI breaking This updates our CI to match llama.cpp's schema * Updating torch nightly to pick up aoti improvements in 128339 (pytorch#862) * Updating torch nightly to pick up aoti improvements in 128339 * Update the torch version to 2.5 * Updating lm_eval version (pytorch#865) Fixing CI related to EleutherAI/wikitext_document_level change requirements from using HF Datasets * Pinning numpy to under 2.0 (pytorch#867)

fix compile error

44cb0dc

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2024

msaroufim self-requested a review September 10, 2024 12:09

msaroufim approved these changes Sep 10, 2024

View reviewed changes

msaroufim merged commit d2226a4 into pytorch:main Sep 10, 2024
17 checks passed

kshitij12345 deleted the fix-compile-error branch September 10, 2024 12:46

jainapurva pushed a commit that referenced this pull request Sep 10, 2024

Fix build error - error: function "torchao::marlin_24_gemm" has alrea…

0802f16

…dy been defined (previous definition at line 83) (#863) fix compile error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix build error - error: function "torchao::marlin_24_gemm" has already been defined (previous definition at line 83) #863

Fix build error - error: function "torchao::marlin_24_gemm" has already been defined (previous definition at line 83) #863

kshitij12345 commented Sep 10, 2024 •

edited

Loading

pytorch-bot bot commented Sep 10, 2024 •

edited

Loading

msaroufim commented Sep 10, 2024

kshitij12345 commented Sep 10, 2024

	#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ < 800

	template <const int num_bits, // weight bits

	torch::Tensor marlin_24_gemm(torch::Tensor& a, torch::Tensor& b_q_weight,
	torch::Tensor& b_meta,
	torch::Tensor& b_scales,
	torch::Tensor& workspace, int64_t num_bits,
	int64_t size_m, int64_t size_n,
	int64_t size_k) {
	TORCH_CHECK_NOT_IMPLEMENTED(
	false, "marlin_24_gemm(..) requires CUDA_ARCH >= 8.0");
	return torch::empty({1, 1});
	}

Fix build error - error: function "torchao::marlin_24_gemm" has already been defined (previous definition at line 83) #863

Fix build error - error: function "torchao::marlin_24_gemm" has already been defined (previous definition at line 83) #863

Conversation

kshitij12345 commented Sep 10, 2024 • edited Loading

pytorch-bot bot commented Sep 10, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/863

✅ No Failures

msaroufim commented Sep 10, 2024

kshitij12345 commented Sep 10, 2024

kshitij12345 commented Sep 10, 2024 •

edited

Loading

pytorch-bot bot commented Sep 10, 2024 •

edited

Loading