Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No module named 'llama_inference_offload' on Arch Linux #879

Closed
1 task done
Yersi88 opened this issue Apr 7, 2023 · 12 comments
Closed
1 task done

No module named 'llama_inference_offload' on Arch Linux #879

Yersi88 opened this issue Apr 7, 2023 · 12 comments
Labels
bug Something isn't working stale

Comments

@Yersi88
Copy link

Yersi88 commented Apr 7, 2023

Describe the bug

Try to run server.py in following: python server.py --wbits 4 --groupsize 128 and get error No module named 'llama_inference_offload' I did this fix: #400 (comment) did not help.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Run following command: python server.py --wbits 4 --groupsize 128

Screenshot

No response

Logs

$ python server.py --wbits 4 --groupsize 128 

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/x/miniconda3/envs/textgen/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /home/x/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
Loading vicuna-13b-GPTQ-4bit-128g...
Traceback (most recent call last):
  File "/home/x/text-generation-webui/server.py", line 308, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/x/text-generation-webui/modules/models.py", line 100, in load_model
    from modules.GPTQ_loader import load_quantized
  File "/home/x/text-generation-webui/modules/GPTQ_loader.py", line 14, in <module>
    import llama_inference_offload
ModuleNotFoundError: No module named 'llama_inference_offload'

System Info

-`                    x@archlinux 
                  .o+`                   ---------------- 
                 `ooo/                   OS: Arch Linux x86_64 
                `+oooo:                  Host: X670 GAMING X AX -CF 
               `+oooooo:                 Kernel: 6.2.9-zen1-1-zen 
               -+oooooo+:                Uptime: 1 hour, 51 mins 
             `/:-:++oooo+:               Packages: 1623 (pacman), 20 (flatpak), 7 (snap) 
            `/++++/+++++++:              Shell: zsh 5.9 
           `/++++++++++++++:             Resolution: 2560x1440 
          `/+++ooooooooooooo/`           DE: Plasma 5.27.4 
         ./ooosssso++osssssso+`          WM: kwin 
        .oossssso-`/ossssss+`         WM Theme: Endless 
       -osssssso.      :ssssssso.        Theme: [Plasma], Breeze [GTK3] 
      :osssssss/        osssso+++.       Icons: [Plasma], Relax-Dark-Icons [GTK2/3] 
     /ossssssss/        +ssssooo/-       Terminal: terminator 
   `/ossssso+/:-        -:/+osssso+-     CPU: AMD Ryzen 9 7900X (24) @ 4.700GHz 
  `+sso+:-`                 `.-/+oso:    GPU: AMD ATI 16:00.0 Raphael 
 `++:.                           `-/+/   GPU: NVIDIA GeForce RTX 2080 Ti Rev. A 
 .`                                 `/   Memory: 12939MiB / 31231MiB
@Yersi88 Yersi88 added the bug Something isn't working label Apr 7, 2023
@oobabooga
Copy link
Owner

@da3dsoul
Copy link
Contributor

da3dsoul commented Apr 7, 2023

Specifically, the llama_inference_offload function is only available in the triton branch of GPTQ-for-LLaMA

@da3dsoul
Copy link
Contributor

da3dsoul commented Apr 7, 2023

Also, I had better luck with vicuna-13b-4bit-128g on cuda. You will probably need to specify --model_type llama as well. There's a lot of trial and error here in the comments

@Yersi88
Copy link
Author

Yersi88 commented Apr 7, 2023

Follow the steps here

https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#installation

This failed with following error: :(
RuntimeError: The current installed version of g++ (12.2.1) is greater than the maximum required version by CUDA 11.7. Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).

@da3dsoul
Copy link
Contributor

da3dsoul commented Apr 7, 2023

Ah I'm guessing arch has a newer one available than Ubuntu, for example. I would install g++ manually, staying on 11.x, then try

@da3dsoul
Copy link
Contributor

da3dsoul commented Apr 7, 2023

#850 looks relevant to that

@kodicw
Copy link

kodicw commented Apr 9, 2023

Follow the steps here
https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#installation

This failed with following error: :( RuntimeError: The current installed version of g++ (12.2.1) is greater than the maximum required version by CUDA 11.7. Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).

I had the same issue with fedora 37. To fix this I did the following.

conda install -c conda-forge gxx

If that doesn't work try

conda install gcc_linux-64==11.2.0

@Sotonya
Copy link

Sotonya commented Apr 10, 2023

same error on windows 11

@Nazushvel
Copy link

Got this working on Arch. Here are the steps:

  1. git clone https://github.com/oobabooga/text-generation-webui.git
  2. sudo pacman -S rocm-hip-sdk python-tqdm
  3. cd text-generation-webui
  4. python -m venv --system-site-packages venv
  5. export PATH=/opt/rocm/bin:$PATH
  6. export HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030
  7. python -m venv --system-site-packages venv
  8. source venv/bin/activate
  9. pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
  10. mkdir repositories && cd repositories
  11. git clone https://github.com/agrocylo/bitsandbytes-rocm
  12. cd bitsandbytes-rocm
  13. make hip
  14. python setup.py install
  15. cd ..
  16. git clone https://github.com/WapaMario63/GPTQ-for-LLaMa-ROCm GPTQ-for-LLaMa
  17. cd GPTQ-for-LLaMa
  18. python setup_rocm.py install
  19. cd ../..
  20. python download-model.py anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g
  21. rm models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g/gpt-x-alpaca-13b-native-4bit-128g.pt (needed or it will just spam random numbers)
  22. pip install -r requirements.txt
  23. python server.py --wbits 4 --groupsize 128

That should do it. Just did this with a fresh install so it should not be missing any steps.
If you are using the Nvidia card steps should be the same but you will install the nvidia equivalent of rocm-hip-sdk and use the normal GPTQ and bitsandbytes repos which instructions are in this repos wiki for.

@FloriMaster
Copy link

FloriMaster commented Apr 13, 2023

python3 setup_cuda.py install
failed with error: command '/usr/bin/nvcc' failed with exit code 1

EDIT: Seem like this might be a problem with having mismatched cuda and nvcc versions.
Fixed by reinstalling Linux and Installing cuda toolkit with nvcc using this script https://gist.github.com/X-TRON404/e9cab789041ef03bcba13da1d5176e28

(You probably don't need to reinstall linux, i just did it out of frustration and found out the script afterward . Running that script should work as it will delete all previously installed drivers for you.)

Full output:

running install
/home/ass/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/home/ass/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info/PKG-INFO
writing dependency_links to quant_cuda.egg-info/dependency_links.txt
writing top-level names to quant_cuda.egg-info/top_level.txt
/home/ass/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info/SOURCES.txt'
writing manifest file 'quant_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
/home/ass/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:388: UserWarning: The detected CUDA version (11.5) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
  warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'quant_cuda' extension
gcc -pthread -B /home/ass/miniconda3/envs/textgen/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/ass/miniconda3/envs/textgen/include -fPIC -O2 -isystem /home/ass/miniconda3/envs/textgen/include -fPIC -I/home/ass/.local/lib/python3.10/site-packages/torch/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/TH -I/home/ass/.local/lib/python3.10/site-packages/torch/include/THC -I/home/ass/miniconda3/envs/textgen/include/python3.10 -c quant_cuda.cpp -o build/temp.linux-x86_64-cpython-310/quant_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
/usr/bin/nvcc -I/home/ass/.local/lib/python3.10/site-packages/torch/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/TH -I/home/ass/.local/lib/python3.10/site-packages/torch/include/THC -I/home/ass/miniconda3/envs/textgen/include/python3.10 -c quant_cuda_kernel.cu -o build/temp.linux-x86_64-cpython-310/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17
/home/ass/.local/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
(61): here
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
/home/ass/.local/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here

/home/ass/.local/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
(61): here
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
/home/ass/.local/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here

/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
error: command '/usr/bin/nvcc' failed with exit code 1

@fuzzah
Copy link

fuzzah commented Apr 26, 2023

When facing the original problem, I somehow missed the need for the GPTQ-for-LLaMa directory to be inside of the repositories dir and had GPTQ-for-LLaMa placed just in the root of text generation web ui, which caused the problem.

Make sure the hierarchy of directories goes like this: text-generation-webui/repositories/GPTQ-for-LLaMa, and not like this: text-generation-webui/GPTQ-for-LLaMa.
Relevant source line is here.

Hope this helps!

@github-actions github-actions bot added the stale label Oct 16, 2023
@github-actions
Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

8 participants