No module named 'llama_inference_offload' on Arch Linux #879

Yersi88 · 2023-04-07T11:22:03Z

Describe the bug

Try to run server.py in following: python server.py --wbits 4 --groupsize 128 and get error No module named 'llama_inference_offload' I did this fix: #400 (comment) did not help.

Is there an existing issue for this?

I have searched the existing issues

Reproduction

Run following command: python server.py --wbits 4 --groupsize 128

Screenshot

No response

Logs

$ python server.py --wbits 4 --groupsize 128 

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/x/miniconda3/envs/textgen/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /home/x/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
Loading vicuna-13b-GPTQ-4bit-128g...
Traceback (most recent call last):
  File "/home/x/text-generation-webui/server.py", line 308, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/x/text-generation-webui/modules/models.py", line 100, in load_model
    from modules.GPTQ_loader import load_quantized
  File "/home/x/text-generation-webui/modules/GPTQ_loader.py", line 14, in <module>
    import llama_inference_offload
ModuleNotFoundError: No module named 'llama_inference_offload'

System Info

-`                    x@archlinux 
                  .o+`                   ---------------- 
                 `ooo/                   OS: Arch Linux x86_64 
                `+oooo:                  Host: X670 GAMING X AX -CF 
               `+oooooo:                 Kernel: 6.2.9-zen1-1-zen 
               -+oooooo+:                Uptime: 1 hour, 51 mins 
             `/:-:++oooo+:               Packages: 1623 (pacman), 20 (flatpak), 7 (snap) 
            `/++++/+++++++:              Shell: zsh 5.9 
           `/++++++++++++++:             Resolution: 2560x1440 
          `/+++ooooooooooooo/`           DE: Plasma 5.27.4 
         ./ooosssso++osssssso+`          WM: kwin 
        .oossssso-`/ossssss+`         WM Theme: Endless 
       -osssssso.      :ssssssso.        Theme: [Plasma], Breeze [GTK3] 
      :osssssss/        osssso+++.       Icons: [Plasma], Relax-Dark-Icons [GTK2/3] 
     /ossssssss/        +ssssooo/-       Terminal: terminator 
   `/ossssso+/:-        -:/+osssso+-     CPU: AMD Ryzen 9 7900X (24) @ 4.700GHz 
  `+sso+:-`                 `.-/+oso:    GPU: AMD ATI 16:00.0 Raphael 
 `++:.                           `-/+/   GPU: NVIDIA GeForce RTX 2080 Ti Rev. A 
 .`                                 `/   Memory: 12939MiB / 31231MiB

oobabooga · 2023-04-07T13:52:19Z

Follow the steps here

https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#installation

da3dsoul · 2023-04-07T14:09:58Z

Specifically, the llama_inference_offload function is only available in the triton branch of GPTQ-for-LLaMA

da3dsoul · 2023-04-07T14:13:50Z

Also, I had better luck with vicuna-13b-4bit-128g on cuda. You will probably need to specify --model_type llama as well. There's a lot of trial and error here in the comments

Yersi88 · 2023-04-07T14:46:16Z

Follow the steps here

https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#installation

This failed with following error: :(
RuntimeError: The current installed version of g++ (12.2.1) is greater than the maximum required version by CUDA 11.7. Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).

da3dsoul · 2023-04-07T14:51:33Z

Ah I'm guessing arch has a newer one available than Ubuntu, for example. I would install g++ manually, staying on 11.x, then try

da3dsoul · 2023-04-07T14:53:26Z

#850 looks relevant to that

kodicw · 2023-04-09T03:26:02Z

Follow the steps here
https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#installation

This failed with following error: :( RuntimeError: The current installed version of g++ (12.2.1) is greater than the maximum required version by CUDA 11.7. Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).

I had the same issue with fedora 37. To fix this I did the following.

conda install -c conda-forge gxx

If that doesn't work try

conda install gcc_linux-64==11.2.0

Sotonya · 2023-04-10T16:14:09Z

same error on windows 11

Nazushvel · 2023-04-10T18:19:11Z

Got this working on Arch. Here are the steps:

git clone https://github.com/oobabooga/text-generation-webui.git
sudo pacman -S rocm-hip-sdk python-tqdm
cd text-generation-webui
python -m venv --system-site-packages venv
export PATH=/opt/rocm/bin:$PATH
export HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030
python -m venv --system-site-packages venv
source venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
mkdir repositories && cd repositories
git clone https://github.com/agrocylo/bitsandbytes-rocm
cd bitsandbytes-rocm
make hip
python setup.py install
cd ..
git clone https://github.com/WapaMario63/GPTQ-for-LLaMa-ROCm GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
python setup_rocm.py install
cd ../..
python download-model.py anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g
rm models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g/gpt-x-alpaca-13b-native-4bit-128g.pt (needed or it will just spam random numbers)
pip install -r requirements.txt
python server.py --wbits 4 --groupsize 128

That should do it. Just did this with a fresh install so it should not be missing any steps.
If you are using the Nvidia card steps should be the same but you will install the nvidia equivalent of rocm-hip-sdk and use the normal GPTQ and bitsandbytes repos which instructions are in this repos wiki for.

FloriMaster · 2023-04-13T09:01:24Z

python3 setup_cuda.py install
failed with error: command '/usr/bin/nvcc' failed with exit code 1

EDIT: Seem like this might be a problem with having mismatched cuda and nvcc versions.
Fixed by reinstalling Linux and Installing cuda toolkit with nvcc using this script https://gist.github.com/X-TRON404/e9cab789041ef03bcba13da1d5176e28

(You probably don't need to reinstall linux, i just did it out of frustration and found out the script afterward . Running that script should work as it will delete all previously installed drivers for you.)

Full output:

running install
/home/ass/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/home/ass/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info/PKG-INFO
writing dependency_links to quant_cuda.egg-info/dependency_links.txt
writing top-level names to quant_cuda.egg-info/top_level.txt
/home/ass/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info/SOURCES.txt'
writing manifest file 'quant_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
/home/ass/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:388: UserWarning: The detected CUDA version (11.5) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
  warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'quant_cuda' extension
gcc -pthread -B /home/ass/miniconda3/envs/textgen/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/ass/miniconda3/envs/textgen/include -fPIC -O2 -isystem /home/ass/miniconda3/envs/textgen/include -fPIC -I/home/ass/.local/lib/python3.10/site-packages/torch/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/TH -I/home/ass/.local/lib/python3.10/site-packages/torch/include/THC -I/home/ass/miniconda3/envs/textgen/include/python3.10 -c quant_cuda.cpp -o build/temp.linux-x86_64-cpython-310/quant_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
/usr/bin/nvcc -I/home/ass/.local/lib/python3.10/site-packages/torch/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/TH -I/home/ass/.local/lib/python3.10/site-packages/torch/include/THC -I/home/ass/miniconda3/envs/textgen/include/python3.10 -c quant_cuda_kernel.cu -o build/temp.linux-x86_64-cpython-310/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17
/home/ass/.local/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
(61): here
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]" 
/home/ass/.local/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here

/home/ass/.local/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
          detected during:
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
(61): here
            instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]" 
/home/ass/.local/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here

/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
error: command '/usr/bin/nvcc' failed with exit code 1

fuzzah · 2023-04-26T10:03:07Z

When facing the original problem, I somehow missed the need for the GPTQ-for-LLaMa directory to be inside of the repositories dir and had GPTQ-for-LLaMa placed just in the root of text generation web ui, which caused the problem.

Make sure the hierarchy of directories goes like this: text-generation-webui/repositories/GPTQ-for-LLaMa, and not like this: text-generation-webui/GPTQ-for-LLaMa.
Relevant source line is here.

Hope this helps!

github-actions · 2023-10-16T23:16:30Z

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Yersi88 added the bug Something isn't working label Apr 7, 2023

This was referenced Apr 7, 2023

ModuleNotFoundError: No module named 'llama_inference_offload' #869

Closed

ModuleNotFoundError: No module named 'llama_inference_offload' #864

Closed

github-actions bot added the stale label Oct 16, 2023

github-actions bot closed this as completed Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No module named 'llama_inference_offload' on Arch Linux #879

No module named 'llama_inference_offload' on Arch Linux #879

Yersi88 commented Apr 7, 2023

oobabooga commented Apr 7, 2023

da3dsoul commented Apr 7, 2023 •

edited

Loading

da3dsoul commented Apr 7, 2023

Yersi88 commented Apr 7, 2023

da3dsoul commented Apr 7, 2023

da3dsoul commented Apr 7, 2023

kodicw commented Apr 9, 2023 •

edited

Loading

Sotonya commented Apr 10, 2023

Nazushvel commented Apr 10, 2023

FloriMaster commented Apr 13, 2023 •

edited

Loading

fuzzah commented Apr 26, 2023 •

edited

Loading

github-actions bot commented Oct 16, 2023

No module named 'llama_inference_offload' on Arch Linux #879

No module named 'llama_inference_offload' on Arch Linux #879

Comments

Yersi88 commented Apr 7, 2023

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

oobabooga commented Apr 7, 2023

da3dsoul commented Apr 7, 2023 • edited Loading

da3dsoul commented Apr 7, 2023

Yersi88 commented Apr 7, 2023

da3dsoul commented Apr 7, 2023

da3dsoul commented Apr 7, 2023

kodicw commented Apr 9, 2023 • edited Loading

Sotonya commented Apr 10, 2023

Nazushvel commented Apr 10, 2023

FloriMaster commented Apr 13, 2023 • edited Loading

fuzzah commented Apr 26, 2023 • edited Loading

github-actions bot commented Oct 16, 2023

da3dsoul commented Apr 7, 2023 •

edited

Loading

kodicw commented Apr 9, 2023 •

edited

Loading

FloriMaster commented Apr 13, 2023 •

edited

Loading

fuzzah commented Apr 26, 2023 •

edited

Loading