won't use gpu #21

TheFinality · 2023-06-16T23:29:51Z

im trying to have ctransformers use gpu but it won't work.

my chatdocs.yml:

ctransformers:
  model: TheBloke/Wizard-Vicuna-7B-Uncensored-GGML
  model_file: Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin
  model_type: llama
  config:
    gpu_layers: 50


llm: ctransformers

marella · 2023-06-17T19:00:40Z

Can you please try reinstalling ctransformers with CUDA enabled:

pip uninstall ctransformers --yes
CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

TheFinality · 2023-06-17T23:52:04Z

now i think it only uses it for a few seconds, then stops using the gpu. why is that? after this it just hovers around 1-2% usage.

LazyCat420 · 2023-06-18T04:38:30Z

running into the same problem as well I followed the instructions above and got it to run but it uses the gpu for only a second then the CPU starts to go up to 40%. I'm running at 3090ti.

Tried setting using set CT_CUBLAS=1 but still didn't seem to work.

Here is the yml

llm: ctransformers

ctransformers:
model: F:\AI_Scripts\chatdocs\model\wizard-mega-13B.ggmlv3.q8_0.bin
model_file: wizard-mega-13B.ggmlv3.q8_0.bin
model_type: llama
config:
context_length: 1024
gpu_layers: 100

embeddings:
model_kwargs:
device: cuda

marella · 2023-06-18T19:24:13Z

Did you notice any performance drop if you don't set gpu_layers?

Recently llama.cpp added full GPU acceleration (ggerganov/llama.cpp#1827) which is added to ctransformers in 0.2.9 today. Can you please try updating ctransformers to 0.2.9 and see if there is any improvement in speed:

CT_CUBLAS=1 pip install 'ctransformers>=0.2.9' --no-binary ctransformers

Note: If you are on Windows, you should run set CT_CUBLAS=1 (Command Prompt) or $env:CT_CUBLAS=1 (PowerShell) before running pip install.

Also try setting threads: 1 in your config:

ctransformers:
  config:
    threads: 1

LazyCat420 · 2023-06-18T21:33:57Z

I reinstalled ctransformers 0.2.9 but it only worked when I removed the quotes. I tried fixing it in conda and venv and ran into the same issues. I'm running cuda 11.8 with CUDA version 12.2 . I set threads to 1 and removed gpu_layers and now its basically doing nothing cpu is set to 3% and gpu is at 1%.

Here is how i installed it on venv and conda

set CT_CUBLAS=1
pip install ctransformers>=0.2.9 --no-binary ctransformers

when i check pip list its there under ctransformers 0.2.9

this is the yml

ctransformers:
model: F:\AI_Scripts\chatdocs\model\wizard-mega-13B.ggmlv3.q8_0.bin
model_file: wizard-mega-13B.ggmlv3.q8_0.bin
model_type: llama
config:
context_length: 1024
threads: 1

marella · 2023-06-18T21:49:26Z

You should set gpu_layers also. I just wanted to see if there is any performance difference when you don't set gpu_layers. So you can add it back.

ctransformers:
  config:
    gpu_layers: 100
    threads: 1

By setting both gpu_layers and threads: 1, it should utilize GPU more.

TheFinality · 2023-06-18T23:34:38Z

did this, and same as @LazyCat420 it barely uses cpu now, and it doesn't use gpu either.

nilvaes · 2023-06-19T16:52:51Z

now i think it only uses it for a few seconds, then stops using the gpu. why is that? after this it just hovers around 1-2% usage.

here in gpu tab you can see that its using your gpu vram (dedicated gpu memory usage). I think its the important part. If you use only cpu, you will see that gpu vram tab will stay same, but if you use gpu that line will go higher as in the screenshot.

In short, its working, if its whats expected.
When i used only my cpu i got answers in 70 secs. but after gpu i got answers in 35 or 37 secs. (It might be still slow i have low specs)

TheFinality · 2023-06-19T17:35:06Z

but shouldnt the utilization also go up? in that picture it is still at 1%.

LazyCat420 · 2023-06-19T20:24:11Z

I got gpu to work on GPTQ I would suggest trying that if you haven't yet. It was using 12GB of vram and 95% of the card the whole time. I just followed the instructions to install chatdocs with GPTQ and it worked. Only issue I ran into was I had to reinstall protobuf 3.2 in order to download the model.

This was the yml

gptq:
model: TheBloke/Nous-Hermes-13B-GPTQ
model_file: nous-hermes-13b-GPTQ-4bit-128g.no-act.order.safetensors
device: 0

llm: gptq

nilvaes · 2023-06-21T10:00:12Z

Hey, @TheFinality
So i wanted to be sure and downloaded MSI Afterburner (a software that you can check usage and temperature of your cpu gpu and much more)

In this first picture i haven't started the prompt so the numbers on bottom right corner are low (red is gpu usage; yellow is cpu1 usage; green is cpu1 temperature)

After entering the prompt, it has started to proccessing:

i don't know why but in task manager it doesn't show how much procent gpu is used when i use chatdocs, but when i checked with MSI Afternburner, i saw that red number (gpu usage) stayed high while processing as 73-60-53 etc.

If you're curious as me, you can also try to use the Software (it's free from MSI) or any other apps you want and check again for yourself. And/ Or you can try what @LazyCat420 did.

I'm just explaining what i've learned and experienced as i tested. Hope it helps!

marella · 2023-06-21T19:48:00Z

Thanks @nilvaes for the explanation.

I suggest simply looking at the response generation speed instead of the GPU usage numbers.

Try out both the CPU and GPU configs and see which gives better performance for your system.

CPU config:

ctransformers:
  config:
    gpu_layers: 0
    threads: 4 # set it to the number of physical cores your CPU has

GPU config:

ctransformers:
  config:
    gpu_layers: 100
    threads: 1

You can also try other models like GPTQ as LazyCat420 mentioned and pick the one that works best for your system.

Ananderz · 2023-07-08T18:45:12Z

I finally figured out how to run GGML using GPU.

I had the same issue as all of you where GPU would be at 0-1% use.

I am on Windows 10:

What I did was the following:

Installed visual studio community 2022 and added both C++ packages and python packages.
pip uninstall torch torchvision
pip cache purge
Went to the pytorch website and found my install command based on operating system and added CUDA version 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 (this will be based on your operating system, get this from the pytorch website)
pip uninstall ctransformers --yes
Because I am on windows the command listed in the documentation didn't work so I had to do it this way
set CT_CUBLAS=1
pip install ctransformers --no-binary ctransformers
Add the following to config of ctransformers
config:
context_length: 2048
gpu_layers: 100
threads: 1
max_new_tokens: 512
temperature: 0.1
Add the following under embeddings
embeddings:
model_kwargs:
device: cuda
Run the setup for chatdocs again
chatdocs download
chatdocs ui
It's live with GPU GGML support for both the LLM and Embeddings

Note, you can remove both max_new_tokens and temperature setting from the config. It now works with GGML. Usage and memory of GPU maxes out!

Hope this helps

csocha · 2023-08-26T01:07:01Z

ugh. I followed these steps but no matter what I do I get this error. The file is there. I did get GPU working well wiih oobabooga, but not with this install. It also couldn't find pydantic which it could once I copied it over to the \chatdocs folder. Something very weird is going on and not sure what to do. If I run without CUDA, it works fine, just slow.

FileNotFoundError: Could not find module
'C:\Users\curti\AppData\Local\Programs\Python\Python310\Lib\site-packages\ctransformers\lib\cuda\ctransformers.dll' (or one of
its dependencies). Try using the full path with constructor syntax

marella · 2023-08-27T13:01:31Z

Please run the following command and post the output:

pip show ctransformers nvidia-cuda-runtime-cu12 nvidia-cublas-cu12

Make sure you have installed the CUDA libraries using:

pip install ctransformers[cuda]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

won't use gpu #21

won't use gpu #21

TheFinality commented Jun 16, 2023

marella commented Jun 17, 2023

TheFinality commented Jun 17, 2023

LazyCat420 commented Jun 18, 2023 •

edited

Loading

marella commented Jun 18, 2023 •

edited

Loading

LazyCat420 commented Jun 18, 2023

marella commented Jun 18, 2023

TheFinality commented Jun 18, 2023

nilvaes commented Jun 19, 2023

TheFinality commented Jun 19, 2023

LazyCat420 commented Jun 19, 2023

nilvaes commented Jun 21, 2023 •

edited

Loading

marella commented Jun 21, 2023

Ananderz commented Jul 8, 2023

csocha commented Aug 26, 2023 •

edited

Loading

marella commented Aug 27, 2023

won't use gpu #21

won't use gpu #21

Comments

TheFinality commented Jun 16, 2023

marella commented Jun 17, 2023

TheFinality commented Jun 17, 2023

LazyCat420 commented Jun 18, 2023 • edited Loading

marella commented Jun 18, 2023 • edited Loading

LazyCat420 commented Jun 18, 2023

marella commented Jun 18, 2023

TheFinality commented Jun 18, 2023

nilvaes commented Jun 19, 2023

TheFinality commented Jun 19, 2023

LazyCat420 commented Jun 19, 2023

nilvaes commented Jun 21, 2023 • edited Loading

marella commented Jun 21, 2023

Ananderz commented Jul 8, 2023

csocha commented Aug 26, 2023 • edited Loading

marella commented Aug 27, 2023

LazyCat420 commented Jun 18, 2023 •

edited

Loading

marella commented Jun 18, 2023 •

edited

Loading

nilvaes commented Jun 21, 2023 •

edited

Loading

csocha commented Aug 26, 2023 •

edited

Loading