-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA 12.5 support or GPU acceleration not working after graphics driver update #2394
Comments
I am trying to deploy localai to a ubuntu 24.04 server (proxmox vm) with A2000 passed through and think I am running into the same issue. I initially had 550 drivers installed on the server which corresponded to what I see in nvidia-smi when localai starts, but have also purged nvidia drivers and put the server back to 535 drivers. Regardless I get this message in the lags when attempting to use a GPU model: I have tried images tagged master-aio-gpu-nvidia-cuda-12, master-aio-gpu-nvidia-cuda-11, master-cublas-cuda12-ffmpeg, and have also tried with the env variable REBUILD=true. I am currently able to run |
Same here, I tried the docker version without success Logs here
|
confirm, only cpu |
Same issue. spent over 2 days trying to figure out what happened till I found this issue. |
Error : INF GPU device found but no CUDA backend present. I think , I had found the reason! IF YOU DID NOT MAKE TO DIST, llama-cpp-cuda New version has changed backends so much. And did not update the documents. New version 's make file, here |
I can confirm.. Running in WSL 2. I tried rebuilding from source and during the build it states that CUDA was found but falls back to AVX2 when loading model. Downgrading the drivers to 551.86 "fixes" the issue. |
Yes, I also moved to an older version of the driver for now. |
Upgrade to latest CUDA toolkit can fix it. Driver Version: 555.42.02 CUDA Version: 12.5 $ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0 |
Will you elaborate a bit on your setup (OS, repos nvidia drivers installed from) because I have exactly the same CUDA as yours, but still no joy?
Ubuntu 22.04 |
I use runfile to install toolkit instead of apt. LocalAI v2.16.0 |
I have the 555.42 and CUDA 12.5 running and working everywhere except for localai.
Now with all of that in place and a reboot of the system for good measure I am getting this when running localai/localai:master-aio-gpu-nvidia-cuda-12
|
I'm on Ubuntu 22.04
everything works just fine. Versions I've got:
|
Also have the "GPU device found but no CUDA backend present" issue:
docker exec -it localai bash
root@localai:/build# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
root@localai:/build# nvidia-smi
Tue Jun 18 10:39:30 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 6000 Ada Gene... Off | 00000000:55:00.0 Off | Off |
| 30% 36C P8 28W / 300W | 7024MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+ |
for a reason in auto detection does not call the following from makefile i dont know where and how the autodetection happens https://github.com/mudler/LocalAI/blob/master/Makefile
to make it work temporarily i used
note: nvidia-smi both host and inside the container use cuda 12.5 hope helps someone to find a way to autodetect and make the above at the normal rebuild process or in the predefined container |
Thanks, @MCP-LTS, this made CUDA work for me inside the LocalAI container! Now we just need this to be fixed inside the official images since rebuilding took hours on my (actually pretty beefy) AI server. |
Thank you @MCP-LTS, this also works on with v2.17.1. |
I found another Github issue for Ollama that seems to be related. ollama/ollama#4563 (comment). Seems that the new Nvidia driver doesn't load the necessary kernel module in Linux. I have not tested this out with LocalAI yet on my Linux deployments. I also run LocalAI on Windows WSL2 with Docker Desktop and was having the same issue. In the same thread as before it mentions updates to Docker Dektop. I updated Docker Desktop to 4.31.1 (https://docs.docker.com/desktop/release-notes/) and it finally works with the latest drivers (555.99) So... For anyone out there that is running this in WSL2, try updating Docker Desktop. |
@MCP-LTS I followed your workaround and the build seemed to succeed. However when I try to chat with a model I get the following error:
Running LocalAI on a K8S node. On the node:
In the container:
Any suggestions? |
I can't successfully build llama-cuda inside the container. I prefer it to be delivered precompiled within the images. When will this be fixed? |
To avoid repeated rebuild I use this dockerfile to build a new image and it works.
|
I can confirm that this is still an issue with v2.18.1. Using @ER-EPR's solution for the time being. |
I use docker container, when I add gpu configuration in docker-compose.yaml, it works fine for me. deploy:
resources:
reservations:
devices:
- capabilities: [gpu] |
EDIT: Sorry, this doesn't work! It doesn't complain, but doesn't utilize the GPU at all. :/ Former not working workaround: I created a Dockerfile based on @MCP-LTS tips, slightly different than @ER-EPR's one to use the latest versions. Had a clean installation of Ubuntu 24.04. First I purged all nvidia drivers:
Then installed from scrath:
Some Docker Stuff: sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker rootless mode: nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
systemctl --user restart docker
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place also: sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
sudo nvidia-ctk runtime configure --runtime=containerd
sudo systemctl restart containerd check for version: docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu20.04 nvidia-smi I added this: vi /etc/nvidia-container-runtime/config.toml
# change no-cgroups=true to no-cgroups=false then Cuda Toolkit installation
nvidia drivers: First try with the open kernel module drivers:
Disable IOMMU:
LocalAI Get the GPUs ID:
Build a custom image, as long the CUDA detection / compilation does not work in the original image: Create a docker file ~/Documents/LocalAI/CUDA-not-found-workaround/Dockerfile:
In the same directory build the image:
Start with: docker volume create localai-models
docker rm local-ai; docker run -p 8080:8080 --name local-ai -ti -v localai-models:/build/models -e NVIDIA_VISIBLE_DEVICES=GPU-233d81ca-903f-0195-63b2-798f5fb087eb --runtime=nvidia --memory=16g local-ai-cuda-hack-v2.18.1-cublas-cuda12-ffmpeg --context-size 1000 --threads 8 check in container: Looks good as well:
I am not sure, whether everything done here is necessary, but this finally made it work for me for gpt-4 at least. This is the startup log, there is no nvidi-smi output (before I had a nvidia-smi output there):
|
Having the same issue in the latest LocalAI version with Ubuntu 22.04.4 LTS, CUDA 12.4 and Nvidia drivers v550. I also tried upgrading to CUDA 12.5 and drivers v555, but still doesn't work. |
can someone help me in testing the image referenced in #2994 (comment) ? I could test only with CUDA 12.2 and 12.4 and seems to work perfectly fine - I miss a testbed for 12.5. Container image: |
I am having this problem since some old versions. Tested 2994PR image (changed docker compose image). Keep getting Log:
Getting same problem:Host info:
Inside docker:
|
You can ignore that message - it's a red herring because container images do not have the CUDA LocalAI alternative binary, however they are built with nvcc so are already working for Nvidia GPUs - I agree the message should be suppressed in that case and it is misleading, that's where we have to improve logging. If you could paste the logs with |
Looks like it is working, speed 41 t/s looks like GPU usage.
|
What do you mean it's working? @mudler it's not working here? Can you say why it is working so I may be able to fix my setup? |
Hey there,
I'm running
LocalAI version:
docker run --rm -ti --gpus all -p 8080:8080 -e DEBUG=true -v $PWD/models:/models --name local-ai localai/localai:latest-aio-gpu-nvidia-cuda-12 --models-path /models --context-size 1000 --threads 14
LocalAI version: v2.15.0 (f69de3b)
Environment, CPU architecture, OS, and Version:
13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz, on Windows 11 with Docker for Windows.
Describe the bug
I get this debug message right before the model is loaded.
stderr ggml_cuda_init: failed to initialize CUDA: named symbol not found
Which indicated to me that the models will not use GPU support. However, this worked just fine before.
After updating the graphics driver, the CUDA version was changed, too. From CUDA version 12.4 to 12.5. It seems like the CUDA environment is no longer used by any LLM. However, the CUDA version is detected correctly when starting the LocalAI Docker container.
Instead of utilizing the GPU, the application uses the fallback and runs only on the CPU.
To Reproduce
Expected behavior
Utilizing the GPU.
Logs
Here are the full logs for the
mistral-7b-instruct-v0.1.Q5_K_M.gguf
model, but I tried several models that worked before. None utilize the GPU after installing the new graphics driver.localai.log
Additional context
Also checking in the task manager shows that there is no GPU usage taking place.
The text was updated successfully, but these errors were encountered: