CUDA Forward Compatibility on non supported HW #234

Pipboyguy · 2023-05-18T13:34:03Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

There's no tagged cuda image on ghcr so after buildting the Dockerfile.cuda image,

docker run --gpus=all --rm -it -p 8000:8000 -v /home/***/models:/models -e MODEL=/models/GPT4-X-Alpasta-30b_q4_0.bin llama_cpp_server_cuda

Current Behavior

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

llama.cpp: loading model from /models/GPT4-X-Alpasta-30b_q4_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32016
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 135.75 KB
llama_model_load_internal: mem required  = 21695.61 MB (+ 3124.00 MB per state)
WARNING: failed to allocate 0.13 MB of pinned memory: forward compatibility was attempted on non supported HW
CUDA error 804 at ggml-cuda.cu:405: forward compatibility was attempted on non supported HW

The text was updated successfully, but these errors were encountered:

abetlen · 2023-05-18T13:38:14Z

@Pipboyguy a couple things to test, is what version of CUDA do you have installed on your base os? Also, this may sound stupid but most bugs with CUDA often are, have you tried rebooting since the most recent CUDA installation?

Pipboyguy · 2023-05-18T13:46:05Z

llama_cpp.server works perfectly outside of docker with virtualenv so I believe this is isolated to the nvidia-docker setup.

Here's some info about my host system:

Driver Version: 525.105.17
CUDA Version: 12.0
OS: Ubuntu 22.04

Pipboyguy · 2023-05-18T14:14:02Z

Please see PR #235. Eliminates issue

Pipboyguy · 2023-05-18T14:20:44Z

This only eliminates it for me so would be nice to get more testers

gjmulder · 2023-05-18T15:01:13Z

Exactly what NVidia GPU are you having issues with? 12.1 seems to support my ancient GTX 1080Ti (Pascal architecture). I have a 980Ti (Maxwell) somewhere, but I'd have to plug it in and hope it still works.

CUDA Compatibility Matrix

Pipboyguy · 2023-05-18T15:02:30Z

Exactly what NVidia GPU are you having issues with? 12.1 seems to support my ancient GTX 1080Ti (Pascal architecture). I have a 980Ti (Maxwell) somewhere, but I'd have to plug it in and hope it still works.

CUDA Compatibility Matrix

Running a RTX 4080. Outdated drivers perhaps?

gjmulder · 2023-05-18T15:12:59Z

FROM nvidia/cuda:12.1.1-devel-ubuntu20.04

is working with my 3090Ti.

My Ubuntu Docker host:

$ uname -a
Linux asushimu 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue
Ubuntu 22.04.2 LTS \n \l

$ dpkg -l | grep "^ii.*nvidia-driver"
ii  nvidia-driver-530                      530.30.02-0ubuntu1                      amd64        NVIDIA driver metapackage

Pipboyguy · 2023-05-18T15:17:54Z

My Pop!_OS host:

$ uname -a
Linux workstation 6.2.6-76060206-generic #202303130630~1683753207~22.04~77c1465 SMP PREEMPT_DYNAMIC Wed M x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue
Pop!_OS 22.04 LTS

$ dpkg -l | grep "^ii.*nvidia-driver"
ii  nvidia-driver-525                       525.105.17-1pop0~1681323337~22.04~22e0810                                       amd64        NVIDIA driver metapackage

gjmulder · 2023-05-23T10:00:03Z

Should be addressed in #258.

Pipboyguy · 2023-05-23T11:27:21Z

shall we close this issue?

d0rc · 2023-06-07T02:27:45Z

Just got it on 4090 with last master:

Commit: 2d7bf110edd8c49209401a16132052cba706ffd0
Built with: make LLAMA_CUBLAS=1

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

uname -a
Linux ddf7cfbdde40 5.15.0-73-generic #80~20.04.1-Ubuntu SMP Wed May 17 14:58:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

llama.cpp/main -m ./guanaco-65B.ggmlv3.q4_0.bin -p "hello!"
main: build = 631 (2d7bf11)
main: seed  = 1686104573
CUDA error 804 at ggml-cuda.cu:1039: forward compatibility was attempted on non supported HW

gjmulder · 2023-06-07T09:10:25Z

Are you running in a virtualized environment and trying to access your NVidia GPU?

If so, try and update your nvidia driver in your VM or Docker instance.

Pipboyguy · 2023-06-07T16:00:40Z

What worked for me was upgrading my nvidia-driver on the host, then Cuda version 12.1 should work. Also try CUDA 11.7 if upgrading nvidia driver is pain. Very likely the issue in your case as well @d0rc

gjmulder · 2023-06-07T16:31:55Z

What worked for me was upgrading my nvidia-driver on the host, then Cuda version 12.1 should work. Also try CUDA 11.7 if upgrading nvidia driver is pain. Very likely the issue in your case as well @d0rc

That makes more sense. I can see that an older driver in the VM works fine with a newer driver on the host, but not vice versa.

gjmulder added bug Something isn't working enhancement New feature or request hardware Hardware specific issue labels May 18, 2023

gjmulder mentioned this issue May 19, 2023

Precompiled wheels with CuBLAS activated #243

Closed

gjmulder closed this as completed May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Forward Compatibility on non supported HW #234

CUDA Forward Compatibility on non supported HW #234

Pipboyguy commented May 18, 2023

abetlen commented May 18, 2023

Pipboyguy commented May 18, 2023 •

edited

Loading

Pipboyguy commented May 18, 2023

Pipboyguy commented May 18, 2023

gjmulder commented May 18, 2023

Pipboyguy commented May 18, 2023

gjmulder commented May 18, 2023

Pipboyguy commented May 18, 2023

gjmulder commented May 23, 2023

Pipboyguy commented May 23, 2023

d0rc commented Jun 7, 2023

gjmulder commented Jun 7, 2023

Pipboyguy commented Jun 7, 2023

gjmulder commented Jun 7, 2023

CUDA Forward Compatibility on non supported HW #234

CUDA Forward Compatibility on non supported HW #234

Comments

Pipboyguy commented May 18, 2023

Prerequisites

Expected Behavior

Current Behavior

abetlen commented May 18, 2023

Pipboyguy commented May 18, 2023 • edited Loading

Pipboyguy commented May 18, 2023

Pipboyguy commented May 18, 2023

gjmulder commented May 18, 2023

Pipboyguy commented May 18, 2023

gjmulder commented May 18, 2023

Pipboyguy commented May 18, 2023

gjmulder commented May 23, 2023

Pipboyguy commented May 23, 2023

d0rc commented Jun 7, 2023

gjmulder commented Jun 7, 2023

Pipboyguy commented Jun 7, 2023

gjmulder commented Jun 7, 2023

Pipboyguy commented May 18, 2023 •

edited

Loading