-
When i trying compile with -DGGML_CUBLAS=ON gpt-neox example run only on cpu. |
Beta Was this translation helpful? Give feedback.
Answered by
ggerganov
May 25, 2023
Replies: 1 comment 1 reply
-
It's possible. You have to offload the tensors used for matrix multiplication to the GPU. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
ArturK-85
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It's possible. You have to offload the tensors used for matrix multiplication to the GPU.
Something like this:
https://github.com/ggerganov/llama.cpp/blob/905d87b70aa189623d500a28602d7a3a755a4769/llama.cpp#L1030-L1056