Commit "CUDA: Quantized matrix matrix multiplication" causes assert "ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1" on Windows when vocab_size != 32000 #2484

dranger003 · 2023-08-01T22:57:50Z

Using CUDA on Windows when model vocab_size != 32000, inference crashes immediately with:

ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1

See #2160 (comment) for more details.
Reverting to commit before 11f3ca0 resolves the issue.
Also, the workaround proposed in #2160 (comment) appears to work (at least for me).

The text was updated successfully, but these errors were encountered:

mirek190 · 2023-08-02T01:02:33Z

The same problem

My arguments - model is llama2 variant 13B

main --model models\new2\newhope.ggmlv3.q4_K_M.bin --mlock --color --threads 30 --keep -1 --batch_size 512 --n_predict -1 --top_k 10000 --top_p 0.9 --temp 0.96 --repeat_penalty 1.1 --ctx_size 4096 --interactive --instruct --reverse-prompt "### Human:" --reverse-prompt "### User:" --reverse-prompt "### Assistant:" -ngl 43

ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1
PS E:\LLAMA\llama.cpp>

without -ngl parameter is working.

dranger003 · 2023-08-02T01:58:31Z

It appears PR #2480 solves this issue.

mirek190 · 2023-08-02T12:01:49Z

Still not merged ....

dranger003 · 2023-08-02T15:09:36Z

Confirmed latest commit 4f6b60c resolves the issue on my end.

JohannesGaessler closed this as completed Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit "CUDA: Quantized matrix matrix multiplication" causes assert "ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1" on Windows when vocab_size != 32000 #2484

Commit "CUDA: Quantized matrix matrix multiplication" causes assert "ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1" on Windows when vocab_size != 32000 #2484

dranger003 commented Aug 1, 2023 •

edited

Loading

mirek190 commented Aug 2, 2023

dranger003 commented Aug 2, 2023

mirek190 commented Aug 2, 2023

dranger003 commented Aug 2, 2023

Commit "CUDA: Quantized matrix matrix multiplication" causes assert "ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1" on Windows when vocab_size != 32000 #2484

Commit "CUDA: Quantized matrix matrix multiplication" causes assert "ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1" on Windows when vocab_size != 32000 #2484

Comments

dranger003 commented Aug 1, 2023 • edited Loading

mirek190 commented Aug 2, 2023

dranger003 commented Aug 2, 2023

mirek190 commented Aug 2, 2023

dranger003 commented Aug 2, 2023

dranger003 commented Aug 1, 2023 •

edited

Loading