[BUG] CLBLast generate garbage text on q8.0 models #1525

CAHbKA-IV · 2023-05-19T15:06:43Z

CLBlast version (device AMD RX6800XT), for the Q8_0 models generate garbage result:

main.exe --ctx_size 2048 --temp 0.74 --top_k 40 --top_p 0.5 --repeat_last_n 192 --repeat_penalty 1.4 --batch_size 256 --threads 24 --n_predict 2048 --color --interactive -ins --interactive-first -m VicUnlocked-30B-LoRA.ggml.q8_0.bin -s 1
main: build = 561 (5ea4339)
main: seed  = 1
llama.cpp: loading model from VicUnlocked-30B-LoRA.ggml.q8_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 135.75 KB
llama_model_load_internal: mem required  = 37206.11 MB (+ 3124.00 MB per state)

Initializing CLBlast (First Run)...
Attempting to use: Platform=0, Device=0 (If invalid, program will crash)
Using Platform: AMD Accelerated Parallel Processing Device: gfx1030
llama_init_from_file: kv self size  = 3120.00 MB

system_info: n_threads = 24 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: repeat_last_n = 192, repeat_penalty = 1.400000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.500000, typical_p = 1.000000, temp = 0.740000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 256, n_predict = 2048, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> My name is Alex. Your name is Lion. You are my personal AI assistant.
 Хронологија Awosiicherсти agesppe Mas Schmidtlichelackadalablo(@" Dynam Terminalairecompatchiaadre arrestilor CTommPRdaggerzilass Howard Sang PDF shadow SM >> Chal Byte Naval FAlaus changing hayoux ba bunchrokенrundUSEetch thrustREodortMR Spirit civ dig glob Tow agents
>

llama_print_timings:        load time =  9065.74 ms
llama_print_timings:      sample time =    22.36 ms /    60 runs   (    0.37 ms per token)
llama_print_timings: prompt eval time = 18715.84 ms /    39 tokens (  479.89 ms per token)
llama_print_timings:        eval time = 58273.65 ms /    60 runs   (  971.23 ms per token)
llama_print_timings:       total time = 113280.74 ms
Terminate batch job (Y/N)? y

The text was updated successfully, but these errors were encountered:

SlyEcho · 2023-05-19T20:24:41Z

Fix is in progress at #1435

SlyEcho · 2023-05-20T15:14:46Z

Can you try again?

CAHbKA-IV · 2023-05-21T16:21:47Z

Now it crushes while loading q8.0 models, mostly silently or show tensor errors on some models:

main.exe --ctx_size 2048 --temp 0.74 --top_k 40 --top_p 0.5 --repeat_last_n 192 --repeat_penalty 1.4 --batch_size 256 --threads 24 --n_predict 2048 --color --interactive -ins --interactive-first -m VicUnlocked-30B-LoRA.ggml.q8_0.bin -s 1
main: build = 582 (7780e4f)
main: seed  = 1
llama.cpp: loading model from VicUnlocked-30B-LoRA.ggml.q8_0.bin
error loading model: llama.cpp: tensor ' +    s  93♣:↨¶ a-                                                                                                                                                                                         ►%♠  ↑Y ◄► 8Ɓ▼0 & M,▬ 9 4 ↔☻   ¶                                                                                                                                                                                                                     ♠"↕/↑ ∟@ չ"*+                                                                                                                                                                                                                                    c 5 ↑  ↓ ↑ ♥   9☻ >+n  !      O Y   7@́    > i4 9☺  ♦☺   ( +2    AgI 3                                                                                                                                                                               ‼‼s    b9                                                                                                                                                                                                                                        ►!&oND 7  ☻ Qa  '|JeS F<$ / P 9      >→  §▬                                                                                                                                                                                                                                                                                                                                                                                                                                                           !    ☻9 ά  ;♣.∟ 9     6 ▬                                                                                                                                                                                                                          0"  9    / !       7 J 6 <ӳ"   ]' should not be 2563577093-dimensional
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'VicUnlocked-30B-LoRA.ggml.q8_0.bin'
main: error: unable to load model

Silent crush example:

main.exe --ctx_size 2048 --temp 0.74 --top_k 40 --top_p 0.5 --repeat_last_n 192 --repeat_penalty 1.4 --batch_size 256 --threads 24 --n_predict 2048 --color --interactive -ins --interactive-first -m Manticore-13B.ggmlv2.q8_0.bin -s 1
main: build = 582 (7780e4f)
main: seed  = 1
llama.cpp: loading model from Manticore-13B.ggmlv2.q8_0.bin

Also it represents with AVX2 executable, so like q8.0 support is broken now.

SlyEcho · 2023-05-21T16:57:01Z

The model file could be out-of-date, the quantization formats were changed just recently. You can try Q5_0 or Q5_1 if you have those.

Otherwise, #1459 will also change the OpenCL version a lot, so it may fix your issue.

akx · 2023-05-22T06:47:31Z

@CAHbKA-IV I got the same error; see #1508 & #1559...

akx mentioned this issue May 22, 2023

ggjt v2 models don't load (or error gracefully) #1559

Closed

CAHbKA-IV closed this as completed May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] CLBLast generate garbage text on q8.0 models #1525

[BUG] CLBLast generate garbage text on q8.0 models #1525

CAHbKA-IV commented May 19, 2023

SlyEcho commented May 19, 2023

SlyEcho commented May 20, 2023

CAHbKA-IV commented May 21, 2023 •

edited

Loading

SlyEcho commented May 21, 2023

akx commented May 22, 2023 •

edited

Loading

[BUG] CLBLast generate garbage text on q8.0 models #1525

[BUG] CLBLast generate garbage text on q8.0 models #1525

Comments

CAHbKA-IV commented May 19, 2023

SlyEcho commented May 19, 2023

SlyEcho commented May 20, 2023

CAHbKA-IV commented May 21, 2023 • edited Loading

SlyEcho commented May 21, 2023

akx commented May 22, 2023 • edited Loading

CAHbKA-IV commented May 21, 2023 •

edited

Loading

akx commented May 22, 2023 •

edited

Loading