Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CLBLast generate garbage text on q8.0 models #1525

Closed
CAHbKA-IV opened this issue May 19, 2023 · 5 comments
Closed

[BUG] CLBLast generate garbage text on q8.0 models #1525

CAHbKA-IV opened this issue May 19, 2023 · 5 comments

Comments

@CAHbKA-IV
Copy link

CLBlast version (device AMD RX6800XT), for the Q8_0 models generate garbage result:

main.exe --ctx_size 2048 --temp 0.74 --top_k 40 --top_p 0.5 --repeat_last_n 192 --repeat_penalty 1.4 --batch_size 256 --threads 24 --n_predict 2048 --color --interactive -ins --interactive-first -m VicUnlocked-30B-LoRA.ggml.q8_0.bin -s 1
main: build = 561 (5ea4339)
main: seed  = 1
llama.cpp: loading model from VicUnlocked-30B-LoRA.ggml.q8_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 135.75 KB
llama_model_load_internal: mem required  = 37206.11 MB (+ 3124.00 MB per state)

Initializing CLBlast (First Run)...
Attempting to use: Platform=0, Device=0 (If invalid, program will crash)
Using Platform: AMD Accelerated Parallel Processing Device: gfx1030
llama_init_from_file: kv self size  = 3120.00 MB

system_info: n_threads = 24 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: repeat_last_n = 192, repeat_penalty = 1.400000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.500000, typical_p = 1.000000, temp = 0.740000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 256, n_predict = 2048, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> My name is Alex. Your name is Lion. You are my personal AI assistant.
 Хронологија Awosiicherсти agesppe Mas Schmidtlichelackadalablo(@" Dynam Terminalairecompatchiaadre arrestilor CTommPRdaggerzilass Howard Sang PDF shadow SM >> Chal Byte Naval FAlaus changing hayoux ba bunchrokенrundUSEetch thrustREodortMR Spirit civ dig glob Tow agents
>

llama_print_timings:        load time =  9065.74 ms
llama_print_timings:      sample time =    22.36 ms /    60 runs   (    0.37 ms per token)
llama_print_timings: prompt eval time = 18715.84 ms /    39 tokens (  479.89 ms per token)
llama_print_timings:        eval time = 58273.65 ms /    60 runs   (  971.23 ms per token)
llama_print_timings:       total time = 113280.74 ms
Terminate batch job (Y/N)? y
@SlyEcho
Copy link
Collaborator

SlyEcho commented May 19, 2023

Fix is in progress at #1435

@SlyEcho
Copy link
Collaborator

SlyEcho commented May 20, 2023

Can you try again?

@CAHbKA-IV
Copy link
Author

CAHbKA-IV commented May 21, 2023

Now it crushes while loading q8.0 models, mostly silently or show tensor errors on some models:

main.exe --ctx_size 2048 --temp 0.74 --top_k 40 --top_p 0.5 --repeat_last_n 192 --repeat_penalty 1.4 --batch_size 256 --threads 24 --n_predict 2048 --color --interactive -ins --interactive-first -m VicUnlocked-30B-LoRA.ggml.q8_0.bin -s 1
main: build = 582 (7780e4f)
main: seed  = 1
llama.cpp: loading model from VicUnlocked-30B-LoRA.ggml.q8_0.bin
error loading model: llama.cpp: tensor ' +    s  93♣:↨¶ a-                                                                                                                                                                                         ►%♠  ↑Y ◄► 8Ɓ▼0 & M,▬ 9 4 ↔☻   ¶                                                                                                                                                                                                                     ♠"↕/↑ ∟@ չ"*+                                                                                                                                                                                                                                    c 5 ↑  ↓ ↑ ♥   9☻ >+n  !      O Y   7@́    > i4 9☺  ♦☺   ( +2    AgI 3                                                                                                                                                                               ‼‼s    b9                                                                                                                                                                                                                                        ►!&oND 7  ☻ Qa  '|JeS F<$ / P 9      >→  §▬                                                                                                                                                                                                                                                                                                                                                                                                                                                           !    ☻9 ά  ;♣.∟ 9     6 ▬                                                                                                                                                                                                                          0"  9    / !       7 J 6 <ӳ"   ]' should not be 2563577093-dimensional
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'VicUnlocked-30B-LoRA.ggml.q8_0.bin'
main: error: unable to load model

Silent crush example:

main.exe --ctx_size 2048 --temp 0.74 --top_k 40 --top_p 0.5 --repeat_last_n 192 --repeat_penalty 1.4 --batch_size 256 --threads 24 --n_predict 2048 --color --interactive -ins --interactive-first -m Manticore-13B.ggmlv2.q8_0.bin -s 1
main: build = 582 (7780e4f)
main: seed  = 1
llama.cpp: loading model from Manticore-13B.ggmlv2.q8_0.bin

Also it represents with AVX2 executable, so like q8.0 support is broken now.

@SlyEcho
Copy link
Collaborator

SlyEcho commented May 21, 2023

The model file could be out-of-date, the quantization formats were changed just recently. You can try Q5_0 or Q5_1 if you have those.

Otherwise, #1459 will also change the OpenCL version a lot, so it may fix your issue.

@akx
Copy link
Contributor

akx commented May 22, 2023

@CAHbKA-IV I got the same error; see #1508 & #1559...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants