Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLBlast support for Mixtral #4451

Closed
paralin opened this issue Dec 13, 2023 · 13 comments
Closed

CLBlast support for Mixtral #4451

paralin opened this issue Dec 13, 2023 · 13 comments
Labels
enhancement New feature or request stale

Comments

@paralin
Copy link

paralin commented Dec 13, 2023

The Mixtral PR has been merged: #4428

Testing latest commit 948ff13 with CLBlast and Mixtral I see some assertion errors:

GGML_ASSERT: ggml.c:7765: ggml_are_same_shape(src0, src1)
GGML_ASSERT: ggml.c:7765: ggml_are_same_shape(src0, src1)
GGML_ASSERT: ggml.c:7765: ggml_are_same_shape(src0, src1)
GGML_ASSERT: ggml.c:7765: ggml_are_same_shape(src0, src1)
GGML_ASSERT: ggml.c:7765: ggml_are_same_shape(src0, src1)
GGML_ASSERT: ggml.c:7765: ggml_are_same_shape(src0, src1)
GGML_ASSERT: ggml.c:7765: ggml_are_same_shape(src0, src1)
GGML_ASSERT: ggml.c:7765: ggml_are_same_shape(src0, src1)

I assume that the CLBlast implementation has not yet been updated to support Mixtral, so filing this issue to request this improvement. Thanks!

@paralin paralin added the enhancement New feature or request label Dec 13, 2023
@clemens98
Copy link

i manged to launch it alright but could not add any input

@paryska99
Copy link

Yeah, the Blas prompt processing seems to not work like it should, hopefuly we'll get a fix for that pretty soon.

@paralin
Copy link
Author

paralin commented Dec 13, 2023

I discovered that all models are broken with CLBlast and git bisected the bad commit, see #4453

Fix: git revert 4d98d9a

@shibe2
Copy link
Collaborator

shibe2 commented Dec 14, 2023

Is this still a problem after 55e87c3?

@kurnevsky
Copy link
Contributor

Somehow it works, but performance is not that great. It makes inference just a bit faster:

CPU

llama_print_timings:        load time =    2068.49 ms
llama_print_timings:      sample time =       2.45 ms /    98 runs   (    0.03 ms per token, 39983.68 tokens per second)
llama_print_timings: prompt eval time =  215288.33 ms /    74 tokens ( 2909.30 ms per token,     0.34 tokens per second)
llama_print_timings:        eval time =   21885.91 ms /    97 runs   (  225.63 ms per token,     4.43 tokens per second)
llama_print_timings:       total time =  240440.55 ms

OpenCL

llama_print_timings:        load time =    6595.12 ms
llama_print_timings:      sample time =       8.04 ms /   342 runs   (    0.02 ms per token, 42547.90 tokens per second)
llama_print_timings: prompt eval time =  195997.10 ms /    74 tokens ( 2648.61 ms per token,     0.38 tokens per second)
llama_print_timings:        eval time =   63946.82 ms /   342 runs   (  186.98 ms per token,     5.35 tokens per second)
llama_print_timings:       total time =  356240.05 ms

(these numbers are for offloading only 14 layers)

@ggerganov
Copy link
Owner

Mixtral support with OpenCL requires some extra work

@shibe2
Copy link
Collaborator

shibe2 commented Dec 14, 2023

It would be nice to have full to-do list specifically for this issue. I see OpenCL-related remarks in ggml.c in ggml_compute_forward_mul_f32, ggml_compute_forward_out_prod_f32, ggml_compute_forward_out_prod_q_f32. Which ones are needed for Mixtral?

@clemens98
Copy link

clemens98 commented Dec 14, 2023

input still does not work . the input bar is blinking and stops blinking when i input something but no text does actually show up pressing enter also does not do anything

thanks to all the people working with this never ending project feels like there are several models released each day that break things or need support

weird was about to shutdown my pc and i realized i didn't shutdown the program and it actually worked but was extremely context focused until it started to ramble about pubs with itself guess it did get thirsty
it was basically answering most questions very well but couldn't handle context changes at all
biggest city it gave a list of biggest city's
biggest mountains gave a list of the biggest mountains
but changing the subject like "who are you" it just continued rambling about city's (the old mistral 7b did handle that question better than expected even finding the context of its own name )

it seams to completely fail to read and understand the bob transcript (which worked for mistral 7b) .
Looks like getting the mountain right was just pure luck

@paryska99
Copy link

input still does not work . the input bar is blinking and stops blinking when i input something but no text does actually show up pressing enter also does not do anything

thanks to all the people working with this never ending project feels like there are several models released each day that break things or need support

weird was about to shutdown my pc and i realized i didn't shutdown the program and it actually worked but was extremely context focused until it started to ramble about pubs with itself guess it did get thirsty it was basically answering most questions very well but couldn't handle context changes at all biggest city it gave a list of biggest city's biggest mountains gave a list of the biggest mountains but changing the subject like "who are you" it just continued rambling about city's (the old mistral 7b did handle that question better than expected even finding the context of its own name )

it seams to completely fail to read and understand the bob transcript (which worked for mistral 7b) . Looks the the getting the mountain right was just pure luck

Yeah for me the CuBlas or any other Blas basically does the same evaluation speed as the cpu (maybe 1 tk/s faster) making mixtral completely unusable

@clemens98
Copy link

shouldn't this be marked has a bug not "enhancement"

@clemens98
Copy link

clemens98 commented Dec 21, 2023

sadly no news anywhere regarding mixtral and clblast and it isn't mentioned on the roadmap either

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
Copy link
Contributor

github-actions bot commented Apr 4, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

6 participants