Refactor ggml.c for future tensor types #1001

sw · 2023-04-15T15:34:02Z

This makes two somewhat tedious changes to ggml.c, that should help us with trying out other tensor types, as has been discussed recently in various issues/PRs.

define a separate QK for each type, in the expectation that there may be different block sizes (e.g. Q4 with FP16 + 16 quants Investigate the performance (speed and perplexity) of Q4_0 with 2x F16 factors #995, or Q2/Q3)
use a default case for all the compute functions

The argument against the latter by @ggerganov was #951 (comment):

It's easier to search for all the places that a type is used.
It does get annoying, so maybe we should reconsider. Overall, there is a lot of room for simplification and reduction of code duplication in ggml.c

My arguments for the change are:

shortens the file by >200 lines
makes diffs less noisy when you add a new type
the assert now also catches invalid values caused by memory corruption or other bugs, instead of the function silently exiting
"all the places that a type is used" -> it's not a "usage" in any practical sense as it just causes an abort().

I have not updated tests/test-quantize.c with QK, as that may be removed in pending #953.

slaren · 2023-04-15T15:46:47Z

I think this is a good change, but I am concerned that we won't be able to change QK without breaking backwards compatibility, and if we ever want to support a different value it will require much deeper changes anyway. I think this could be very easily solved with templates, but with C I am afraid that we may have to choose between converting everything into macro hell or accept the runtime cost of a dynamic QK value. Or just never change QK at all.

sw · 2023-04-15T15:49:49Z

I don't think the idea was to change QK for an existing type, rather to add e.g. Q4_2 which will have its own QK42 != 32

slaren · 2023-04-15T15:56:46Z

Right, I was thinking of @qwopqwop200's implementation that showed some benefits from using a group size of 128 (if I understood that correctly). But as you say that is a completely different use case, my bad.

ggerganov

Wondering if QK4_0, QK4_1 and QK8_0 wouldn't be better - this way I can search for 4_1 for example and get all related functions, usages, etc.

sw · 2023-04-15T16:16:58Z

Wondering if QK4_0, QK4_1 and QK8_0 wouldn't be better

Sure, let's do that. That way we can also support >10 variants without confusion ;-) (at least in the numbering, the size of ggml.c would be another matter)

Refactor ggml.c for future tensor types

472145c

sw requested a review from ggerganov April 15, 2023 15:43

ggerganov approved these changes Apr 15, 2023

View reviewed changes

QK40 -> QK4_0 etc.

524b201

sw merged commit 0ad9646 into ggerganov:master Apr 15, 2023

sw deleted the ggml-refactor branch April 15, 2023 16:25

sw mentioned this pull request Apr 15, 2023

Clean up QK and file and tensor types #678

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor ggml.c for future tensor types #1001

Refactor ggml.c for future tensor types #1001

sw commented Apr 15, 2023 •

edited

Loading

slaren commented Apr 15, 2023

sw commented Apr 15, 2023

slaren commented Apr 15, 2023

ggerganov left a comment

sw commented Apr 15, 2023

Refactor ggml.c for future tensor types #1001

Refactor ggml.c for future tensor types #1001

Conversation

sw commented Apr 15, 2023 • edited Loading

slaren commented Apr 15, 2023

sw commented Apr 15, 2023

slaren commented Apr 15, 2023

ggerganov left a comment

Choose a reason for hiding this comment

sw commented Apr 15, 2023

sw commented Apr 15, 2023 •

edited

Loading