[WIP]: Fix #286 #317

goerch · 2023-06-28T12:51:36Z

Based on the fix for #292 here a try to fix #286 (with the caveat: tested only on CPU). The patch adds the llama.cpp tests for quantization.

One question I'm pondering is if we should get rid of the *_reference functions or add some more?

ggerganov · 2023-07-02T16:13:59Z

Thanks! Let's rebase on latest master so that the diff takes into account the new macros from #292

Btw, there has been recently an update to ggerganov/llama.cpp#1237
Should synchronize with @sw

One question I'm pondering is if we should get rid of the *_reference functions or add some more?

We need them to have a deterministic way of generating models.
Add more in what sense?

goerch · 2023-07-02T17:12:35Z

Thanks! Let's rebase on latest master so that the diff takes into account the new macros from #292

Should be done.

Btw, there has been recently an update to ggerganov/llama.cpp#1237 Should synchronize with @sw

Will look into this.

One question I'm pondering is if we should get rid of the *_reference functions or add some more?

We need them to have a deterministic way of generating models. Add more in what sense?

We have something like

        .quantize_row_q           = quantize_row_q4_0,
        .quantize_row_q_reference = quantize_row_q4_0_reference,

but the original patch included

        .quantize_row_q           = ggml_fp32_to_fp16_row,
        .quantize_row_q_reference = ggml_fp32_to_fp16_row,

We could define ggml_fp32_to_fp16_row_reference, too. But I also imagine that dropping '.quantize_row_q_reference' member from the struct could work?

goerch added 3 commits June 26, 2023 20:26

[WIP] Fix ggerganov#292

5fec682

Further code reduction

061365f

Fix ggerganov#286

58ed3ad

goerch added 2 commits July 2, 2023 18:57

Merge branch 'master' into fix-ggerganov#286

f863199

Fix merge bug

c6cf633

goerch changed the title ~~Fix #286~~ [WIP]: Fix #286 Jul 2, 2023

goerch mentioned this pull request Jul 2, 2023

Generalize quantize_fns for simpler FP16 handling ggerganov/llama.cpp#1237

Merged

goerch closed this Jul 5, 2023

ggerganov mentioned this pull request Jul 5, 2023

ggml : generalize quantize_fns for simpler FP16 handling #286

Closed

goerch deleted the fix-#286 branch September 20, 2023 07:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]: Fix #286 #317

[WIP]: Fix #286 #317

goerch commented Jun 28, 2023 •

edited

Loading

ggerganov commented Jul 2, 2023

goerch commented Jul 2, 2023

[WIP]: Fix #286 #317

[WIP]: Fix #286 #317

Conversation

goerch commented Jun 28, 2023 • edited Loading

ggerganov commented Jul 2, 2023

goerch commented Jul 2, 2023

goerch commented Jun 28, 2023 •

edited

Loading