ggml : add Q8_0 quantization for intermediate results (#951)

* ggml : add Q8_0 quantization for intermediate results * quantize-stats : fix test + add it to Makefile default * Q8: use int8_t, AVX/AVX2 optimizations * ggml : fix quantize_row_q8_0() ARM_NEON rounding * minor : updates after rebase to latest master * quantize-stats : delete obsolete strings * ggml : fix q4_1 dot func --------- Co-authored-by: Stephan Walter <stephan@walter.name>
ggerganov · Apr 15, 2023 · e95b655 · e95b655
1 parent aa485ce
commit e95b655
Show file tree

Hide file tree

Showing 3 changed files with 442 additions and 18 deletions.
diff --git a/Makefile b/Makefile
@@ -133,7 +133,7 @@ $(info I CC:       $(CCV))
 $(info I CXX:      $(CXXV))
 $(info )
 
-default: main quantize perplexity embedding
+default: main quantize quantize-stats perplexity embedding
 
 #
 # Build library