Skip to content

Commit

Permalink
ggml : use 8-bit precision for Q4_1 intermediate results (#1047)
Browse files Browse the repository at this point in the history
* ggml : use 8-bit precision for Q4_1 intermediate results (ARM)

* ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32

56 ms/token with Q4_1 !

* ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051)

* gitignore : ignore ppl-*.txt files

---------

Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
  • Loading branch information
ggerganov and slaren authored Apr 19, 2023
1 parent 7cd5c4a commit 884e7d7
Show file tree
Hide file tree
Showing 2 changed files with 192 additions and 194 deletions.
15 changes: 8 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
*.o
*.a
.DS_Store
.build/
.cache/
.direnv/
.envrc
.swiftpm
.venv
.vs/
.vscode/
.DS_Store

.build/
build/
build-em/
build-debug/
Expand All @@ -30,12 +34,9 @@ models/*
arm_neon.h
compile_commands.json

.envrc
.direnv/

.venv
__pycache__
.swiftpm

zig-out/
zig-cache/

ppl-*.txt
Loading

0 comments on commit 884e7d7

Please sign in to comment.