llama.cpp: enable Kompute support for 10 more model architectures #2005

cebtenzzre · 2024-02-22T19:18:38Z

These are Baichuan, Bert and Nomic Bert, CodeShell, GPT-2, InternLM, MiniCPM, Orion, Qwen, and StarCoder. The output of each of these has been checked subjectively for accuracy in Q4_0 format.

The only models that we don't support right now are because of missing ops - GGML_OP_ALIBI for BLOOM, MPT, and Refact, and GGML_OP_CONCAT for Persimmon. Upstream support for PLaMo seems to be broken (ggerganov/llama.cpp#5669).

These are Baichuan, Bert and Nomic Bert, CodeShell, GPT-2, InternLM, MiniCPM, Orion, Qwen, and StarCoder. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llama.cpp: enable Kompute support for 10 more model architectures

213c565

These are Baichuan, Bert and Nomic Bert, CodeShell, GPT-2, InternLM, MiniCPM, Orion, Qwen, and StarCoder. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre requested a review from manyoso February 22, 2024 19:18

manyoso approved these changes Feb 22, 2024

View reviewed changes

cebtenzzre merged commit 88e330e into main Feb 22, 2024
6 of 17 checks passed

cebtenzzre mentioned this pull request Feb 22, 2024

Support for QWEN and Baichuan2 models #1731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp: enable Kompute support for 10 more model architectures #2005

llama.cpp: enable Kompute support for 10 more model architectures #2005

cebtenzzre commented Feb 22, 2024 •

edited

Loading

llama.cpp: enable Kompute support for 10 more model architectures #2005

llama.cpp: enable Kompute support for 10 more model architectures #2005

Conversation

cebtenzzre commented Feb 22, 2024 • edited Loading

cebtenzzre commented Feb 22, 2024 •

edited

Loading