Support batched generation for llama.cpp #102

the-crypt-keeper · 2023-10-11T23:11:47Z

Batch generation has landed: ggerganov/llama.cpp#3228

This should make our test suite ~10x faster on GGUF models.

the-crypt-keeper · 2023-10-13T17:41:26Z

Looks like the example parallel.cpp has 3 problems:

there's a hard-coded input template
there's a hard-coded reverse prompt (maybe less of an issue):
there is a system prompt evaluated separately from the other prompts and this cannot be disabled

the-crypt-keeper · 2023-12-31T21:29:45Z

Turns out performance of lammacpp batched generation on CUDA devices is pretty awful, top speed-up is ~2x on an A6000. Not going to bother for so little gain.

the-crypt-keeper added the enhancement New feature or request label Oct 11, 2023

the-crypt-keeper changed the title ~~Support batched completions for llama.cpp~~ Support batched generation for llama.cpp Nov 3, 2023

the-crypt-keeper closed this as completed Dec 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support batched generation for llama.cpp #102

Support batched generation for llama.cpp #102

the-crypt-keeper commented Oct 11, 2023

the-crypt-keeper commented Oct 13, 2023 •

edited

Loading

the-crypt-keeper commented Dec 31, 2023

Support batched generation for llama.cpp #102

Support batched generation for llama.cpp #102

Comments

the-crypt-keeper commented Oct 11, 2023

the-crypt-keeper commented Oct 13, 2023 • edited Loading

the-crypt-keeper commented Dec 31, 2023

the-crypt-keeper commented Oct 13, 2023 •

edited

Loading