-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Description
This is a re-open of #14113
Name and Version
Affected:
Version at commit: b7a1746
Not affected:
Version at commit: c6a2c9e
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
I had open this bug in oobabooga/text-generation-webui
: oobabooga/text-generation-webui#7060
The issue being that prompt prefixes were no longer being used in the following requests.
I confirmed that --cache-reuse 1
was being passed on, so that wasn't the issue.
After reverting to the previous version of the WebUI (which ships an older version of llama.cpp
), the prompts started to be cached again.
So, this seems to point to being a bug with llama.cpp
.
First Bad Commit
It looks like there may have been a commit between c6a2c9e and b7a1746 that broke --cache-reuse
, or that changed its behavior.
I use the oobabooga/text-generation-webui
project, which bundles a snapshot of llama.cpp
(the commits I mentioned above),
and, from what I saw, this is the command it runs:
llama-server --model user_data/models/gemma-3-12b-it-qat-UD-Q6_K_XL.guff --ctx-size 32768 --gpu-layers 49 --batch-size 256 --port 60033 --no-webui --threads 10 --threads-batch 10 --rope-freq-scale 0.125 --rope-freq-base 1000000.0 --cache-reuse 1
The model can be downloaded from: https://huggingface.co/unsloth/gemma-3-12b-it-qat-GGUF/tree/main
I confirm --cache-reuse 1
is present in the parameters, when the cache isn't working.
I don't know how exactly a Chat Completion call is made to llama-server
, but basically, if I make two chat-instruct calls, where the first 1000 characters or so are exactly the same, the prompt will be processed from byte 0, when using the affected commits I mentioned above.
I'm not providing the cache_prompt
parameter, so it should default to true
.
I hope this helps!
This is still happening when using the version at https://github.com/ggml-org/llama.cpp/tree/90083283ec254fa8d33897746dea229aee401b37
It appears that the fix from #14163 did not fix the cache issue.
Thank you.