-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phi-3 mini 128k produces gibberish if context >4k tokens #2185
Comments
With VLLM I got the same issue initially - but then was able to figure out that this is due to the FP8 KV cache (see here) - does TGI this by default? because I didn't enable it knowingly. |
Hi, did you manage to solve this with TGI? I am running into the same issue. I am currently running the latest release, 2.2.0: the Phi3-128k support is back, but this issue persists. |
I can't get the phi-3-mini 128k model to publish at all through inference endpoints. Is there a particular tagged version compatible with it? edit: Adding the environment variable |
Hello, is there any fix or configuration for this issue ? |
System Info
GPU: RTX4090
Run 2.1.0 with docker like:
docker run -it --rm --gpus all --ipc=host -p 8080:80 -v /home/jp/.cache/data:/data ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id microsoft/Phi-3-mini-128k-instruct --max-batch-prefill-tokens=8192 --max-total-tokens=8192 --max-input-tokens=8191 --trust-remote-code --revision bb5bf1e4001277a606e11debca0ef80323e5f824 --sharded false
Information
Tasks
Reproduction
Running Phi-3 128k (the old revision as the new one fails - see #2172 ) I get good results as long as total context (input tokens + output tokens) are below 4096.
As soon as Input + Output tokens > 4096, Phi-3 outputs just gibberish, e.g.
,,..,,,,,,,,,,,,,,,,ß,,.s,ß,gen,gen,,,,s,,,,,,,,,,,,,,,,,,,,,,,,,,,o,,,,,,,,,,,,,,,,,,,,,,-hn,.,,,,,,,,,,und,,,,,,,,,,,,,,,,,,,,,,,s,,gen...,
I think there has to be some bug in the rotary embedding implementation, see also #2060 and #2055 .
Expected behavior
Inference works for longer contexts.
The text was updated successfully, but these errors were encountered: