Phi-3 mini 128k produces gibberish if context >4k tokens #2185

jphme · 2024-07-04T08:37:20Z

System Info

GPU: RTX4090

Run 2.1.0 with docker like:
docker run -it --rm --gpus all --ipc=host -p 8080:80 -v /home/jp/.cache/data:/data ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id microsoft/Phi-3-mini-128k-instruct --max-batch-prefill-tokens=8192 --max-total-tokens=8192 --max-input-tokens=8191 --trust-remote-code --revision bb5bf1e4001277a606e11debca0ef80323e5f824 --sharded false

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Running Phi-3 128k (the old revision as the new one fails - see #2172 ) I get good results as long as total context (input tokens + output tokens) are below 4096.

As soon as Input + Output tokens > 4096, Phi-3 outputs just gibberish, e.g.
,,..,,,,,,,,,,,,,,,,ß,,.s,ß,gen,gen,,,,s,,,,,,,,,,,,,,,,,,,,,,,,,,,o,,,,,,,,,,,,,,,,,,,,,,-hn,.,,,,,,,,,,und,,,,,,,,,,,,,,,,,,,,,,,s,,gen...,

I think there has to be some bug in the rotary embedding implementation, see also #2060 and #2055 .

Expected behavior

Inference works for longer contexts.

The text was updated successfully, but these errors were encountered:

jphme · 2024-07-04T09:45:27Z

With VLLM I got the same issue initially - but then was able to figure out that this is due to the FP8 KV cache (see here) - does TGI this by default? because I didn't enable it knowingly.

annadmitrieva · 2024-07-31T17:22:22Z

Hi, did you manage to solve this with TGI? I am running into the same issue. I am currently running the latest release, 2.2.0: the Phi3-128k support is back, but this issue persists.

ytjhai · 2024-08-12T21:44:15Z

I can't get the phi-3-mini 128k model to publish at all through inference endpoints. Is there a particular tagged version compatible with it?

edit: Adding the environment variable TRUST_REMOTE_CODE and setting it to true fixed the issue

nbroad1881 · 2024-09-05T21:55:41Z

huggingface/transformers#33129

etiennebonnafoux · 2025-01-29T14:57:59Z

Hello, is there any fix or configuration for this issue ?
I got the same problem when deploy on AWS with SageMaker with image uri
763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0

This was referenced Jul 4, 2024

fix(layers): fix SuRotaryEmbedding #2060

Merged

[Bug]: Phi-3 long context (longrope) doesn't work with fp8 kv cache vllm-project/vllm#6135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phi-3 mini 128k produces gibberish if context >4k tokens #2185

Phi-3 mini 128k produces gibberish if context >4k tokens #2185

jphme commented Jul 4, 2024 •

edited

Loading

jphme commented Jul 4, 2024

annadmitrieva commented Jul 31, 2024

ytjhai commented Aug 12, 2024 •

edited

Loading

nbroad1881 commented Sep 5, 2024

etiennebonnafoux commented Jan 29, 2025 •

edited

Loading

Phi-3 mini 128k produces gibberish if context >4k tokens #2185

Phi-3 mini 128k produces gibberish if context >4k tokens #2185

Comments

jphme commented Jul 4, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

jphme commented Jul 4, 2024

annadmitrieva commented Jul 31, 2024

ytjhai commented Aug 12, 2024 • edited Loading

nbroad1881 commented Sep 5, 2024

etiennebonnafoux commented Jan 29, 2025 • edited Loading

jphme commented Jul 4, 2024 •

edited

Loading

ytjhai commented Aug 12, 2024 •

edited

Loading

etiennebonnafoux commented Jan 29, 2025 •

edited

Loading