You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking forward to your reply! I set the temperature=0.1, top-k=10, top-p=0.75,I think i infer the same prompt, I will get the same output, through test the hf and vllm inference, HF will get stable output then vllm sometimes get the different output. Is this normal? The parameters of the two inferences are the same and do not bring the same constraints.
The text was updated successfully, but these errors were encountered:
I have encountered the same problem. use llama 13B model, set max_tokens=256, frequency_penalty=0.1, temperature=0.1, top-k=50, top-p=0.75,I tested on a set of 40 questions and found that the outputs for 15 questions were different from the outputs obtained using huggingface inference.
The LLM inference process includes sampling, which is a random process. Because the implementation of HF and vLLM are different, it is normal to get different samples. However, if you perform argmax sampling (e.g., temperature=0), then you should be able to see the same results.
Looking forward to your reply! I set the temperature=0.1, top-k=10, top-p=0.75,I think i infer the same prompt, I will get the same output, through test the hf and vllm inference, HF will get stable output then vllm sometimes get the different output. Is this normal? The parameters of the two inferences are the same and do not bring the same constraints.
The text was updated successfully, but these errors were encountered: