Different from the output of the HF inference #272

xcxhy · 2023-06-27T03:58:34Z

Looking forward to your reply! I set the temperature=0.1, top-k=10, top-p=0.75，I think i infer the same prompt, I will get the same output, through test the hf and vllm inference, HF will get stable output then vllm sometimes get the different output. Is this normal? The parameters of the two inferences are the same and do not bring the same constraints.

marscc · 2023-06-27T09:15:53Z

I have encountered the same problem. use llama 13B model, set max_tokens=256, frequency_penalty=0.1, temperature=0.1, top-k=50, top-p=0.75，I tested on a set of 40 questions and found that the outputs for 15 questions were different from the outputs obtained using huggingface inference.

zhuohan123 · 2023-06-27T15:25:06Z

The LLM inference process includes sampling, which is a random process. Because the implementation of HF and vLLM are different, it is normal to get different samples. However, if you perform argmax sampling (e.g., temperature=0), then you should be able to see the same results.

zhuohan123 closed this as completed Jun 27, 2023

vllm-project locked and limited conversation to collaborators Jun 27, 2023

zhuohan123 converted this issue into discussion #280 Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Different from the output of the HF inference #272

Different from the output of the HF inference #272

xcxhy commented Jun 27, 2023

marscc commented Jun 27, 2023 •

edited

Loading

zhuohan123 commented Jun 27, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Different from the output of the HF inference #272

Different from the output of the HF inference #272

Comments

xcxhy commented Jun 27, 2023

marscc commented Jun 27, 2023 • edited Loading

zhuohan123 commented Jun 27, 2023

This issue was moved to a discussion.

marscc commented Jun 27, 2023 •

edited

Loading