Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different from the output of the HF inference #272

Closed
xcxhy opened this issue Jun 27, 2023 · 2 comments
Closed

Different from the output of the HF inference #272

xcxhy opened this issue Jun 27, 2023 · 2 comments

Comments

@xcxhy
Copy link

xcxhy commented Jun 27, 2023

Looking forward to your reply! I set the temperature=0.1, top-k=10, top-p=0.75,I think i infer the same prompt, I will get the same output, through test the hf and vllm inference, HF will get stable output then vllm sometimes get the different output. Is this normal? The parameters of the two inferences are the same and do not bring the same constraints.

@marscc
Copy link

marscc commented Jun 27, 2023

I have encountered the same problem. use llama 13B model, set max_tokens=256, frequency_penalty=0.1, temperature=0.1, top-k=50, top-p=0.75,I tested on a set of 40 questions and found that the outputs for 15 questions were different from the outputs obtained using huggingface inference.

@zhuohan123
Copy link
Member

The LLM inference process includes sampling, which is a random process. Because the implementation of HF and vLLM are different, it is normal to get different samples. However, if you perform argmax sampling (e.g., temperature=0), then you should be able to see the same results.

@vllm-project vllm-project locked and limited conversation to collaborators Jun 27, 2023
@zhuohan123 zhuohan123 converted this issue into discussion #280 Jun 27, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants