vllm deploy with stop_token_ids #72

geasyheart · 2024-02-22T01:51:30Z

你好～请教个问题，我发现有个现象，当使用vllm进行部署时在SamplingParams中使用stop_token_ids会出现回答不完全的现象。

例如：
问：你是谁呀？
回答：我是***，我能回答

然后就结束了。

而不使用stop_token_ids则正常：

sampling_params = SamplingParams(temperature=0.7, top_p=0.8,top_k=20,repetition_penalty=1.05, stop_token_ids=[], max_tokens=32768)

请问这么处理是否合适以及为什么不需要指定stop_token_ids？？

allendred · 2024-02-23T07:55:30Z

同问
#46

ticoAg · 2024-02-23T12:43:34Z

LeonG7 · 2024-03-01T01:50:07Z

currenttime · 2024-03-19T09:53:15Z

指定maxtoken长度，默认为16，所以会截断

currenttime · 2024-03-19T09:53:26Z

output = llm.generate(text, sampling_params=SamplingParams(max_tokens=512))

github-actions · 2025-03-08T08:02:49Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

geasyheart closed this as completed Feb 26, 2024

github-actions bot locked as resolved and limited conversation to collaborators Mar 8, 2025

Provide feedback