You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attempting to reuse an existing OpenAI client to stream responses from HF endpoint doesn't work due to a couple of differences. In my case the differences break the .NET client in Azure AI SDK, though I suspect it might affect other clients too.
Differences found:
When streaming response tokens, OpenAI terminates the stream with a final [DONE] string, while HF simply stops sending tokens. Clients expecting [DONE] get stuck waiting either for another token of for the termination string.
OpenAI supports '0.0 <= top_p <= 1.0', while HF supports only '0.0 < top_p < 1.0'
When sending top_p = 0 to HF endpoint, the service replies 200 OK with an error {"error":"Input validation error: top_p must be > 0.0 and < 1.0","error_type":"validation"} and no final [DONE]. Given the status code and the lack of a termination, the error is parsed as data and causes a client to hang, waiting for the next token.
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
Example 1: error with top_p = 0
Request:
curl -v -X POST https://api-inference.huggingface.co/v1/chat/completions \
-H "Authorization: Bearer ${HF_KEY}" \
-H "Content-Type: application/json" \
-d '{"messages":[{"content":"how much is 1+1","role":"system"}],
"max_tokens":50,
"temperature":0,
"top_p":0.0,
"presence_penalty":0,
"frequency_penalty":0,
"stream":true,
"model":"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"}'
Would be great if it was possible to reuse OpenAI clients (and apps built on these clients) simply by pointing them at https://api-inference.huggingface.co.
While it's possible to workaround the different range of top_p changing the code (if apps allow for it), the lack of termination strings makes it impossible to use these clients.
The text was updated successfully, but these errors were encountered:
System Info
Attempting to reuse an existing OpenAI client to stream responses from HF endpoint doesn't work due to a couple of differences. In my case the differences break the .NET client in Azure AI SDK, though I suspect it might affect other clients too.
Differences found:
[DONE]
string, while HF simply stops sending tokens. Clients expecting[DONE]
get stuck waiting either for another token of for the termination string.200 OK
with an error{"error":"Input validation error:
top_pmust be > 0.0 and < 1.0","error_type":"validation"}
and no final[DONE]
. Given the status code and the lack of a termination, the error is parsed as data and causes a client to hang, waiting for the next token.Information
Tasks
Reproduction
Example 1: error with top_p = 0
Request:
Response:
OpenAI returns a response instead (see next).
Example 2: OpenAI response includes '[DONE]`
Request:
Response:
Example 3: HF response is missing '[DONE]`
Request:
Response:
Expected behavior
Would be great if it was possible to reuse OpenAI clients (and apps built on these clients) simply by pointing them at
https://api-inference.huggingface.co
.While it's possible to workaround the different range of top_p changing the code (if apps allow for it), the lack of termination strings makes it impossible to use these clients.
The text was updated successfully, but these errors were encountered: