You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to disable streaming on a per-model basis in my proxy yaml config, and have LiteLLM do a non-streaming request upstream (while returning a fake-streaming response to the client).
When streaming on OpenAI, their moderation filter makes the responses insanely slow. Like, I sometimes timeout after >10 minutes.
I don't want to change my client-side code, as I do want other providers (like Azure OpenAI) in the same model group to stream (as the perf impact isn't as bad).
MockGPT doesn't support streaming, but I want to use it when load testing to replace GPT-4o. I can't control if my client application sends stream=True to LiteLLM.
The Feature
I would like to disable streaming on a per-model basis in my proxy yaml config, and have LiteLLM do a non-streaming request upstream (while returning a fake-streaming response to the client).
Basically what was mentioned here: #61 (comment)
Motivation, pitch
When streaming on OpenAI, their moderation filter makes the responses insanely slow. Like, I sometimes timeout after >10 minutes.
I don't want to change my client-side code, as I do want other providers (like Azure OpenAI) in the same model group to stream (as the perf impact isn't as bad).
Twitter / LinkedIn details
https://www.linkedin.com/in/davidmanouchehri/
The text was updated successfully, but these errors were encountered: