-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support predibase LLM serving a base model with optional fine-tuned adapter. #369
Conversation
Looks good to me and excited to merge this. @VisargD will run some tests at our end on this and come back if there's any changes. |
Hey @alexsherstinsky - Thanks for the PR! In the Predibase docs, I can see that streaming If its not planned for this PR then I can merge this and raise a new one with streaming support for Predibase.
|
@VisargD Streaming is already supported! If you see in my example above, there is a |
…m_serving_with_fine_tuned_adapters-2024_04_18-0
Gateway expects separate responseTransforms for stream and non-stream mode. In the The reason why stream is working fine currently is because it is not able to find a stream chunk transform function and so it is doing a passthrough of all the chunks as it is. Even if predibase is sending OpenAI compatible chunks, its preferred to atleast add this function and map the chunk data as it is so that in future it does not break whenever predibase makes any change. Please let me know if you need any help with this. I can provide more details if required. |
Here is what I am suggesting:
|
@VisargD Thank you very much for this -- it was extremely helpful! I incorporated your suggestions and looked up how perplexity-ai does it as well. Thanks to your suggestion, I already found one error (one of my tests is failing, which is a good thing, because it is happening now, while we are still developing it!). I will ping you again once I have figured it out and made the fix. Thanks again! |
@VisargD Please re-review; I incorporated your suggestion and also added error handling. The error handling this way enables the client to see the actual error; otherwise, the error response does not work, because the HTTP response is 200 OK. Thank you. |
Thanks for the quick changes. Looks good to me! I will merge this PR and make it a part of the next gateway release. |
Closes #126 |
Title: Support predibase LLM serving a base model with an optional fine-tuned adapter.
"<adapter_repository_reference/version_number" (version_number is required).
Description: (optional)
Motivation: (optional)
Related Issues: (optional)