[FEATURE] Support predibase LLM serving a base model with optional fine-tuned adapter. #369

alexsherstinsky · 2024-05-17T18:21:48Z

Title: Support predibase LLM serving a base model with an optional fine-tuned adapter.

Brief Description of Changes

We add "predibase" as the provider using the OpenAI-compliant API that Predibase has.
Format for supplying a fine-tuned adapter is "<base_model>[:adapter_id]", where adapter_id format is
"<adapter_repository_reference/version_number" (version_number is required).

client = OpenAI(
    api_key=os.environ["PREDIBASE_API_TOKEN"],
    base_url=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        provider="predibase",
    )
)

chat_complete = client.chat.completions.create(
    user=os.environ["PREDIBASE_TENANT_ID"],
    # model=os.environ["PREDIBASE_DEPLOYMENT"],
    model=f'{os.environ["PREDIBASE_DEPLOYMENT"]}:test-phi-3/4',
    stream=False,  # True is also supported.
    max_tokens=128,
    temperature=0.2,
    messages=[
        {
            "role": "user",
            "content": "How fast can a horse run?",
        },
    ],
)

if isinstance(chat_complete, Stream):
    completion_stream: Stream = chat_complete
    text: list[str] = []
    for message in completion_stream:
        print(message)
        delta_content: str | None = message.choices[0].delta.content
        if delta_content:
            text.append(delta_content)
    print("".join(text))
else:
    print(chat_complete.choices[0])
    print(chat_complete.choices[0].message)
    print(chat_complete.choices[0].message.content)


Supported body parameters:

  model,
  messages,
  max_tokens,
  temperature,
  top_p,
  stream,
  n,
  stop,

Description: (optional)

Detailed change 1
Detailed change 2

Motivation: (optional)

Predibase becomes an option as an LLM provider for the users of Gateway. Predibase is the most robust platform for LLM fine-tuning and serving for a wide variety of pre-trained models, including the open-source ones.

Related Issues: (optional)

#issue-number

… work). [WIP]

roh26it · 2024-05-20T07:42:11Z

Looks good to me and excited to merge this.

@VisargD will run some tests at our end on this and come back if there's any changes.

VisargD · 2024-05-20T08:39:02Z

Hey @alexsherstinsky - Thanks for the PR! In the Predibase docs, I can see that streaming (/generate_stream) is also supported. Are you planning to add it to this PR as well? Here is the doc that I am referring to: https://docs.predibase.com/user-guide/inference/rest_api#notes

If its not planned for this PR then I can merge this and raise a new one with streaming support for Predibase.

You can also use the /generate_stream endpoint to have the tokens be streamed from the deployment. The parameters also follow the same format as the [LoRAX generate endpoints](https://github.com/predibase/lorax/tree/main/clients/python).

alexsherstinsky · 2024-05-20T16:00:15Z

Hey @alexsherstinsky - Thanks for the PR! In the Predibase docs, I can see that streaming (/generate_stream) is also supported. Are you planning to add it to this PR as well? Here is the doc that I am referring to: https://docs.predibase.com/user-guide/inference/rest_api#notes

If its not planned for this PR then I can merge this and raise a new one with streaming support for Predibase.
You can also use the /generate_stream endpoint to have the tokens be streamed from the deployment. The parameters also follow the same format as the [LoRAX generate endpoints](https://github.com/predibase/lorax/tree/main/clients/python).

@VisargD Streaming is already supported! If you see in my example above, there is a stream flag -- I test it, and it works! Thank you!

…m_serving_with_fine_tuned_adapters-2024_04_18-0

VisargD · 2024-05-21T06:52:26Z

Gateway expects separate responseTransforms for stream and non-stream mode. In the index.ts of a provider, we define chatComplete and stream-chatComplete responseTransforms. Just like you defined PredibaseChatCompleteResponseTransform function, there should be one more function like PredibaseChatCompleteStreamChunkTransform which maps provider's stream chunk to Gateway compatible stream chunk. You can check other providers like perplexity-ai, mistral-ai to get an idea of this.

The reason why stream is working fine currently is because it is not able to find a stream chunk transform function and so it is doing a passthrough of all the chunks as it is. Even if predibase is sending OpenAI compatible chunks, its preferred to atleast add this function and map the chunk data as it is so that in future it does not break whenever predibase makes any change.

Please let me know if you need any help with this. I can provide more details if required.

VisargD · 2024-05-21T08:06:58Z

Here is what I am suggesting:

Add a new stream chunk transform function in chatComplete.ts:
Predibase does not send id in chunks. So you can use fallbackId which is passed to all transform function calls.

export const PredibaseChatCompleteStreamChunkTransform: (
  response: string,
  fallbackId: string,
) => string | string[] = (responseChunk, fallbackId) => {
  let chunk = responseChunk.trim();
  chunk = chunk.replace(/^data:/, '');
  chunk = chunk.trim();

  const parsedChunk: PredibaseChatCompletionStreamChunk = JSON.parse(chunk);

  return `data: ${JSON.stringify({
      id: fallbackId,
      object: parsedChunk.object,
      created: Math.floor(Date.now() / 1000),
      model: parsedChunk.model,
      provider: PREDIBASE,
      choices: [
        {
          delta: {
            role: parsedChunk.choices[0]?.delta?.role,
            content: parsedChunk.choices[0]?.delta?.content,
          },
          index: 0,
          finish_reason: parsedChunk.choices[0]?.finish_reason,
        },
      ],
    })}` + '\n\n';
};

Add stream-chatComplete responseTransform in predibase index.ts:

const PredibaseConfig: ProviderConfigs = {
  chatComplete: PredibaseChatCompleteConfig,
  api: PredibaseAPIConfig,
  responseTransforms: {
    chatComplete: PredibaseChatCompleteResponseTransform,
    'stream-chatComplete': PredibaseChatCompleteStreamChunkTransform,
  },
};

alexsherstinsky · 2024-05-21T15:45:39Z

Here is what I am suggesting:

Add a new stream chunk transform function in chatComplete.ts:
Predibase does not send id in chunks. So you can use fallbackId which is passed to all transform function calls.

export const PredibaseChatCompleteStreamChunkTransform: (
  response: string,
  fallbackId: string,
) => string | string[] = (responseChunk, fallbackId) => {
  let chunk = responseChunk.trim();
  chunk = chunk.replace(/^data:/, '');
  chunk = chunk.trim();

  const parsedChunk: PredibaseChatCompletionStreamChunk = JSON.parse(chunk);

  return `data: ${JSON.stringify({
      id: fallbackId,
      object: parsedChunk.object,
      created: Math.floor(Date.now() / 1000),
      model: parsedChunk.model,
      provider: PREDIBASE,
      choices: [
        {
          delta: {
            role: parsedChunk.choices[0]?.delta?.role,
            content: parsedChunk.choices[0]?.delta?.content,
          },
          index: 0,
          finish_reason: parsedChunk.choices[0]?.finish_reason,
        },
      ],
    })}` + '\n\n';
};

Add stream-chatComplete responseTransform in predibase index.ts:

const PredibaseConfig: ProviderConfigs = {
  chatComplete: PredibaseChatCompleteConfig,
  api: PredibaseAPIConfig,
  responseTransforms: {
    chatComplete: PredibaseChatCompleteResponseTransform,
    'stream-chatComplete': PredibaseChatCompleteStreamChunkTransform,
  },
};

@VisargD Thank you very much for this -- it was extremely helpful! I incorporated your suggestions and looked up how perplexity-ai does it as well. Thanks to your suggestion, I already found one error (one of my tests is failing, which is a good thing, because it is happening now, while we are still developing it!). I will ping you again once I have figured it out and made the fix. Thanks again!

alexsherstinsky · 2024-05-21T21:55:20Z

@VisargD Please re-review; I incorporated your suggestion and also added error handling. The error handling this way enables the client to see the actual error; otherwise, the error response does not work, because the HTTP response is 200 OK. Thank you.

VisargD · 2024-05-22T07:52:02Z

Thanks for the quick changes. Looks good to me! I will merge this PR and make it a part of the next gateway release.

VisargD · 2024-05-22T07:53:04Z

Closes #126

alexsherstinsky added 5 commits May 16, 2024 21:48

Predibase-PortKey GateWay integration -- initial commit (does not yet…

5f8dc83

… work). [WIP]

Fixing arguments.

bc32c3e

Fixing arguments.

1b06739

Clean up.

a38dad1

Clean up.

7fc525a

alexsherstinsky marked this pull request as ready for review May 17, 2024 18:25

alexsherstinsky changed the title ~~[FEATURE] Support predibase LLM serving a base model (without fine-tuned adapter for now).~~ [FEATURE] Support predibase LLM serving a base model (without a fine-tuned adapter for now). May 17, 2024

vrushankportkey requested review from sk-portkey and VisargD May 17, 2024 19:43

alexsherstinsky marked this pull request as draft May 17, 2024 20:23

alexsherstinsky added 3 commits May 17, 2024 15:23

Running "npx prettier src --write".

aaec69f

Extend accepted parameters and support fine-tuned adapters.

250a7c1

Linting.

73312c6

alexsherstinsky changed the title ~~[FEATURE] Support predibase LLM serving a base model (without a fine-tuned adapter for now).~~ [FEATURE] Support predibase LLM serving a base model with optional fine-tuned adapter. May 18, 2024

alexsherstinsky added 3 commits May 17, 2024 18:53

Extend accepted parameters and support fine-tuned adapters.

1e72f02

Linting.

fe60083

Improved comments.

c6aa64b

alexsherstinsky marked this pull request as ready for review May 18, 2024 15:21

Merge branch 'main' into feature/alexsherstinsky/support_predibase_ll…

3f335aa

…m_serving_with_fine_tuned_adapters-2024_04_18-0

Adding streaming transformation for Predibase provider.

7a722f1

VisargD approved these changes May 22, 2024

View reviewed changes

VisargD linked an issue May 22, 2024 that may be closed by this pull request

[Provider] Predibase #126

Closed

VisargD merged commit 079b463 into Portkey-AI:main May 22, 2024
1 check passed

alexsherstinsky deleted the feature/alexsherstinsky/support_predibase_llm_serving_with_fine_tuned_adapters-2024_04_18-0 branch May 22, 2024 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support predibase LLM serving a base model with optional fine-tuned adapter. #369

[FEATURE] Support predibase LLM serving a base model with optional fine-tuned adapter. #369

alexsherstinsky commented May 17, 2024 •

edited

Loading

roh26it commented May 20, 2024

VisargD commented May 20, 2024

alexsherstinsky commented May 20, 2024

VisargD commented May 21, 2024

VisargD commented May 21, 2024

alexsherstinsky commented May 21, 2024

alexsherstinsky commented May 21, 2024

VisargD commented May 22, 2024

VisargD commented May 22, 2024

[FEATURE] Support predibase LLM serving a base model with optional fine-tuned adapter. #369

[FEATURE] Support predibase LLM serving a base model with optional fine-tuned adapter. #369

Conversation

alexsherstinsky commented May 17, 2024 • edited Loading

roh26it commented May 20, 2024

VisargD commented May 20, 2024

alexsherstinsky commented May 20, 2024

VisargD commented May 21, 2024

VisargD commented May 21, 2024

alexsherstinsky commented May 21, 2024

alexsherstinsky commented May 21, 2024

VisargD commented May 22, 2024

VisargD commented May 22, 2024

alexsherstinsky commented May 17, 2024 •

edited

Loading