Text / Audio / Image embeddings behind the Post `/embeddings` via discrimminated union and optional extra fields #394

michaelfeil · 2024-10-03T16:13:03Z

Feature request

#385
Last question is there any way to support this from OpenAI embeddings spec? extra_body is typically used to provide additional non spec features to allow the server to handle things. It would be nice to be able to reuse the openai embeddings client for things like:

client = OpenAI(....)
client.embeddings.create(
model="default",
inputs=["https://...", "data:..."],
extra_body={
  "infiniy_extra_embeddings": audio | image | text = text,
}

I know its not exactly what openai supports but it would be great to be able to use existing clients to solve this problem if possible. they have built in retries, etc.

vllm does this for vllm specific features like guided decoding, etc.

Solution:

(Michael) Idea would be to implement a discriminated union in pydantic.

If it has text, it validates against text schema (str), if it has image or audio, it validates against the audio schema.

See: https://docs.pydantic.dev/latest/concepts/unions/#discriminated-unions-with-str-discriminators

from typing import Literal, Union

from typing_extensions import Annotated

from pydantic import BaseModel, Field, ValidationError, RootModel, Discriminator, Tag

def get_discriminator_value(model: dict) -> str:
    return model.get("color", "white")

class BlackCat(BaseModel):
    pet_type: Literal['cat']
    color: Literal['black'] = "black"
    black_name: str


class WhiteCat(BaseModel):
    pet_type: Literal['cat']
    color: Literal['white'] = "white"
    white_name: str

class Cat(RootModel):
    root:  Annotated[
        Union[
            Annotated[BlackCat, Tag('black')],
            Annotated[WhiteCat, Tag('white')],
        ],
        Discriminator(get_discriminator_value)
    ]

Cat(pet_type="cat", color="black", black_name="g")
Cat(pet_type="cat", color="white", white_name="g")
Cat(pet_type="cat", white_name="g")

Motivation

Ease of use with OpenAI client.

Your contribution

Looking for contriutors

michaelfeil changed the title ~~Audio / Image embeddings via discrimminated union.~~ Text / Audio / Image embeddings behind the Post /embeddings via discrimminated union and optional extra fields Oct 3, 2024

michaelfeil added the help wanted Extra attention is needed label Oct 3, 2024

michaelfeil mentioned this issue Oct 4, 2024

Embed openai broad multimodal compat #395

Merged

michaelfeil closed this as completed in #395 Oct 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text / Audio / Image embeddings behind the Post `/embeddings` via discrimminated union and optional extra fields #394

Text / Audio / Image embeddings behind the Post `/embeddings` via discrimminated union and optional extra fields #394

michaelfeil commented Oct 3, 2024 •

edited

Loading

Text / Audio / Image embeddings behind the Post /embeddings via discrimminated union and optional extra fields #394

Text / Audio / Image embeddings behind the Post /embeddings via discrimminated union and optional extra fields #394

Comments

michaelfeil commented Oct 3, 2024 • edited Loading

Feature request

Solution:

Motivation

Your contribution

Text / Audio / Image embeddings behind the Post `/embeddings` via discrimminated union and optional extra fields #394

Text / Audio / Image embeddings behind the Post `/embeddings` via discrimminated union and optional extra fields #394

michaelfeil commented Oct 3, 2024 •

edited

Loading