Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text / Audio / Image embeddings behind the Post /embeddings via discrimminated union and optional extra fields #394

Closed
michaelfeil opened this issue Oct 3, 2024 · 0 comments · Fixed by #395
Labels
help wanted Extra attention is needed

Comments

@michaelfeil
Copy link
Owner

michaelfeil commented Oct 3, 2024

Feature request

Raised by @stikkireddy

#385
Last question is there any way to support this from OpenAI embeddings spec? extra_body is typically used to provide additional non spec features to allow the server to handle things. It would be nice to be able to reuse the openai embeddings client for things like:

client = OpenAI(....)
client.embeddings.create(
model="default",
inputs=["https://...", "data:..."],
extra_body={
  "infiniy_extra_embeddings": audio | image | text = text,
}

I know its not exactly what openai supports but it would be great to be able to use existing clients to solve this problem if possible. they have built in retries, etc.

vllm does this for vllm specific features like guided decoding, etc.

Solution:

(Michael) Idea would be to implement a discriminated union in pydantic.

If it has text, it validates against text schema (str), if it has image or audio, it validates against the audio schema.

See: https://docs.pydantic.dev/latest/concepts/unions/#discriminated-unions-with-str-discriminators

from typing import Literal, Union

from typing_extensions import Annotated

from pydantic import BaseModel, Field, ValidationError, RootModel, Discriminator, Tag

def get_discriminator_value(model: dict) -> str:
    return model.get("color", "white")

class BlackCat(BaseModel):
    pet_type: Literal['cat']
    color: Literal['black'] = "black"
    black_name: str


class WhiteCat(BaseModel):
    pet_type: Literal['cat']
    color: Literal['white'] = "white"
    white_name: str

class Cat(RootModel):
    root:  Annotated[
        Union[
            Annotated[BlackCat, Tag('black')],
            Annotated[WhiteCat, Tag('white')],
        ],
        Discriminator(get_discriminator_value)
    ]

Cat(pet_type="cat", color="black", black_name="g")
Cat(pet_type="cat", color="white", white_name="g")
Cat(pet_type="cat", white_name="g")

Motivation

Ease of use with OpenAI client.

Your contribution

Looking for contriutors

@michaelfeil michaelfeil changed the title Audio / Image embeddings via discrimminated union. Text / Audio / Image embeddings behind the Post /embeddings via discrimminated union and optional extra fields Oct 3, 2024
@michaelfeil michaelfeil added the help wanted Extra attention is needed label Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant