Vector search and embeddings API #58

samuelcolvin · 2024-11-18T09:30:48Z

Currently we don't have anything, and the RAG example just uses OpenAI's plain API to generate embeddings.

It seems simple enough to add a dedicated API to models to generate embeddings, would wouldn't provide much on top of what the OpenAI SDK already offers, but would help a lot with Gemini where there's currently not interface in what we have.

I suspect that the vector search part is harder to provide an API for: anything beyond toy examples will require full control of the database being searched, and we're not (yet) building an ORM.

Am I wrong or missing something?

davidhewitt · 2024-11-18T10:18:14Z

Can the retriever pattern be reused for vector search?

samuelcolvin · 2024-12-04T14:08:31Z

Can the retriever pattern be reused for vector search?

tool calls already work well for vector search, see https://ai.pydantic.dev/examples/rag/.

The main thing we can add is a model agnostic interface for creating embeddings.

mertbozkir · 2025-01-07T11:24:36Z

Hey Samuel, is there any possibility that instead of agent.tools, pydantic-ai has proper API integration for vector db? so it would be much easier for the users to use retriever pattern. cc: @stephen37

jinglesthula · 2025-01-07T23:07:36Z

Is there a possibility of having something like a standard API for vector search and then people can build adapters for specific vector db implementations? Perhaps a reference implementation for one or two of the more popular FOSS vector dbs could be included in PydanticAI to begin with, with community contributions of others welcome.

izzyacademy · 2025-01-20T13:51:06Z

Here is a draft suggestion for the Vector Store APIs.

It could be a separate github and pypi project [pydantic-ai-vectorstores] with the ABCs in the code pydantic-ai so that implementers can extend the interface and implement vector store specific integrations that meet the minimum threshold specified in the project docs (500k downloads or something similar)

from abc import ABC, abstractmethod
from typing import Any


class Embeddings(ABC):
    """This is used to generate emebeddings for document chunks and query strings"""

    @abstractmethod
    async def vectorize_documents(self, document_chunks: list[str]) -> list[list[float]]:
        """This is used to generate document embeddings for a list of chunks"""

    @abstractmethod
    async def vectorize_query(self, text: str) -> list[float]:
        """This is used to generate embeddings for the query string"""

    @abstractmethod
    def vectorize_documents_sync(self, document_chunks: list[str]) -> list[list[float]]:
        """Synchronous version of vectorize_documents()"""

    @abstractmethod
    def vectorize_query_sync(self, text: str) -> list[float]:
        """Synchrounous version of vectorize_query()"""

class Document:
    """Represents a document or record added to a vector store"""
    id: str
    content: str
    meta_data_fields: dict[str, Any]
    
class VectorStore(ABC):
    """Base class for vector store implementation"""

    embeddings: Embeddings

    @abstractmethod
    async def add_documents(self, documents: list[Document], **kwargs: Any) -> list[str]:
        """Adds a list of documents to the vector store and returns their unique identifiers"""

    @abstractmethod
    async def add_document_chunks(self, documents: list[str], **kwargs: Any) -> list[str]:
        """This can use VectorStore.add_documents() to prepare records for the vector store insertion"""

    @abstractmethod
    async def delete_documents(self, document_ids: list[str]):
        """Deletes the specified list of documents by their record identifiers"""

    @abstractmethod
    async def search(self, query: str, search_type: str, **kwargs: Any) -> list[Document]:
        """Implementor can define a list of valid search types in sub classes. Can use VectorStore.search_with_embeddings() for search"""

    @abstractmethod
    async def search_with_embeddings(self, query: list[float], search_type: str, **kwargs: Any) -> list[Document]:
        """Implementor can define a list of valid search types in sub classes"""

dsaad68 · 2025-01-28T23:47:40Z

@izzyacademy Great suggestion! I would like to expand on your suggestion with an additional idea.

Some packages also support customized responses with a retrieval query, where a query fragment can be passed as a variable to the search function, enabling more tailored responses. For example, in the case of Neo4j, the underlying search function query might look like this:

read_query = (
  "CALL db.index.vector.queryNodes($index, $k, $embedding) "
  "YIELD node, score "
) + retrieval_query

Here, retrieval_query can be used to customize the response, as demonstrated below:

retrieval_query = """
RETURN "Name:" + node.name AS text, score, {foo:"bar"} AS metadata
"""

Building on this idea, I propose the following structure for the search function:

    @abstractmethod
    async def search(self, query: str, search_type: str, retrieval_query:str | None, **kwargs: Any) -> list[Document]:
        """Implementor can define a list of valid search types in sub classes. Can use VectorStore.search_with_embeddings() for search"""

    @abstractmethod
    async def search_with_embeddings(self, query: list[float], search_type: str, retrieval_query:str | None, **kwargs: Any) -> list[Document]:
        """Implementor can define a list of valid search types in sub classes"""

This approach allows for greater flexibility and customization in tailoring search results.

sydney-runkle added the enhancement New feature or request label Dec 5, 2024

samuelcolvin self-assigned this Jan 6, 2025

samuelcolvin mentioned this issue Jan 16, 2025

State persistence #695

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector search and embeddings API #58

Vector search and embeddings API #58

samuelcolvin commented Nov 18, 2024

davidhewitt commented Nov 18, 2024

samuelcolvin commented Dec 4, 2024

mertbozkir commented Jan 7, 2025 •

edited

Loading

jinglesthula commented Jan 7, 2025

izzyacademy commented Jan 20, 2025

dsaad68 commented Jan 28, 2025

Vector search and embeddings API #58

Vector search and embeddings API #58

Comments

samuelcolvin commented Nov 18, 2024

davidhewitt commented Nov 18, 2024

samuelcolvin commented Dec 4, 2024

mertbozkir commented Jan 7, 2025 • edited Loading

jinglesthula commented Jan 7, 2025

izzyacademy commented Jan 20, 2025

dsaad68 commented Jan 28, 2025

mertbozkir commented Jan 7, 2025 •

edited

Loading