Skip to content

Commit

Permalink
VoyageEmbeddings (#12608)
Browse files Browse the repository at this point in the history
- **Description:** Integrate VoyageEmbeddings into LangChain, with tests
and docs
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Tag maintainer:** N/A
  - **Twitter handle:** @Voyage_AI_

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
  • Loading branch information
thomas0809 and baskaryan authored Oct 31, 2023
1 parent 92bf40a commit 1dbb77d
Show file tree
Hide file tree
Showing 4 changed files with 421 additions and 0 deletions.
228 changes: 228 additions & 0 deletions docs/docs/integrations/text_embedding/voyageai.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "278b6c63",
"metadata": {},
"source": [
"# Voyage AI\n",
"\n",
"Let's load the Voyage Embedding class."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0be1af71",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import VoyageEmbeddings"
]
},
{
"cell_type": "markdown",
"id": "137cfde9-b88c-409a-9394-a9e31a6bf30d",
"metadata": {},
"source": [
"Voyage AI utilizes API keys to monitor usage and manage permissions. To obtain your key, create an account on our [homepage](https://www.voyageai.com). Then, create a VoyageEmbeddings model with your API key."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2c66e5da",
"metadata": {},
"outputs": [],
"source": [
"embeddings = VoyageEmbeddings(voyage_api_key=\"[ Your Voyage API key ]\")"
]
},
{
"cell_type": "markdown",
"id": "459dffb3-9bff-41f2-8507-642de7431b2d",
"metadata": {},
"source": [
"Prepare the documents and use `embed_documents` to get their embeddings."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c85e948f-85fd-4d56-8d21-6e2f7e65cab8",
"metadata": {},
"outputs": [],
"source": [
"documents = [\n",
" \"Caching embeddings enables the storage or temporary caching of embeddings, eliminating the necessity to recompute them each time.\",\n",
" \"An LLMChain is a chain that composes basic LLM functionality. It consists of a PromptTemplate and a language model (either an LLM or chat model). It formats the prompt template using the input key values provided (and also memory key values, if available), passes the formatted string to LLM and returns the LLM output.\",\n",
" \"A Runnable represents a generic unit of work that can be invoked, batched, streamed, and/or transformed.\",\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5a77a12d-6ac6-4ab8-b103-80ff24487019",
"metadata": {},
"outputs": [],
"source": [
"documents_embds = embeddings.embed_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "2c89167c-816c-487e-8704-90908a4190bb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.0562174916267395,\n",
" 0.018221192061901093,\n",
" 0.0025736060924828053,\n",
" -0.009720131754875183,\n",
" 0.04108370840549469]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"documents_embds[0][:5]"
]
},
{
"cell_type": "markdown",
"id": "f8d796d1-4ced-44d3-81bf-282721edb6bb",
"metadata": {},
"source": [
"Similarly, use `embed_query` to embed the query."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "bfb6142c",
"metadata": {},
"outputs": [],
"source": [
"query = \"What's an LLMChain?\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "91bc875d-829b-4c3d-8e6f-fc2dda30a3bd",
"metadata": {},
"outputs": [],
"source": [
"query_embd = embeddings.embed_query(query)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a4b0d49e-0c73-44b6-aed5-5b426564e085",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[-0.0052348352037370205,\n",
" -0.040072452276945114,\n",
" 0.0033957737032324076,\n",
" 0.01763271726667881,\n",
" -0.019235141575336456]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_embd[:5]"
]
},
{
"cell_type": "markdown",
"id": "b16ddbb2-61f0-49ec-92c3-a6f236d9517f",
"metadata": {},
"source": [
"## A minimalist retrieval system"
]
},
{
"cell_type": "markdown",
"id": "5464cb0a-6967-4f1e-ac7c-0aab80b2795a",
"metadata": {},
"source": [
"The main feature of the embeddings is that the cosine similarity between two embeddings captures the semantic relatedness of the corresponding original passages. This allows us to use the embeddings to do semantic retrieval / search."
]
},
{
"cell_type": "markdown",
"id": "a0bd3ad2-ca68-4e75-9172-76aea28ba46e",
"metadata": {},
"source": [
" We can find a few closest embeddings in the documents embeddings based on the cosine similarity, and retrieve the corresponding document using the `KNNRetriever` class from LangChain."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0a3fc579-85a9-4bd0-a944-4e32ac62e2d4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"An LLMChain is a chain that composes basic LLM functionality. It consists of a PromptTemplate and a language model (either an LLM or chat model). It formats the prompt template using the input key values provided (and also memory key values, if available), passes the formatted string to LLM and returns the LLM output.\n"
]
}
],
"source": [
"from langchain.retrievers import KNNRetriever\n",
"\n",
"retriever = KNNRetriever.from_texts(documents, embeddings)\n",
"\n",
"# retrieve the most relevant documents\n",
"result = retriever.get_relevant_documents(query)\n",
"top1_retrieved_doc = result[0].page_content # return the top1 retrieved result\n",
"\n",
"print(top1_retrieved_doc)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
},
"vscode": {
"interpreter": {
"hash": "e971737741ff4ec9aff7dc6155a1060a59a8a6d52c757dbbe66bf8ee389494b1"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
2 changes: 2 additions & 0 deletions libs/langchain/langchain/embeddings/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
from langchain.embeddings.spacy_embeddings import SpacyEmbeddings
from langchain.embeddings.tensorflow_hub import TensorflowHubEmbeddings
from langchain.embeddings.vertexai import VertexAIEmbeddings
from langchain.embeddings.voyageai import VoyageEmbeddings
from langchain.embeddings.xinference import XinferenceEmbeddings

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -115,6 +116,7 @@
"OllamaEmbeddings",
"QianfanEmbeddingsEndpoint",
"JohnSnowLabsEmbeddings",
"VoyageEmbeddings",
]


Expand Down
Loading

0 comments on commit 1dbb77d

Please sign in to comment.