Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embeddings: nomic embed vision #22482

Merged
merged 18 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
497 changes: 497 additions & 0 deletions cookbook/nomic_multimodal_rag.ipynb

Large diffs are not rendered by default.

12 changes: 8 additions & 4 deletions docs/scripts/arxiv_references.py
Original file line number Diff line number Diff line change
Expand Up @@ -515,7 +515,8 @@ def log_results(arxiv_id2type2key2urls):
def generate_arxiv_references_page(file_name: Path, papers: list[ArxivPaper]) -> None:
with open(file_name, "w") as f:
# Write the table headers
f.write("""# arXiv
f.write(
"""# arXiv

LangChain implements the latest research in the field of Natural Language Processing.
This page contains `arXiv` papers referenced in the LangChain Documentation, API Reference,
Expand All @@ -525,7 +526,8 @@ def generate_arxiv_references_page(file_name: Path, papers: list[ArxivPaper]) ->

| arXiv id / Title | Authors | Published date 🔻 | LangChain Documentation|
|------------------|---------|-------------------|------------------------|
""")
"""
)
for paper in papers:
refs = []
if paper.referencing_doc2url:
Expand Down Expand Up @@ -595,7 +597,8 @@ def generate_arxiv_references_page(file_name: Path, papers: list[ArxivPaper]) ->
if el
]
)
f.write(f"""
f.write(
f"""
## {paper.title}

- **arXiv id:** {paper.arxiv_id}
Expand All @@ -608,7 +611,8 @@ def generate_arxiv_references_page(file_name: Path, papers: list[ArxivPaper]) ->
{refs}

**Abstract:** {paper.abstract}
""")
"""
)

logger.warning(f"Created the {file_name} file with {len(papers)} arXiv references.")

Expand Down
4 changes: 1 addition & 3 deletions libs/partners/nomic/langchain_nomic/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from langchain_nomic.embeddings import NomicEmbeddings

__all__ = [
"NomicEmbeddings",
]
__all__ = ["NomicEmbeddings"]
11 changes: 11 additions & 0 deletions libs/partners/nomic/langchain_nomic/embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ def __init__(
self,
*,
model: str,
nomic_api_key: Optional[str] = ...,
dimensionality: Optional[int] = ...,
inference_mode: Literal["remote"] = ...,
):
Expand All @@ -32,6 +33,7 @@ def __init__(
self,
*,
model: str,
nomic_api_key: Optional[str] = ...,
dimensionality: Optional[int] = ...,
inference_mode: Literal["local", "dynamic"],
device: Optional[str] = ...,
Expand All @@ -43,6 +45,7 @@ def __init__(
self,
*,
model: str,
nomic_api_key: Optional[str] = ...,
dimensionality: Optional[int] = ...,
inference_mode: str,
device: Optional[str] = ...,
Expand All @@ -57,6 +60,7 @@ def __init__(
dimensionality: Optional[int] = None,
inference_mode: str = "remote",
device: Optional[str] = None,
vision_model: Optional[str] = None,
):
"""Initialize NomicEmbeddings model.

Expand All @@ -80,6 +84,7 @@ def __init__(
self.dimensionality = dimensionality
self.inference_mode = inference_mode
self.device = device
self.vision_model = vision_model

def embed(self, texts: List[str], *, task_type: str) -> List[List[float]]:
"""Embed texts.
Expand Down Expand Up @@ -121,3 +126,9 @@ def embed_query(self, text: str) -> List[float]:
texts=[text],
task_type="search_query",
)[0]

def embed_image(self, uris: List[str]) -> List[List[float]]:
return embed.image(
images=uris,
model=self.vision_model,
)["embeddings"]
14 changes: 7 additions & 7 deletions libs/partners/nomic/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions libs/partners/nomic/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ license = "MIT"
python = ">=3.8.1,<4.0"
langchain-core = ">=0.1.46,<0.3"
nomic = "^3.0.29"
pillow = "^10.3.0"

[tool.poetry.group.test]
optional = true
Expand Down
21 changes: 11 additions & 10 deletions templates/rag-multi-modal-local/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ With the release of open source, multi-modal LLMs it's possible to build this ki

This template demonstrates how to perform private visual search and question-answering over a collection of your photos.

It uses OpenCLIP embeddings to embed all of the photos and stores them in Chroma.
It uses [`nomic-embed-vision-v1`](https://huggingface.co/nomic-ai/nomic-embed-vision-v1) multi-modal embeddings to embed the images and `Ollama` for question-answering.

Given a question, relevant photos are retrieved and passed to an open source multi-modal LLM of your choice for answer synthesis.

![Diagram illustrating the visual search process with OpenCLIP embeddings and multi-modal LLM for question-answering, featuring example food pictures and a matcha soft serve answer trace.](https://github.com/langchain-ai/langchain/assets/122662504/da543b21-052c-4c43-939e-d4f882a45d75 "Visual Search Process Diagram")
![Diagram illustrating the visual search process with nomic-embed-vision-v1 embeddings and multi-modal LLM for question-answering, featuring example food pictures and a matcha soft serve answer trace.](https://github.com/langchain-ai/langchain/assets/122662504/da543b21-052c-4c43-939e-d4f882a45d75 "Visual Search Process Diagram")

## Input

Expand All @@ -34,22 +34,23 @@ python ingest.py

## Storage

This template will use [OpenCLIP](https://github.com/mlfoundations/open_clip) multi-modal embeddings to embed the images.

You can select different embedding model options (see results [here](https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_results.csv)).
This template will use [nomic-embed-vision-v1](https://huggingface.co/nomic-ai/nomic-embed-vision-v1) multi-modal embeddings to embed the images.

The first time you run the app, it will automatically download the multimodal embedding model.

By default, LangChain will use an embedding model with moderate performance but lower memory requirments, `ViT-H-14`.

You can choose alternative `OpenCLIPEmbeddings` models in `rag_chroma_multi_modal/ingest.py`:
You can choose alternative models in `rag_chroma_multi_modal/ingest.py`, such as `OpenCLIPEmbeddings`.
```
langchain_experimental.open_clip import OpenCLIPEmbeddings

embedding_function=OpenCLIPEmbeddings(
model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k"
)

vectorstore_mmembd = Chroma(
collection_name="multi-modal-rag",
persist_directory=str(re_vectorstore_path),
embedding_function=OpenCLIPEmbeddings(
model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k"
),
embedding_function=embedding_function
)
```

Expand Down
6 changes: 4 additions & 2 deletions templates/rag-multi-modal-local/ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from pathlib import Path

from langchain_community.vectorstores import Chroma
from langchain_experimental.open_clip import OpenCLIPEmbeddings
from langchain_nomic import NomicMultimodalEmbeddings

# Load images
img_dump_path = Path(__file__).parent / "docs/"
Expand All @@ -21,7 +21,9 @@

# Load embedding function
print("Loading embedding function")
embedding = OpenCLIPEmbeddings(model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k")
embedding = NomicMultimodalEmbeddings(
vision_model="nomic-embed-vision-v1", text_model="nomic-embed-text-v1"
)

# Create chroma
vectorstore_mmembd = Chroma(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from langchain_core.output_parsers import StrOutputParser
from langchain_core.pydantic_v1 import BaseModel
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_experimental.open_clip import OpenCLIPEmbeddings
from langchain_nomic import NomicMultimodalEmbeddings
from PIL import Image


Expand Down Expand Up @@ -102,8 +102,8 @@ def multi_modal_rag_chain(retriever):
vectorstore_mmembd = Chroma(
collection_name="multi-modal-rag",
persist_directory=str(Path(__file__).parent.parent / "chroma_db_multi_modal"),
embedding_function=OpenCLIPEmbeddings(
model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k"
embedding_function=NomicMultimodalEmbeddings(
vision_model="nomic-embed-vision-v1", text_model="nomic-embed-text-v1"
),
)

Expand Down
Loading