Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
baptiste-pasquier committed Mar 13, 2024
1 parent e8b70df commit 5bd2cdb
Showing 1 changed file with 17 additions and 3 deletions.
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,20 @@ including text, images, and tables. It utilizes a retriever to store and manage

![alt text](https://blog.langchain.dev/content/images/size/w1600/2023/10/image-22.png)

- **Option 1**: This option involves retrieving the raw image directly from the dataset and combining it with the raw table and text data. The combined raw data is then processed by a Multimodal LLM to generate an answer. This approach uses the complete, unprocessed image data in conjunction with textual information.
- Ingestion : Multimodal embeddings
- RAG chain : Multimodal LLM

- **Option 2**: In this option, instead of using the raw image, an image summary is retrieved. This summary, along with the raw table and text data, is fed into a Text LLM to generate an answer.
- Ingestion : Multimodal LLM (for summarization) + Text embeddings
- RAG chain : Text LLM

- **Option 3**: This option also retrieves an image summary, but unlike Option 2, it passes the raw image to a Multimodal LLM for synthesis along with the raw table and text data.
- Ingestion : Multimodal LLM (for summarization) + Text embeddings
- RAG chain : Multimodal LLM

For all options, we can choose to treat tables as text or images.

### RAG Option 1

Folder: [backend/rag_1](backend/rag_1)
Expand Down Expand Up @@ -56,14 +70,14 @@ Folder: [backend/rag_2](backend/rag_2)
Method:

- Use a multimodal LLM (such as GPT-4V, LLaVA, or FUYU-8b) to produce text summaries from images.
- Embed and retrieve text.
- Pass text chunks to a text LLM for answer synthesis.
- Embed and retrieve image summaries and texts chunks.
- Pass image summaries and text chunks to a text LLM for answer synthesis.

Backend:

- Use the [multi-vector retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector)
with [Chroma](https://www.trychroma.com/) to store raw text (or tables) and images along with their summaries for retrieval.
- Use GPT-4V for image summarization (for retrieval)
- Use GPT-4V for image summarization.
- Use GPT-4 for final answer synthesis from join review of image summaries and texts (or tables).

Parameters:
Expand Down

0 comments on commit 5bd2cdb

Please sign in to comment.