update README

artefactory · Mar 13, 2024 · 5bd2cdb · 5bd2cdb
1 parent e8b70df
commit 5bd2cdb
Showing 1 changed file with 17 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -26,6 +26,20 @@ including text, images, and tables. It utilizes a retriever to store and manage
 
 ![alt text](https://blog.langchain.dev/content/images/size/w1600/2023/10/image-22.png)
 
+- **Option 1**: This option involves retrieving the raw image directly from the dataset and combining it with the raw table and text data. The combined raw data is then processed by a Multimodal LLM to generate an answer. This approach uses the complete, unprocessed image data in conjunction with textual information.
+  - Ingestion : Multimodal embeddings
+  - RAG chain : Multimodal LLM
+
+- **Option 2**: In this option, instead of using the raw image, an image summary is retrieved. This summary, along with the raw table and text data, is fed into a Text LLM to generate an answer.
+  - Ingestion : Multimodal LLM (for summarization) + Text embeddings
+  - RAG chain : Text LLM
+
+- **Option 3**: This option also retrieves an image summary, but unlike Option 2, it passes the raw image to a Multimodal LLM for synthesis along with the raw table and text data.
+  - Ingestion : Multimodal LLM (for summarization) + Text embeddings
+  - RAG chain : Multimodal LLM
+
+For all options, we can choose to treat tables as text or images.
+
 ### RAG Option 1
 
 Folder: [backend/rag_1](backend/rag_1)
@@ -56,14 +70,14 @@ Folder: [backend/rag_2](backend/rag_2)
 Method:
 
 - Use a multimodal LLM (such as GPT-4V, LLaVA, or FUYU-8b) to produce text summaries from images.
-- Embed and retrieve text.
-- Pass text chunks to a text LLM for answer synthesis.
+- Embed and retrieve image summaries and texts chunks.
+- Pass image summaries and text chunks to a text LLM for answer synthesis.
 
 Backend:
 
 - Use the [multi-vector retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector)
   with [Chroma](https://www.trychroma.com/) to store raw text (or tables) and images along with their summaries for retrieval.
-- Use GPT-4V for image summarization (for retrieval)
+- Use GPT-4V for image summarization.
 - Use GPT-4 for final answer synthesis from join review of image summaries and texts (or tables).
 
 Parameters: