Add Multimodal RAG (llamaindex+NIMs) example to community projects #178

jayrodge · 2024-08-29T04:21:48Z

dglogo · 2024-08-29T04:52:39Z

community/README.md

@@ -45,7 +45,7 @@ Community examples are sample code and deployments for RAG pipelines that are no

 * [NVIDIA Multimodal RAG Assistant](./multimodal_assistant)

-  This example is able to ingest PDFs, PowerPoint slides, Word and other documents with complex data formats including text, images, slides and tables. It allows users to ask questions through a text interface and optionally with an image query, and it can respond with text and reference images, slides and tables in its response, along with source links and downloads.
+  This example is able to ingest PDFs, PowerPoint slides, Word and other documents with complex data formats including text, images, slides and tables, orchestrated with Langchain. It allows users to ask questions through a text interface and optionally with an image query, and it can respond with text and reference images, slides and tables in its response, along with source links and downloads. Refer to this [example](./multimodal-rag) for the LlamaIndex version that uses [integration](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_nim/) with NVIDIA Inference Microservices (NIMs) of the Multimodal RAG Assistant.


NIMs -> NIM microservices

dglogo · 2024-08-29T04:53:01Z

community/multimodal-rag/README.md

+
+This Streamlit application implements a Multimodal Retrieval-Augmented Generation (RAG) system. It processes various types of documents including text files, PDFs, PowerPoint presentations, and images. The app leverages Large Language Models and Vision Language Models to extract and index information from these documents, allowing users to query the processed data through an interactive chat interface.
+
+The system utilizes LlamaIndex for efficient indexing and retrieval of information, NVIDIA Inference Microservices (NIMs) for high-performance inference capabilities, and Milvus as a vector database for efficient storage and retrieval of embedding vectors. This combination of technologies enables the application to handle complex multimodal data, perform advanced queries, and deliver rapid, context-aware responses to user inquiries.


NIMs -> NIM microservices

dglogo · 2024-08-29T04:53:25Z

community/multimodal-rag/README.md

+
+- **Multi-format Document Processing**: Handles text files, PDFs, PowerPoint presentations, and images.
+- **Advanced Text Extraction**: Extracts text from PDFs and PowerPoint slides, including tables and embedded images.
+- **Image Analysis**: Uses a VLM (NeVA) to describe images and Google's DePlot for processing graphs/charts on NVIDIA Inference Microservices (NIMs).


NIMs -> NIM microservices

dglogo · 2024-08-29T04:56:14Z

community/README.md

@@ -45,7 +45,7 @@ Community examples are sample code and deployments for RAG pipelines that are no

 * [NVIDIA Multimodal RAG Assistant](./multimodal_assistant)

-  This example is able to ingest PDFs, PowerPoint slides, Word and other documents with complex data formats including text, images, slides and tables. It allows users to ask questions through a text interface and optionally with an image query, and it can respond with text and reference images, slides and tables in its response, along with source links and downloads.
+  This example is able to ingest PDFs, PowerPoint slides, Word and other documents with complex data formats including text, images, slides and tables, orchestrated with Langchain. It allows users to ask questions through a text interface and optionally with an image query, and it can respond with text and reference images, slides and tables in its response, along with source links and downloads. Refer to this [example](./multimodal-rag) for the LlamaIndex version that uses [integration](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_nim/) with NVIDIA Inference Microservices (NIMs) of the Multimodal RAG Assistant.


dglogo · 2024-08-29T04:57:23Z

community/multimodal-rag/README.md

+
+This Streamlit application implements a Multimodal Retrieval-Augmented Generation (RAG) system. It processes various types of documents including text files, PDFs, PowerPoint presentations, and images. The app leverages Large Language Models and Vision Language Models to extract and index information from these documents, allowing users to query the processed data through an interactive chat interface.
+
+The system utilizes LlamaIndex for efficient indexing and retrieval of information, NVIDIA Inference Microservices (NIMs) for high-performance inference capabilities, and Milvus as a vector database for efficient storage and retrieval of embedding vectors. This combination of technologies enables the application to handle complex multimodal data, perform advanced queries, and deliver rapid, context-aware responses to user inquiries.


NIMs -> NIM microservices

dglogo · 2024-08-29T04:58:04Z

community/multimodal-rag/app.py

+
+# Initialize settings
+def initialize_settings():
+    Settings.embed_model = NVIDIAEmbedding(model="NV-Embed-QA", truncate="END")


NV-Embed-QA -> nvidia / nv-embedqa-e5-v5

Add Multimodal RAG example to community projects

a378699

jayrodge changed the title ~~Add Multimodal RAG example to community projects~~ Add Multimodal RAG (llamaindex+NIMs) example to community projects Aug 29, 2024

added instruction to setup Milvus GPU docker

ac951aa

dglogo requested a review from shubhadeepd August 29, 2024 05:00

added copyright headers

35a04a9

dglogo approved these changes Aug 29, 2024

View reviewed changes

dglogo merged commit 06f9905 into NVIDIA:main Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Multimodal RAG (llamaindex+NIMs) example to community projects #178

Add Multimodal RAG (llamaindex+NIMs) example to community projects #178

jayrodge commented Aug 29, 2024

dglogo Aug 29, 2024

dglogo Aug 29, 2024

dglogo Aug 29, 2024

dglogo Aug 29, 2024

dglogo Aug 29, 2024

dglogo Aug 29, 2024


		This Streamlit application implements a Multimodal Retrieval-Augmented Generation (RAG) system. It processes various types of documents including text files, PDFs, PowerPoint presentations, and images. The app leverages Large Language Models and Vision Language Models to extract and index information from these documents, allowing users to query the processed data through an interactive chat interface.

		The system utilizes LlamaIndex for efficient indexing and retrieval of information, NVIDIA Inference Microservices (NIMs) for high-performance inference capabilities, and Milvus as a vector database for efficient storage and retrieval of embedding vectors. This combination of technologies enables the application to handle complex multimodal data, perform advanced queries, and deliver rapid, context-aware responses to user inquiries.

Add Multimodal RAG (llamaindex+NIMs) example to community projects #178

Add Multimodal RAG (llamaindex+NIMs) example to community projects #178

Conversation

jayrodge commented Aug 29, 2024

dglogo Aug 29, 2024

Choose a reason for hiding this comment

dglogo Aug 29, 2024

Choose a reason for hiding this comment

dglogo Aug 29, 2024

Choose a reason for hiding this comment

dglogo Aug 29, 2024

Choose a reason for hiding this comment

dglogo Aug 29, 2024

Choose a reason for hiding this comment

dglogo Aug 29, 2024

Choose a reason for hiding this comment