Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multimodal RAG (llamaindex+NIMs) example to community projects #178

Merged
merged 3 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions RAG/notebooks/llamaindex/llamaindex_basic_RAG.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@
"cell_type": "markdown",
"id": "2969cdab-82fc-4ce5-bde1-b4f629691f27",
"metadata": {},
"source": [
"This notebook introduces how to use LlamaIndex to interact with NVIDIA hosted NIM microservices like chat, embedding, and reranking models to build a simple retrieval-augmented generation (RAG) application.\n",
"\n",
"Alternatively, for a more interactive experience with a graphical user interface, you can refer to our Gradio-based RAG Q&A reference application that also uses NVIDIA hosted NIM microservices [here](https://github.com/jayrodge/llm-assistant-cloud-app/)."
]
"source": [
"This notebook introduces how to use LlamaIndex to interact with NVIDIA hosted NIM microservices like chat, embedding, and reranking models to build a simple retrieval-augmented generation (RAG) application.\n",
"\n",
"Alternatively, for a more interactive experience with a graphical user interface, you can refer to our [code](https://github.com/jayrodge/llm-assistant-cloud-app/) and [YouTube video](https://www.youtube.com/watch?v=09uDCmLzYHA) for Gradio-based RAG Q&A reference application that also uses NVIDIA hosted NIM microservices."
]
},
{
"cell_type": "markdown",
Expand Down
2 changes: 1 addition & 1 deletion community/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Community examples are sample code and deployments for RAG pipelines that are no

* [NVIDIA Multimodal RAG Assistant](./multimodal_assistant)

This example is able to ingest PDFs, PowerPoint slides, Word and other documents with complex data formats including text, images, slides and tables. It allows users to ask questions through a text interface and optionally with an image query, and it can respond with text and reference images, slides and tables in its response, along with source links and downloads.
This example is able to ingest PDFs, PowerPoint slides, Word and other documents with complex data formats including text, images, slides and tables, orchestrated with Langchain. It allows users to ask questions through a text interface and optionally with an image query, and it can respond with text and reference images, slides and tables in its response, along with source links and downloads. Refer to this [example](./multimodal-rag) for the LlamaIndex version that uses [integration](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_nim/) with NVIDIA Inference Microservices (NIMs) of the Multimodal RAG Assistant.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIMs -> NIM microservices

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LangChain?


* [NVIDIA Developer RAG Chatbot](./rag-developer-chatbot)

Expand Down
109 changes: 109 additions & 0 deletions community/multimodal-rag/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Creating Multimodal AI Agent for Enhanced Content Understanding

## Overview

This Streamlit application implements a Multimodal Retrieval-Augmented Generation (RAG) system. It processes various types of documents including text files, PDFs, PowerPoint presentations, and images. The app leverages Large Language Models and Vision Language Models to extract and index information from these documents, allowing users to query the processed data through an interactive chat interface.

The system utilizes LlamaIndex for efficient indexing and retrieval of information, NVIDIA Inference Microservices (NIMs) for high-performance inference capabilities, and Milvus as a vector database for efficient storage and retrieval of embedding vectors. This combination of technologies enables the application to handle complex multimodal data, perform advanced queries, and deliver rapid, context-aware responses to user inquiries.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIMs -> NIM microservices

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIMs -> NIM microservices


## Features

- **Multi-format Document Processing**: Handles text files, PDFs, PowerPoint presentations, and images.
- **Advanced Text Extraction**: Extracts text from PDFs and PowerPoint slides, including tables and embedded images.
- **Image Analysis**: Uses a VLM (NeVA) to describe images and Google's DePlot for processing graphs/charts on NVIDIA Inference Microservices (NIMs).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIMs -> NIM microservices

- **Vector Store Indexing**: Creates a searchable index of processed documents using Milvus vector store.
- **Interactive Chat Interface**: Allows users to query the processed information through a chat-like interface.

## Setup

1. Clone the repository:
```
git clone https://github.com/NVIDIA/GenerativeAIExamples.git
cd GenerativeAIExamples/community/multimodal_rag
```

2. (Optional) Create a conda environment or a virtual environment:

- Using conda:
```
conda create --name multimodal-rag python=3.10
conda activate multimodal-rag
```

- Using venv:
```
python -m venv venv
source venv/bin/activate

3. Install the required packages:
```
pip install -r requirements.txt
```

4. Set up your NVIDIA API key as an environment variable:
```
export NVIDIA_API_KEY="your-api-key-here"
```

5. Refer this [tutorial](https://milvus.io/docs/install_standalone-docker-compose-gpu.md) to install and start the GPU-accelerated Milvus container:

```
sudo docker compose up -d
```


## Usage

1. Ensure the Milvus container is running:

```bash
docker ps
```

2. Run the Streamlit app:
```
streamlit run app.py
```

3. Open the provided URL in your web browser.

4. Choose between uploading files or specifying a directory path containing your documents.

5. Process the files by clicking the "Process Files" or "Process Directory" button.

6. Once processing is complete, use the chat interface to query your documents.

## File Structure

- `app.py`: Main Streamlit application
- `utils.py`: Utility functions for image processing and API interactions
- `document_processors.py`: Functions for processing various document types
- `requirements.txt`: List of Python dependencies
- `vectorstore/` : Repository to store information from pdfs and ppt


## GPU Acceleration for Vector Search
To utilize GPU acceleration in the vector database, ensure that:
1. Your system has a compatible NVIDIA GPU.
2. You're using the GPU-enabled version of Milvus (as shown in the setup instructions).
3. There are enough concurrent requests to justify GPU usage. GPU acceleration typically shows significant benefits under high load conditions.

It's important to note that GPU acceleration will only be used when the incoming requests are extremely high. For more detailed information on GPU indexing and search in Milvus, refer to the [official Milvus GPU Index documentation](https://milvus.io/docs/gpu_index.md).

To connect the GPU-accelerated Milvus with LlamaIndex, update the MilvusVectorStore configuration in app.py:
```
vector_store = MilvusVectorStore(
host="127.0.0.1",
port=19530,
dim=1024,
collection_name="your_collection_name",
gpu_id=0 # Specify the GPU ID to use
)
```

## Contributing
Contributions to this project are welcome! Please follow these steps:
1. Fork the NVIDIA/GenerativeAIExamples repository.
2. Create a new branch for your feature or bug fix.
3. Make your changes in the community/multimodal_rag/ directory.
4. Submit a pull request to the main repository.
103 changes: 103 additions & 0 deletions community/multimodal-rag/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
import os
import streamlit as st
from llama_index.core import Settings
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.node_parser import SentenceSplitter
from llama_index.vector_stores.milvus import MilvusVectorStore
from llama_index.embeddings.nvidia import NVIDIAEmbedding
from llama_index.llms.nvidia import NVIDIA

from document_processors import load_multimodal_data, load_data_from_directory
from utils import set_environment_variables

# Set up the page configuration
st.set_page_config(layout="wide")

# Initialize settings
def initialize_settings():
Settings.embed_model = NVIDIAEmbedding(model="NV-Embed-QA", truncate="END")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NV-Embed-QA -> nvidia / nv-embedqa-e5-v5

Settings.llm = NVIDIA(model="meta/llama-3.1-70b-instruct")
Settings.text_splitter = SentenceSplitter(chunk_size=600)

# Create index from documents
def create_index(documents):
vector_store = MilvusVectorStore(
host = "127.0.0.1",
port = 19530,
dim = 1024
)
# vector_store = MilvusVectorStore(uri="./milvus_demo.db", dim=1024, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
return VectorStoreIndex.from_documents(documents, storage_context=storage_context)

# Main function to run the Streamlit app
def main():
set_environment_variables()
initialize_settings()

col1, col2 = st.columns([1, 2])

with col1:
st.title("Multimodal RAG")

input_method = st.radio("Choose input method:", ("Upload Files", "Enter Directory Path"))

if input_method == "Upload Files":
uploaded_files = st.file_uploader("Drag and drop files here", accept_multiple_files=True)
if uploaded_files and st.button("Process Files"):
with st.spinner("Processing files..."):
documents = load_multimodal_data(uploaded_files)
st.session_state['index'] = create_index(documents)
st.session_state['history'] = []
st.success("Files processed and index created!")
else:
directory_path = st.text_input("Enter directory path:")
if directory_path and st.button("Process Directory"):
if os.path.isdir(directory_path):
with st.spinner("Processing directory..."):
documents = load_data_from_directory(directory_path)
st.session_state['index'] = create_index(documents)
st.session_state['history'] = []
st.success("Directory processed and index created!")
else:
st.error("Invalid directory path. Please enter a valid path.")

with col2:
if 'index' in st.session_state:
st.title("Chat")
if 'history' not in st.session_state:
st.session_state['history'] = []

query_engine = st.session_state['index'].as_query_engine(similarity_top_k=20, streaming=True)

user_input = st.chat_input("Enter your query:")

# Display chat messages
chat_container = st.container()
with chat_container:
for message in st.session_state['history']:
with st.chat_message(message["role"]):
st.markdown(message["content"])

if user_input:
with st.chat_message("user"):
st.markdown(user_input)
st.session_state['history'].append({"role": "user", "content": user_input})

with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""
response = query_engine.query(user_input)
for token in response.response_gen:
full_response += token
message_placeholder.markdown(full_response + "▌")
message_placeholder.markdown(full_response)
st.session_state['history'].append({"role": "assistant", "content": full_response})

# Add a clear button
if st.button("Clear Chat"):
st.session_state['history'] = []
st.rerun()

if __name__ == "__main__":
main()
Loading