Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the multimodal RAG tutorial with Amazon Nova and LangChain #305

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

debnsuma
Copy link

This notebook demonstrates how to implement a multi-modal Retrieval-Augmented Generation (RAG) system using Amazon Bedrock with Amazon Nova and LangChain. Many documents contain a mixture of content types, including text and images. Traditional RAG applications often lose valuable information captured in images. With the emergence of Multimodal Large Language Models (MLLMs), we can now leverage both text and image data in our RAG systems.

Copy link
Collaborator

@3coins 3coins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@debnsuma
This is a a great addition, thanks for submitting this. Added a few suggestions to simplify some of the code and structure.

Comment on lines 150 to 160
"source": [
"<h2 style=\"background: linear-gradient(to right, #ff6b6b, #4ecdc4, #1e90ff); \n",
" color: white; \n",
" padding: 15px; \n",
" border-radius: 10px; \n",
" text-align: center; \n",
" font-family: 'Comic Sans MS', cursive, sans-serif; \n",
" text-shadow: 2px 2px 4px rgba(0,0,0,0.5);\">\n",
" Data Loading\n",
"</h2>"
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest to use markdown headers, instead of any HTML elements to keep the notebook simple and consistent.

Comment on lines 198 to 216
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"<h2 style=\"background: linear-gradient(to right, #ff6b6b, #4ecdc4, #1e90ff); \n",
" color: white; \n",
" padding: 15px; \n",
" border-radius: 10px; \n",
" text-align: center; \n",
" font-family: 'Comic Sans MS', cursive, sans-serif; \n",
" text-shadow: 2px 2px 4px rgba(0,0,0,0.5);\">\n",
" Data Extraction\n",
"</h2>"
]
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with a markdown heading.

"overlap= 200\n",
"\n",
"# Process chunks with LangChain's RecursiveCharacterTextSplitter\n",
"text_splitter = RecursiveCharacterTextSplitter(\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Formatting seems a bit off here.

Comment on lines 231 to 234
"image_save_dir = \"data/processed_images\"\n",
"text_save_dir = \"data/processed_text\"\n",
"table_save_dir = \"data/processed_tables\"\n",
"page_images_save_dir = \"data/processed_page_images\"\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can probably simplify and shorten the names here.

Suggested change
"image_save_dir = \"data/processed_images\"\n",
"text_save_dir = \"data/processed_text\"\n",
"table_save_dir = \"data/processed_tables\"\n",
"page_images_save_dir = \"data/processed_page_images\"\n",
"images_dir = \"data/images\"\n",
"texts_dir = \"data/texts\"\n",
"tables_dir = \"data/tables\"\n",
"page_images_dir = \"data/page_images\"\n",

" page = doc[page_num]\n",
" text = page.get_text()\n",
"\n",
" # Step 1: Get/extract all TABLES in the curremt page and store \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight typo here.

Suggested change
" # Step 1: Get/extract all TABLES in the curremt page and store \n",
" # Step 1: Get/extract all TABLES in the current page and store \n",

Comment on lines +433 to +438
" response = client.invoke_model(\n",
" modelId=model_id,\n",
" body=json.dumps(body),\n",
" accept=\"application/json\",\n",
" contentType=\"application/json\"\n",
" )\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can BedrockEmbeddings be used here, instead of invoking boto3 directly? Also, it seems like embeddings are being generated externally, but usually in a RAG app, this is possible by just passing the documents to the vector store.

vector_store.add_documents(all_splits)

Comment on lines 488 to 505
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"<h2 style=\"background: linear-gradient(to right, #ff6b6b, #4ecdc4, #1e90ff); \n",
" color: white; \n",
" padding: 15px; \n",
" border-radius: 10px; \n",
" text-align: center; \n",
" font-family: 'Comic Sans MS', cursive, sans-serif; \n",
" text-shadow: 2px 2px 4px rgba(0,0,0,0.5);\">\n",
" Creating Vector Database/Index\n",
"</h2>"
]
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to markdown header.

"# Generating RAG response with Amazon Nova\n",
"def invoke_nova_multimodal(prompt, matched_items):\n",
" \"\"\"\n",
" Invoke the Amazon Nova model using langchain-aws.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" Invoke the Amazon Nova model using langchain-aws.\n",
" Invoke the Amazon Nova model.\n",

Comment on lines 605 to 622
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"<h2 style=\"background: linear-gradient(to right, #ff6b6b, #4ecdc4, #1e90ff); \n",
" color: white; \n",
" padding: 15px; \n",
" border-radius: 10px; \n",
" text-align: center; \n",
" font-family: 'Comic Sans MS', cursive, sans-serif; \n",
" text-shadow: 2px 2px 4px rgba(0,0,0,0.5);\">\n",
" Test the RAG Pipeline\n",
"</h2>"
]
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with markdown header.

Comment on lines 730 to 745
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 style=\"background: linear-gradient(to right, #ff6b6b, #4ecdc4, #1e90ff); \n",
" color: white; \n",
" padding: 15px; \n",
" border-radius: 10px; \n",
" text-align: center; \n",
" font-family: 'Comic Sans MS', cursive, sans-serif; \n",
" text-shadow: 2px 2px 4px rgba(0,0,0,0.5);\">\n",
" Thank you!\n",
"</h2>"
]
}
],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with a markdown header.

@debnsuma
Copy link
Author

Thanks so much @3coins for all your inputs. I fixed all of them and refactored the notebook and pushed the changes.
Just one feedback, I am still exploring on how to perform the embeddings using BedrockEmbeddings, I tried before, but I was not able to do that to generate embeddings for both image and text as Titan's query request schema is different for image and text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants