New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Adding the multimodal RAG tutorial with Amazon Nova and LangChain #305

Open

debnsuma wants to merge 8 commits into langchain-ai:main from debnsuma:multimodal-rag-with-amazon-nova

debnsuma commented Dec 12, 2024

This notebook demonstrates how to implement a multi-modal Retrieval-Augmented Generation (RAG) system using Amazon Bedrock with Amazon Nova and LangChain. Many documents contain a mixture of content types, including text and images. Traditional RAG applications often lose valuable information captured in images. With the emergence of Multimodal Large Language Models (MLLMs), we can now leverage both text and image data in our RAG systems.

debnsuma added 4 commits

December 12, 2024 15:32


          adding the multimodal RAG tutorial with Amazon Nova and LangChain

13d4d87


          updated the invole function, replaced boto3 with langchian_aws

e01949d


          updated the invole function, replaced boto3 with langchian_aws

55e0761


          updated the invole function, replaced boto3 with langchian_aws

1a7c8ff

3coins reviewed

View reviewed changes

Collaborator

3coins left a comment

@debnsuma
This is a a great addition, thanks for submitting this. Added a few suggestions to simplify some of the code and structure.

samples/multi-modal/multimodal_rag_with_nova.ipynb Outdated

Comment on lines 150 to 160

+                 "source": [
+                  "<h2 style=\"background: linear-gradient(to right, #ff6b6b, #4ecdc4, #1e90ff); \n",
+                  "            color: white; \n",
+                  "            padding: 15px; \n",
+                  "            border-radius: 10px; \n",
+                  "            text-align: center; \n",
+                  "            font-family: 'Comic Sans MS', cursive, sans-serif; \n",
+                  "            text-shadow: 2px 2px 4px rgba(0,0,0,0.5);\">\n",
+                  "   Data Loading\n",
+                  "</h2>"
+                 ]

Collaborator

3coins Dec 16, 2024

Would suggest to use markdown headers, instead of any HTML elements to keep the notebook simple and consistent.

samples/multi-modal/multimodal_rag_with_nova.ipynb Outdated

Comment on lines 198 to 216

+                {
+                 "cell_type": "markdown",
+                 "metadata": {
+                  "slideshow": {
+                   "slide_type": "slide"
+                  }
+                 },
+                 "source": [
+                  "<h2 style=\"background: linear-gradient(to right, #ff6b6b, #4ecdc4, #1e90ff); \n",
+                  "            color: white; \n",
+                  "            padding: 15px; \n",
+                  "            border-radius: 10px; \n",
+                  "            text-align: center; \n",
+                  "            font-family: 'Comic Sans MS', cursive, sans-serif; \n",
+                  "            text-shadow: 2px 2px 4px rgba(0,0,0,0.5);\">\n",
+                  "   Data Extraction\n",
+                  "</h2>"
+                 ]
+                },

Collaborator

3coins Dec 16, 2024

Replace with a markdown heading.

samples/multi-modal/multimodal_rag_with_nova.ipynb Outdated

+                  "overlap= 200\n",
+                  "\n",
+                  "# Process chunks with LangChain's RecursiveCharacterTextSplitter\n",
+                  "text_splitter = RecursiveCharacterTextSplitter(\n",

Collaborator

3coins Dec 16, 2024

nit: Formatting seems a bit off here.

samples/multi-modal/multimodal_rag_with_nova.ipynb Outdated

Comment on lines 231 to 234

+                  "image_save_dir = \"data/processed_images\"\n",
+                  "text_save_dir = \"data/processed_text\"\n",
+                  "table_save_dir = \"data/processed_tables\"\n",
+                  "page_images_save_dir = \"data/processed_page_images\"\n",

Collaborator

3coins Dec 16, 2024

nit: Can probably simplify and shorten the names here.

Suggested change

      
                "image_save_dir = \"data/processed_images\"\n",
          
                "text_save_dir = \"data/processed_text\"\n",
          
                "table_save_dir = \"data/processed_tables\"\n",
          
                "page_images_save_dir = \"data/processed_page_images\"\n",
          
                "images_dir = \"data/images\"\n",
          
                "texts_dir = \"data/texts\"\n",
          
                "tables_dir = \"data/tables\"\n",
          
                "page_images_dir = \"data/page_images\"\n",

samples/multi-modal/multimodal_rag_with_nova.ipynb Outdated

+                  "    page = doc[page_num]\n",
+                  "    text = page.get_text()\n",
+                  "\n",
+                  "    # Step 1: Get/extract all TABLES in the curremt page and store \n",

Collaborator

3coins Dec 16, 2024

Slight typo here.

Suggested change

      
                "    # Step 1: Get/extract all TABLES in the curremt page and store \n",
          
                "    # Step 1: Get/extract all TABLES in the current page and store \n",

samples/multi-modal/multimodal_rag_with_nova.ipynb

Comment on lines +433 to +438

+                  "        response = client.invoke_model(\n",
+                  "            modelId=model_id,\n",
+                  "            body=json.dumps(body),\n",
+                  "            accept=\"application/json\",\n",
+                  "            contentType=\"application/json\"\n",
+                  "        )\n",

Collaborator

3coins Dec 16, 2024

Can BedrockEmbeddings be used here, instead of invoking boto3 directly? Also, it seems like embeddings are being generated externally, but usually in a RAG app, this is possible by just passing the documents to the vector store.

vector_store.add_documents(all_splits)

samples/multi-modal/multimodal_rag_with_nova.ipynb Outdated

Comment on lines 488 to 505

+                 "cell_type": "markdown",
+                 "metadata": {
+                  "slideshow": {
+                   "slide_type": "slide"
+                  }
+                 },
+                 "source": [
+                  "<h2 style=\"background: linear-gradient(to right, #ff6b6b, #4ecdc4, #1e90ff); \n",
+                  "            color: white; \n",
+                  "            padding: 15px; \n",
+                  "            border-radius: 10px; \n",
+                  "            text-align: center; \n",
+                  "            font-family: 'Comic Sans MS', cursive, sans-serif; \n",
+                  "            text-shadow: 2px 2px 4px rgba(0,0,0,0.5);\">\n",
+                  "  Creating Vector Database/Index\n",
+                  "</h2>"
+                 ]
+                },

Collaborator

3coins Dec 16, 2024

Update to markdown header.

samples/multi-modal/multimodal_rag_with_nova.ipynb Outdated

+                  "# Generating RAG response with Amazon Nova\n",
+                  "def invoke_nova_multimodal(prompt, matched_items):\n",
+                  "    \"\"\"\n",
+                  "    Invoke the Amazon Nova model using langchain-aws.\n",

Collaborator

3coins Dec 16, 2024

Suggested change

      
                "    Invoke the Amazon Nova model using langchain-aws.\n",
          
                "    Invoke the Amazon Nova model.\n",

samples/multi-modal/multimodal_rag_with_nova.ipynb Outdated

Comment on lines 605 to 622

+                 "cell_type": "markdown",
+                 "metadata": {
+                  "slideshow": {
+                   "slide_type": "slide"
+                  }
+                 },
+                 "source": [
+                  "<h2 style=\"background: linear-gradient(to right, #ff6b6b, #4ecdc4, #1e90ff); \n",
+                  "            color: white; \n",
+                  "            padding: 15px; \n",
+                  "            border-radius: 10px; \n",
+                  "            text-align: center; \n",
+                  "            font-family: 'Comic Sans MS', cursive, sans-serif; \n",
+                  "            text-shadow: 2px 2px 4px rgba(0,0,0,0.5);\">\n",
+                  "  Test the RAG Pipeline\n",
+                  "</h2>"
+                 ]
+                },

Collaborator

3coins Dec 16, 2024

Replace with markdown header.

samples/multi-modal/multimodal_rag_with_nova.ipynb Outdated

Comment on lines 730 to 745

+                {
+                 "cell_type": "markdown",
+                 "metadata": {},
+                 "source": [
+                  "<h2 style=\"background: linear-gradient(to right, #ff6b6b, #4ecdc4, #1e90ff); \n",
+                  "            color: white; \n",
+                  "            padding: 15px; \n",
+                  "            border-radius: 10px; \n",
+                  "            text-align: center; \n",
+                  "            font-family: 'Comic Sans MS', cursive, sans-serif; \n",
+                  "            text-shadow: 2px 2px 4px rgba(0,0,0,0.5);\">\n",
+                  "  Thank you!\n",
+                  "</h2>"
+                 ]
+                }
+               ],

Collaborator

3coins Dec 16, 2024

Replace with a markdown header.

debnsuma added 2 commits

December 20, 2024 15:57


          Merge branch 'langchain-ai:main' into multimodal-rag-with-amazon-nova

1bbef2f


          refactoring notebook

71b1f75

fixing the notebook based on code review for pr#305

Author

debnsuma commented Dec 20, 2024

Thanks so much @3coins for all your inputs. I fixed all of them and refactored the notebook and pushed the changes.
Just one feedback, I am still exploring on how to perform the embeddings using BedrockEmbeddings, I tried before, but I was not able to do that to generate embeddings for both image and text as Titan's query request schema is different for image and text.

debnsuma added 2 commits

January 16, 2025 18:53


          Merge branch 'main' into multimodal-rag-with-amazon-nova

18489f0


          Merge branch 'main' into multimodal-rag-with-amazon-nova

766cb8f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet