Agentic RAG System with PDF and Website Integration

This project implements an Agentic Retrieval-Augmented Generation (RAG) system that allows users to retrieve answers from uploaded PDFs, specified website URLs, or a combination of both. The system uses an intelligent agent to decide whether a query can be answered based on the provided sources or needs to fall back on online searches.

Key Features

PDF Retrieval: Upload PDF files and extract information for question answering.
Website Retrieval: Provide URLs to extract and use content for answering queries.
Combined Query Handling: Simultaneously process PDFs and URLs to retrieve answers.
Agent Logic:
- First checks if the answer exists in the uploaded PDF.
- If not found, checks the website content.
- If unavailable in both, declares the question as outside the RAG database and refrains from answering.
Fallback Search: If no relevant information is found in the provided data, an online search is used to retrieve relevant context.

Tech Stack

Streamlit: User interface.
PyPDF2: Extract text from PDF files.
BeautifulSoup: Parse and clean website content.
OpenAI API: Generate embeddings and answer questions.
Qdrant: Vector database for semantic search.
DuckDuckGo Search: Online search fallback for out-of-database queries.

Installation

Clone the Repository:

git clone https://github.com/rajveersinghcse/Agentic_RAG
cd Agentic_RAG

Install Dependencies:
```
pip install -r requirements.txt
```
Set Up Qdrant:
- Download and install Qdrant.
- Start Qdrant on http://localhost:6333.

Usage

Run the Application:
```
streamlit run app.py
```
Configure API Key:
- Enter your OpenAI API Key in the designated input field in the app.
Upload Data:
- Upload PDF files or provide website URLs (comma-separated).
- Optionally, enable crawling to extract content from all linked pages.
Process Data:
- Click "Process and Index Documents" to generate embeddings and store them in the Qdrant database.
Ask Questions:
- Enter your question in the input field.
- The agent determines the source of the answer:
  - Retrieves from PDF if present.
  - Falls back to website if not in PDF.
  - If neither, performs an online search (optional) or states that the question is outside the RAG database.

Agent Workflow

PDF Search: If the answer is found in the uploaded PDFs, it is retrieved and displayed.
Website Search: If the answer is not in PDFs, it searches through the provided website content.
Fallback Search: If neither source contains the answer, the question is identified as outside the RAG database.

Configuration

OpenAI API Key: Required for embeddings and question-answering models.
Qdrant: Must be running locally or configured to a remote host in the code.

Requirements

Python 3.8+ (I used 3.12.7)
Valid OpenAI API Key
Running instance of Qdrant

Dependencies

streamlit
PyPDF2
beautifulsoup4
qdrant-client
litellm
duckduckgo_search
langchain_text_splitters

Install all dependencies with:

pip install -r requirements.txt

FAQ

What happens if I upload both PDFs and URLs?

The agent processes both and prioritizes the PDFs. If the answer is not in PDFs, it checks the websites.

Can it answer questions outside the uploaded data?

No. If the answer isn't in the PDFs or URLs, the agent either performs an online search (if enabled) or states that it can't answer.

What if the Qdrant server isn't running?

Ensure Qdrant is properly installed and started on localhost:6333 before indexing documents.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic RAG System with PDF and Website Integration

Key Features

Tech Stack

Installation

Usage

Agent Workflow

Configuration

Requirements

Dependencies

FAQ

What happens if I upload both PDFs and URLs?

Can it answer questions outside the uploaded data?

What if the Qdrant server isn't running?

About

Releases

Packages

Languages

rajveersinghcse/Agentic_RAG

Folders and files

Latest commit

History

Repository files navigation

Agentic RAG System with PDF and Website Integration

Key Features

Tech Stack

Installation

Usage

Agent Workflow

Configuration

Requirements

Dependencies

FAQ

What happens if I upload both PDFs and URLs?

Can it answer questions outside the uploaded data?

What if the Qdrant server isn't running?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages