Web Content Q&A Tool (RAG-based)

Overview

This is a Web-based Q&A Tool that allows users to ask questions based on extracted content from URLs. The tool leverages:

BeautifulSoup for web scraping
FAISS for semantic search and retrieval
Sentence Transformers for text embeddings
OpenAI GPT-3.5 Turbo for generating answers
Streamlit for the user-friendly web interface

Features

✅ Extracts textual content from multiple URLs ✅ Splits text into chunks for efficient retrieval ✅ Creates an FAISS index for fast similarity search ✅ Retrieves relevant content based on user queries ✅ Uses OpenAI's GPT-3.5 Turbo to generate answers ✅ Displays relevant content chunks for transparency ✅ User-friendly UI with Streamlit

Installation

Clone the repository and install dependencies:

git clone https://github.com/your-username/web-qa-tool.git
cd web-qa-tool
pip install -r requirements.txt

Environment Variables

Create a .env file and add your OpenAI API Key:

OPENAI_API_KEY=your_openai_api_key

Usage

Run the Streamlit app:

streamlit run app.py

Steps to Use:

Enter URLs (one per line) in the input field.
Click Process URLs to extract and index content.
Ask questions in the chat interface.
View generated answers along with relevant content chunks.

Project Structure

web_qa_tool/
│── main.py                # Streamlit UI (entry point)
│── config.py              # Configuration (e.g., API keys)
│── requirements.txt       # Dependencies
│
├── modules/               # Folder for modularized functionalities
│   │── text_extraction.py # Functions to extract text from URLs
│   │── text_processing.py # Text splitting and preprocessing functions
│   │── faiss_indexing.py  # FAISS index creation and retrieval
│   │── openai_helper.py   # OpenAI API interaction

API & Libraries Used

BeautifulSoup for extracting web content
Sentence Transformers (all-MiniLM-L6-v2) for embeddings
FAISS for fast similarity search
OpenAI GPT-3.5 Turbo for answer generation
Streamlit for UI

Future Enhancements

🔹 Currently processing a single page from a URL, but we can extend it to process an entire website 🔹 Improved chunking strategy for better context retrieval 🔹 Option to upload documents (PDF, DOCX) for Q&A

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.devcontainer		.devcontainer
__pycache__		__pycache__
modules		modules
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Content Q&A Tool (RAG-based)

Overview

Features

Installation

Environment Variables

Usage

Steps to Use:

Project Structure

API & Libraries Used

Future Enhancements

About

Releases

Packages

Contributors 2

Languages

sauurabh/web_qa_tool

Folders and files

Latest commit

History

Repository files navigation

Web Content Q&A Tool (RAG-based)

Overview

Features

Installation

Environment Variables

Usage

Steps to Use:

Project Structure

API & Libraries Used

Future Enhancements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages