Impersonator 🤖

A powerful document-based chatbot powered by LangChain and OpenAI that lets you have intelligent conversations with your documents. Upload PDFs and interact with their content through natural language queries.

✨ Features

Document Intelligence: Upload PDFs and have natural conversations about their content
Smart Text Processing: Advanced chunking algorithms for optimal context preservation
Vector Database Integration: Efficient document embedding storage with Supabase
Flexible Architecture: Support for multiple LLMs (OpenAI, HuggingFace) and Vector DBs (Supabase, Pinecone)
Containerized Deployment: Ready for production with Docker support
Interactive UI: Built with Streamlit for a seamless user experience

🚀 Quick Start

Using Docker (Recommended)

Clone the repository
Open a terminal in the impersonator folder
Create a .env file in the src/ directory with the following format:

OPENAI_API_KEY=your_openai_key
SUPABASE_API_KEY=your_supabase_key
SUPABASE_PROJ_URL=your_project_url
PINECONE_API_KEY=your_pinecone_key

Build the docker image

docker build -t impersonator .

Run with docker binding the docker containers ports to the host machines port!

docker run -p 8501:8501 impersonator

Visit http://localhost:8501 in your browser

Manual Setup

Navigate to source directory:

cd src/

Install all dependencies

pip install -r requirements.txt

Run the app!

streamlit run script.py

Go to the local host link provided in the CLI!

To-Do

Validate the uploaded files
Pre-train using the text
Add a chat feature with history

🛠️ Technical Stack

Frontend: Streamlit with streamlit-chat for UI components
Backend: Python with LangChain for LLM operations
Document Processing: PyPDF for PDF parsing
Vector Embeddings: OpenAI's embedding model (with HuggingFace support)
Vector Storage: Supabase (with Pinecone integration ready)
Containerization: Docker for consistent deployment
Environment Management: python-dotenv for configuration

💡 How It Works

Document Upload: Users upload PDF documents through the Streamlit interface
Text Processing: Documents are parsed and split into manageable chunks
Vector Embedding: Text chunks are converted to vector embeddings
Storage: Embeddings are stored in Supabase's vector database
Query Processing: User questions are processed using conversational AI
Response Generation: LLM generates contextual responses based on retrieved information

🔜 Roadmap

File validation and error handling
Pre-training capabilities
Enhanced chat history with conversation memory
Support for additional document formats
Alternative LLM integrations
Advanced vector database options

📄 License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Built with ❤️ by jaysqvl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Impersonator 🤖

✨ Features

🚀 Quick Start

Using Docker (Recommended)

Manual Setup

To-Do

🛠️ Technical Stack

💡 How It Works

🔜 Roadmap

📄 License

🤝 Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Impersonator 🤖

✨ Features

🚀 Quick Start

Using Docker (Recommended)

Manual Setup

To-Do

🛠️ Technical Stack

💡 How It Works

🔜 Roadmap

📄 License

🤝 Contributing