A powerful document-based chatbot powered by LangChain and OpenAI that lets you have intelligent conversations with your documents. Upload PDFs and interact with their content through natural language queries.
- Document Intelligence: Upload PDFs and have natural conversations about their content
- Smart Text Processing: Advanced chunking algorithms for optimal context preservation
- Vector Database Integration: Efficient document embedding storage with Supabase
- Flexible Architecture: Support for multiple LLMs (OpenAI, HuggingFace) and Vector DBs (Supabase, Pinecone)
- Containerized Deployment: Ready for production with Docker support
- Interactive UI: Built with Streamlit for a seamless user experience
-
Clone the repository
-
Open a terminal in the impersonator folder
-
Create a
.env
file in thesrc/
directory with the following format:
OPENAI_API_KEY=your_openai_key
SUPABASE_API_KEY=your_supabase_key
SUPABASE_PROJ_URL=your_project_url
PINECONE_API_KEY=your_pinecone_key
- Build the docker image
docker build -t impersonator .
- Run with docker binding the docker containers ports to the host machines port!
docker run -p 8501:8501 impersonator
- Visit
http://localhost:8501
in your browser
- Navigate to source directory:
cd src/
- Install all dependencies
pip install -r requirements.txt
- Run the app!
streamlit run script.py
- Go to the local host link provided in the CLI!
- Validate the uploaded files
- Pre-train using the text
- Add a chat feature with history
- Frontend: Streamlit with streamlit-chat for UI components
- Backend: Python with LangChain for LLM operations
- Document Processing: PyPDF for PDF parsing
- Vector Embeddings: OpenAI's embedding model (with HuggingFace support)
- Vector Storage: Supabase (with Pinecone integration ready)
- Containerization: Docker for consistent deployment
- Environment Management: python-dotenv for configuration
- Document Upload: Users upload PDF documents through the Streamlit interface
- Text Processing: Documents are parsed and split into manageable chunks
- Vector Embedding: Text chunks are converted to vector embeddings
- Storage: Embeddings are stored in Supabase's vector database
- Query Processing: User questions are processed using conversational AI
- Response Generation: LLM generates contextual responses based on retrieved information
- File validation and error handling
- Pre-training capabilities
- Enhanced chat history with conversation memory
- Support for additional document formats
- Alternative LLM integrations
- Advanced vector database options
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit issues and pull requests.
Built with ❤️ by jaysqvl