Research Flow is a Multi-Agent Research Tool with Langraph and Airflow that aims to build an end-to-end research tool using an Airflow pipeline to process documents, store and search vectors, and create a multi-agent research interface.
- Document parsing and vector storage pipeline
- Multi-agent system for document-based research
- User interaction interface for conducting research
- Export functionality for research findings
-
Airflow Pipeline
- Use Docling for parsing documents
- Configure Docling to process the provided dataset
- Extract text and export structured information
-
Vector Storage with Pinecone
- Store parsed document vectors in Pinecone for fast and scalable similarity search
-
Pipeline Automation
- Build an Airflow pipeline integrating Docling and Pinecone
- Automate the document parsing and vector storage process
- Agent Setup
- Use Pinecone for vector storage and retrieval
- Use Langraph to create a multi-agent system for document-based research
- Implement the following agents:
- Document Selection Agent
- Arxiv Agent
- Web Search Agent
- RAG (Retrieval-Augmented Generation) Agent
-
User Interaction Interface
- Streamlit to create a user interface
- Allow users to ask 5-6 questions per document
- Save results of each research session
-
Export Results
- Generate a professional PDF report of research findings
- Structure findings in a Codelabs format for instructional clarity and future reference
-
Clone the repository:
git clone https://github.com/BigDataIA-Fall2024-TeamA2/Assignment4
-
Install dependencies:
poetry install --dev
-
Set up environment variables:
cp .env.example .env
Edit the
.env
file with your configuration details. -
Start the Airflow services:
docker-compose up -d
-
Access the Airflow UI at
http://localhost:8080
to monitor and manage the pipeline. -
Run the frontend application:
streamlit run frontend/app.py
- Access the user interface through the provided URL.
- Select a document for research.
- Use the multi-agent system to conduct research and ask questions.
- Export your research findings as a PDF report or in Codelabs format.
For detailed instructions on setup, usage, and accessing the deployed applications, please refer to our comprehensive User Guide.
WE ATTEST THAT WE HAVEN’T USED ANY OTHER STUDENTS’ WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK
Contribution:
a. Gopi Krishna Gorle: 33%
b. Pranali Chipkar: 33%
c. Mubin Modi: 33%
.
├── README.md
├── app.py
├── architecture
│ ├── assignment_4_architecture.png
│ ├── generate_diagrams.py
├── backend
│ ├── __init__.py
│ ├── config.py
│ ├── database
│ │ ├── __init__.py
│ │ ├── articles.py
│ │ └── users.py
│ ├── logging.conf
│ ├── main.py
│ ├── research_agent
│ │ ├── __init__.py
│ │ ├── edges.py
│ │ ├── generate_chain.py
│ │ ├── grader.py
│ │ ├── graph.py
│ │ ├── nodes.py
│ │ └── vector_store.py
│ ├── schemas
│ │ ├── __init__.py
│ │ ├── articles.py
│ │ ├── auth.py
│ │ ├── chat.py
│ │ └── users.py
│ ├── server.py
│ ├── services
│ │ ├── __init__.py
│ │ ├── articles.py
│ │ ├── auth.py
│ │ ├── auth_bearer.py
│ │ ├── chat.py
│ │ └── users.py
│ ├── utils.py
│ └── views
│ ├── __init__.py
│ ├── articles.py
│ ├── auth.py
│ ├── chat.py
│ └── users.py
├── backend.Dockerfile
├── config
├── dags
│ ├── articles.py
│ ├── downloaded_pdfs
│ ├── pdf_download.py
│ ├── pdf_processor_indexer.py
│ └── pipeline.py
├── docker-compose-airflow.yaml
├── docker-compose-app.yml
├── frontend
│ ├── __init__.py
│ ├── config.py
│ ├── pages
│ │ ├── __init__.py
│ │ ├── chat.py
│ │ ├── list_docs.py
│ │ ├── user_creation.py
│ │ └── user_login.py
│ └── utils
│ ├── __init__.py
│ ├── api_utils.py
│ ├── auth.py
│ └── chat.py
├── frontend.Dockerfile
├── graph.png
├── poetry.lock
├── pyproject.toml
└── video
└── video.mov