This project provides a news summarization application that allows users to summarize news articles either by entering keywords or providing a URL. It uses NLP models and pipelines to generate concise summaries of the latest news. The project includes two main applications: a Streamlit app for user interaction and a FastAPI app for serving the summarization model.
- Project Overview
- Project Components
- Clone the Repository
- Run the FastAPI And Streamlit Application
- Dockerization
- MLflow Integration
This project allows users to:
- Search by Keywords: Input keywords to find relevant news articles and summarize them.
- Enter a Link: Input a news article URL to generate a summary.
- The app uses AI to categorize the articles and provide summaries for a range of categories such as Technology, Science, Health, and Sports.
This project consists of the following components:
Dockerfile
: Used to build a Docker image for containerizing the FastAPI and Streamlit apps.APP-Streamlit.py
: A Streamlit-based frontend application for interacting with the news summarization system.APP-FastAPI.py
: A FastAPI backend application that processes news summarization requests.RAG_News_NB.ipynb
: A comprehensive news processing pipeline that integrates web scraping, text summarization, categorization, and data storage using ChromaDB for efficient querying and retrieval.
To start using this project, first, clone the repository:
git clone https://github.com/Abdelrahman-Elshahed/News_Summerization_Using_RAG--Graduation_Project_DEPI.git
If you prefer not to use Docker, follow the steps below:
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run APP-Streamlit.py
-
Run the FastAPI app:
uvicorn APP-FastAPI:app --reload
The Streamlit app provides a user-friendly interface for summarizing news articles:
- Open the Streamlit app in your browser.
- Choose one of the following options:
- Search by Keywords: Enter a keyword (e.g., "AI in healthcare") to find relevant articles and summarize them.
- Enter a Web Link: Paste a URL of a news article to summarize it.
The FastAPI app exposes a backend API to handle summarization requests. It can be accessed programmatically via HTTP requests, making it suitable for integration with other applications It includes two main routes:
GET /
: Displays the homepage with the user form.POST /summarize
: Accepts form data to summarize the entered news based on either keywords or URL..
- A Docker configuration file to containerize the application.
- Steps:
- Copies necessary files, installs dependencies, and sets up the API server.
- Build the Docker image with:
docker build -t news_summarizer_app .
- Run the container with:
docker run -p 8000:8000 news_summarizer_app
- MLflow is integrated into the news processing pipeline for experiment tracking, model versioning, and performance evaluation. It logs key metrics, hyperparameters, and outputs during model training, while visualizing model improvements over time. Additionally, it tracks model versions and interacts with ChromaDB to store performance and categorization results.