NewsGuardAI is a machine learning-based system designed to classify news articles as Fake or Real. With the rise of misinformation, this project aims to enhance the detection of deceptive news by leveraging advanced Natural Language Processing (NLP) and machine learning techniques. The goal is to improve classification accuracy compared to existing research.
The project follows a structured approach:
-
Data Preparation and Cleaning
- Handling missing values, removing duplicates, and preprocessing text.
-
Exploratory Data Analysis (EDA) & Feature Engineering
- Analyzing data distribution and extracting key features.
-
Feature Extraction
- Converting text into numerical representations using TF-IDF.
-
Modeling (Machine Learning-based Classification)
- Training and evaluating models like Logistic Regression, Naive Bayes, and SVM.
-
Deployment using Streamlit
- A simple web application for real-time news classification.
The dataset consists of labeled news articles with their title, content, and respective truthfulness labels (Fake or Real). Preprocessing techniques ensure optimal feature extraction for improved performance.
- Programming Language: Python
- Libraries: Pandas, NumPy, Scikit-Learn, NLTK, SciPy
- Modeling Techniques: Logistic Regression, Naive Bayes, SVM
- Deployment Framework: Streamlit
git clone https://github.com/yourusername/NewsGuardAI.git
cd NewsGuardAI
pip install -r requirements.txt
streamlit run Streamlit_Deployment.py
The model is evaluated using:
- F1-score (Primary metric)
- Accuracy
- Confusion Matrix
This is a team project for the semester.
- Integrate deep learning models for improved accuracy.
- Expand dataset sources for better generalization.
- Optimize feature engineering techniques.
Feel free to contribute and improve NewsGuardAI! 🚀