A machine learning-based email spam classification system using Logistic Regression. It preprocesses email content with TF-IDF and achieves high accuracy in detecting spam. The model is deployed as a Streamlit app for easy testing and interaction.
The system uses the Logistic Regression model, built and trained on email data. It preprocesses the raw email text using TF-IDF to extract meaningful features, making the classifier capable of achieving impressive performance in detecting spam. With an accuracy of 94.32%, the model is fine-tuned for precision and recall, reducing false positives and negatives.
- Input: Raw email text (message content).
- Output: Prediction label ("Spam" or "Not Spam").
- Preprocessing: The raw email text is transformed into numerical features using TF-IDF Vectorizer, making it ready for model input.
- Model: Logistic Regression optimized for high precision and recall on imbalanced datasets.
- Performance: 94.32% accuracy, with strong precision and recall, ensuring minimized errors.
Logistic Regression is a binary classifier that predicts the probability of a particular outcome (spam or not spam). It uses the sigmoid function to transform predicted values into a range of 0 to 1, which is then used to classify the email.
- Efficiency: It is computationally light and quick to train, making it ideal for real-time classification.
- Interpretability: The model's coefficients are easy to interpret, giving insights into which features are most influential in the classification process.
- Scalability: Works well with large datasets when combined with efficient feature extraction like TF-IDF.
- Python: Programming language.
- Scikit-learn: For building and training the Logistic Regression model.
- Streamlit: For creating the web-based interface for deploying the model.
- Pandas: For data handling and manipulation.
- Numpy: For numerical operations.
- TF-IDF: For transforming text data into numerical format suitable for the model.
- Clone the repository:
git clone https://github.com/your-username/email-spam-classification.git
- Navigate into the project folder:
cd email-spam-classification
- Install the required libraries:
pip install -r requirements.txt
- Run the Streamlit app:
streamlit run Streamlit_app.py
Feel free to fork this project and make contributions. Pull requests are welcome!