Email Spam Classification

A machine learning-based email spam classification system using Logistic Regression. It preprocesses email content with TF-IDF and achieves high accuracy in detecting spam. The model is deployed as a Streamlit app for easy testing and interaction.

Project Overview:

The system uses the Logistic Regression model, built and trained on email data. It preprocesses the raw email text using TF-IDF to extract meaningful features, making the classifier capable of achieving impressive performance in detecting spam. With an accuracy of 94.32%, the model is fine-tuned for precision and recall, reducing false positives and negatives.

Key Features:

Input: Raw email text (message content).
Output: Prediction label ("Spam" or "Not Spam").
Preprocessing: The raw email text is transformed into numerical features using TF-IDF Vectorizer, making it ready for model input.
Model: Logistic Regression optimized for high precision and recall on imbalanced datasets.
Performance: 94.32% accuracy, with strong precision and recall, ensuring minimized errors.

Logistic Regression Model:

Logistic Regression is a binary classifier that predicts the probability of a particular outcome (spam or not spam). It uses the sigmoid function to transform predicted values into a range of 0 to 1, which is then used to classify the email.

Why Logistic Regression?

Efficiency: It is computationally light and quick to train, making it ideal for real-time classification.
Interpretability: The model's coefficients are easy to interpret, giving insights into which features are most influential in the classification process.
Scalability: Works well with large datasets when combined with efficient feature extraction like TF-IDF.

Technologies Used:

Python: Programming language.
Scikit-learn: For building and training the Logistic Regression model.
Streamlit: For creating the web-based interface for deploying the model.
Pandas: For data handling and manipulation.
Numpy: For numerical operations.
TF-IDF: For transforming text data into numerical format suitable for the model.

Installation and Setup:

Clone the repository:

git clone https://github.com/your-username/email-spam-classification.git

Navigate into the project folder:
```
cd email-spam-classification
```
Install the required libraries:
```
pip install -r requirements.txt
```
Run the Streamlit app:
```
streamlit run Streamlit_app.py
```

Deployed App

Contributing:

Feel free to fork this project and make contributions. Pull requests are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Codes		Codes
Documents		Documents
Resources/Research Papers		Resources/Research Papers
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Spam Classification

Project Overview:

Key Features:

Logistic Regression Model:

Why Logistic Regression?

Technologies Used:

Installation and Setup:

Deployed App

Contributing:

About

Releases

Packages

Languages

License

Talenteddolly/Email-Spam-Classification

Folders and files

Latest commit

History

Repository files navigation

Email Spam Classification

Project Overview:

Key Features:

Logistic Regression Model:

Why Logistic Regression?

Technologies Used:

Installation and Setup:

Deployed App

Contributing:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages