This project aims to classify SMS messages as either spam or not spam using various machine learning models. The dataset used in this project contains labeled SMS messages indicating whether they are spam or not spam.
- Data preprocessing: The SMS messages are preprocessed to remove noise and irrelevant information.
- Exploratory Data Analysis (EDA): Various visualizations are used to analyze the distribution of spam and not spam messages.
- Feature Engineering: Additional features such as the number of characters, words, and sentences are extracted from the SMS messages.
- Model Building: Several machine learning models such as Naive Bayes, Logistic Regression, Support Vector Machines (SVM), Random Forest, etc., are trained and evaluated.
- Model Improvement: Techniques such as hyperparameter tuning and ensemble methods like Voting Classifier are employed to improve model performance.
- Saving the Model: The trained model and vectorizer used for feature extraction are saved for future use.
- Python 3.x
- Libraries: numpy, pandas, matplotlib, seaborn, nltk, scikit-learn, xgboost, wordcloud
- Clone the repository:
git clone https://github.com/SyedFahad7/Spam-or-Ham.git
- Install the required libraries:
pip install -r requirements.txt
- Run the Jupyter Notebook or Python script:
jupyter notebook sms-classifier.ipynb
- Follow the instructions in the notebook/script to preprocess the data, train the models, and evaluate their performance.
sms-classifier.ipynb
: Jupyter Notebook containing the project code.spam_or_not_spam.csv
: Dataset containing labeled SMS messages.README.md
: Documentation providing an overview of the project, usage instructions, and file structure.requirements.txt
: Text file listing all the required libraries and their versions.
This project is licensed under the MIT License - see the LICENSE file for details.
Feel free to further customize the colors or style according to your preferences!