This project implements a machine learning model to classify SMS and email messages as either spam or non-spam (ham). It showcases proficiency in natural language processing (NLP), data preprocessing techniques, and model building using various algorithms. The classifier is deployed with a Streamlit frontend for easy interaction.
- Frontend: Streamlit
- Libraries: NLTK, scikit-learn
- Classifiers: GaussianNB, MultinomialNB, BernoulliNB, SVC, KNeighborsClassifier, DecisionTreeClassifier, LogisticRegression, RandomForestClassifier, AdaBoostClassifier, BaggingClassifier, ExtraTreesClassifier, GradientBoostingClassifier, XGBClassifier
- Data Cleaning and Preprocessing: Techniques include lowercasing, tokenization, removal of special characters, stop words, punctuation, and stemming.
- Exploratory Data Analysis (EDA): Understand data distribution and characteristics through statistical summaries and visualizations.
- Model Building: Implementation of multiple classifiers including Naive Bayes, SVM, Decision Tree, Random Forest, Logistic Regression, AdaBoost, Bagging, Extra Trees, Gradient Boosting, and XGBoost.
- Evaluation Metrics: Use of accuracy, precision, and confusion matrix for model evaluation.
- Deployment: Creation of a Streamlit frontend for user interaction, showcasing model predictions on new text inputs.
-
Data Cleaning:
- Lowercasing
- Tokenization
- Special characters removal
- Stop words and punctuation removal
- Stemming
-
EDA:
- Statistical summaries
- Visualizations (word clouds, histograms)
-
Data Preprocessing:
- Tokenization
- Removal of special characters
- Removal of stop words and punctuation
- Stemming
-
Model Building:
- Implemented classifiers:
- Naive Bayes (Gaussian, Multinomial, Bernoulli)
- SVM (Sigmoid kernel)
- K-Nearest Neighbors
- Decision Tree
- Logistic Regression
- Random Forest
- AdaBoost
- Bagging
- Extra Trees
- Gradient Boosting
- XGBoost
- Implemented classifiers:
-
Evaluation:
- Accuracy scores
- Confusion matrices
- Precision scores
-
Deployment:
- Streamlit frontend for interaction
- Input text for predictions
- Display of predicted class (spam or ham)
-
Clone or download this repository to your local machine.
-
cd
into the cloned folder. -
Install virtual environment python package using command:
pip install virtualenv
-
Create a virtual environment using command:
python3 -m venv [Enter Folder name]
-
Activate virtual environment using command:
source [virtual environment name]/bin/activate
-
Install all the libraries mentioned in the requirements.txt file with the command:
pip install -r requirements.txt
-
Install ipykernel using command:
pip install ipykernel
-
Create a kernel user using command:
ipython kernel install --user --name=[Enter kernel_name]
-
Run the file
app.py
by executing the command:streamlit run app.py
-
The streamlit app will locally run on your browser using your default browser or run it manually in any browser using the local url provided in your terminal as follows:
You can now view your Streamlit app in your browser. Local URL: http://localhost:8501 (port number can be different) [copy and paste in any browser] Network URL: http://192.168.0.103:8501
Hurray! That's it. 🥳