We all receive a lot of emails in our daily life. Some emails are also very meaningless and irrelevant. We call such emails "spam". So, would you like to know which e-mail is spam and which is ham?
Dataset consist of two classes. These are "ham" and "spam". We have 4825 ham data and 747 spam data. The dataset is heavily unbalanced.
The following two figures show WordCloud representation for spam and ham.
We have trained the data set with the machine learning algorithms.
- Naive Bayes
- Support Vector Machine
- KNN
- Decision Tree
- Random Forest
Below, for each algorithm you can see the accuracy.
You can also do your predicts for each algorithm or you can choose one for prediction.
MultinomialNB() This is a Real email
SVC(C=1000, gamma=0.001) This is a Real email
KNeighborsClassifier(n_neighbors=3) This is a Real email
DecisionTreeClassifier() This is a Real email
RandomForestClassifier() This is a Real email