Skip to content

Latest commit

 

History

History
41 lines (27 loc) · 759 Bytes

README.md

File metadata and controls

41 lines (27 loc) · 759 Bytes

Spam classifier

This project is an example used to present the lecture Machine Learning 101.

The Spam classifier example creates a model to classify emails based on its texts.

Setup

Create virtualenv

virtualenv -p python3 myenv
source myenv/bin/activate

Install dependencies

make install

Run classification steps

Split dataset

split_dataset --dataset spam.csv --test-size 0.2

This step must create two files: spam_test.csv and spam_train.csv.

Create Tf-Idf model

tfidf --train spam_train.csv

Create and eval the model

naive_bayes --train spam_train.csv --test spam_test.csv