This is a project to demonstrate a usage of Naive Bayes which a classifier algorithm
In this project we gonna know the email authors by analyzing the predefined email data sets (email_authors , word_data)
NOTE : -> To run this project you should have >= python 3
-> We will use pip to install some packages. First get and install pip from https://pip.pypa.io/en/latest/installing/. Using pip, install a bunch of python packages:
go to your terminal line (don’t open python, just the command prompt)
install sklearn: pip install scikit-learn
-- for your reference, the link sklearn installation instructions https://scikit-learn.org/stable/install.html
install natural language toolkit: pip install nltk
You will need git to clone the repository: git clone https://github.com/udacity/ud120-projects.git
-> Keep the files (email_authors , word_data , startup , email_preprocess) in a seperate folder (name it as tools) after cloning the repo..
-> Go into the tools/ directory, and run startup.py. It will first check for the python modules, then download and unzip a large dataset that we will use heavily later , this will take a couple of minutes
-> And the final step , run the nb_author_id.py file