This README.md explains how to run the code submitted for the CO553 Intro to Machine Learning coursework 1: Desicion Trees, from the group comprising of Niusha Alavi, Olle Nilsson, Paween Pitimanaaree and Alfred Tingey.
-
desicion_tree.py contains the implementation of the algorithm for training desicion trees including the decision_tree_learning(dataset, depth) function as asked in the spec.
-
evaluation.py contains the implementation of crossvalidation and pruning algorithms. including the evaluate(test_db, trained_tree) function as asked in the spec.
-
visualize.py contains the implementation of decision tree visulisation.
-
util.py is for utilites such as functions for importing data.
-
All other files (with .py extention) are setup to run the code to replicate the results in our report and to allow the marker to run our implemetation on other datasets.
After extracting the zip file, navigate to the extracted folder in the directory.
To run the code that we used to calculate the result that are in the report, run
python3 main.py
This will first run the 10-fold cross-validation on both clean and noisy datasets and then run the 10-fold cross-validation with pruning on the clean and noisy datasets. Results will print in the terminal while it calculates and averages displayed at the end. Note that the dataset is shuffled in a random order each time so results may not match exacly to what's in the report.
Move the file containing the dataset you wish to run to the extracted folder. Then run
python3 run_crossvalidation_on_unseen_data.py <yourdataset>
Be sure to replace <yourdataset>
with your test dataset filename and extention(for example: data.txt). This will run the 10-fold crossvalidation on your new dataset and print results at the end.
Move the file containg the dataset to the extracted folder. Then run
python3 run_pruning_on_unseen_data.py <yourdataset>
Be sure to replace <yourdataset>
with your test dataset filename and extention(for example: data.txt). This will run 10-fold crossvalidation with pruning on the clean and noisy datasets. Results will print in screen while it calculates and averages displayed at the end.
Move the file containg the dataset to the extracted folder. Then run
python3 run_classification.py <yourdataset>
Be sure to replace <yourdataset>
with your test dataset filename and extention(for example: data.txt). This will first classify the samples in the dataset, with our 'best' trained tree using the evaluate(test_db, trained_tree) function, and display the results.