CO553 - Intro to Machine Learning CW 1

This README.md explains how to run the code submitted for the CO553 Intro to Machine Learning coursework 1: Desicion Trees, from the group comprising of Niusha Alavi, Olle Nilsson, Paween Pitimanaaree and Alfred Tingey.

Important Files

desicion_tree.py contains the implementation of the algorithm for training desicion trees including the decision_tree_learning(dataset, depth) function as asked in the spec.
evaluation.py contains the implementation of crossvalidation and pruning algorithms. including the evaluate(test_db, trained_tree) function as asked in the spec.
visualize.py contains the implementation of decision tree visulisation.
util.py is for utilites such as functions for importing data.
All other files (with .py extention) are setup to run the code to replicate the results in our report and to allow the marker to run our implemetation on other datasets.

How to run the code?

After extracting the zip file, navigate to the extracted folder in the directory.

To run the code that we used to calculate the result that are in the report, run

python3 main.py

This will first run the 10-fold cross-validation on both clean and noisy datasets and then run the 10-fold cross-validation with pruning on the clean and noisy datasets. Results will print in the terminal while it calculates and averages displayed at the end. Note that the dataset is shuffled in a random order each time so results may not match exacly to what's in the report.

To run our crossvalidation code on a different dataset:

Move the file containing the dataset you wish to run to the extracted folder. Then run

python3 run_crossvalidation_on_unseen_data.py <yourdataset>

Be sure to replace <yourdataset> with your test dataset filename and extention(for example: data.txt). This will run the 10-fold crossvalidation on your new dataset and print results at the end.

To run our crossvalidation code with pruning on a different dataset:

Move the file containg the dataset to the extracted folder. Then run

python3 run_pruning_on_unseen_data.py <yourdataset>

Be sure to replace <yourdataset> with your test dataset filename and extention(for example: data.txt). This will run 10-fold crossvalidation with pruning on the clean and noisy datasets. Results will print in screen while it calculates and averages displayed at the end.

To run classification on our 'best' tree on dataset:

Move the file containg the dataset to the extracted folder. Then run

python3 run_classification.py <yourdataset>

Be sure to replace <yourdataset> with your test dataset filename and extention(for example: data.txt). This will first classify the samples in the dataset, with our 'best' trained tree using the evaluate(test_db, trained_tree) function, and display the results.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
dataset		dataset
.gitignore		.gitignore
README.md		README.md
README.txt		README.txt
Report.pdf		Report.pdf
decision_tree.py		decision_tree.py
desicion_tree.npy		desicion_tree.npy
evaluation.py		evaluation.py
main.py		main.py
run_classification.py		run_classification.py
run_crossvalidation_on_unseen_data.py		run_crossvalidation_on_unseen_data.py
run_pruning_on_unseen_data.py		run_pruning_on_unseen_data.py
spec-553-1-0.pdf		spec-553-1-0.pdf
unseendata.txt		unseendata.txt
unseendata_placeholder.txt		unseendata_placeholder.txt
util.py		util.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CO553 - Intro to Machine Learning CW 1

Important Files

How to run the code?

To run our crossvalidation code on a different dataset:

To run our crossvalidation code with pruning on a different dataset:

To run classification on our 'best' tree on dataset:

About

Releases

Packages

Contributors 3

Languages

paweenpit/CO553-CW1

Folders and files

Latest commit

History

Repository files navigation

CO553 - Intro to Machine Learning CW 1

Important Files

How to run the code?

To run our crossvalidation code on a different dataset:

To run our crossvalidation code with pruning on a different dataset:

To run classification on our 'best' tree on dataset:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages