C++ implementation of Decision Trees and Random Forests for classification of Insurance Dataset
We build decision trees and random forests for a insurance dataset, evaluating it for various experiments . Dataset taken from : https://archive.ics.uci.edu/ml/datasets/Insurance+Company+Benchmark+%28COIL+2000%29
- Go to the folder :
cd Final
- Compile the program by entering the following command :
g++ -o ID3 ID3.cpp
- Run the executable by entering the following command :
./ID3 ticdata2000.txt experiment_no
ticdata2000.txt contains the dataset for creating the tree.
-
Press enter to print the output.
-
Please refer to the Results and Conclusion file to see the final results of all the experiments.
- We vary the "stopping criteria" that prevents further splitting of node. Changes in accuracy and complexity of model are observed.
- Add noise to the dataset and evaluate the accuracy of the model along with the change in its complexity (number of nodes)
- Perform "Reduced Error Pruning" on the tree and measure the change in accuracy of the tree.
- Create a random forest using "Feature Bagging" approach where we select a subset of features, make multiple trees, and take majority vote for the result.