This is a project to develop random forest classifier from scratch without scikit-learn
. However, for Decision Tree Classifier, we still need the scikit-learn due to some issues.
Model asides, I also develop the GridSearchCV
from scratch, altough some features such as selecting best parameters automatically has not yet added.
The objectives are:
- Develop Random Forest Classifier from scratch
- Tuning the hyperparameter of the classifier with precision metrics
- Testing the model against a dataset
The dataset that is used for this project is from Kaggle
For the report and explanation about the code please visit my medium
The code and the report are based on the following sources:
- Dietterich, T. G. (2000). Ensemble methods in machine learning. In J. Kittler & F. Roli (Eds.), First International Workshop on Multiple Classifier Systems (pp. 1-15). Springer Verlag.
- Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25, 197-227.
- James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). An introduction to statistical learning with applications in Python. Springer.
- Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical learning: Data mining, inference, and prediction. Springer.
- L. Breiman. Random forests. Machine Learning, 45:5 - 32, 2001
- Zhou, Z. H., & Zhou, Z. H. (2021). Ensemble learning (pp. 181-210). Springer Singapore.