Vignette on implementing and demonstrating the effects of using bootstrapping and bagging techniques on data modeled with random forest; created as a class project for PSTAT197A in Fall 2023.
Contributors: Sarah Liang, Sharanya Sharma, Dannah Golich, Jason Siu
Vignette abstract: This vignette covers the basics of bootstrapping, applications to machine learning, and related resampling methods. A data set containing grade information for UC Santa Barbara students from 2009 to 2023 is used in demonstrating the impacts of bootstrapping on estimating sampling distributions and GPA prediction.
Repository contents: The completed vignette in qmd and html format can be found at the root directory. The data folder contains both the raw and preprocessed data used in the vignette. The scripts folder includes the preprocessing script, all code for the final vignette, and drafts completed by each contributor. Finally, the images folder contains the images used in our document.
Biswal, A. (2023, Aug 10). Bagging in Machine Learning: Step to Perform And Its Advantages. Retrieved from https://www.simplilearn.com/tutorials/machine-learning-tutorial/bagging-in-machine-learning
Mwiti, D. (2023 Sept 1). Random Forest Regression: When Does It Fail and Why?. Retrieved from https://neptune.ai/blog/random-forest-regression-when-does-it-fail-and-why
Random Forests. (n.d.). AFIT Data Science Lab R Programming Guide. Retrieved from https://afit-r.github.io/random_forests#basic
Tim C. "What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum." The American Statistician, vol. 69, no. 4, 2015, pp. 371--86, https://doi.org/10.1080/00031305.2015.1089789.
What is Bootstrapping?. (n.d.). Retrieved from https://www.mastersindatascience.org/learning/machine-learning-algorithms/bootstrapping/
Yu, Guo. "Lecture 8: Cross-Validation & Bootstrap", PSTAT-131/231: Introduction to Statistical Machine Learning, Oct.26, 2023, UC Santa Barbara.