As a part of my Consultation and Communication for Statisticians course (STAT 4893W), we were tasked with completing a Capstone project, which included statistical analysis and a written report. Amongst the three options provided, I decided to conduct my analysis on a dataset comprised of 517 forest fire incidents in Portugal, which included various spatial, temporal, and meteorological variables.
This repository includes my statistical analysis conducted in R Studio, the forest fire dataset, and my written report.
Below is the abstract for my written report to give you a better idea of the process and results of this project. Thanks!
Using data provided by A U.S. Forest Service Incident Management Team (USFS IMT) in Colorado, our statistical research explored the relationship between burned area of forest fires and 12 spatial, temporal, and meteorological variables. Our objectives were to create an optimal model for predicting burned area, and simulating burned area over a variety of conditions. For model optimization, we explored the predictive power of a multiple linear regression model and a random forest model. For model simulation, we analyzed three regression trees to explore the predicted burned area over various classifications of our predictor variables. Ultimately, we were not able to create an optimal model that provides statistically significant outcomes. With that said, we created interpretable visuals in the form of our regression trees, while improving upon our initial MLR model with random forest when comparing test MSE values.