Predictive policing is becoming an important part for fair policing today. We wanted to find out if we could predict which areas would have crimes based on factors like victim's age, sex, descent, premise types etc. using decision trees and logistic regression models. The final product is a list of predictors that heavily influence severe crime in Los Angeles. This list could be used to create a heat map to predict which areas are more prone to crime. This is an academic project for ST309.
For this project, we used the 2020 - Present dataset which can be found here: L.A crime dataset (The data was last accessed on 9 February 2022, there may be updates to the data not incorporated in the analysis)
- Make sure you have the packages in the first line of the code installed before running
- Download the dataset from here, and name the files as: "Crime_Data_from_2010_to_2019" and "Crime_Data_from_2020_to_Present" when reading the csv files
- Run the code on an R Script
- Categorical data: Categorical data is harder to interpret at times. For instance, when we transformed the premise description column, we only took the top 10 premises and classified the remaining under the ‘OtherPremise’ category. It is possible that doing so affected our analysis.
- More modelling: After going through our analysis, using bagging and bootstrapping may have given us more confidence in our results.
- Analysis limitations: Since we only used the 2020 - Present dataset, this could have affected our results. A merged dataset may have given us higher accuracy rates.
The results from our analysis showed that the factors Weapon, Sidewalk, Street, Female and Age have a strong link to severe crimes. However, it is also worth noting that these models were created based on training data. Previous predictive policing programs like PredPol have failed because the past data of crime records had race biases. These models may only further magnify these biases and lead to inaccuracy. A more accurate analysis would include a dataset that is free of bias.
- Rachel Soh: https://github.com/RS201918703
- Rafay Butt: https://github.com/raf201920011