Investigating the Nature of Severe Crimes in L.A

Overview and motivation

Predictive policing is becoming an important part for fair policing today. We wanted to find out if we could predict which areas would have crimes based on factors like victim's age, sex, descent, premise types etc. using decision trees and logistic regression models. The final product is a list of predictors that heavily influence severe crime in Los Angeles. This list could be used to create a heat map to predict which areas are more prone to crime. This is an academic project for ST309.

Description

For this project, we used the 2020 - Present dataset which can be found here: L.A crime dataset (The data was last accessed on 9 February 2022, there may be updates to the data not incorporated in the analysis)

How to run

Make sure you have the packages in the first line of the code installed before running
Download the dataset from here, and name the files as: "Crime_Data_from_2010_to_2019" and "Crime_Data_from_2020_to_Present" when reading the csv files
Run the code on an R Script

Improvements

Categorical data: Categorical data is harder to interpret at times. For instance, when we transformed the premise description column, we only took the top 10 premises and classified the remaining under the ‘OtherPremise’ category. It is possible that doing so affected our analysis.
More modelling: After going through our analysis, using bagging and bootstrapping may have given us more confidence in our results.
Analysis limitations: Since we only used the 2020 - Present dataset, this could have affected our results. A merged dataset may have given us higher accuracy rates.

Conclusion

The results from our analysis showed that the factors Weapon, Sidewalk, Street, Female and Age have a strong link to severe crimes. However, it is also worth noting that these models were created based on training data. Previous predictive policing programs like PredPol have failed because the past data of crime records had race biases. These models may only further magnify these biases and lead to inaccuracy. A more accurate analysis would include a dataset that is free of bias.

Team members

Rachel Soh: https://github.com/RS201918703
Rafay Butt: https://github.com/raf201920011

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
Sectioned code		Sectioned code
Full Code.R		Full Code.R
R-Markdown version.pdf		R-Markdown version.pdf
README.md		README.md
ST309 Final Report.pdf		ST309 Final Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigating the Nature of Severe Crimes in L.A

Overview and motivation

Description

How to run

Improvements

Conclusion

Team members

References

About

Contributors 2

Languages

rachelsohzc/L.A-Crime-Analysis

Folders and files

Latest commit

History

Repository files navigation

Investigating the Nature of Severe Crimes in L.A

Overview and motivation

Description

How to run

Improvements

Conclusion

Team members

References

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages