Vignette on implementing outlier and anomaly detection using breast cancer detection data; created as a class project for PSTAT197A in Fall 2023.
Contributors: Kyle Wu, Jimmy Dysart, Azfal Peermohammed, Ryan Sevilla, Navneet Rajagopal
As the names suggest, outlier and anomaly detection are methods meant to identify data points that appear to fall outside the normal range. These anomalous observations are often rare and present patterns not present for standard data points. Much like in regular machine learning models, anomaly detection methods fall into 3 main categories; supervised, unsupervised, and semi-supervised models. The vignette here and supporting documents will demonstrate how to utilize anomaly detection methods.
This repository includes a vignette demonstrating the implementation of a number of anomaly/outlier detection methods. The repository also includes the data used, as well as scripts containing the end-to-end implementation of the models utilized.
In this vignette, we will demonstrate the efficacy of a number of different models and their consequent ability to identify outliers that may be present in the data. In particular, our models of interest are:
- Isolation Forests
- Local Outlier Factors
- One class SVM
Our dataset was downloaded from the UC Irvine Machine Learning Repository.
For more resources on outlier and anomaly detection, utilize some of the following links: