Stat 666 Term Project
In the field of unsupervised learning, algorithms are used to analyze and cluster unlabeled datasets. Each algorithm uses a specific clustering process in an effort to discover patterns and/or groupings in the data that might not be as evident to the human eye. The patterns and/or groupings in the data found by these algorithms are used in a wide range of applications.
One clustering algorithm that has become more popular is the density-based spatial clustering of ap- plications with noise (DBSCAN) algorithm. DBSCAN is a popular clustering algorithm that can discover clusters of arbitrary shape and size while also identifying noise points. Due to its robustness and scalability, DBSCAN is often chosen for clustering tasks in various domains such as image processing, natural language processing, and social network analysis. This report aims to provide an in-depth exploration of the DBSCAN algorithm, including its core concepts, implementation, advantages, and limitations.
Specifically, this exploration of the DBSCAN algorithm includes a simulation study which compares the performance of the DBSCAN algorithm to the commonly used k-means algorithm. The simulation study focuses on evaluating the two algorithms with respect to their multidimensional scaling ability as well as their ability to cluster data generated from various settings. Additional exploration of the DBSCAN algorithm includes using it with a real-world dataset from the 2014 season of the Professional Golf Association to demonstrate its effectiveness and applicability. The insights gained from this exploration of the DBSCAN algorithm can be useful to practitioners and researchers interested in clustering techniques and can serve as a guide for selecting appropriate clustering methods for different types of data.