This repo contains a MATLAB implementation of the RNN-DBSCAN algorithm by Bryant and Cios. This implementation is based upon the graph-based interpretation presented in their paper.
RNN DBSCAN is a density-based clustering algorithm that uses reverse nearest neighbor counts as an estimate of observation density. It is based upon traversals of the directed k-nearest neighbor graph, and can handle clusters of different densities, unlike DBSCAN. For more details about the algorithm, see the original paper.
- My knn-graphs MATLAB library
- Statistics and Machine Learning Toolbox
To use the NN Descent algorithm to construct the KNN graph used by RNN DBSCAN, you need pynndescent and MATLAB's Python language interface. I recommend using Conda to set up an environment, as MATLAB is picky about which Python versions it supports.
Install with mpm:
mpm install knn-graphs
mpm install matlab-rnn-dbscan
- Download knn-graphs from the MATLAB File Exchange
- Download matlab-rnn-dbscan from the MATLAB File Exchange or from the latest GitHub release
- Add both packages to your MATLAB path
RnnDbscan
is a class with a single public method, cluster
. The results of the clustering operation are stored in read-only public properties.
Creating an RnnDbscan object:
% Create an RnnDbscan object using a 5-nearest-neighbor graph.
% nNeighborsIndex is how many neighbors used to create the knn index, and must be >= nNeighbors + 1
% because the index includes self-edges (each point is it's own nearest neighbor).
nNeighors = 5;
nNeighborsIndex = 6;
rnndbscan = RnnDbscan(data, nNeighbors, nNeighborsIndex);
% Use the NN Descent algorithm to create the knn index; this is much faster than an exhaustive search
rnndbscan = RnnDbscan(data, nNeighbors, nNeighborsIndex, 'Method', 'nndescent');
% Explicitly use an exhaustive search, which is the default
rnndbscan = RnnDbscan(data, nNeighbors, nNeighborsIndex, 'Method', 'knnsearch');
% Use a precomputed knn index
knnidx = knnindex(data, nNeighborsIndex);
rnndbscan = RnnDbscan(data, nNeighbors, knnidx);
Clustering:
rnndbscan.cluster();
% Or
cluster(rnndbscan);
% Inspect clusters, outliers, and labels
rnndbscan.Clusters
rnndbscan.Outliers
rnndbscan.Labels
For more details, see the help text: help RnnDbscan
. RNN-DBSCAN tests.ipynb
also contains many tests, which can be used as usage examples.
All contributions are welcome! Just submit a pull request or open an issue.