This folder contains the code for the experiments and results of the conducted experiments. You can find here next data:
- zip-archives with .pkl.gz files that contain indexes choice of k and related values of Normalized Mutual Index (NMI), Adjusted Rand Index (ARI) and Mean Absolute Relative Error (MARE) between predicted number of classes and ground truth. (Zip files here)
- .csv files with all results of each zip-archive (CSV files here)
- .xlsx files with summary over each dataset (Final Tables)
- python notebook and python scripts of the experiments (NB! Experiments require main package 'mircl' being installed)
Original data used for the experiments (1.3 GB) and intermediate numpy matrices with partitions for each dataset (71 GB) can be provided on the request.
Requests: a.g.rykov@glndwr.ru
- Synthetic data generated with next parameters:
- Size (N) = 2500
- Dimensionality (M/V) = 15, 50
- Intermix parameter (a) = 0.25 (deprecated in the research), 0.5, 0.75, 0.85
- Number of clusters (K) = 7, 15, 21
- UCI datasets:
- Ecoli
- Iris
- Optdigits
- Segmentation
- Wisconsin Breast Cancer (Prognosis)
- Wisconsin Breast Cancer (Diagnosis)
- Wine
- Zoo
- Ionosphere (depricated)
- Glass (depricated)
- Pima Diabetes (depricated)
- Clustering Benchmark Datasets
- G2 (Only M/V = 8, 32; var = 10, 50, 90, 100)
- S
- Unbalance
Abbreviation | Expanded |
---|---|
KM | K-Means |
RS | Random Swap |
Eucl | Euclidean |
Conv | Conventional |
HR | Hartigan Rule |
CH | Calinski-Harabasz index |
WB | WB index |
XU | Xu index |
SW | Silhouette Width |
EL XX | Elbow, XX - left and right step size |
ARI | Adjusted Rand Index |
NMI | Normalized Mutual Information |
MARE | Mean Absolute Relative Error (in number of clusters) |
M | number of dimensions |
a | interval of cluster width around centers (intermix) |
var | degree of clusters overlap |
TrueK | Actual number of clusters |