This repository contains the experimental setup, data, and plots used in the CP'23 contribution titled "Incremental Constrained Clustering by Minimal Weighted Modification". The paper is available in open access (https://drops.dagstuhl.de/opus/volltexte/2023/19047).
src
: contains the source code of IAC.experiments
: contains the experimental results produced by the authors and used in the paper.comparison
: results for the comparison of IAC with other constrained clustering methods.params
: results of sensitivity analysis for the parameters of IAC (anchor generation rate, generalization scope)relaxing
: results of evaluating the relaxation of IAC with other methods using soft constraints.scaling
: results of runtime analysis of the CP model for modification in regard of the number and type of constraints.treecut
: results of the use case on satellite image time series.plots
: visualizations of the results of every experiment.
Each experiment directory contains subdirectories for each of the datasets tested. These subdirectories contain :
raw
: raw results for each experimental run, identified by its number : partitions, constraints and runtimes.compiled
: compilation of raw results and evaluation according to each metric used in the evaluation.
Finally, folders clustering-data-v1-1.1.0
and datasets
contain the datasets used in the experiments,
pulled from the clustering-benchmarks
library except for Letters
which comes directly from the UCI repository.
Folder use case
contains all the data pertaining to the experiment on tree cut SITS presented in Section 4.3 of the paper (results in directory treecut
).
Plots are HTML files generated by Plotly, they are interactive and thus offer a better view on our results.
Initial partitions were created through KMeans clustering from scikit-learn
, with n_clusters
set to the number of clusters in the ground truth partition,
and random_state
set to 9 for reproducibility.
The experiments can be reproduced by running the main script, opening a CLI menu to choose an experiment.
The results will be written in a folder named reproduced_results
.
Concerning the use case experiments, the images in the paper were assembled from layers with Gimp. The code will reproduce the layers, not the composite images.