Skip to content

Implementation of the Incremental and Active Clustering (IAC) framework

Notifications You must be signed in to change notification settings

aymericb213/IAC

Repository files navigation

Incremental Constrained Clustering by Minimal Weighted Modification

This repository contains the experimental setup, data, and plots used in the CP'23 contribution titled "Incremental Constrained Clustering by Minimal Weighted Modification". The paper is available in open access (https://drops.dagstuhl.de/opus/volltexte/2023/19047).

Repo structure

  • src: contains the source code of IAC.
  • experiments: contains the experimental results produced by the authors and used in the paper.
    • comparison: results for the comparison of IAC with other constrained clustering methods.
    • params: results of sensitivity analysis for the parameters of IAC (anchor generation rate, generalization scope)
    • relaxing: results of evaluating the relaxation of IAC with other methods using soft constraints.
    • scaling: results of runtime analysis of the CP model for modification in regard of the number and type of constraints.
    • treecut: results of the use case on satellite image time series.
    • plots: visualizations of the results of every experiment.

Each experiment directory contains subdirectories for each of the datasets tested. These subdirectories contain :

  • raw: raw results for each experimental run, identified by its number : partitions, constraints and runtimes.
  • compiled: compilation of raw results and evaluation according to each metric used in the evaluation.

Finally, folders clustering-data-v1-1.1.0 and datasets contain the datasets used in the experiments, pulled from the clustering-benchmarks library except for Letters which comes directly from the UCI repository. Folder use case contains all the data pertaining to the experiment on tree cut SITS presented in Section 4.3 of the paper (results in directory treecut).

Technical details

Plots are HTML files generated by Plotly, they are interactive and thus offer a better view on our results.

Initial partitions were created through KMeans clustering from scikit-learn, with n_clusters set to the number of clusters in the ground truth partition, and random_state set to 9 for reproducibility.

The experiments can be reproduced by running the main script, opening a CLI menu to choose an experiment. The results will be written in a folder named reproduced_results.

Concerning the use case experiments, the images in the paper were assembled from layers with Gimp. The code will reproduce the layers, not the composite images.

Releases

No releases published

Packages

No packages published

Languages