This repository contains the code for the paper ROhAN: Row-Order Agnostic Null Models for Statistically-sound Knowledge Discovery (PDF), by Maryam Abuissa, Alexander Lee, and Matteo Riondato, appearing in the Data Mining and Knowledge Discovery Special Issue for ECML PKDD'23.
An Amherst College Data* Mammoths project. This work was funded, in part by NSF award IIS-2006765.
The code uses classes from SPMF. As such, it is distributed under the GNU General Public License, Version 3 or later.
A recent Java SDK, Maven, and Python 3.
- Create the jar
mvn clean package
All commands below assume that the current working directory is the root of the repository.
To run the driver:
java -cp target/ROhAN-1.0-SNAPSHOT-jar-with-dependencies.jar \
rohan.drivers.SampleAndMineDriver \
<datasetPath> <samplerType> <numSwaps> <numSamples> <minFreq> <numThreads> <seed> <resultsDir>
Driver arguments:
<datasetPath>
: the path to the dataset- A string
<samplerType>
: the type of samplerNaiveSampler
RefinedSampler
GmmtSampler
<numSwaps>
: the number of swaps/steps to run in the chain- A positive integer
<numSamples>
: the number of samples to generate- A positive integer
<minFreq>
: the minimum frequency threshold for the frequent itemset mining algorithm- A number in the range [0, 1)
<numThreads>
: the number of threads to use to run the algorithm in parallel- A positive integer
<seed>
: seed for the random generator for replication (-1 means use a random seed)- A long
<resultsDir>
: the directory to output the results of the algorithm
Create a Python virtual environment and install necessary packages in order to generate the figures:
python3 -m venv venv
source venv/bin/activate
pip install -r experiments/figures/requirements.txt
To deactivate the virtual environment later:
deactivate
Experiment results will be written to experiments/results/
and figures will be
saved to experiments/figures/images/
.
To replicate all our experiments:
./run_experiments.sh -t all
To run a specific type of experiment with all the configuration files in
experiments/confs/<experiment_type>/
:
./run_experiments.sh -t <experiment_type>
Possible values for <experiment_type>
:
distortion
: run distortion experimentruntime
: run step time experimentscalability
: run scalability experimentconvergence
: run convergence experimentnumFreqItemsets
: run number of frequent itemsets experimentsigFreqItemsets
: run significant frequent itemsets experiment
To run a specific type of experiment with a single configuration file:
java -cp target/ROhAN-1.0-SNAPSHOT-jar-with-dependencies.jar \
rohan.experiments.<experiment_class> path/to/configuration/file
Possible values for <experiment_class>
:
DistortionExperiment
: run distortion experimentRuntimeExperiment
: run step time or scalability experimentConvergenceExperiment
: run convergence experimentNumFreqItemsetsExperiment
: run number of frequent itemsets experimentSigFreqItemsetsExperiment
: run significant frequent itemsets experiment
To run the test suite:
mvn test
Copyright (C) 2023 Alexander Lee, Maryam Abuissa, and Matteo Riondato
This code is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This code is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License (also available online) for more details.