This repository contains the code for the paper SPEck: Mining Statistically-significant Sequential Patterns Efficiently with Exact Sampling (PDF), by Steedman Jenkins, Stefan Walzer-Goldfeld, and Matteo Riondato, appearing in the Data Mining and Knowledge Discovery Special Issue for ECML PKDD'22.
An Amherst College Data* Mammoths project. This work was funded, in part by NSF award IIS-2006765.
The code is derived from the one for ProMiSe by Andrea Tonon and Fabio Vandin. As such, it is distributed under the GNU General Public License, Version 3 or later.
A recent Java SDK, Maven,
R, Python3, and R
packages ggplot2
and tidyverse
.
- Enter the
SPECK
directory
cd SPECK
- Create the jar
mvn clean package
- Create a configuration file in the following format with all variables as strings:
{
"procs": <NUMBER_OF_PROCESSORS>,
"reps": <NUMBER_OF_REPETITIONS>,
"P": [<P_1>, <P_2>, ..., <P_N>],
"T": [<T1>, <T2>, ..., <TN>],
"seed": <RANDOM_SEED>,
"strategies": [
"completePerm", # Null model #1, EUS
"itemsetsSwaps", # Null model #1, eps-AUS
"sameSizePerm", # Null model #2, EUS
"sameSizeSwaps", # Null model #2, eps-AUS
"sameFreqSwaps", # Null model #3, eps-AUS
"sameSizeSeqSwaps"
],
"simulated": "false",
"datasetFilepath": <DATASET_FILEPATH>,
"dataset": <DATASET_NAME>, # See under SPECK/data
"thetas": [<THETA_1>, <THETA_2>, ..., <THETA_N>],
"outdir": "results/"
}
- Run the experiment
- Sampling runtime experiment:
java -cp target/SPEck-1.0-SNAPSHOT.jar RuntimeExperiment <path to configuration file>
- Full SPEck run experiment:
java -cp target/SPEck-1.0-SNAPSHOT.jar SFSPExperiment <path to configuration file>
- Enter the
SPECK
directory:
cd SPECK
- Run the bash script:
bash runExps.sh -e <EXP_TYPE> -d <DATA> -p <NUM_PROCS>
Run bash runExps.sh -h
for a complete description of its usage.
At the end of the execution, the figures for runtime experiments will be found
in SPECK/plots/
. The csv
files containing the results of the full SPECK run
experiments will be found in SPECK/results/csv
.
Copyright (C) 2021-2022 Steedman Jenkins, Stefan Walzer-Goldfeld, Matteo Riondato
This work is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
The work is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License (also available online) for more details.