The requirements to run the experiments are listed in the requirements.txt
file. Currently, you need to install Declare4Py
separately - you can follow the package mantainers' instructions at this page. Or if you trust me:
git clone --recurse-submodules https://github.com/kr2023-6949/experiments.git && cd experiments
python3.10 -m venv venv
source venv/bin/activate
pip3 install declare4py/dist/declare4py-1.0.0.tar.gz
pip3 install -r requirements.txt
Details (code) about data generation are available in the cformule.dataset_generation.generate_cformulae
Python package. By running this script:
python3.10 generate_data.py data/sepsis.xes data/
all necessary formulae to repeat the experiments is generated.
cf_*.lp
files are needed to run the conformance checking, trace clustering and discriminative discovery experiments, while the qc_*.lp
files are needed for the query checking experiment.
The ones used in the paper's experimental section are already in the folder.
The ASP encodings are stored in the cformulae/application/encodings
folder. Facts are injected at runtime when input log gets parsed, and are executed through the clingo
Python API. For further details check conformance_checking
, query_checking
, trace_clustering
, discriminative_discovery
functions in the cformulae.application
subpackage.
The procedures to evaluate constraints' via clingo
's @
-terms are defined in the LogContext
class in the cformulae.backend.template_backend
Python module. The experiments were performed using only Declare-based templates (for ease of generation), these are automatically loaded.
The collection of available Declare constraints (and their POSIX's re
definition) can be found in the cformulae/backend/declare/minerful_templates.txt
file.
In order to run the experiments in the paper, you can use the following example scripts. The defaults for optional arguments are the configurations used in the paper.
python3.10 cf.py data/sepsis.xes formulae.lp -o formulae.lp.output
This runs the conformance checking task on the sepsis.xes
log, using the formulae defined in formulae.lp
file. Call with -h
flag to check optional arguments.
python3.10 qc.py data/sepsis.xes formulae.lp -o formulae.lp.output
This runs the query checking task on the sepsis.xes
log, using the non-ground formulae and variable domains defined in formulae.lp
file. Call with -h
flag to check optional arguments.
python3.10 tc.py data/sepsis.xes formulae.lp.output [num_partitions]
This runs the trace clustering task on the sepsis.xes
log, where formulae.lp.output
is the output of cf.py
on the formulae.lp
set of formulae (on the same sepsis.xes
log). Call with -h
flag to check optional arguments.
Warnings about rejects/2
(accepts/2
) not appearing in the head of any rule are due to the fact that the formulae in formulae.lp
do not reject (accept) any control-flow variable in the log sepsis.xes
.
python3.10 tc.py data/sepsis.xes formulae.lp.output [num_partitions]
This runs the discriminative discovery task on the sepsis.xes
log, where formulae.lp.output
is the output of cf.py
on the formulae.lp
set of formulae (on the same sepsis.xes
log). Call with -h
flag to check optional arguments.
This is only a proof-of-concept and randomly generates labels for the traces in the log.
Warnings about rejects/2
(accepts/2
) not appearing in the head of any rule are due to the fact that the formulae in formulae.lp
do not reject (accept) any control-flow variable in the log sepsis.xes
.
The following scripts can be used to run the tasks in parallel using GNU Parallel:
parallel -j T --progress --results joblogs/cf_X.csv python3.10 cf.py data/sepsis.xes {1} ">" {1}.output ::: $(ls data/formulae_X/cf_*.lp)
parallel -j T --progress --results joblogs/dd_X.csv python3.10 dd.py data/sepsis.xes {1} {2} ::: $(ls data/formulae_X/cf_*.lp.output) ::: 2 4 6
parallel -j T --progress --results joblogs/tc_X.csv python3.10 tc.py data/sepsis.xes {1} {2} ::: $(ls data/formulae_X/cf_*.lp.output) ::: 2 4 6
parallel -j T --progress --results joblogs/qc_X.csv python3.10 qc.py data/sepsis.xes {1} ::: $(ls data/formulae_X/qc_*.lp)
where T
is the number of jobs to be computed in parallel and X = 1 ... 6
select a subset of the formulae in the data/
folder. The figures.py
script generates the plots - this requires some extra libraries like matplotlib
, pandas
and seaborn
. The folder joblogs
contains the output of the executions used in the paper.