Skip to content

Script for the Environment Gneration of the CER's Big Data pipeline.

License

Notifications You must be signed in to change notification settings

angelo-casciani/EnvGenCER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CER Environment Generation

Script for the Environment Generation step of the CER's Big Data pipeline. This work is part of my MSc thesis.

About The Project

TThis script implements a Process Mining technique to extract simulation parameters for subsequent pipeline stages. It starts by reading the contents of the XES event log into a dataframe for allowing a more intuitive and conscious data processing by leveraging pandas, the well-established data manipulation library, alongside the state-of-the-art process mining functionalities and algorithms made available by PM4PY. Notably, the parameters derived from this analysis are the count of processed instances, activity and resource counts, total process duration in time, activity-related costs, resource-related costs, average arrival times between cases, average processing and waiting times for activities, resource calendars, and XOR split branching probabilities. It is worth noticing that, for tasks like calculating XOR split branching probabilities, the script also foresees the discovery of the process model in BPMN format underlying the input log. This process leverages the Inductive Miner algorithm supported by PM4PY.

Requirements

To run the project, the requirements are the following:

Running The Project

Go in the EnvGenCER directory and run the following command by specifying the path of the input event log:

python3 data_analysis.py "<input_log_path>"

About

Script for the Environment Gneration of the CER's Big Data pipeline.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages