The shared dataset is built from engineered features tailored to improve ransomware detection. These features were derived from a raw set of 413,556 Sysmon events collected during a controlled experiment that compared benign software (goodware) with ransomware activity. Of these events, 302,260 came from regular benign actions like installing and using applications, while 111,296 were triggered by ransomware execution. This extensive raw data served as the basis for creating key indicators that differentiate normal system behavior from the malicious patterns typical of ransomware attacks.
Additionaly, specical versions of the dataset for data drift detection are added in the dataset folder. The drift dataset is organized in three phases. The first phase includes a balanced distribution of ransomware families to capture diverse behaviors. The second phase introduces abrupt drift by overrepresenting certain strains, such as LockBit and Medusa, while reducing others like Akira and BlackBasta, simulating a sudden shift in ransomware prevalence. The third phase applies gradual drift, subtly decreasing the representation of strains such as Akira and LockBit before restoring balance in the final part, allowing models to be tested for resilience to incremental changes.