Skip to content

Prototypes is a proof-of-concept system combining cybersecurity / SIEM / ML insights into an end-to-end pipeline. It’s designed for SOC analysts, threat researchers, or ML engineers who want to explore data-driven security event detection and anomaly analysis.

License

Notifications You must be signed in to change notification settings

CyberMetrics/Prototypes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Prototypes License: MIT Python Version Notebook


📖 Table of Contents

Project Illustration

🎯 About

Prototype-001 is a proof-of-concept system combining cybersecurity / SIEM / ML insights into an end-to-end pipeline.
It’s designed for SOC analysts, threat researchers, or ML engineers who want to explore data-driven security event detection and anomaly analysis.

Prototype-002 introduces a HybridSecurityModel designed to perform three key security analyses in one: anomaly detection, classification, and time-series analysis. This prototype can handle both numeric and log-based data, making it a versatile tool for security monitoring.

Motivation:

  • Many SIEM-related ML prototypes lack modularity, visualization, or reproducibility.
  • Prototype-001 aims to provide a clean foundation: data loading, feature extraction, model training, evaluation & visualization — all in one Jupyter notebook.
  • Multi-Faceted Analysis: It's a base model that can perform anomaly detection, classification, and time-series analysis simultaneously in Prototype_002.
  • Hybrid Data Support: The model includes separate pipelines for analyzing both numeric data and log data.
  • Custom Classifier: It features a SimpleLogisticRegression classifier built from scratch.
  • Rule-Based Logic: The prototype uses rule-based logic for tasks like flagging rare events in logs as anomalies and classifying log content based on keywords.

Key Topics / Tags: training · logs · ml · cybersecurity · siem · socs


✨ Features

  • 🚧 Modular pipeline: Data ingestion → Feature extraction → Model training → Evaluation
  • 📊 Rich visualizations (ROC curves, confusion matrices, feature importances)
  • 📂 Flexible log format support (plug your own datasets)
  • 🔧 Hyperparameter tuning (grid search or alternatives)
  • ✅ Reproducible experiment tracking (seeds, logging)
  • 🧪 Notebook-first demonstration (interactive, exploratory)

🏗 Architecture & Design

[Raw Logs / Events]
↓
[Preprocessing / Parsing]
↓
[Feature Extraction / Aggregation]
↓
[Train / Validate / Test Split]
↓
[Model(s)]
↓
[Evaluation & Visualization]
  • Data Ingestion → Parsers for CSV/JSON/syslog
  • Feature Extraction → Time windows, counts, embeddings
  • Models → Random Forest, XGBoost, Neural Nets, anomaly detectors
  • Evaluation → Precision, Recall, F1, ROC-AUC
  • Visualization → Matplotlib / Seaborn / Plotly plots

🛠 Installation

Prerequisites

  • Python 3.8+
  • pip or conda
  • (Optional) GPU if training deep models

Setup

git clone https://github.com/CyberMetrics/Prototype-001.git
cd Prototype-001

   

🧪 Usage / Demo
Notebook Mode
bash
Copy code
jupyter notebook Prototype_001.ipynb
Walk through the notebook:

Data Loading & Preprocessing

Feature Engineering

Train / Validation / Test Split

Model Training & Evaluation

Visualizations & Analysis

Script Mode (if modularized)
bash
Copy code
python run.py --input path/to/logs.json --model random_forest --output results/
📊 Experiments & Results
Model	Precision	Recall	F1-Score	ROC-AUC
Random Forest	0.85	0.78	0.81	0.92
XGBoost	0.87	0.80	0.83	0.94
Neural Net	0.82	0.75	0.78	0.91

Sample Visualizations
ROC curves for multiple models

Confusion matrices

Feature importance plots (e.g., SHAP)

Time series anomaly detection charts

Insights:

Random Forest provided stable results across datasets.

Feature X showed highest importance.

Future work: reduce false positives, test deep anomaly detection.

###📂 Project Structure

Prototype-001/
├── LICENSE
├── README.md
└── Prototype_001.ipynb       ← main demo notebook

🤝 Contributing

Contributions are welcome!

Fork this repository

Create a feature branch (git checkout -b feature/your-feature)

Commit changes (git commit -m "Add new feature")

Push and open a Pull Request

Ideas:

Add support for more log formats (Zeek, Suricata, syslog)

Add deep models (autoencoders, LSTMs)

Integrate MLflow or W&B for experiment tracking

Improve visualization dashboards (Plotly, Streamlit)

##📜 License This project is licensed under the MIT License — see the LICENSE file.

##🙏 Acknowledgements

Open-source libraries: NumPy, Pandas, scikit-learn, Matplotlib, Seaborn

Inspired by research in ML for SIEM / anomaly detection

Thanks to the open-source community for tools & feedback

About

Prototypes is a proof-of-concept system combining cybersecurity / SIEM / ML insights into an end-to-end pipeline. It’s designed for SOC analysts, threat researchers, or ML engineers who want to explore data-driven security event detection and anomaly analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •