Skip to content

DianaC01/iot-attack-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

IoT Attack Detection with GAN-based Data Augmentation πŸ”πŸ“Ά

Intrusion detection for IoT networks using machine learning / deep learning and GAN-based synthetic data to improve class balance.
This repository contains a Google Colab notebook and a concise report-style README that summarizes the theory and the implementation steps.


πŸ“š Background (Short Theory)

Internet of Things (IoT) deployments are exposed to a wide spectrum of attacks (e.g., port scans, DoS, brute-force, botnet traffic). Signature-based IDS struggles with novel or rare patterns, while ML/DL classifiers can generalize betterβ€”provided the training data is representative.
However, many IoT datasets are imbalanced: some attack classes are under-represented, which hurts recall. Generative Adversarial Networks (GANs)β€”here CTGAN for tabular dataβ€”can synthesize additional samples for the minority classes to balance the dataset and boost detection metrics.


πŸ—„οΈ Dataset

  • Based on the IoT-23 dataset (UNB/CIC) or a cleaned derivative.
  • Typical pipeline: CSVs with network-flow features, label column (normal / multiple attack types), and optional train/test splits.

Dataset files are not included in this repo. Place them under data/ when running locally or mount from Drive in Colab.


🧩 Methodology

  1. Preprocessing
    • Load CSV(s), drop irrelevant cols, handle missing values, encode categorical features, scale numeric features.
  2. Baseline Model
    • Train a neural network (Keras MLP) or a classic ML model as a baseline; record metrics (Accuracy, Precision/Recall/F1, Confusion Matrix).
  3. Synthetic Data with CTGAN
    • Train CTGAN on the training splitβ€”focusing on minority classesβ€”and generate synthetic samples.
  4. Retrain with Augmented Data
    • Concatenate real + synthetic data; retrain a robust model (e.g., RandomForest or improved MLP).
  5. Evaluation
    • Compare baseline vs. augmented: class-wise precision/recall/F1, macro-F1, ROC-AUC (if applicable), and visualize the confusion matrix.

πŸ› οΈ Implementation Steps (Notebook)

  1. Environment setup (Colab): install libs, mount Google Drive (optional).
  2. Load & preprocess: read CSV(s), encode & scale, split into train/test.
  3. Train baseline: fit model, log metrics, save artifacts to results/.
  4. CTGAN training: fit on minority classes, generate N samples per class.
  5. Augmented training: mix real + synthetic, refit model, log metrics.
  6. Evaluation & plots: classification report, confusion matrix, and (optionally) feature importance for tree-based models.

▢️ How to Run (Google Colab)

  1. Open Google Colab and upload notebooks/Untitled10.ipynb (or open from GitHub).
  2. Prepare data:
    • Upload your CSV(s) to Colab, or
    • Mount Google Drive and point the notebook to your data folder.
  3. Run the notebook cells in order (setup β†’ preprocessing β†’ baseline β†’ CTGAN β†’ retrain β†’ evaluation).
  4. Results (figures, CSVs, models) can be saved under results/.

πŸ“‚ Repository Structure

iot-attack-detection/
β”œβ”€ notebooks/
β”‚  └─ Untitled10.ipynb      # main Colab notebook
β”œβ”€ results/                 # generated plots/reports (ignored by Git)
β”œβ”€ requirements.txt         # Python dependencies
β”œβ”€ .gitignore
└─ README.md

πŸ”– Recommended Topics

iot, intrusion-detection, cybersecurity, machine-learning, deep-learning, gan, ctgan, tabular-data


πŸ“ Notes

  • Replace Untitled10.ipynb with a clearer name (e.g., iot23_gan_augmentation.ipynb) once you finalize it.
  • If you need to reproduce on CPU-only machines, consider using RandomForest as baseline + augmented retraining (fast & strong for tabular data).
  • Keep large datasets outside the repo (data/ is ignored via .gitignore).

About

Intrusion detection in IoT networks using machine learning and GAN-based data augmentation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published