Intrusion detection for IoT networks using machine learning / deep learning and GAN-based synthetic data to improve class balance.
This repository contains a Google Colab notebook and a concise report-style README that summarizes the theory and the implementation steps.
Internet of Things (IoT) deployments are exposed to a wide spectrum of attacks (e.g., port scans, DoS, brute-force, botnet traffic). Signature-based IDS struggles with novel or rare patterns, while ML/DL classifiers can generalize betterβprovided the training data is representative.
However, many IoT datasets are imbalanced: some attack classes are under-represented, which hurts recall. Generative Adversarial Networks (GANs)βhere CTGAN for tabular dataβcan synthesize additional samples for the minority classes to balance the dataset and boost detection metrics.
- Based on the IoT-23 dataset (UNB/CIC) or a cleaned derivative.
- Typical pipeline: CSVs with network-flow features, label column (normal / multiple attack types), and optional train/test splits.
Dataset files are not included in this repo. Place them under
data/
when running locally or mount from Drive in Colab.
- Preprocessing
- Load CSV(s), drop irrelevant cols, handle missing values, encode categorical features, scale numeric features.
- Baseline Model
- Train a neural network (Keras MLP) or a classic ML model as a baseline; record metrics (Accuracy, Precision/Recall/F1, Confusion Matrix).
- Synthetic Data with CTGAN
- Train CTGAN on the training splitβfocusing on minority classesβand generate synthetic samples.
- Retrain with Augmented Data
- Concatenate real + synthetic data; retrain a robust model (e.g., RandomForest or improved MLP).
- Evaluation
- Compare baseline vs. augmented: class-wise precision/recall/F1, macro-F1, ROC-AUC (if applicable), and visualize the confusion matrix.
- Environment setup (Colab): install libs, mount Google Drive (optional).
- Load & preprocess: read CSV(s), encode & scale, split into train/test.
- Train baseline: fit model, log metrics, save artifacts to
results/
. - CTGAN training: fit on minority classes, generate N samples per class.
- Augmented training: mix real + synthetic, refit model, log metrics.
- Evaluation & plots: classification report, confusion matrix, and (optionally) feature importance for tree-based models.
- Open Google Colab and upload
notebooks/Untitled10.ipynb
(or open from GitHub). - Prepare data:
- Upload your CSV(s) to Colab, or
- Mount Google Drive and point the notebook to your data folder.
- Run the notebook cells in order (setup β preprocessing β baseline β CTGAN β retrain β evaluation).
- Results (figures, CSVs, models) can be saved under
results/
.
iot-attack-detection/
ββ notebooks/
β ββ Untitled10.ipynb # main Colab notebook
ββ results/ # generated plots/reports (ignored by Git)
ββ requirements.txt # Python dependencies
ββ .gitignore
ββ README.md
iot
, intrusion-detection
, cybersecurity
, machine-learning
, deep-learning
, gan
, ctgan
, tabular-data
- Replace
Untitled10.ipynb
with a clearer name (e.g.,iot23_gan_augmentation.ipynb
) once you finalize it. - If you need to reproduce on CPU-only machines, consider using RandomForest as baseline + augmented retraining (fast & strong for tabular data).
- Keep large datasets outside the repo (
data/
is ignored via.gitignore
).