Skip to content

Bitcoin Illicit Transaction Detection with Anomaly Detection and Graph Neural Networks using the Elliptic dataset from Kaggle

License

Notifications You must be signed in to change notification settings

mstampfer/kaggle-elliptic-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bitcoin Illicit Transaction Detection with Graph Neural Networks

A Graph Neural Network (GNN) implementation for detecting illicit Bitcoin transactions using the Elliptic dataset from Kaggle. This project uses PyTorch Geometric and PyTorch Lightning to build a sophisticated GNN model for anomaly detection and identifying potentially fraudulent cryptocurrency transactions through graph-based pattern recognition.

Features

  • Graph Neural Network Architecture: Multi-layer GCN with batch normalization and dropout
  • Automated Data Loading: Downloads Elliptic dataset from Kaggle automatically
  • Model Training: PyTorch Lightning integration with callbacks and logging
  • Anomaly Detection: Advanced pattern recognition for identifying suspicious transaction behaviors
  • Evaluation Metrics: ROC-AUC, classification reports, and confusion matrices
  • Suspicious Node Detection: Identifies unknown transactions likely to be illicit through anomaly scoring
  • Visualization: t-SNE embeddings and subgraph visualization around suspicious nodes
  • Model Checkpointing: Automatic saving of best performing models

Dataset

The Elliptic Dataset contains Bitcoin transaction data with:

  • 203,769 transactions (nodes)
  • 234,355 directed edges representing Bitcoin flows
  • 166 node features including transaction amounts, timestamps, and aggregated features
  • Labels: Illicit (1), Licit (2), or Unknown

Requirements

See requirements.txt for full dependencies. Key packages:

  • PyTorch & PyTorch Geometric
  • PyTorch Lightning
  • Pandas, NumPy, Scikit-learn
  • NetworkX, Matplotlib
  • Kaggle API (for automatic dataset download)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/Kaggle_Elliptic_Dataset.git
cd Kaggle_Elliptic_Dataset
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up Kaggle API credentials (for automatic dataset download):

Usage

Basic Usage

from kaggle_elliptic_dataset import BitcoinIllicitGNNDetector

# Initialize detector
detector = BitcoinIllicitGNNDetector(hidden_dim=128, learning_rate=0.001)

# Load data (downloads automatically if needed)
detector.load_elliptic_data(data_dir='./elliptic_data')

# Prepare graph data
detector.prepare_graph_data()

# Train model
detector.train_model(max_epochs=200, patience=20)

# Evaluate on test set
y_true, y_pred, y_prob = detector.evaluate_model()

# Find suspicious unknown nodes through anomaly detection
suspicious_nodes = detector.identify_suspicious_unknown_nodes(threshold=0.7, top_k=50)

Advanced Features

# Analyze learned embeddings
embeddings, embeddings_2d = detector.analyze_node_embeddings()

# Visualize suspicious node neighborhoods
detector.visualize_suspicious_subgraph(suspicious_nodes[0]['txId'], num_hops=2)

Model Architecture

The GNN model consists of:

  • 3 GCN layers with batch normalization and ReLU activation
  • Dropout layers for regularization
  • Fully connected layers for final classification
  • Adam optimizer with learning rate scheduling

Results

The model achieves:

  • ROC-AUC: ~0.81 on test set
  • Precision/Recall: Balanced performance on both classes
  • Anomaly Detection: Identifies high-risk unknown transactions through pattern analysis

Files Structure

├── kaggle_elliptic_dataset.py    # Main implementation
├── elliptic_data/                # Dataset directory
│   ├── elliptic_txs_features.csv
│   ├── elliptic_txs_classes.csv
│   └── elliptic_txs_edgelist.csv
├── lightning_logs/               # Training logs and checkpoints
├── requirements.txt              # Python dependencies
└── README.md                     # This file

Key Classes

  • BitcoinIllicitGNNDetector: Main wrapper class for the entire pipeline
  • BitcoinGNNDetector: PyTorch Lightning module for training
  • GNNModel: Core GNN architecture with GCN layers

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this code in your research, please cite:

@dataset{elliptic_dataset,
  title={The Elliptic Data Set},
  author={Weber, Mark and Chen, Giacomo and Mendez, Manuel and Altintas, Alpay and Coscia, Michele and McNeeley, Bridgette},
  url={https://www.kaggle.com/ellipticco/elliptic-data-set},
  year={2019}
}

Acknowledgments

About

Bitcoin Illicit Transaction Detection with Anomaly Detection and Graph Neural Networks using the Elliptic dataset from Kaggle

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages