The repository of GALG, a graph-based artificial intelligence approach to link addresses for user tracking on TLS encrypted traffic.
GALG uses the framework of Graph Auto-encoder and adversarial training to learn the user embedding with semantics and distributions. Employing a new theory – link generation, GALG could link all the addresses of target users from the knowledge of address-service links.
The work is introduced in the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2022):
Tianyu Cui, Gang Xiong, Chang Liu, Junzheng Shi, Peipei Fu, Gaopeng Gou. GALG: Linking Addresses in Tracking Ecosystem Using Graph Autoencoder with Link Generation. European Conference on Practice of Knowledge Discovery in Databases 2022.
Note: this code is based on GAE, ARGA, GAT, and Link Prediction Experiments. Many thanks to the authors.
- python 3
- TensorFlow (1.0 or later)
- gensim
- networkx
- scikit-learn
- scipy
python main.py
For privacy consideration, here we only provide the public dataset we used in the paper.
CSTNET
: CSTNET is a public dataset collected from March to July 2018 on China Science and Technology Network (CSTNET).
If you want to use your own data, please check if the data format is the same as data/cstnet.json
and specify the data path in main.py
.
You can choose between the following models:
GALG
: Graph Auto-Encoder for Link GenerationVGALG
: Variational Graph Auto-Encoder for Link Generation
We provide the utils for extensive experiments on the task of user tracking and link generation:
baselines
: All link prediction methods modified with the link generation framework.
The link prediction methods include:
- (Variational) Graph Auto-Encoders: An end-to-end trainable convolutional neural network model for unsupervised learning on graphs
- Adversarially Regularized (Variational) Graph Autoencoder: An adversarial graph embedding framework for robust graph embedding learning
- Node2Vec/DeepWalk: A skip-gram based approach to learning node embeddings from random walks within a given graph
- Spectral Clustering: Using spectral embeddings to create node representations from an adjacency matrix
- Heuristics: Common Neighbors, Jaccard, and Preferential Attachment