A collection of academic articles, published methodology, and datasets on the subject of Privacy-Preserving Explainable AI.
A sortable version is available here: https://awesome-privex.github.io/
📌 We are actively tracking the latest research and welcome contributions to our repository and survey paper. If your studies are relevant, please feel free to create an issue or a pull request.
📰 2024-06-27: Our paper A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures has been revised into version 2 with new methods and dicussions.
If you find this work helpful in your research, welcome to cite the paper and give a ⭐.
Please read and cite our paper:
Nguyen, T.T., Huynh, T.T., Ren, Z., Nguyen, T.T., Nguyen, P.L., Yin, H. and Nguyen, Q.V.H., 2024. A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures. arXiv preprint arXiv:2404.00673.
@article{nguyen2024survey,
title={A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures},
author={Nguyen, Thanh Tam and Huynh, Thanh Trung and Ren, Zhao and Nguyen, Thanh Toan and Nguyen, Phi Le and Yin, Hongzhi and Nguyen, Quoc Viet Hung},
journal={arXiv preprint arXiv:2404.00673},
year={2024}
}
Dataset | #Items | Disk Size | Downstream Explanations | #Papers Used |
---|---|---|---|---|
MNIST | 70K | 11MB | Counterfactuals, Gradient | 4 |
CIFAR | 60K | 163MB | Gradient | 4 |
SVHN | 600K | 400MB+ | Gradient | 1 |
Food101 | 100K+ | 10GB | Case-based | 1 |
Flowers102 | 8K+ | 300MB+ | Case-based | 1 |
Cervical | 8K+ | 46GB+ | Case-based, Interpretable Models | 1 |
CheXpert | 220K+ | GBs | Black-box | 1 |
Facial Expression | 12K+ | 63MB | Gradient | 1 |
Celeb | 200K | GBs | Counterfactuals, Shapley, Gradient, Perturbation | 1 |
Dataset | #Items | Disk Size | Downstream Explanations | #Papers Used |
---|---|---|---|---|
Adult | 48K+ | 10MB | Counterfactuals, Shapley | 10+ |
COMPAS | 7K+ | 25MB | Counterfactuals, Shapley | 2 |
FICO | 10K+ | ≤ 1MB | Counterfactuals, Shapley | 4 |
Boston Housing | 500+ | ≤ 1MB | Counterfactuals, Shapley | 1 |
German Credit | 1K | ≤ 1MB | Counterfactuals, Shapley | 4 |
Student Admission | 500 | ≤ 1MB | Counterfactuals, Shapley, Gradient, Perturbation | 1 |
Student Performance | 10K | ≤ 1MB | Counterfactuals, Shapley | 1 |
GMSC | 150K+ | 15MB | Interpretable models, Counterfactuals | 2 |
Diabetes | 100K+ | 20MB | Feature-based | 5 |
Breast Cancer | 569 | < 1MB | Feature-based | 1 |
Dataset | #Items | Disk Size | Downstream Explanations | #Papers Used |
---|---|---|---|---|
Cora | 2K+ | 4.5MB | Feature-based | 1 |
Bitcoin | 30K | ≤ 1MB | Counterfactuals | 1 |
CIC-IDS2017 | 2.8M+ | 500MB | Black-box | 1 |
Dataset | #Items | Disk Size | Downstream Explanations | #Papers Used |
---|---|---|---|---|
IMDB Review | 50K | 66MB | Black-box | 1 |
Category | Evaluation Metrics | Formula/Description | Usage |
---|---|---|---|
Explanation Utility | Counterfactual validity | Assess the range of attribute values within k-anonymous counterfactual instances. Consider all attributes, including those beyond quasi-identifiers |
|
Classification metric | Assess equivalence classes within anonymized datasets, focusing on class label uniformity. |
||
Faithfulness (RDT-Fidelity) |
|
Reflect how often the model's predictions are unchanged despite perturbations to the input, which would suggest that the explanation is effectively capturing the reasoning behind the model's predictions. |
|
Sparsity | A complete and faithful explanation to the model should inherently be sparse, focusing only on a select subset of features that are most predictive of the model's decision. |
||
Information Loss | Normalised Certainty Penalty (NCP) |
Higher NCP values indicate a greater degree of generalization and more information loss. This metric helps in assessing the balance between data privacy and utility. |
|
Discernibility | Measure the penalties on tuples in a dataset after k-anonymization, reflecting how indistinguishable they are post-anonymization |
||
Approximation Loss | Measure the error caused by randomness added when minimizing the privacy loss as the expected deviation of the randomized explanation from the best local approximation |
||
Explanation Intersection | The percentage of bits in the original explanation that is retained in the privatised explanation after using differential privacy |
The higher the better but due to privacy-utility trade-off, this metric should not be 100%. |
|
Privacy Degree | k-anonymity | A person's information is indistinguishable from at least k-1 other individuals.
|
Refers to the number of individuals in the training dataset to whom a given explanation could potentially be linked. |
Information Leakage | If an adversary can access model explanations, they would not gain any additional information that could help in inferring something about the training data beyond what could be learned from the model predictions alone. |
||
Privacy Budget | The total privacy budget for all queries is fixed at |
The explanation algorithm must not exceed the overall budget across all queries. Stricter requirement |
|
Attack Success | Precision/Recall/F1 |
|
Evaluate an attack's effectiveness in correctly and completely identifying the properties it is designed to infer. |
Balanced Accuracy | Measures the accuracy of attack (e.g., membership prediction in membership inference attacks), on a balanced dataset of members and non-members. |
||
ROC/AUC | The ROC curve plots the true positive rate against the false positive rate at various threshold settings. |
An AUC near 1 indicates a highly successful privacy attack, while an AUC close to 0.5 suggests no better performance than random guessing. |
|
TPR at Low FPR | Report TPR at a fixed FPR (e.g., 0.1%). | If an attack can pinpoint even a minuscule fraction of the training dataset with high precision, then the attack ought to be deemed effective. |
|
Mean Absolute Error (MAE) | Gives an overview of how accurately an attack can reconstruct private inputs by averaging the absolute differences across all samples and features. |
||
Success Rate (SR) | The ratio of successfully reconstructed features to the total number of features across all samples |
||
Model Agreement | A higher agreement indicates that the substitute model is more similar to the original model. When comparing two model extraction methods with the same agreement, the one with the lower standard deviation is preferred. |
||
Average Uncertainty Reduction | The degree to which a data reconstruction attack is accurate, measured by the reduction in uncertainty across all features of all samples in the dataset |
Disclaimer
Feel free to contact us if you have any queries or exciting news. In addition, we welcome all researchers to contribute to this repository and further contribute to the knowledge of this field.
If you have some other related references, please feel free to create a Github issue with the paper information. We will glady update the repos according to your suggestions. (You can also create pull requests, but it might take some time for us to do the merge)