This repository contains the official implementation (in PyTorch) of Attribute Inference Attack of Speech Emotion Recognition in Federated Learning.
We extract a variety of speech representations using OpenSMILE toolkit and pretrained models. You can refer to OpenSMILE and SUPERB paper for more information.
Below is a listed of features that we include in the current experiment:
Publication Date | Model | Name | Paper | Input | Stride | Pre-train Data | Official Repo |
---|---|---|---|---|---|---|---|
--- | EmoBase | --- | MM'10 | Speech | --- | --- | EmoBase |
5 Apr 2019 | APC | apc | arxiv | Mel | 10ms | LibriSpeech-360 | APC |
17 May 2020 | VQ-APC | vq_apc | arxiv | Mel | 10ms | LibriSpeech-360 | NPC |
12 Jul 2020 | TERA | tera | arxiv | Mel | 10ms | LibriSpeech-960 | S3PRL |
1 Nov 2020 | NPC | npc | arxiv | Mel | 10ms | LibriSpeech-360 | NPC |
Dec 11 2020 | DeCoAR 2.0 | decoar2 | arxiv | Mel | 10ms | LibriSpeech-960 | speech-representations |
Let's recap the basic of the FL.
-
In a typical FL training round shown in the figure below, a subset of selected clients receive a global model, which they can locally train with their private data.
-
Afterward, the clients only share their model updates (model parameters/gradients) to the central server.
-
Finally, the server aggregates the model updates to obtain the global model for the next training round.
Two common scenarios in FL are:
Table shows the prediction results of the SER model trained in two FL scenarios: FedSGD and FedAvg. We report the accuracy and unweighted average recall (UAR) scores of the SER task on each individual data set. In the baseline experiment, we set the learning rate as 0.05 and 0.0005 in FedSGD and FedAvg, respectively. The local batch size is 20, and global training epoch is set to 200. 10% of the clients participant in each global training epoch.
The figure shows the problem setup of the attribute inference attack in this work. The primary application is SER, where the adversaries (the outside attacker or the curious server) attempt to predict the gender (the sensitive attribute) using the shared model updates training the SER model.
Our attack framework mimics the attack framework commonly used in the membership inference attack (MIA). The attack framework consists of training shadow models, forming attack trianing data set, and training the attack model as shown below.
The idea of the shadow training is to mimic the private training. Here, we train each shadow model with the same hyperparameters used in the private FL training. We train 5 shadow models in our experiment.
Here, we construct our attack training data set using the gradients input data and the client's gender label. We pick 80% of the data as training, and rest as validation. The test set are the shared model updates generated in the private training, and the attack model aims to predict the gender label of the client in the private training data set.
Our attack model architecture is shown below:
The short answer is: inferring gender (UAR score in the table) of the client using the shared model updates is a trivial task when training the SER model in both FedSGD and FedAvg.
The short answer is: the shared updates between feature input and first dense layer (UAR score in the table).
The short answer is: the increased dropout makes the attack stronger in this attack (UAR score in the table).
@misc{feng2021attribute,
title={Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings},
author={Tiantian Feng and Hanieh Hashemi and Rajat Hebbar and Murali Annavaram and Shrikanth S. Narayanan},
year={2021},
eprint={2112.13416},
archivePrefix={arXiv}
}
@inproceedings{eyben2010opensmile,
title={Opensmile: the munich versatile and fast open-source audio feature extractor},
author={Eyben, Florian and W{\"o}llmer, Martin and Schuller, Bj{\"o}rn},
booktitle={Proceedings of the 18th ACM international conference on Multimedia},
pages={1459--1462},
year={2010}
}
@inproceedings{yang21c_interspeech,
author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
title={{SUPERB: Speech Processing Universal PERformance Benchmark}},
year=2021,
booktitle={Proc. Interspeech 2021},
pages={1194--1198},
doi={10.21437/Interspeech.2021-1775}
}