Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey

Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey [Paper Link]

Mang Ye, Wei Shen, Bo Du, Eduard Snezhko, Vassili Kovalev, Pong C. Yuen

Wuhan University, National Academy of Sciences of Belarus, Hong Kong Baptist University

Abstract Vertical Federated Learning (VFL) is a privacy-preserving distributed learning paradigm where different parties collaboratively learn models using partitioned features of shared samples, without leaking private data. Recent research has shown promising results addressing various challenges in VFL, highlighting its potential for practical applications in cross-domain collaboration. However, the corresponding research is scattered and lacks organization. To advance VFL research, this survey offers a systematic overview of recent developments. First, we provide a history and background introduction, along with a summary of the general training protocol of VFL. We then revisit the taxonomy in recent reviews and analyze limitations in-depth. For a comprehensive and structured discussion, we synthesize recent research from three fundamental perspectives: effectiveness, security and applicability. Finally, we discuss several critical future research directions in VFL, which will facilitate the developments in this field.

Survey for Vertical Federated Learning, by MARS Group at Wuhan University, led by Prof. Mang Ye.

Table of Contents

Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey
- Our Works
- Vertical Federated Learning Survey

Our Works

Survey

Vertical Federated Learning for Effectiveness, Applicability, Security: A Survey arXiv 2024 [Code]
A Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark TPAMI 2024 [Code]
Heterogeneous Federated Learning: State-of-the-art and Research Challenges ACM Computing Surveys 2023 [Code]

Generalized Federated Learning

AbrFun - Revisiting Federated Learning with Label Skew: An Over-Confidence Perspective SCIS 2024

We investigate the label skew of federated learning from an over-confidence perspective.
RUCR - Federated Learning with Long-Tailed Data via Representation Unification and Classifier Rectification TIFS 2024 [Code]

We handle the long-tail problem in federated learning by representation unification and classifier rectification.
FedHEAL — Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity CVPR 2024 [Code]

We investigate the fairness of federated learning under domain skew with local consistency and domain diversity.
FCCL+ — Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning TPAMI 2023 [Code]

We handle mode heterogeneous federated learning from feature and logits aspects.
FPL — Rethinking Federated Learning with Domain Shift: A Prototype View CVPR 2023 [Code]

We handle federated learning with domain shift from the prototype view.
FCCL — Learn from Others and Be Yourself in Heterogeneous Federated Learning CVPR 2022 [Code]

We investigate heterogeneity problems and catastrophic forgetting in federated learning.
FSMAFL — Few-Shot Model Agnostic Federated Learning ACMMM 2022 [Code]
We study a challenging problem, namely few-shot model agnostic federated learning.

Robust Federated Learning

SDEAHFL — Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning ICML 2024 [Code]

We deal with robust heterogeneous federated learning under byzantine attack.
AugHFL — Robust Heterogeneous Federated Learning under Data Corruption ICCV 2023 [Code]

We deal with robust heterogeneous federated learning under data corruption.
RHFL — Robust Federated Learning With Noisy and Heterogeneous Clients CVPR 2022 [Code]

We deal with robust federated learning with noisy and heterogeneous clients.

Personalized Federated Learning

FedAS — FedAS: Bridging Inconsistency in Personalized Fedearated Learning CVPR 2024 [Code]

we present a novel PFL framework with federated parameter-alignment and client-synchronization.
FedDPA — Dynamic Personalized Federated Learning with Adaptive Differential Privacy NeurIPS 2023 [Code]

we propose a novel adaptive method for personalized federated learning with differential privacy.

Federated Graph Learning

FGGP — Federated Graph Learning under Domain Shift with Generalizable Prototypes AAAI 2024 [Code]

We deal with federated graph learning under domain shift with generalizable prototypes.
FGSSL — Federated Graph Semantic and Structural Learning IJCAI 2023 [Code]

We handle federated graph learning from node-level semantic and graph-level structure.

Vertical Federated Learning Survey

Survey Outline

Preliminary

Example of VFL

We present a practical cross-domain collaboration with three participants: mall, video platform, and bank. The mall acts as the active client, collaborating with the video platform and the bank as passive clients. Each client holds the local features and models of the same users. The active client holds the task labels, e.g., whether buying the cigar. A global model is introduced to make the final prediction of the shared/aligned users by aggregating feature embeddings from all clients. With prediction results and labels, the gradients can be calculated for both global and local model updation. Besides, a third-party coordinator can be employed for secure communication and sample alignment.

The Training and Testing Flow

(a) During training, aligned sample embeddings are sent to the active client, where gradients are calculated based on task labels. The overall objective is to optimize for collaborative prediction. These gradients are then sent back to each client for model updating. (b) During testing, predictions on aligned samples are made utilizing the trained global and local models.

Basic Research Directions

☀️Effectiveness

Model Design

Tree-based Model

Privacy preserving vertical federated learning for tree-based models VLDB 2020
Securegbm: Secure multi-party gradient boosting IEEE International Conference on Big Data (Big Data) 2019
Secureboost: A lossless federated learning framework IEEE Intelligent Systems 2021
SecureBoost+ : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning arXiv 2021
Large-scale Secure XGB for Vertical Federated Learning CIKM 2021
Federboost: Private federated learning for gbdt arXiv 2022
Federated Forest IEEE Transactions on Big Data 2022
An efficient and robust system for vertically federated random forest arXiv 2022
Verifiable privacy-preserving scheme based on vertical federated random forest IEEE Internet of Things Journal 2022
Squirrel: A Scalable Secure two-party Computation Framework for Training Gradient Boosting Decision Tree USENIX Security 2023
Effective and Efficient Federated Tree Learning on Hybrid Data ICLR 2024

Neural Network-based Model

Multi-participant multi-class vertical federated learning arXiv 2020
A secure federated transfer learning framework IEEE Intelligent Systems 2020
Pyvertical: A vertical federated learning framework for multi-headed splitnn ICLR 2021 Workshop
Fedsl: Federated split learning on distributed sequential data in recurrent neural networks Multimedia Tools and Applications 2024

Feature & Client Selection

Feature Selection

Vertical federated learning-based feature selection with non-overlapping sample utilization Expert Systems with Applications 2022
Secure Feature Selection for Vertical Federated Learning in eHealth Systems ICC 2022-IEEE International Conference on Communications 2022
FEAST: A Communication-efficient Federated Feature Selection Framework for Relational Data Proceedings of the ACM on Management of Data 2023
An embedded vertical-federated feature selection algorithm based on particle swarm optimisation CAAI Transactions on Intelligence Technology, 2023
FedSDG-FS: Efficient and Secure Feature Selection for Vertical Federated Learning INFOCOM 2023
LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning ICML 2023

Client Selection

Measure Contribution of Participants in Federated Learning IEEE International Conference on Big Data (Big Data) 2019
VF-PS: How to Select Important Participants in Vertical Federated Learning, Efficiently and Securely? NeurIPS 2022
Fair and efficient contribution valuation for vertical federated learning ICLR 2024

☀️Security

Privacy Leakage

Secure Alignment

Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption arXiv 2017
Multi-party private set intersection in vertical federated learnin TrustCom 2020
Vertical federated learning without revealing intersection membership arXiv 2021

Secure Embedding Transportation

Secure Gradient Transportation

FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data AISec 2021
Vertically Federated Learning with Correlated Differential Privacy Electronics 2022
Adaptive differential privacy in vertical federated learning for mobility forecasting Future Generation Computer Systems 2023

Inference Attack

Feature Inference Attack

Feature inference attack on model predictions in vertical federated learning ICDE 2021
Cafe: Catastrophic data leakage in vertical federated learning NeurIPS 2021
Privacy against inference attacks in vertical federated learning arXiv 2022
Feature reconstruction attacks and countermeasures of dnn training in vertical federated learning arXiv 2022
Practical feature inference attack in vertical federated learning during prediction in artificial internet of things IEEE Internet of Things Journal 2023

Label Inference Attack

Label leakage and protection in two-party split learning arXiv 2021
Batch label inference and replacement attacks in black-boxed vertical federated learning arXiv 2022
Label leakage and protection from forward embedding in vertical federated learning arXiv 2022
Label inference attacks against vertical federated learning USENIX Security 2022
Exploit: Extracting private labels in split learning
Your Labels are Selling You Out: Relation Leaks in Vertical Federated Learning TDSC 2022

Destructive Attack

Backdoor Attack

Backdoor attacks and defenses in feature-partitioned collaborative learning ICML Workshop 2020
Graph-fraudster: Adversarial attacks on graph neural network-based vertical federated learning IEEE Transactions on Computational Social Systems 2022
LR-BA: Backdoor attack against vertical federated learning using local latent representations Computers & Security, 2023
Backdoor Attack Against Split Neural Network-Based Vertical Federated Learning TIFS 2023
Villain: Backdoor Attacks Against Vertical Split Learning USENIX Security 2023
Practical and general backdoor attacks against vertical federated learning ECML PKDD 2023
Universal adversarial backdoor attacks to fool vertical federated learning Computers & Security 2024

Poison Attack

Defense

Defense Against Feature Inference Attack

Defending against reconstruction attack in vertical federated learning ICML Workshop 2021
Secure Split Learning against Property Inference and Data Reconstruction Attacks arXiv 2022
FedPass: Privacy-Preserving Vertical Federated Deep Learning with Adaptive Obfuscation IJCAI 2023
Vulnerabilities of Data Protection in Vertical Federated Learning Training and Countermeasures TIFS 2024
Gradient-based defense methods for data leakage in vertical federated learning Computers & Security 2024

Defense Against Label Inference Attack

Label leakage and protection in two-party split learning arXiv 2021
Rvfr: Robust vertical federated learning via feature subspace recovery NeurIPS Workshop 2021
Label leakage and protection from forward embedding in vertical federated learning arXiv 2022
Defending Batch-Level Label Inference and Replacement Attacks in Vertical Federated Learning IEEE Transactions on Big Data 2022
Making split learning resilient to label leakage by potential energy loss arXiv 2022
Differentially private label protection in split learning arXiv 2022
Eliminating Label Leakage in Tree-Based Vertical Federated Learning arXiv 2023
Beyond model splitting: Preventing label inference attacks in vertical federated learning with dispersed training World Wide Web Journal 2023
FLSG: A Novel Defense Strategy Against Inference Attacks in Vertical Federated Learning IEEE Internet of Things Journal 2024
ProjPert: Projection-based Perturbation for Label Protection in Split Learning based Vertical Federated Learning TKDE 2024
HashVFL: Defending Against Data Reconstruction Attacks in Vertical Federated Learning TIFS 2024
Vulnerabilities of Data Protection in Vertical Federated Learning Training and Countermeasures TIFS 2024

Defense Against Destructive Attack

Rvfr: Robust vertical federated learning via feature subspace recovery NeurIPS Workshop 2021
Backdoor Attack Against Split Neural Network-Based Vertical Federated Learning TIFS 2023
VFedAD: A Defense Method Based on the Information Mechanism Behind the Vertical Federated Data Poisoning Attack CIKM 2023
A GAN-based data poisoning framework against anomaly detection in vertical federated learning arXiv 2024
Hijack Vertical Federated Learning Models As One Party TDSC 2024

☀️Applicability

Limited Data

Limited Aligned Samples

FedCVT: Semi-supervised Vertical Federated Learning with Cross-view Training TIST 2022
Multi-View Federated Learning with Data Collaboration ICMLC 2022
Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples ICCV 2023

Limited Labels

Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study JMIR Medical Informatics 2021
Practical Vertical Federated Learning with Unsupervised Representation Learning IEEE Transactions on Big Data 2022
Self-supervised vertical federated learning NeurIPS Workshop 2022
A hybrid self-supervised learning framework for vertical federated learning arxiv 2023

Large Communication Burden

Federated doubly stochastic kernel learning for vertically partitioned data KDD 2020
AsySQN: Faster Vertical Federated Learning Algorithms with Better Computation Resource Utilization KDD 2021
Communication-Efficient Vertical Federated Learning Algorithms 2022
FedBCD: A communication-efficient collaborative learning framework for distributed features IEEE Transactions on Signal Processing 2022
Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data ICML 2022
Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates VLDB 2022
Cheetah: Lean and fast secure two-party deep neural network inference USENIX Security 2022
SparseVFL: Communication-Efficient Vertical Federated Learning Based on Sparsification of Embeddings and Gradients KDD FL4Data-Mining 2023
A Unified Solution for Privacy and Communication Efficiency in Vertical Federated Learning NeurIPS 2023

Client Asynchrony

VAFL: a Method of Vertical Asynchronous Federated Learning ICML Workshop 2020
Efficient Asynchronous Vertical Federated Learning via Gradient Prediction and Double-End Sparse Compression ICARCV 2020
Secure Bilevel Asynchronous Vertical Federated Learning with Backward Updating AAAI 2021
Privacy-Preserving Asynchronous Vertical Federated Learning Algorithms for Multiparty Collaborative Learning TNNLS 2022
Efficient Asynchronous Multi-Participant Vertical Federated Learning IEEE Transactions on Big Data 2022
vfedsec: Efficient secure aggregation for vertical federated learning via secure layer arXiv 2023
Robust and ip-protecting vertical federated learning against unexpected quitting of parties arXiv 2023
Fedvs: Straggler-resilient and privacy-preserving vertical federated learning for split models ICML 2023

Future Directions

Effectiveness/Applicability and Security Trade-off

A framework for evaluating privacy-utility trade-off in vertical federated learning arXiv 2022
Privacy Tradeoffs in Vertical Federated Learning Federated Learning Systems (FLSys) Workshop 2023

Effectiveness Facilitates Security and Applicability

No recent works and will be a critical direction in the future.

Open Issues

Practical Datasets

Robustness and Generalization

Generalization to Unfair Prediction Bias

Achieving model fairness in vertical federated learning arXiv 2021
Fairvfl: A fair vertical federated learning framework with contrastive adversarial learning NeurIPS 2022

VFL on Different Data Variants

Multi-Modal Data

A Multi-Modal Vertical Federated Learning Framework Based on Homomorphic Encryption TIFS 2023

Graph Data

Fedsgc: Federated simple graph convolution for node classification IJCAI Workshop 2021
A vertical federated learning framework for graph convolutional network arXiv 2021
Vertically federated graph neural network for privacy-preserving node classification IJCAI 2022
Graph-fraudster: Adversarial attacks on graph neural network-based vertical federated learning IEEE Transactions on Computational Social Systems 2023
Glasu: A communication-efficient algorithm for federated learning with vertically distributed graph data arXiv 2023
Vertical federated graph neural network for recommender system ICML 2023
Privacy-preserving design of graph neural networks with applications to vertical federated learning NeurIPS Workshop 2023

VFL with Foundation Models

Input Reconstruction Attack against Vertical Federated Large Language Models arXiv 2023

Please kindly cite the paper if it helps your research, thanks!

@article{ye2024vertical,
  title={Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey},
  author={Ye, Mang and Shen, Wei and Snezhko, Eduard and Kovalev, Vassili and Yuen, Pong C and Du, Bo},
  journal={arXiv preprint arXiv:2405.17495},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Figures		Figures
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

shentt67/VFL_Survey

Folders and files

Latest commit

History

Repository files navigation

Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey

Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey [Paper Link]

Our Works

Survey

Generalized Federated Learning

Robust Federated Learning

Personalized Federated Learning

Federated Graph Learning

Vertical Federated Learning Survey

Survey Outline

Preliminary

Example of VFL

The Training and Testing Flow

Basic Research Directions

☀️Effectiveness

Model Design

Feature & Client Selection

☀️Security

Privacy Leakage

Inference Attack

Destructive Attack

Defense

☀️Applicability

Limited Data

Large Communication Burden

Client Asynchrony

Future Directions

Effectiveness/Applicability and Security Trade-off

Effectiveness Facilitates Security and Applicability

Open Issues

Practical Datasets

Robustness and Generalization

VFL on Different Data Variants

VFL with Foundation Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages