- Our FLHetBench consists of 1) two sampling methods, DPGMM for continuous device database and DPCSM for discrete state database, to sample real-world device and state datasets with varying heterogeneity; and 2) various metrics (DevMC-R/T for device heterogeneity, StatMC-R/T for state heterogeneity, and InterMC-R/T for both device and state heterogeneity) to assess the device/state heterogeneity in FL.
metric.py
contains the methods for calculating DevMC-R/T(device heterogeneity), StatMC-R/T(state heterogeneity) and InterMC-R/T(their interplay).sampling.py
contains DPGMM and DPCSM.data/
folder contains all the databases used and the sampled heterogeneous datasets.bench/
folder contains benchmark framework.
pip install -r requirements.txt
- Baseline real-world device database (each device is characterized with both computational latency and communication latency)
- For communication latency, using MobiPerf
data/mobiperf_tcp_down_2018.json
data/mobiperf_tcp_down_2019.json
- For computational latency, using AI-Benchmark and our proposed training latency data. Our data will be dynamically updated, and we sincerely invite more people to participate. If you're interested, click on the link to learn more
data/device_latency.json
- For communication latency, using MobiPerf
- dataset for state heterogeneity from FLASH.
data/cached_timers.json
- OpenImage
- Download the data partition from OpenImage
- Put the downloaded openImg.npy at sub-folder
bench/data
- COVID-FL dataset
- Download the data and partitions file from COVID-FL
TL;DR: For simplicity, you can use the device and state heterogeneity data provided in bench/cached_sample_data
directly (please refer to Sec 2.) or build your own heterogeneous environment with DPGMM and DPCSM
- Device speed(device heterogeneity) and client states(state heterogeneity) of the sampled datasets with different heterogeneity levels (mild, middle and severe).
DPGMM (Device heterogeneity) can generate device databases with varying heterogeneity degrees while maintaining consistent average speed across the sampled devices by following steps:
-
Set the average speed of devices
mu
based on your experimental settings. -
Control heterogeneity using the total number of distinct devices
K
allocated ton
clients (K <= n
). A larger value ofK
indicates more distinct samples. -
Using
sigma
to control the variation of the selectedK
devices. A lager value ofsigma
indicates greater the speed difference between the distinctK
devices.
Here is an example of sampling 2,466 clients with K_n=50, σ=0.
# network speed infomations refer to 'data/mobiperf_tcp_down_2018.json'
speed_info = json.load(open("data/mobiperf_tcp_down_2018.json", "r"))
# Here is an example of sampling 2,466 clients with K_n=50, σ=0
n = 2466 # number of clients
mu = 4000 # expected average speeed
K = 50 # number of distinct clusters, the same as K_n in the paper
simga = 0. # control of divergence, the same as σ in the paper
random_seed = 42
_, sampled_speed_mean, sampled_speed_std, samples = DPGMM_sampling(speed_info, mu0=mu, K=k, sigma=sigma, n=2466, seed=random_seed)
DPCSM (State heterogeneity) accept two parameters (start_rank
and alpha
) can generate state databases with varying heterogeneity levels. DPCSM will first sort the Sort the state data by their scores in score_dict
. You can control the heterogeneity levels of sampled state data by following steps:
- The
start_rank
represent the rank of the optimal state from the baseline dataset, i.e., selecting states$D_{(startRank)}>\cdots>D_{(N)}$ from$D_{(1)}>\cdots>D_{(N)}$ where$D_{(i)}$ indicates the score of state data. A lowerstart_rank
indicates a higher score of optimal state. - Using
alpha
to control the divergence of sampled state data. A smalleralpha
leads to a lower probability of selecting subsequent states, concentrating on states with higher score.
# state score dict used for sampling by DPCSM
score_dict = {
'681': 0.1,
'573': 0.2,
...
}
n = 2466 # number of clients
alpha = 100 # control of divergence, the same as α in the paper
start_rank = 0 # control of start rank, the same as StartRank in the paper
# return a list of length n=2466 with elements that are keys in score_dict
samples = DPCSM_sampling(score_dict, n=2466, alpha=alpha, start_rank=start_rank)
We use DevMC-R/T for assessing device heterogeneity. We use StatMC-R/T to assess state heterogeneity and InterMC-R/T for their interplays.
Please refer to metric_example.ipynb for snippets.
In our paper, we used different heterogeneous device and state data from the folder bench/cached_sample_data
to combine them in pairs to obtain
- Vision Transformer: download the file from ViT-B_16 and put it under the sub-folder
bench/checkpoint
# cmd for benchmark
# set aggregation strategy, heterogeneous device data and state data
# Here is a example of benchmarking FedAVG with heterogeneous device data = case1.json and heterogeneous state data = case2.json
python main.py --config configs/default.cfg --aggregation_strategy deadline --device_path cached_sample_data/device/case1.json --state_path cached_sample_data/state/case2.json
Some important tags for training settings:
--dataset_name
: we provide implement of OpenImage and COVID-FL in main.py.--aggregation_strategy
: type of sever aggregation strategy, supports ["deadline", "readiness"].--deadline
: round deadline for deadline-based strategy.--num_rounds
: total communication rounds for deadline-based strategy.--target_acc
: target performance for readiness-based strategy.--device_path
: the sampled heterogeneous device data file path.--state_path
: the sampled heterogeneous state data file path.
BiasPrompt+ comprises two modules: a gradient surgery-based staleness-aware aggregation strategy (bench/helpers/gradient_surgery_helper.py
) and a communication-efficient module BiasPrompt (bench/models/BiasPrompt.py
) based on fast weights.
# shells for BiasPrompt baseline with cuda:0
bash shells/biasprompt.sh 0
If you find our code or paper useful, please consider citing:
@InProceedings{Zhang_2024_CVPR,
author = {Zhang, Junyuan and Zeng, Shuang and Zhang, Miao and Wang, Runxi and Wang, Feifei and Zhou, Yuyin and Liang, Paul Pu and Qu, Liangqiong},
title = {FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {12098-12108}
}
-
Our code is based on PKU-Chengxu/FLASH (github.com)
-
ResNet50 and ViT implementations are based on https://github.com/rwightman/pytorch-image-models and vpt