keywords:
federated-learning
,asynchronous
,synchronous
,semi-asynchronous
,personalized
Table of Contents
One code adapts to multiple operating modes: thread
, process
, MPMT
, distributed
.
One-click start; change the experimental environment without modifying the code.
Support random seeds for reproducible experiments.
Redesigned the FL framework to be module with high extensibility, supporting various mainstream federated learning paradigms: synchronous
, asynchronous
, semi-asynchronous
, personalized
, etc.
With wandb, synchronize experimental data to the cloud, avoiding data loss.
For more project information, please see the wiki.
python3.8 + pytorch + linux
It has been validated on macOS.
It supports single GPU and Multi-GPU.
Install dependencies on an existing python environment using pip install -r requirements.txt
or
Create a new python environment using conda:
conda env create -f environment.yml
You can run python main.py
(the main file in the fl directory) directly. The program will automatically read the config.json
file in the root directory and store the results in the specified path under results
, along with the configuration file.
You can also specify the configuration file by python main.py ../../config.json
. Please note that the path of config.json
is relative to the main.py
.
The config
folder in the root directory provides some algorithm configuration files proposed in papers. The following algorithm implementations are currently available:
Centralized Learning
FedAvg
FedAsync
FedProx
FedAT
FedLC
FedDL
M-Step AsyncFL
FedBuff
FedAdam
FedNova
FedBN
TWAFL
more methods to refer to the wiki
Now you can directly pull and run a Docker image, the command is as follows:
docker pull desperadoccy/async-fl
docker run -it async-fl config/FedAvg-config.json
Similarly, it supports passing a config file path as a parameter. You can also build the Docker image yourself.
cd docker
docker build -t async-fl .
docker run -it async-fl config/FedAvg-config.json
- Asynchronous Federated Learning
- Support model and dataset replacement
- Support scheduling algorithm replacement
- Support aggregation algorithm replacement
- Support loss function replacement
- Support client replacement
- Synchronous federated learning
- Semi-asynchronous federated learning
- Provide test loss information
- Custom label heterogeneity
- Custom data heterogeneity
- Support Dirichlet distribution
- wandb visualization
- Support for multiple GPUs
- Docker deployment
- Process thread switching
Please refer to the wiki
Currently, there is a core issue in the framework that the communication between clients and servers is implemented using the multiprocessing
queues. However, when a CUDA tensor is received by the queue and retrieved by other threads, it can cause a memory leak and may cause the program to crash.
This bug is caused by PyTorch and the multiprocessing queue, and the current solution is to upload non-CUDA tensors to the queue and convert them to CUDA tensors during aggregation. Therefore, when adding aggregation algorithms, the following code will be needed:
updated_parameters = {}
for key, var in client_weights.items():
updated_parameters[key] = var.clone()
if torch.cuda.is_available():
updated_parameters[key] = updated_parameters[key].cuda()
Desperadoccy |
Jzj007 |
Cauchy |
Please cite our paper in your publications if this code helps your research.
@misc{chen2024fedmodulemodularfederatedlearning,
title={FedModule: A Modular Federated Learning Framework},
author={Chuyi Chen and Zhe Zhang and Yanchao Zhao},
year={2024},
eprint={2409.04849},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2409.04849},
}
We created a QQ group to discuss the asyncFL framework and FL, welcome everyone to join~~
Here is the group number:
895896624
QQ: 527707607
email: desperado@qq.com
Welcome to provide suggestions for the project~
if you'd like contribute to this project, please contact us.