Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence
This repository contains the code for reproducing the experimental results and simulations as presented in the paper:
Philip Jordan, Florian Grötschla, Flint Xiaofeng Fan and Roger Wattenhofer. "Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence." 2024.
Our paper introduces a decentralized Byzantine fault-tolerant federated reinforcement learning method: Each agent is acting in its local environment, and, without exchanging raw trajectories (but only policies and policy gradients), agents aim to collaboratively learn a common task. The merit of our method is in its provable convergence (and competitive sample complexity bounds) despite a fraction of agents sending Byzantine updates, and without relying on a central trusted server.
ByzPG
, the first method presented in our paper, serves as a warm-up for combining Byzantine filtering/aggregation in the context of variance-reduced policy gradient.
The DecByzPG
algorithm, our main contribution, is a decentralized variant of ByzPG
. It combines the aggregation methods from ByzPG
with a decentralized communication and agreement protocol that enables Byzantine fault-tolerance despite the lack of a central trusted entity. For details of our algorithm, we refer to the paper.
Our experiments aim to demonstrate
- speed-up when increasing the number of agents, and
- resilience against various Byzantine attacks.
We apply our algorithms to the CartPole-v1
and LunarLander-v2
environments from the OpenAI Gym toolkit.
The setup has been tested for Python 3.8.16
. To install the necessary dependencies, run
pip install -r requirements.txt
Then, running run_experiments.sh
reproduces all experiments contained in the paper with the same choice of seeds. As described in the paper, running a single experiment terminates within few hours on a consumer-grade CPU. Running all experiments sequentially however might take a long time. We therefore recommend running as many experiments as possible in parallel on a cluster.
Note that the results of the experiments will be stored in an experiments
directory. Each experiment will be associated with a unique ID obtained from the set of chosen hyperparameters. Results (in form of the history of obtained rewards during training) of an experiment with ID x
and seed i
can be found in experiments/x/seed_i/rew_hist.npy
, and the respective choice of parameters is contained in experiments/x/config.json
.
Below, K
denotes the number of agents in the system among which an 𝛼
-fraction is Byzantine.
Speed-up with increasing number of agents in both in both CartPole
and LunarLander
, as suggested by our theoretical sample complexity results.
Comparing DecByzPG
for CartPole
under above attacks to (a) PAGE-PG
, the SOTA (in terms of sample complexity) single-agent PG method that DecByzPG
reduces to for K=1
, and (b) Dec-PAGE-PG
, a naive decentralized (but not fault-tolerant) version of PAGE-PG where aggregation of gradients is done by averaging, and no agreement mechanism is used.
Equivalent to above, but for LunarLander
. In both environments, we can observe that DecByzPG
performs nearly on par with the unattacked Dec-PAGE-PG. In all cases DecByzPG
with K=13
and 𝛼 > 0
outperforms
PAGE-PG with K = 1
(and 𝛼 = 0
), meaning that despite the presence of Byzantines, joining the federation is empirical beneficial for faster convergence.
For plots of ByzPG
we refer to Appendix E of the paper.
To cite our work, please use the following BibTeX entry:
@article{jordan2024decentralized,
title={Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence},
author={Philip Jordan and Florian Grötschla and Flint Xiaofeng Fan and Roger Wattenhofer},
url={https://arxiv.org/abs/2401.03489},
publisher={arXiv},
year={2024},
}
The code contained in this repository includes parts that are built on top of the following implementations (and respective publications):
- [https://github.com/epfml/byzantine-robust-noniid-optimizer] Karimireddy, Sai Praneeth, Lie He, and Martin Jaggi. "Byzantine-robust learning on heterogeneous datasets via bucketing." arXiv preprint arXiv:2006.09365 (2020).
- [https://github.com/flint-xf-fan/Byzantine-Federeated-RL] Fan, Xiaofeng, et al. "Fault-tolerant federated reinforcement learning with theoretical guarantee." Advances in Neural Information Processing Systems 34 (2021): 1007-1021.
Some auxiliary methods are borrowed directly from existing code. Whenever this is the case, we have added a comment referencing its source.