-
This repository showcases my first published paper titled Adaptive Worker Grouping for Communication-Efficient and Straggler-Tolerant Distributed SGD.
-
The paper was presented at the IEEE International Symposium on Information Theory (ISIT) in 2022.
-
About the paper
- We propose a novel algorithm, G-CADA, designed to enhance both time and communication efficiencies in distributed learning systems.
- Building upon the widely recognized CADA algorithm, which aggregates gradients from workers lazily to improve communication efficiency, G-CADA introduces adaptive worker grouping. Specifically:
- Workers are partitioned into groups dynamically at each iteration, and each group is assigned the same dataset shard.
- By utilizing additional storage at the workers, the server interacts with only the fastest worker in each group, significantly improving robustness against stragglers.
- Numerical simulations demonstrate substantial gains in both time and communication efficiency.
-
Group formation:
- A total of
$M$ workers are divided into$G$ groups, with each group containing an equal number of workers. - The dataset is split into
$G$ shards, and all workers within a group access the same shard.
- A total of
-
Age parameter:
- The server maintains a parameter,
$τ_g^k$ for each group, representing the “age” (i.e., the number of rounds the group has not communicated with the server) at iteration$k$ .
- The server maintains a parameter,
-
Group selection:
-
Age update:
- Groups that communicate with the server reset their age parameter.
An example visualization of the algorithm:
The pseudocode below provides a step-by-step representation of the G-CADA algorithm:
- Numerical examples:
- G-CADA demonstrates clear advantages over benchmark algorithms in terms of time efficiency, communication cost, and computational cost.
For more detailed information, please refer to our full paper: IEEE Xplore or arXiv.
- The implementation in
linear_regression_ps.py
simulates a linear regression model using the MNIST dataset and a quadratic error loss function. - Benchmark comparisons:
- G-CADA is compared against state-of-the-art algorithms, including distributed SGD, CADA, and distributed Adam.
- Results consistently show that G-CADA outperforms these methods in both time and communication efficiencies.