Here I house mxnet code for the original signSGD paper (ICML-18). I've put the code here to facilitate reproducing the results in the paper, and this code isn't intended for development purposes. In particular, this implementation does not gain any speedups from compression. Some links:
- arxiv version of the paper.
- more information about the paper on my personal website.
- my coauthors: Yu-Xiang Wang, Kamyar Azizzadenesheli, Anima Anandkumar.
[Update Jan 2021] As noted in this issue, this codebase used an implementation of the sign function that maps sign(0) --> 0
. A test in this notebook suggests there may be little difference to an implementation that maps sign(0) --> ±1
at random. In the codebase for the ICLR 2019 paper, we used an implementation that maps sign(0) --> +1
deterministically.
General instructions:
- Signum is implemented as an official optimiser in mxnet, so to use Signum in this codebase, we pass in the string 'signum' as a command line argument.
- if you do not use our suggested hyperparameters, be careful to tune them yourself.
- Signum hyperparameters are typically similar to Adam hyperparameters, not SGD!
There are four folders:
- cifar/ -- code to train resnet-20 on Cifar-10.
- gradient_expts/ -- code to compute gradient statistics as in Figure 1 and 2. Includes Welford algorithm.
- imagenet/ -- code to train resnet-50 on Imagenet. Implementation inspired by that of Wei Wu.
- toy_problem/ -- simple example where signSGD is more robust than SGD.
More info to be found within each folder.
Any questions / comments? Don't hesitate to get in touch: bernstein@caltech.edu.