NG+ is a multi-step matrix-product natural gradient method for deep learning.
This is an example of training ResNet-50 V1.5 on the ImageNet1K (ILSVRC2012) dataset. NG+ can finish the training within 40 epochs to top-1 accuracy of 75.9% using 16 Tesla V100 GPUs with batch size 4,096.
The overall network architecture of ResNet-50 is shown below: link
python 3.8, pytorch (>= 1.5.0), tensorboardX, NVIDIA DALI, CUDA, CUDNN, and NCCL.
For batch size 4096 (256 x 16), run the following code:
python -m torch.distributed.launch --master_port 12226 --nproc_per_node=16 main.py --fp16 --batch_size 256 --lr-decay-rate 0.75 --damping 0.35 --lr_init 3.8 --method 'poly' --epoch_end 60 --lr_exponent 6 --warmup_epoch 5 --curvature_momentum 0.9 --datadir /mnt/ILSVRC2012 --logdir your_log_file --decay_epochs 37 --inv-update-freq 1000 --cov-update-freq 1000
- NGPlus.py: Our NG+ optimizer.
- dali_pipe.py: A wrapper over DALI.
- data_manager.py: Utilities about loading datasets.
- logging_utils.py: Utilities about logging.
- main.py: The main script for training.
- nvidia_dali_utils2.py: Utilities about DALI.
- resnet_ngplus.py: The definition of resnet50 model.
- utils.py: Miscellaneous utilities.
We hope that the package is useful for your application. If you have any bug reports or comments, please feel free to email one of the toolbox authors:
- Minghan Yang, yangminghan at pku.edu.cn
- Dong Xu, taroxd at pku.edu.cn
- Zaiwen Wen, wenzw at pku.edu.cn
Minghan Yang, Dong Xu, Qiwen Cui, Zaiwen Wen, Pengxiang Xu, An Efficient Fisher Matrix Approximation Method for Large-Scale Neural Network Optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.