3D-Shampoo optimizer

3D-Shampoo is an distributed preconditioning based optimizer to be used with the DeepSpeed library. Depending on the level of data parallelism of DeepSpeed, it automatically distributes the number of preconditioning matrices across all available workers.

3D-Shampoo is a modified version of Google-Research's Shampoo.

This code was created as part of my Master thesis "Distribtued Gradient Preconditioning for Training Large-Scale Models".

For more informations about 3D-Shampoo check out my Master thesis which is publicly available at the ETH Research Collection.

The pseudocode of 3D-Shampoo is shown below

3D-Shampoo will distributed the preconditioning matrices accordingly on the level of parallelism of DeepSpeed is active

NOTE: that ZeRO optimization is not supported due to storing the preconditioning matrices. Future update will maybe support this. If there are more layers to precondition than number of available GPUs, the layers will be distributed accordingly to an own defined expected cost function to balance to workload as good as possible across all the available GPUs. If there are more GPUs than layers, #GPU - #layers will idle during preconditioning.

How to install and use

Atm, you don't have to install it, you only need to link the folders to your python script. You can use 3D-Shampoo like every other type of PyTorch based optimizers. 3D-Shampoo will work if initialized with DeepSpeed, otherwise it is just basic Shampoo from Google-Research.

# loading libraries
import torch
import torch.distributed as dist
import deepspeed
...

# loading 3d-shampoo optimizer
import sys
sys.path.append('../3d-shampoo/src/')
import shampoo_3d

# initialize torch.distributed, define model, load datasets, etc.
...

optimizer = shampoo_3d.Shampoo_3D(params=model.parameters(),
                                  world_rank=world_rank,
                                  world_size=world_size,
                                  topology=model.topology(), 
                                  shapes=[tuple(p.shape) for p in model.parameters() if p.requires_grad], 
                                  lr=1e-1, 
                                  momentum=0.9, 
                                  hyperparams=shampoo_3d.ShampooHyperParams(ignore_embedding_layer=True))
							
model_engine, optimizer, _, _ = deepspeed.initialize(args=cmd_args,
                                                     model=model,
                                                     optimizer=optimizer
                                                     )
														
# train your model
...

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
examples		examples
figures		figures
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D-Shampoo optimizer

How to install and use

About

Releases

Packages

Languages

noabauma/3d-shampoo

Folders and files

Latest commit

History

Repository files navigation

3D-Shampoo optimizer

How to install and use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages