This repository contains the code to reproduce the results for the paper MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence.
We provide code to reproduce the following experiments:
- BERT-Base/Large and OPT-1.3B on GLUE/MNLI using HuggingFace repository
- Llama-2 7B on GSM8k using
llm-foundry
from MosaicML
MicroAdam
optimizer is implemented in the ISTA-DASLab-Optimizers
repository, along with other optimizers. It is installable via pip install ista-daslab-optimizers
(automatically done in the install.sh
script). Follow the steps below to setup the environment for MicroAdam:
cd ~
git clone git@github.com:IST-DASLab/MicroAdam.git
cd ~/MicroAdam
source install.sh
We provide the scripts run_hf_glue_mnli_OPTIM.sh
, where OPTIM
is the optimizer name, as follows:
microadam
, adamw
, galore
, came
, adamw8b
.
cd ~/MicroAdam/huggingface_glue_mnli
# bash run_hf_glue_mnli_adamw.sh
# bash run_hf_glue_mnli_adamw8b.sh
# bash run_hf_glue_mnli_came.sh
# bash run_hf_glue_mnli_galore.sh
bash run_hf_glue_mnli_microadam.sh
We can run the experiments using the following commands:
cd ~/MicroAdam/llm-foundry/scripts/train
bash run_llama2-7b_gsm8k_microadam.sh
python3 train.py yamls/finetune/llama2-7b_microadam_gsm8k.yaml \
task=gsm8k \
optimizer.name=adamw8b \
optimizer.defaults.lr=5e-5 \
save_folder=./llama2_7b_gsm8k_adamw8b \
seed=42
python3 train.py yamls/finetune/llama2-7b_microadam_gsm8k.yaml \
task=gsm8k \
optimizer.name=decoupled_adamw \
optimizer.defaults.lr=5e-5 \
save_folder=./llama2_7b_gsm8k_decoupled_adamw \
seed=42
- method
build_optimizer
- changes in
llm-foundry/scripts/train/train.py:
- set
run_name
andsave_folder
depending on wandb group, job_type and name - added evaluation and time elapsed to be logged to wandb
- added wandb_groups_config to finetuning yaml
- set
- changes in finetuning yaml file:
- added
task
variable - added
wandb_groups
section
- added
If you find our work useful, please consider citing:
@misc{modoranu2024microadam,
title={MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence},
author={Ionut-Vlad Modoranu and Mher Safaryan and Grigory Malinovsky and Eldar Kurtic and Thomas Robert and Peter Richtarik and Dan Alistarh},
year={2024},
eprint={2405.15593},
archivePrefix={arXiv},
primaryClass={cs.LG}
}