Merging optimizer states from different pipeline parallel size to resume training #38
Labels
enhancement
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
Suppose you start training with a pipeline parallel size of 4. We need to make it supports resuming training with a different pipeline parallel size, like 2, by merging optimizer states.
The text was updated successfully, but these errors were encountered: