Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This draft code adds two arguments to
catwalk.train
. 1) the--use_fairscale
flag uses a FairScaleTrainingEngine instead of a normal TorchTrainingEngine. 2) the--modules_to_wrap" argument takes list of regex to for module names to wrap using Tango's
with_wrapped_modules`. Neither of these currently work fully.Problems
Without wrapping modules
By itself
--use_fairscale
will train and reproduce validation metrics as follows.However it must run with amp/mixed_precision disabled to avoid NaNs in training. As such there is no memory footprint savings and the compute speed is lower than just not using fairscale. For instance with
gpt2
onpiqa
with 2 gpus and batch size 16, memory is at 31150MiB / 40536MiB without fairscale and at 30952MiB / 40536MiB with--use_fairscale
.With module wrapping
The modules to wrap can be specified like this:
python -m catwalk.train --model rc::gpt2 --task piqa --device_count 2 --use_fairscale --modules_to_wrap inner_module\\.transformer\\.h\\.\[0-9\]+
. However insidecatwalk_model.predict()
intraining_callback.py
the following error occurs:The issue as I understand it is that our custom
.predict
code does not support the distributed communication necessary. But my understanding of the problem is limited.