Make tpu codepath work w/ hydra. #6

taylanbil · 2020-12-16T19:23:01Z

Share and pass down model related args to rawaudiodataset correctly.
fp16 bug fix on non-xla devices (self._inftensor change)
use index_put to avoid dynamicity in model's fwd.
Get rid of some unnecessary warnings for tpus to clean up stderr.
Send logging outputs to cpu b4 logging to reduce atens.
Util function to move cpu tensors to tpu.
Use the util function to handle dummy batches to avoid crash at the
end of epoch in distributed training.

With these changes, ran a w2v2 workload on small dataset, and it takes 150 secs to do 200 steps as before:

2020-12-16 01:35:18 | INFO | train_inner | epoch 005:    616 / 1046 loss=4.942, ntokens=7792, nsentences=32, prob_perplexity=348.304, code_perplexity=345.724, temp=15.621, loss_0=4.792, loss_1=0.133, loss_2=0.017, accuracy=0.22857, wps=53.1, ups=0.01, wpb=7792, bsz=32, num_updates=4800, lr=7.5e-05, gnorm=0.897, train_wall=45, gb_free=7.7, gb_total=16, wall=4661
2020-12-16 01:37:47 | INFO | train_inner | epoch 005:    816 / 1046 loss=5.243, ntokens=10480, nsentences=32, prob_perplexity=409.905, code_perplexity=408.027, temp=15.605, loss_0=5.092, loss_1=0.133, loss_2=0.018, accuracy=0.18139, wps=70.1, ups=0.01, wpb=10480, bsz=32, num_updates=5000, lr=7.8125e-05, gnorm=0.746, train_wall=47, gb_free=7.7, gb_total=16, wall=4810
2020-12-16 01:40:16 | INFO | train_inner | epoch 005:   1016 / 1046 loss=4.945, ntokens=8476, nsentences=32, prob_perplexity=406.209, code_perplexity=403.644, temp=15.589, loss_0=4.794, loss_1=0.133, loss_2=0.019, accuracy=0.21885, wps=57.1, ups=0.01, wpb=8476, bsz=32, num_updates=5200, lr=8.125e-05, gnorm=0.823, train_wall=46, gb_free=7.7, gb_total=16, wall=4959

* Share and pass down model related args to rawaudiodataset correctly. * fp16 bug fix on non-xla devices (self._inftensor change) * use index_put to avoid dynamicity in model's fwd. * Get rid of some unnecessary warnings for tpus to clean up stderr. * Send logging outputs to cpu b4 logging to reduce atens. * Util function to move cpu tensors to tpu. * Use the util function to handle dummy batches to avoid crash at the end of epoch in distributed training.

input shape temp update clean up dataset updates clean up move tensor idx to matrix op inside apply_mask use tensor operators to replace tensor indexing, passed consistency test verification Minor improvements Fix bucketpadlendataset Moved mask matrices creation to dataset prep. Remove dynamism, apply mask correctly, add some guardrails, some cleanups. Send device data to cpu b4 logging. Fix data bucketing for RawAudioDataset, refactor bucketing functions, fix filling w/ -inf in wav2vec2, minor cleanups Sample size computeation during data prep to reduce atens, dont call item in log_scalar, minor cleanups Remove extra validation atens, clean up marking step and sending to cpu. Correct loss computation for w2v2 criterion + refactor index_put Fix bug in index_put + fix integer division Dont call float on extra logs, clean up comment. Correct accuracy computation, refactor xla tensor check. Adjust loss computation so it works w/ binary cross entropy. Remove sending log outputs back to cpu after allreduce. Dont sample padded states when sampling negatives + correct mi in loss computation. Fixing config issues after rebase Fix bug in negatives from everywhere Fixing config issue for TPU after rebase Taylans changes on top of rebase Use float on cpu if fp16 when filling w/ -inf in w2v2 (#5) * Use float on cpu if fp16 when filling w/ -inf in w2v2 * xla -> self.xla * make logging_output_can_be_summed a regular method instead of staticmethod. Make tpu codepath work w/ hydra. (#6) * Make tpu codepath work w/ hydra. * Share and pass down model related args to rawaudiodataset correctly. * fp16 bug fix on non-xla devices (self._inftensor change) * use index_put to avoid dynamicity in model's fwd. * Get rid of some unnecessary warnings for tpus to clean up stderr. * Send logging outputs to cpu b4 logging to reduce atens. * Util function to move cpu tensors to tpu. * Use the util function to handle dummy batches to avoid crash at the end of epoch in distributed training. * fixing configs for precompute mask indices Co-authored-by: Bilge Acun <acun@fb.com>

taylanbil requested a review from bilgeacun December 16, 2020 19:23

fixing configs for precompute mask indices

d74b45e

bilgeacun merged commit 146781c into w2v2_rebased Feb 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tpu codepath work w/ hydra. #6

Make tpu codepath work w/ hydra. #6

taylanbil commented Dec 16, 2020

Make tpu codepath work w/ hydra. #6

Make tpu codepath work w/ hydra. #6

Conversation

taylanbil commented Dec 16, 2020