Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tpu codepath work w/ hydra. #6

Merged
merged 2 commits into from
Feb 5, 2021
Merged

Make tpu codepath work w/ hydra. #6

merged 2 commits into from
Feb 5, 2021

Conversation

taylanbil
Copy link
Owner

  • Share and pass down model related args to rawaudiodataset correctly.
  • fp16 bug fix on non-xla devices (self._inftensor change)
  • use index_put to avoid dynamicity in model's fwd.
  • Get rid of some unnecessary warnings for tpus to clean up stderr.
  • Send logging outputs to cpu b4 logging to reduce atens.
  • Util function to move cpu tensors to tpu.
  • Use the util function to handle dummy batches to avoid crash at the
    end of epoch in distributed training.

With these changes, ran a w2v2 workload on small dataset, and it takes 150 secs to do 200 steps as before:

2020-12-16 01:35:18 | INFO | train_inner | epoch 005:    616 / 1046 loss=4.942, ntokens=7792, nsentences=32, prob_perplexity=348.304, code_perplexity=345.724, temp=15.621, loss_0=4.792, loss_1=0.133, loss_2=0.017, accuracy=0.22857, wps=53.1, ups=0.01, wpb=7792, bsz=32, num_updates=4800, lr=7.5e-05, gnorm=0.897, train_wall=45, gb_free=7.7, gb_total=16, wall=4661
2020-12-16 01:37:47 | INFO | train_inner | epoch 005:    816 / 1046 loss=5.243, ntokens=10480, nsentences=32, prob_perplexity=409.905, code_perplexity=408.027, temp=15.605, loss_0=5.092, loss_1=0.133, loss_2=0.018, accuracy=0.18139, wps=70.1, ups=0.01, wpb=10480, bsz=32, num_updates=5000, lr=7.8125e-05, gnorm=0.746, train_wall=47, gb_free=7.7, gb_total=16, wall=4810
2020-12-16 01:40:16 | INFO | train_inner | epoch 005:   1016 / 1046 loss=4.945, ntokens=8476, nsentences=32, prob_perplexity=406.209, code_perplexity=403.644, temp=15.589, loss_0=4.794, loss_1=0.133, loss_2=0.019, accuracy=0.21885, wps=57.1, ups=0.01, wpb=8476, bsz=32, num_updates=5200, lr=8.125e-05, gnorm=0.823, train_wall=46, gb_free=7.7, gb_total=16, wall=4959

* Share and pass down model related args to rawaudiodataset correctly.
* fp16 bug fix on non-xla devices (self._inftensor change)
* use index_put to avoid dynamicity in model's fwd.
* Get rid of some unnecessary warnings for tpus to clean up stderr.
* Send logging outputs to cpu b4 logging to reduce atens.
* Util function to move cpu tensors to tpu.
* Use the util function to handle dummy batches to avoid crash at the
end of epoch in distributed training.
@taylanbil taylanbil requested a review from bilgeacun December 16, 2020 19:23
@bilgeacun bilgeacun merged commit 146781c into w2v2_rebased Feb 5, 2021
bilgeacun added a commit that referenced this pull request Feb 6, 2021
input shape temp update

clean up

dataset updates

clean up

move tensor idx to matrix op inside apply_mask

use tensor operators to replace tensor indexing, passed consistency test verification

Minor improvements

Fix bucketpadlendataset

Moved mask matrices creation to dataset prep.

Remove dynamism, apply mask correctly, add some guardrails, some cleanups.

Send device data to cpu b4 logging.

Fix data bucketing for RawAudioDataset, refactor bucketing functions, fix filling w/ -inf in wav2vec2, minor cleanups

Sample size computeation during data prep to reduce atens, dont call item in log_scalar, minor cleanups

Remove extra validation atens, clean up marking step and sending to cpu.

Correct loss computation for w2v2 criterion + refactor index_put

Fix bug in index_put + fix integer division

Dont call float on extra logs, clean up comment.

Correct accuracy computation, refactor xla tensor check.

Adjust loss computation so it works w/ binary cross entropy.

Remove sending log outputs back to cpu after allreduce.

Dont sample padded states when sampling negatives + correct mi in loss computation.

Fixing config issues after rebase

Fix bug in negatives from everywhere

Fixing config issue for TPU after rebase

Taylans changes on top of rebase

Use float on cpu if fp16 when filling w/ -inf in w2v2 (#5)

* Use float on cpu if fp16 when filling w/ -inf in w2v2

* xla -> self.xla

* make logging_output_can_be_summed a regular method instead of staticmethod.

Make tpu codepath work w/ hydra. (#6)

* Make tpu codepath work w/ hydra.

* Share and pass down model related args to rawaudiodataset correctly.
* fp16 bug fix on non-xla devices (self._inftensor change)
* use index_put to avoid dynamicity in model's fwd.
* Get rid of some unnecessary warnings for tpus to clean up stderr.
* Send logging outputs to cpu b4 logging to reduce atens.
* Util function to move cpu tensors to tpu.
* Use the util function to handle dummy batches to avoid crash at the
end of epoch in distributed training.

* fixing configs for precompute mask indices

Co-authored-by: Bilge Acun <acun@fb.com>
bilgeacun added a commit that referenced this pull request Feb 11, 2021
input shape temp update

clean up

dataset updates

clean up

move tensor idx to matrix op inside apply_mask

use tensor operators to replace tensor indexing, passed consistency test verification

Minor improvements

Fix bucketpadlendataset

Moved mask matrices creation to dataset prep.

Remove dynamism, apply mask correctly, add some guardrails, some cleanups.

Send device data to cpu b4 logging.

Fix data bucketing for RawAudioDataset, refactor bucketing functions, fix filling w/ -inf in wav2vec2, minor cleanups

Sample size computeation during data prep to reduce atens, dont call item in log_scalar, minor cleanups

Remove extra validation atens, clean up marking step and sending to cpu.

Correct loss computation for w2v2 criterion + refactor index_put

Fix bug in index_put + fix integer division

Dont call float on extra logs, clean up comment.

Correct accuracy computation, refactor xla tensor check.

Adjust loss computation so it works w/ binary cross entropy.

Remove sending log outputs back to cpu after allreduce.

Dont sample padded states when sampling negatives + correct mi in loss computation.

Fixing config issues after rebase

Fix bug in negatives from everywhere

Fixing config issue for TPU after rebase

Taylans changes on top of rebase

Use float on cpu if fp16 when filling w/ -inf in w2v2 (#5)

* Use float on cpu if fp16 when filling w/ -inf in w2v2

* xla -> self.xla

* make logging_output_can_be_summed a regular method instead of staticmethod.

Make tpu codepath work w/ hydra. (#6)

* Make tpu codepath work w/ hydra.

* Share and pass down model related args to rawaudiodataset correctly.
* fp16 bug fix on non-xla devices (self._inftensor change)
* use index_put to avoid dynamicity in model's fwd.
* Get rid of some unnecessary warnings for tpus to clean up stderr.
* Send logging outputs to cpu b4 logging to reduce atens.
* Util function to move cpu tensors to tpu.
* Use the util function to handle dummy batches to avoid crash at the
end of epoch in distributed training.

* fixing configs for precompute mask indices

Co-authored-by: Bilge Acun <acun@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants