Fairseq-TPU #1282

Eric-Wallace · 2019-10-22T05:22:22Z

(More of a discussion than an issue)

Hi, I am investigating training Fairseq models using TPUs. I have followed the tutorial here https://cloud.google.com/tpu/docs/tutorials/transformer-pytorch, which worked nicely for training an NMT model. The tutorial uses a developmental branch of fairseq with TPU support developed by @taylanbil
https://github.com/pytorch-tpu/fairseq.

Currently, I am wondering what the plan is for merging this developmental branch into fairseq master? In particular, my use case is training models based on RoBERTa on cloud TPUs. The TPU branch is decently out of date with fairseq master, so the recent features are not present. It is also unclear which parts of fairseq (e.g., models, losses, tasks, etc.) are supported under pytorch-tpu/fairseq.

The text was updated successfully, but these errors were encountered:

taylanbil · 2019-10-22T18:47:32Z

Thanks @Eric-Wallace ! Roberta is definitely in our plans. Unfortunately, when we created the fork, Roberta was not added to Fairseq just yet.

It is in our plans to be up to date with Fairseq's latest and greatest functionalities, however, it's also a balancing act between making new models work and making the existing models optimized in terms of performance. That said, we would definitely appreciate any help from the community since our bandwidth is limited :)

Eric-Wallace · 2019-10-22T21:43:31Z

Great, thanks! How difficult do you estimate it to be for me to just try to merge with upstream Fairseq and run RoBERTa on TPU? Would it require significant changes to the input padding and data loading?

taylanbil · 2019-10-22T22:00:36Z

Those changes (input padding etc) already exist in our tpu branch currently. There might be merge conflicts that need resolution. That can get pretty ugly, or maybe easy, it depends.

The real issue is, there was a time when I was developing against the master branch and there was a commit which caused pretty big performance regression for us. In order to continue with our experiments, we had to cut a branch before that commit and keep going. A lot has happened since then, so it may be the case that the underlying cause is fixed, I'm not sure what the issue was exactly. It may require some debugging.

Eric-Wallace · 2019-10-22T22:31:15Z

Cool, I will try to merge with upstream and see if it reasonable. What are the different branches, xla, tpu and tpu-r0.5? Which should I use?

taylanbil · 2019-10-22T23:51:11Z

I would use tpu. xla is deprecated, and tpu-r0.5 is not an active branch, it's the stable branch for the relase 0.5.

Big thanks for taking this on! Feel free to be in touch if you have questions / comments.

Eric-Wallace · 2019-10-24T18:09:41Z

Update, we got NMT running in the latest Fairseq master. Was basically resolving some simple merge conflicts. It is indeed about 25-50% slower than using the tpu-r0.5 branch, so the problem persists. I will continue looking into this.

taylanbil · 2019-10-24T18:26:18Z

Thanks Eric!

Eric-Wallace · 2019-10-26T08:11:36Z

@taylanbil do you have any idea about what date the commit happened that caused the slowdown?

taylanbil · 2019-11-19T23:51:49Z

fyi @Eric-Wallace : pytorch-tpu#19 enables roberta on the tpu branch.

Summary: Gate psutil import to make tests pass Pull Request resolved: fairinternal/fairseq-py#1282 Reviewed By: tangyuq Differential Revision: D23822037 Pulled By: myleott fbshipit-source-id: c652c7931147ecd377d78322840e343c55cb85a2

Eric-Wallace closed this as completed Oct 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fairseq-TPU #1282

Fairseq-TPU #1282

Eric-Wallace commented Oct 22, 2019 •

edited

Loading

taylanbil commented Oct 22, 2019

Eric-Wallace commented Oct 22, 2019

taylanbil commented Oct 22, 2019

Eric-Wallace commented Oct 22, 2019

taylanbil commented Oct 22, 2019 •

edited

Loading

Eric-Wallace commented Oct 24, 2019

taylanbil commented Oct 24, 2019

Eric-Wallace commented Oct 26, 2019

taylanbil commented Nov 19, 2019

Fairseq-TPU #1282

Fairseq-TPU #1282

Comments

Eric-Wallace commented Oct 22, 2019 • edited Loading

taylanbil commented Oct 22, 2019

Eric-Wallace commented Oct 22, 2019

taylanbil commented Oct 22, 2019

Eric-Wallace commented Oct 22, 2019

taylanbil commented Oct 22, 2019 • edited Loading

Eric-Wallace commented Oct 24, 2019

taylanbil commented Oct 24, 2019

Eric-Wallace commented Oct 26, 2019

taylanbil commented Nov 19, 2019

Eric-Wallace commented Oct 22, 2019 •

edited

Loading

taylanbil commented Oct 22, 2019 •

edited

Loading