-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fairseq-TPU #1282
Comments
Thanks @Eric-Wallace ! Roberta is definitely in our plans. Unfortunately, when we created the fork, Roberta was not added to Fairseq just yet. It is in our plans to be up to date with Fairseq's latest and greatest functionalities, however, it's also a balancing act between making new models work and making the existing models optimized in terms of performance. That said, we would definitely appreciate any help from the community since our bandwidth is limited :) |
Great, thanks! How difficult do you estimate it to be for me to just try to merge with upstream Fairseq and run RoBERTa on TPU? Would it require significant changes to the input padding and data loading? |
Those changes (input padding etc) already exist in our tpu branch currently. There might be merge conflicts that need resolution. That can get pretty ugly, or maybe easy, it depends. The real issue is, there was a time when I was developing against the master branch and there was a commit which caused pretty big performance regression for us. In order to continue with our experiments, we had to cut a branch before that commit and keep going. A lot has happened since then, so it may be the case that the underlying cause is fixed, I'm not sure what the issue was exactly. It may require some debugging. |
Cool, I will try to merge with upstream and see if it reasonable. What are the different branches, |
I would use Big thanks for taking this on! Feel free to be in touch if you have questions / comments. |
Update, we got NMT running in the latest Fairseq master. Was basically resolving some simple merge conflicts. It is indeed about 25-50% slower than using the tpu-r0.5 branch, so the problem persists. I will continue looking into this. |
Thanks Eric! |
@taylanbil do you have any idea about what date the commit happened that caused the slowdown? |
fyi @Eric-Wallace : pytorch-tpu#19 enables roberta on the tpu branch. |
Summary: Gate psutil import to make tests pass Pull Request resolved: fairinternal/fairseq-py#1282 Reviewed By: tangyuq Differential Revision: D23822037 Pulled By: myleott fbshipit-source-id: c652c7931147ecd377d78322840e343c55cb85a2
Summary: Gate psutil import to make tests pass Pull Request resolved: fairinternal/fairseq-py#1282 Reviewed By: tangyuq Differential Revision: D23822037 Pulled By: myleott fbshipit-source-id: c652c7931147ecd377d78322840e343c55cb85a2
(More of a discussion than an issue)
Hi, I am investigating training Fairseq models using TPUs. I have followed the tutorial here https://cloud.google.com/tpu/docs/tutorials/transformer-pytorch, which worked nicely for training an NMT model. The tutorial uses a developmental branch of fairseq with TPU support developed by @taylanbil
https://github.com/pytorch-tpu/fairseq.
Currently, I am wondering what the plan is for merging this developmental branch into fairseq master? In particular, my use case is training models based on RoBERTa on cloud TPUs. The TPU branch is decently out of date with fairseq master, so the recent features are not present. It is also unclear which parts of fairseq (e.g., models, losses, tasks, etc.) are supported under
pytorch-tpu/fairseq
.The text was updated successfully, but these errors were encountered: