-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] Embedding weight tying (#169) #172
Conversation
55b5ec2
to
77fa504
Compare
Codecov Report
@@ Coverage Diff @@
## main #172 +/- ##
==========================================
+ Coverage 90.56% 90.58% +0.01%
==========================================
Files 56 56
Lines 2829 2835 +6
==========================================
+ Hits 2562 2568 +6
Misses 267 267
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Looks good! Huge thanks, @blefaudeux. I can pull in the changes to my project using xformers and try it out. |
Looks like there's a tiny bit of performance improvement (on my silly CPU machine): ➜ python train.py tie
Epoch 1 step: 1 Loss: 9.666109 Took 5.800530 seconds. bsz (toks): 2438
Epoch 1 step: 2 Loss: 8.875856 Took 15.910649 seconds. bsz (toks): 3595
Epoch 1 step: 3 Loss: 7.488206 Took 15.342067 seconds. bsz (toks): 3866
^C
...
➜ python train.py
Epoch 1 step: 1 Loss: 9.688322 Took 5.884865 seconds. bsz (toks): 2438
Epoch 1 step: 2 Loss: 8.820903 Took 16.065957 seconds. bsz (toks): 3595
Epoch 1 step: 3 Loss: 7.255448 Took 15.518760 seconds. bsz (toks): 3866 I think this is generally the right direction. I can also take a look at comparisons of memory util between them, too. That said, printing out the number of trainable parameters shows that this seems to work well:
|
would you have a small enough task in mind ? It could be added to the examples and can be useful for sanity checking and perf regression catching. |
This example is somewhat involved (machine translation), but I could probably make something smaller. If that's of interest, I'm happy to try to contribute something! |
just if it's not too much work ! There are two examples here if that helps. Also, you'll really need a GPU at some point :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Option to pass in in/out projections
What does this PR do?
Tentative implementation of #169, fairly minor, with a matching unit test update.
cc @erip
See for a reference and more context
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.