-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[models] Vit: fix intermediate size scale and unify TF to PT #1063
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1063 +/- ##
==========================================
+ Coverage 95.16% 95.17% +0.01%
==========================================
Files 141 141
Lines 5827 5821 -6
==========================================
- Hits 5545 5540 -5
+ Misses 282 281 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Felix 👏
one comment related to VIT PRs, and another on a docstring typo!
Regarding TF, from the graph, it looks like the patch embedding is not efficient memory wise (as it's the only structural diff)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @felixdittrich92 ! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Felix 🙏
This PR:
Any feedback is welcome 🤗
PT: (@frgfm thanks for torch-scan 👍 )
TF:
As you can see the models are similar (only PatchEmbed is different PT: -> linear proj / TF Conv2D proj)
PT model compared with timm's implementation our: ~6,5 GB VRAM timm's: ~7GB VRAM
TF model: ~15GB VRAM @frgfm do you know any reason why ? 😅
Additional timm's implementation:
with this PR: (TF is mostly identical)