-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding nnx Gemma2-2b (including overall fixes) to examples/gemma #4587
base: main
Are you sure you want to change the base?
Conversation
Hey Martin! Thanks for doing this. |
Ahhh - Brings back memories of PRs for TensorFlow. Good times! /s I'll attempt fix the white-space check failures, when you let me know that it isn't a waste of my time. |
@@ -128,30 +128,31 @@ def gemma_7b(cls): | |||
) | |||
|
|||
@classmethod | |||
def gemma_27b(cls): | |||
num_layers = 46 | |||
def gemma2_2b(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we add a new method for gemma_27b
configuration instead of removing the gemma2_2b
one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are both in the file - it's just the git that didn't pick up the diff properly.
But also, it makes sense to rename the different 'generations' of gemma, gemma2, gemma3, etc as separate classes, rather than relying on implicit knowledge about which size came from where.
Moreover, the gemma2_27 didn't have the right normalisation in the attention - will need a separate fix.
I also see that the Google-generated PR adopted the same fix as mine to the Block module. Good for you!
Thanks @mdda for doing this, it took a while because there where other changes to the model in the background that were pending. I think this is great. Can you please add a test to check that the GQA configuration works and matches the base version? Also now need to solve the conflicts, sorry about this, this code is being used by a couple of users internally. |
Surprised that the code was being used internally prior to my PR, since the Block module was entirely borked. |
@mdda please take a look at CI, you probably need to run: pip install pre-commit
pre-commit run --all-files |
Adds Gemma2-2b (including GQA to Attention and fixes to Block)
Fixes #4567
It includes:
TransformerConfig
for gemma2-2b modelgemma
(flax linen) modelnnx
documentation work (did not work before)