-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IA3 adaptors #403
IA3 adaptors #403
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only minor comments.
:param config: | ||
A :class:`~tango.integrations.transformers.ia3.WithIA3Config` that specifies the layers to modify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any chance we could automatically detect the right config, at least in some cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we could make it look up the known configs in MODEL_NAME_TO_CONFIG
in tango/integrations/transformers/ia3.py.
But if you mean trying to figure out a config from scratch just based on looking at the model architecture, that might be pretty difficult. Just from the few that we support right now they all have very different names for the layers we need. And there's enough variation in how the nodes can be nested that we can't find the layers we need by just looking for a Linear layer with a certain position in the model graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean, can we just find the name of the model given the model, and do the lookup that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked: We can look up transformer_model.config.name_or_path
. If it matches anything in the dictionary of configs, we can use it automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup working on a commit to do that! c1e27f1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay it's working now and I also updated the Catwalk end to use this: allenai/catwalk@9800e12
|
||
input_seq = tokenizer(["A tiny test on a tiny model."], return_tensors="pt") | ||
|
||
model = AutoModelForCausalLM.from_pretrained(model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this test works, since you're not setting the models into eval()
mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops good catch!
Co-authored-by: Dirk Groeneveld <dirkg@allenai.org>
Changes proposed in this pull request:
Results on piqa
A related PR in catwalk implements an example of how these adaptors can be trained. While hardly impressive results, the IA3 implementation manages to reduce validation loss and recover much of the accuracy of the fully tuned equivalent for all the architectures for which default configurations are provided. The
gpt-j-6b
full tune is not able to run on a single gpu while the IA3 training is able to fit due to having far fewer optimizer states for its fewer trainable parameters.Before submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting