Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IA3 adaptors #403

Merged
merged 16 commits into from
Sep 19, 2022
Merged

IA3 adaptors #403

merged 16 commits into from
Sep 19, 2022

Conversation

IanMagnusson
Copy link
Collaborator

@IanMagnusson IanMagnusson commented Sep 14, 2022

Changes proposed in this pull request:

  • Adds a function to modify a Hugging Face transformer with IA3 adaptors

Results on piqa

A related PR in catwalk implements an example of how these adaptors can be trained. While hardly impressive results, the IA3 implementation manages to reduce validation loss and recover much of the accuracy of the fully tuned equivalent for all the architectures for which default configurations are provided. The gpt-j-6b full tune is not able to run on a single gpu while the IA3 training is able to fit due to having far fewer optimizer states for its fewer trainable parameters.

Screen Shot 2022-09-13 at 6 57 54 PM

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

@IanMagnusson IanMagnusson marked this pull request as ready for review September 14, 2022 02:07
Copy link
Contributor

@AkshitaB AkshitaB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments.

@IanMagnusson IanMagnusson enabled auto-merge (squash) September 15, 2022 16:05
Comment on lines +206 to +207
:param config:
A :class:`~tango.integrations.transformers.ia3.WithIA3Config` that specifies the layers to modify.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance we could automatically detect the right config, at least in some cases?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we could make it look up the known configs in MODEL_NAME_TO_CONFIG in tango/integrations/transformers/ia3.py.

But if you mean trying to figure out a config from scratch just based on looking at the model architecture, that might be pretty difficult. Just from the few that we support right now they all have very different names for the layers we need. And there's enough variation in how the nodes can be nested that we can't find the layers we need by just looking for a Linear layer with a certain position in the model graph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean, can we just find the name of the model given the model, and do the lookup that way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked: We can look up transformer_model.config.name_or_path. If it matches anything in the dictionary of configs, we can use it automatically.

Copy link
Collaborator Author

@IanMagnusson IanMagnusson Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup working on a commit to do that! c1e27f1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay it's working now and I also updated the Catwalk end to use this: allenai/catwalk@9800e12


input_seq = tokenizer(["A tiny test on a tiny model."], return_tensors="pt")

model = AutoModelForCausalLM.from_pretrained(model_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this test works, since you're not setting the models into eval() mode.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops good catch!

IanMagnusson and others added 2 commits September 16, 2022 18:14
Co-authored-by: Dirk Groeneveld <dirkg@allenai.org>
@IanMagnusson IanMagnusson merged commit 7382019 into main Sep 19, 2022
@IanMagnusson IanMagnusson deleted the ia3-adaptors branch September 19, 2022 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants