Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4D mask documentation updates #28151

Closed
wants to merge 4 commits into from
Closed

Conversation

poedator
Copy link
Contributor

@poedator poedator commented Dec 19, 2023

following #27539 this PR adds updates to transformers documentation to reflect possibility of utilizing 4D masks.

Plan:

  • add updates for Llama model docstring(s)
  • identify other models that can use 4D masks in present form (which requires ability to accept custom position_ids argument) and updating their docstrings. Classes that need updates:
    • Falcon Model
    • [TODO identify more]
  • update code comments that may need corrections, like cases where the mask may be either 2D or 4D now. one example is based on this comment by @shentianxiao

Update 20.12.2023:
to find out which models require docstring changes, I scanned all model classes in transformers insing inspect.

  • excluded tf and jax classes
  • excluded models without position_ids argument in .forward() - can't use 4D mask effectively
  • excluded models that do not use _prepare_4d_attention_mask method - need different code change to use 4D mask
  • excluded multi-modal models (clip, clvp, vit, bark, git)

what is left is LlamaModel, FalconModel and XGLMModel.

cc @ArthurZucker

@poedator poedator marked this pull request as draft December 19, 2023 21:46
@ArthurZucker
Copy link
Collaborator

Feel free to ping me for a review whenever this is ready 🤗

@poedator
Copy link
Contributor Author

Feel free to ping me for a review whenever this is ready 🤗

@ArthurZucker , I only identified 3 applicable model classes and made changes. Please check my logic in classes selection in my big first message above.

@poedator poedator marked this pull request as ready for review December 20, 2023 13:14
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#28132 might be relevant to you! I'll make sure 4d can still be used but will make things a lot easier I think.
LGTM otherwise!

@huggingface huggingface deleted a comment from github-actions bot Jan 19, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Feb 21, 2024
@alex-hh
Copy link

alex-hh commented Aug 29, 2024

Has something like this or equivalent been merged? I don't see much documentation for 4D masks but would find useful!

@ArthurZucker
Copy link
Collaborator

Yep sorry, it's supported for some models.

A 2D attention mask of shape `(batch_size, key_value_length)` or a 4D attention mask of shape `(batch_size, 1, query_length, key_value_length)`.
doc is a bit scarce, feel free to open a pr to add this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants