Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DETR: changing position_embedding and key_value_position_embedding args #23091

Closed
wants to merge 607 commits into from
Closed

DETR: changing position_embedding and key_value_position_embedding args #23091

wants to merge 607 commits into from

Conversation

Lorenzobattistela
Copy link
Contributor

What does this PR do?

This PR refers to #19833 , and it just update some variables/docstrings names. Quoting the Issue, the paper mentions that the position_embeddings argument of the cross-attention layer are these input embeddings called object queries. And the key_value_position_embeddings is refered to as spatial_position_embeddings.

This PR is limited to DETR model.

Notes

This is my first contribution, so I'm happy to adjust anything in this PR. I ran all tests and style, and it went all, except for one:
make fixup. I got the following output:

image

Reading the output, I assume it is about other file using classes in modeling_detr. I'll wait for updates. I will also wait for review for doc updating or more guidance.

Fixes # (issue)

Before submitting

Who can review?

@NielsRogge
@amyeroberts

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@A-guridi
Copy link

Hi,

I dont know if this issue is still up.

I believe you need to change the names of the files mentioned in the fixup too. Since in the paper of Conditional DETR, they also use the same nomenclature (sometimes object queries are also called content queries though) .

For example in modeling_conditional_detr.py the names of the forward function are still position_embeddings, so you would need to change that to object queries for consistency.

Same applies to the other file mentioned in the fixup too. I am also new to fixing PRs in this repo, so I would leave this decision to the reviewers, but I believe it makes sense if you would like to apply the changes @Lorenzobattistela . If not, maybe another issue could be created for that.

@Lorenzobattistela
Copy link
Contributor Author

@A-guridi Hey, I understand these names could be changed to keep consistency, and I am up to do this. But I don't know if this is the right to do since the issue is specific about DETR. But I'll try what you said, let's wait up the reviewers

@amyeroberts
Copy link
Collaborator

@NielsRogge Could you review and confirm if it aligns with your suggestion in #19833?

key_value_states=encoder_hidden_states,
attention_mask=encoder_attention_mask,
key_value_position_embeddings=position_embeddings,
spatial_position_embeddings=object_queries,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is confusing, we'd like to make the distinction between object queries (which are added to the inputs of each attention layer of the decoder) and spatial position embeddings (which are added to the keys and values of each cross-attention layer of the decoder)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it seems a little confusing, since I was assigning object_queries to a variable different than object_queries arg. I think I confused thing. It should be:

self.encoder_attn(
                hidden_states=hidden_states,
                object_queries=object_queries,  
                key_value_states=encoder_hidden_states,
                attention_mask=encoder_attention_mask,
                spatial_position_embeddings=query_position_embeddings,
                output_attentions=output_attentions,
            )

Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, left some comments.

Specifically, DETR's decoder uses 2 types of position embeddings:

  • the ones that are added to the inputs i.e. hidden states of each cross-attention layer (the object_queries)
  • the ones that are added to the keys and values of each cross-attention layer (the spatial_position_embeddings)

@Lorenzobattistela
Copy link
Contributor Author

Thanks for working on this, left some comments.

Specifically, DETR's decoder uses 2 types of position embeddings:

  • the ones that are added to the inputs i.e. hidden states of each cross-attention layer (the object_queries)
  • the ones that are added to the keys and values of each cross-attention layer (the spatial_position_embeddings)

working on it

pacman100 and others added 19 commits July 4, 2023 13:42
* fix mixed precision prep during eval only mode

* update to address comments

* update to reflect the changes in accelerate
* Min accelerate

* Also min version

* Min accelerate

* Also min version

* To different minor version

* Empty
* Add AzureOpenAiAgent

* quality

* Update src/transformers/tools/agents.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
…24103)

* v1

* some refactor

- add ST format as well

* fix

* add `ADAPTER_WEIGHTS_NAME` & `ADAPTER_SAFE_WEIGHTS_NAME`
* fix get_keys_to_not_convert funct

* Fix style
…all child classes of `GPT2PreTrainedModel` (#24113)

* add correct keys on `_keys_to_ignore_on_load_unexpected`

* oops
* add trust_remote_code option

* require_torch
* Fix typo in Llama docstrings

Signed-off-by: Serge Panev <spanev@nvidia.com>

* Update

Signed-off-by: Serge Panev <spanev@nvidia.com>

* make style

Signed-off-by: Serge Panev <spanev@nvidia.com>

---------

Signed-off-by: Serge Panev <spanev@nvidia.com>
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
…kens [core] (#24042)

* preventllama fast from returning token type ids

* remove type hints

* normalised False
* fix bnb config json serialization

* forward contrib credits from discussions

---------

Co-authored-by: Andrechang <Andrechang@users.noreply.github.com>
* fix the deepspeed test failures

* apex fix

* FSDP save ckpt fix

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
amyeroberts and others added 26 commits July 4, 2023 13:42
…#24570)

* Removal of deprecated methods and specify versions

* Fix tests
* Fix ESM models buffers

* Remove modifs

* Tied weights keys are needed silly

* quality
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations
* fix push to hub for peft ckpts

* oops
* don't add space before single letter chars that don't have a merge

* fix the fix

* fixup

* add a test

* more testing

* fixup

* hack to make sure fast is also fixed

* update switch transformers test

* revert convert slow

* Update src/transformers/models/t5/tokenization_t5.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add typechecking

* quality

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Udate link to RunHouse hardware setup documentation.

* Fix link to hardware setup in other location as well
…one (#24510)

* Adding warning messages to BERT for missing attention masks

These warning messages when there are pad tokens within the input ids and
no attention masks are given. The warning message should only show up once.

* Adding warning messages to BERT for missing attention masks

These warning messages are shown when the pad_token_id is not None
and no attention masks are given. The warning message should only
show up once.

* Ran fix copies to copy over the changes to some of the other models

* Add logger.warning_once.cache_clear() to the test

* Shows warning when there are no attention masks and input_ids start/end with pad tokens

* Using warning_once() instead and fix indexing in input_ids check

---------

Co-authored-by: JB Lau <hckyn@voyager2.local>
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* hidden layers, huh, what are they good for (absolutely nothing)

* Some tests break with 1 hidden layer, use 2

* Use 1 hidden layer in a few slow models

* Use num_hidden_layers=2 everywhere

* Slightly higher tol for groupvit

* Slightly higher tol for groupvit
* [modeling_clip.py] improve readability

* apply to other models

* fix
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Limit Pydantic to V1 in dependencies

Pydantic is about to release V2 release which will break a lot of things. This change prevents `transformers` to be used with Pydantic V2 to avoid breaking things.

* more

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* add tokenization template

* update conversion script

* update modeling code

* update

* update convert checkpoint

* update modeling

* revert changes on convert script

* new conversion script for new format

* correct position bias

* cleaning a bit

* Credit co authors

Co-authored-by: agemagician
<ahmed.elnaggar@tum.de>

Co-authored-by: stefan-it
<>

* styling

* Add docq

* fix copies

* add co author

* Other Author

* Merge branch 'main' of https://github.com/huggingface/transformers into add-umt5

* add testing

* nit

* Update docs/source/en/model_doc/umt5.mdx

Co-authored-by: Stefan Schweter <stefan@schweter.it>

* fix t5

* actual fix?

* revert wrong changes

* remove

* update test

* more fixes

* revert some changes

* add SPIECE_UNDERLINE

* add a commone xample

* upfate

* fix copies

* revert changes on t5 conversion script

* revert bytefallback changes since there was no addition yet

* fixup

* fixup

* ingore umt5 cutom testing folder

* fix readmes

* revertT5 changes

* same outputs

* fixup

* update example

* Apply suggestions from code review

* style

* draft addition of all new files

* current update

* fix attention and stuff

* finish refactoring

* auto config

* fixup

* more nits

* add umt5 to init

* use md format

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* revert changes on mt5

* revert mt4 changes

* update test

* more fixes

* add to mapping

* fix-copies

* fix copies

* foix retain grad

* fix some tests

* nits

* done

* Update src/transformers/models/umt5/modeling_umt5.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/umt5.md

* Update src/transformers/models/umt5/__init__.py

* Update docs/source/en/model_doc/umt5.md

Co-authored-by: Stefan Schweter <stefan@schweter.it>

* Update src/transformers/models/umt5/modeling_umt5.py

* update conversion script + use google checkpoints

* nits

* update test and modelling

* stash slow convert

* update fixupd

* don't change slow

---------

Co-authored-by: stefan-it <>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* docs: ko: `perplexity.mdx`

* translate comment

* reference english file

* change extension

* update toctree
* [Time-Series] Added blog-post to tips

* added Resources to time series models docs

* removed "with Bert"
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix loading dataset link

* Update examples/tensorflow/translation/run_translation.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update examples/tensorflow/translation/run_translation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
…for XLNetTokenizerFast conversion. (#24618)

* precompiled_charsmap checking before adding to the normalizers' list.

* precompiled_charsmap checking for all Sentencepiece tokenizer models

* precompiled_charsmap checking for SPM tokenizer models - correct formatting
* Fix audio feature extractor deps

* use audio utils window over torch window
* open llama fp16 bug fix

* bug fix

* bug fixed

* make style

* Update modeling_llama.py

* apply formatting

* Address amy's comment

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: root <root@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
* Sort filenames alphabetically

* Add check for order
@Lorenzobattistela
Copy link
Contributor Author

git history got messed up, will open a new PR just with the correct changes

@Lorenzobattistela
Copy link
Contributor Author

Reopened PR #24652

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.