-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing gradient in sincos positional encoding in monai/networks/blocks/patchembedding.py #7564
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…/blocks/patchembedding.py and associated tests Signed-off-by: Lucas Robinet <robinet.lucas@iuct-oncopole.fr>
marksgraham
approved these changes
Mar 22, 2024
LGTM, i agree the sincos embeddings should be fixed, it doesn't make sense to have them learnable. |
/build |
2 similar comments
/build |
/build |
/build |
Merged now, feel free to create another PR for #7564 (comment). |
7 tasks
KumoLiu
added a commit
that referenced
this pull request
Apr 4, 2024
…#7605) Fixes #7564 . ### Description As discussed, a small simplification for the creation of sincos positional encoding where we don't need to use the `torch.no_grad()` context or copy the tensor with `copy_` from torch which doesn't preserve the `requires_grad` attribute here. The changes are simple and are linked to the corresponding comment #7564, the output is already in float32 so it doesn't seem particularly necessary to apply the conversion previously done. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [x] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [x] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. Signed-off-by: Lucas Robinet <robinet.lucas@iuct-oncopole.fr> Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com>
Yu0610
pushed a commit
to Yu0610/MONAI
that referenced
this pull request
Apr 11, 2024
…s/patchembedding.py (Project-MONAI#7564) ### Description When you choose to put the argument `pos_embed_type='sincos'` in the `PatchEmbeddingBlock` class, it still return a learnable positional encoding To reproduce: ```python from monai.networks.blocks import PatchEmbeddingBlock patcher = PatchEmbeddingBlock( in_channels=1, img_size=(32, 32, 32), patch_size=(8, 8, 8), hidden_size=96, num_heads=8, pos_embed_type="sincos", dropout_rate=0.5, ) print(patcher.position_embeddings.requires_grad) >>> True ``` In the literature, we sometimes use either positional encoding in sincos which are fixed and non-trainable as in the original Attention Is All You Need [paper](https://arxiv.org/abs/1706.03762) or a learnable positional embedding as in the ViT [paper](https://arxiv.org/abs/2010.11929). If you choose to use a sincos, then it seems that is must be fixed which is not the case here. I'm not completely sure of the desired result in MONAI since there's already a learnable possibility, so if we choose sincos we'd like gradient-free parameters. However the documentation of `build_sincos_position_embedding`in the `pos_embed_utils.py`files stipulate: "The sin-cos position embedding as a learnable parameter" which seems a bit confusing. Especially as the encoding construction function seems to aim to set the require gradient to False (see below) ```python pos_embed = nn.Parameter(pos_emb) pos_embed.requires_grad = False return pos_embed ``` But these changes are not maintained by torch's `copy_` function, which does not copy gradient parameters (see the cpp code https://github.com/pytorch/pytorch/blob/148a8de6397be6e4b4ca1508b03b82d117bfb03c/torch/csrc/lazy/ts_backend/tensor_aten_ops.cpp#L51). This `copy_`is used in the `PatchEmbeddingBlock` class to instantiate the positional embedding. I propose a small fix to overcome this problem as well as test cases to ensure that positional embedding behaves correctly. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [x] New tests added to cover the changes. - [x] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [x] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. Signed-off-by: Lucas Robinet <robinet.lucas@iuct-oncopole.fr> Co-authored-by: YunLiu <55491388+KumoLiu@users.noreply.github.com> Signed-off-by: Yu0610 <612410030@alum.ccu.edu.tw>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
When you choose to put the argument
pos_embed_type='sincos'
in thePatchEmbeddingBlock
class, it still return a learnable positional encodingTo reproduce:
In the literature, we sometimes use either positional encoding in sincos which are fixed and non-trainable as in the original Attention Is All You Need paper or a learnable positional embedding as in the ViT paper.
If you choose to use a sincos, then it seems that is must be fixed which is not the case here.
I'm not completely sure of the desired result in MONAI since there's already a learnable possibility, so if we choose sincos we'd like gradient-free parameters. However the documentation of
build_sincos_position_embedding
in thepos_embed_utils.py
files stipulate: "The sin-cos position embedding as a learnable parameter" which seems a bit confusing. Especially as the encoding construction function seems to aim to set the require gradient to False (see below)But these changes are not maintained by torch's
copy_
function, which does not copy gradient parameters (see the cpp code https://github.com/pytorch/pytorch/blob/148a8de6397be6e4b4ca1508b03b82d117bfb03c/torch/csrc/lazy/ts_backend/tensor_aten_ops.cpp#L51).This
copy_
is used in thePatchEmbeddingBlock
class to instantiate the positional embedding.I propose a small fix to overcome this problem as well as test cases to ensure that positional embedding behaves correctly.
Types of changes
./runtests.sh -f -u --net --coverage
../runtests.sh --quick --unittests --disttests
.make html
command in thedocs/
folder.