Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metaformers] handling different normalizations + layer repetition #345

Merged
merged 3 commits into from
Jul 14, 2022

Conversation

blefaudeux
Copy link
Contributor

@blefaudeux blefaudeux commented Jul 1, 2022

What does this PR do?

Some changes, on the way to a proper EfficientFormer support. This is limited to the factory side of the repo, no changes in the actual parts

  • Make it possible to repeat layers in the config generator helper
  • Make it possible to define a different MLP per layer
  • Make it possible to skip the layernorm (will be extended to support other normalizations)
    related to EfficientFormer #330

Before submitting

  • Did you have fun?
    • Make sure you had fun coding 🙃
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
    • N/A
  • Did you make sure to update the docs?
    • N/A
  • Did you write any new necessary tests?
    • N/A
  • Did you update the changelog? (if needed)
    • N/A

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 1, 2022
@blefaudeux blefaudeux changed the title handling different normalizations + layer repetition [DRAFT] handling different normalizations + layer repetition Jul 1, 2022
@blefaudeux blefaudeux marked this pull request as draft July 1, 2022 21:07

from examples.microViT import Classifier, VisionTransformer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min/micro was a follow up from Karpathy's minGPT, does not really apply here, I figured that cifar_ViT was probably more transparent ?

@@ -131,34 +146,16 @@ def forward(self, x):
torch.cuda.manual_seed_all(42)
torch.manual_seed(42)

train_transforms = transforms.Compose(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lightning-bolt datamodule already does all that

@@ -205,33 +203,15 @@ def test_step(self, batch, _):
NUM_WORKERS = 4
GPUS = 1

train_transforms = transforms.Compose(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, these are actually the default transforms in the lightning bolt datamodule, not useful

outputs = wrap(inputs=[x, x, x])

assert id(outputs[0]) == id(outputs[1])

# Check the BW pass
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better code cov, and good idea in any case I believe

@blefaudeux blefaudeux force-pushed the hierachical_models_improvement branch from 22f45db to 41b8516 Compare July 1, 2022 21:25
@@ -18,7 +18,7 @@
from collections import namedtuple


class LayerNormStyle(str, Enum):
class ResidualNormStyle(str, Enum):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of the new transformer variants for vision (metaformer, efficientformer,..) alter the actual normalization, can be something else than layernorm. RMSnorm is also used in NLP

class NormalizationType(str, Enum):
LayerNorm = "layernorm"
Skip = "skip"
# TODO: BatchNorm = "batchnorm"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably to be done in another PR, requires deferred init or some clever way to get the embedding size at that point, so for now just handle the "no normalization" path

@blefaudeux blefaudeux changed the title [DRAFT] handling different normalizations + layer repetition [metaformers] handling different normalizations + layer repetition Jul 2, 2022
@blefaudeux blefaudeux requested review from dianaml0 and fmassa July 2, 2022 11:53
@blefaudeux blefaudeux marked this pull request as ready for review July 2, 2022 11:53
@blefaudeux blefaudeux force-pushed the hierachical_models_improvement branch from 41b8516 to e5f22e4 Compare July 2, 2022 11:59
@blefaudeux blefaudeux force-pushed the hierachical_models_improvement branch from e5f22e4 to ebc4f6f Compare July 3, 2022 13:31
Copy link
Contributor

@dianaml0 dianaml0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -190,6 +195,7 @@ def __init__(
multi_head_config_cross: Dict[str, Any],
position_encoding_config: Optional[Dict[str, Any]] = None,
layer_norm_style: str = "post",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might the distinction between the two variable names be confusing? layer_norm_style and normalization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh good point, you're right, it's more of a "residual path style" I think. Do you think I can rename that ? It would break all the existing configs, it's a user facing change unfortunately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah true, may not be great to break. I guess if the difference is well documented should be okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also catch the name for some time and fix it with a warning, then remove this failsafe in a few releases ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed this comment, that sounds good!

@blefaudeux blefaudeux force-pushed the hierachical_models_improvement branch from 68d3a84 to 78f8b7b Compare July 8, 2022 19:01
@blefaudeux blefaudeux force-pushed the hierachical_models_improvement branch from 78f8b7b to 87cd5a4 Compare July 8, 2022 19:15
@codecov-commenter
Copy link

Codecov Report

Merging #345 (87cd5a4) into main (6c003f1) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #345      +/-   ##
==========================================
+ Coverage   93.91%   93.95%   +0.03%     
==========================================
  Files          70       70              
  Lines        3961     3984      +23     
==========================================
+ Hits         3720     3743      +23     
  Misses        241      241              
Flag Coverage Δ
Python 93.95% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
xformers/components/__init__.py 100.00% <100.00%> (ø)
xformers/components/residual.py 98.73% <100.00%> (+0.18%) ⬆️
xformers/factory/block_configs.py 90.81% <100.00%> (+0.19%) ⬆️
xformers/factory/block_factory.py 97.03% <100.00%> (ø)
xformers/factory/model_factory.py 98.16% <100.00%> (ø)
xformers/helpers/hierarchical_configs.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c003f1...87cd5a4. Read the comment docs.

@blefaudeux blefaudeux merged commit 3a7b713 into main Jul 14, 2022
@matthias-weissenbacher
Copy link

Potential bug report.
It works fine on device: "cuda:0" but gives an error when employed at "cuda:1":
"""
RuntimeError: CUDA error: an illegal memory access was encountered” at “File “/lib/python3.8/site-packages/xformers/triton/softmax.py", line 200, in _softmax_dispatch
return torch.softmax(x, dim=-1)”.
"""

We have Installed xformers 0.0.11 from pypi and triton-2.0.0(hash: 5b04331dd2efdd23f4475823761fa975de60a514) from source.
Also xformers-0.0.12.dev0(hash: 3a7b713) and got same error.

@blefaudeux
Copy link
Contributor Author

Potential bug report. It works fine on device: "cuda:0" but gives an error when employed at "cuda:1": """ RuntimeError: CUDA error: an illegal memory access was encountered” at “File “/lib/python3.8/site-packages/xformers/triton/softmax.py", line 200, in _softmax_dispatch return torch.softmax(x, dim=-1)”. """

We have Installed xformers 0.0.11 from pypi and triton-2.0.0(hash: 5b04331dd2efdd23f4475823761fa975de60a514) from source. Also xformers-0.0.12.dev0(hash: 3a7b713) and got same error.

Hi @matthias-weissenbacher , I just saw this report buried in my mails, does that still happen ? Could you tell me more about the GPUs on 0 and 1, are they the same ?

@blefaudeux
Copy link
Contributor Author

also, could you report what you get with CUDA_LAUNCH_BLOCKING=1 {your command} ? probably that it does not really fail on torch.softmax() but earlier

bertmaher pushed a commit to bertmaher/xformers that referenced this pull request Dec 20, 2024
Benchmark: Add possibility to compare benchmarks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants