-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[metaformers] handling different normalizations + layer repetition #345
Conversation
|
||
from examples.microViT import Classifier, VisionTransformer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min/micro was a follow up from Karpathy's minGPT, does not really apply here, I figured that cifar_ViT was probably more transparent ?
@@ -131,34 +146,16 @@ def forward(self, x): | |||
torch.cuda.manual_seed_all(42) | |||
torch.manual_seed(42) | |||
|
|||
train_transforms = transforms.Compose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the lightning-bolt datamodule already does all that
@@ -205,33 +203,15 @@ def test_step(self, batch, _): | |||
NUM_WORKERS = 4 | |||
GPUS = 1 | |||
|
|||
train_transforms = transforms.Compose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, these are actually the default transforms in the lightning bolt datamodule, not useful
outputs = wrap(inputs=[x, x, x]) | ||
|
||
assert id(outputs[0]) == id(outputs[1]) | ||
|
||
# Check the BW pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better code cov, and good idea in any case I believe
22f45db
to
41b8516
Compare
@@ -18,7 +18,7 @@ | |||
from collections import namedtuple | |||
|
|||
|
|||
class LayerNormStyle(str, Enum): | |||
class ResidualNormStyle(str, Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of the new transformer variants for vision (metaformer, efficientformer,..) alter the actual normalization, can be something else than layernorm. RMSnorm is also used in NLP
class NormalizationType(str, Enum): | ||
LayerNorm = "layernorm" | ||
Skip = "skip" | ||
# TODO: BatchNorm = "batchnorm" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably to be done in another PR, requires deferred init or some clever way to get the embedding size at that point, so for now just handle the "no normalization" path
41b8516
to
e5f22e4
Compare
e5f22e4
to
ebc4f6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
xformers/factory/block_configs.py
Outdated
@@ -190,6 +195,7 @@ def __init__( | |||
multi_head_config_cross: Dict[str, Any], | |||
position_encoding_config: Optional[Dict[str, Any]] = None, | |||
layer_norm_style: str = "post", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might the distinction between the two variable names be confusing? layer_norm_style
and normalization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh good point, you're right, it's more of a "residual path style" I think. Do you think I can rename that ? It would break all the existing configs, it's a user facing change unfortunately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah true, may not be great to break. I guess if the difference is well documented should be okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also catch the name for some time and fix it with a warning, then remove this failsafe in a few releases ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry missed this comment, that sounds good!
68d3a84
to
78f8b7b
Compare
78f8b7b
to
87cd5a4
Compare
Codecov Report
@@ Coverage Diff @@
## main #345 +/- ##
==========================================
+ Coverage 93.91% 93.95% +0.03%
==========================================
Files 70 70
Lines 3961 3984 +23
==========================================
+ Hits 3720 3743 +23
Misses 241 241
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Potential bug report. We have Installed xformers 0.0.11 from pypi and triton-2.0.0(hash: 5b04331dd2efdd23f4475823761fa975de60a514) from source. |
Hi @matthias-weissenbacher , I just saw this report buried in my mails, does that still happen ? Could you tell me more about the GPUs on 0 and 1, are they the same ? |
also, could you report what you get with CUDA_LAUNCH_BLOCKING=1 {your command} ? probably that it does not really fail on torch.softmax() but earlier |
Benchmark: Add possibility to compare benchmarks
What does this PR do?
Some changes, on the way to a proper EfficientFormer support. This is limited to the factory side of the repo, no changes in the actual parts
related to EfficientFormer #330
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.