-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forcing conv subsampling to 32 bit #4293
Conversation
This pull request introduces 1 alert when merging 37ac555 into f6936ce - view on LGTM.com new alerts:
|
# added in order to prevent slowdown in bfloat16 with CUDNN v8 API | ||
# to be removed once the above is fixed in cudnn | ||
with torch.cuda.amp.autocast(dtype=torch.float32) | ||
audio_signal, length = self.pre_encode(audio_signal, length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to explicitly cast the input to float here otherwise it does nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is my understanding that there's no need .. e.g. https://pytorch.org/docs/stable/amp.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is. Autocast only is disabled. If the input to the function is still fp16, then the operation will still be done in fp16. That's why we needed the explicit cast in SE too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the response and discussion regarding this SE PR - #2030
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm can you print out the input tensor, weight tensor and output tensor dtypes with and without explicit cast and see which one works properly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* forcing conv subsampling to 32 bit * extra character * comments added * moving autocast into subsampling * lint Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com> Signed-off-by: Georg Kucsko <gkucsko@gmail.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Forces 2D conv subsampling to fp32
Collection: [Note which collection this PR will affect]
ASR
Changelog
One liner change in nemo/collections/asr/modules/conformer_encoder.py
Usage
No changes
# Add a code snippet demonstrating how to use this
No changes
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information