-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Tensor cores used only for fp16 in interleaved multihead attention #17994
Conversation
Hey @blchu , Thanks for submitting the PR
CI supported jobs: [centos-cpu, unix-cpu, edge, centos-gpu, miscellaneous, windows-gpu, clang, sanity, unix-gpu, website, windows-cpu] Note: |
Is this tested on cuda architectures < 5? |
I've tested on cuda architecture < 5 (specifically, k80), there's no issue running the operator |
@mxnet-bot run ci [all] |
@ChaiBapchya is something happening to CI right now? bot did not trigger the CI. Thanks! |
@mxnet-bot run ci [all] |
@blchu can you rebase on master and force push? |
8862691
to
9a99b87
Compare
@mxnet-bot run ci [unix-cpu, unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu, unix-cpu] |
@mxnet-bot run ci [unix-cpu] Download of cifar failed in the Perl test. |
Jenkins CI successfully triggered : [unix-cpu] |
…iction (apache#17994) (cherry picked from commit afae030)
Description
Fixed issue where fp32 inputs use tensor cores for the interleaved multihead attention operators, resulting in lower precision calculations and potential reduction in accuracy.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments