Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[API] use softmax with length, and interleaved matmul for BERT #1136

Merged
merged 7 commits into from
Feb 7, 2020

Conversation

eric-haibin-lin
Copy link
Member

@eric-haibin-lin eric-haibin-lin commented Feb 1, 2020

Description

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

eric-haibin-lin and others added 3 commits January 31, 2020 23:50
…1091)

* use softmax with length, and interleaved matmul

* push backward compatibility fix

* fix failing unittests for output_all_encodings, and valid-len=None

* fix lint

* Update bert.py

* amp patch

* Update MXNet 1.6 pre-release version tested on CI

* Update bert.py

Co-authored-by: Leonard Lausen <leonard@lausen.nl>
@eric-haibin-lin eric-haibin-lin requested a review from a team as a code owner February 1, 2020 21:29
@codecov
Copy link

codecov bot commented Feb 1, 2020

Codecov Report

Merging #1136 into master will decrease coverage by 0.41%.
The diff coverage is 98.8%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1136      +/-   ##
==========================================
- Coverage   87.76%   87.34%   -0.42%     
==========================================
  Files          67       67              
  Lines        6310     6386      +76     
==========================================
+ Hits         5538     5578      +40     
- Misses        772      808      +36
Impacted Files Coverage Δ
src/gluonnlp/model/bert.py 94.28% <98.8%> (+1.63%) ⬆️
src/gluonnlp/data/batchify/embedding.py 45.16% <0%> (-52.42%) ⬇️
src/gluonnlp/utils/files.py 42.62% <0%> (-3.28%) ⬇️
src/gluonnlp/vocab/subwords.py 85.1% <0%> (-2.13%) ⬇️
src/gluonnlp/data/question_answering.py 100% <0%> (ø) ⬆️
src/gluonnlp/model/attention_cell.py 91.06% <0%> (+0.55%) ⬆️
src/gluonnlp/model/transformer.py 91.66% <0%> (+4.48%) ⬆️
src/gluonnlp/model/utils.py 77.69% <0%> (+6.92%) ⬆️
src/gluonnlp/model/seq2seq_encoder_decoder.py 80% <0%> (+30%) ⬆️

@fhieber
Copy link

fhieber commented Feb 1, 2020

Hi @eric-haibin-lin,
could you help me understand the difference between this PR and the previously reverted commit? That is, what has changed to address #1127 ?
We are about to switch to the fast multi-head attention ops in Sockeye as well. I also observed perplexity differences there, so I'd like to understand the root cause of the reverted commit for gluon-nlp. Thanks!

@mli
Copy link
Member

mli commented Feb 1, 2020

Job PR-1136/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1136/1/index.html

@eric-haibin-lin
Copy link
Member Author

eric-haibin-lin commented Feb 2, 2020

Hi @fhieber the main difference is in this code block:

        query_weight = query_weight.reshape(shape=(self._num_heads, -1, 0), reverse=True)
        key_weight = key_weight.reshape(shape=(self._num_heads, -1, 0), reverse=True)
        value_weight = value_weight.reshape(shape=(self._num_heads, -1, 0), reverse=True)
        in_weight = F.concat(query_weight, key_weight, value_weight, dim=-2)
        in_weight = in_weight.reshape(shape=(-1, 0), reverse=True)

I think the new ops assume the projection is done with interleaving weights for k/q/v. The concatenated weight should have shape (num_heads, C_out/num_heads * 3, C_in).

@mli
Copy link
Member

mli commented Feb 2, 2020

Job PR-1136/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1136/2/index.html

@mli
Copy link
Member

mli commented Feb 2, 2020

Job PR-1136/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1136/3/index.html

@mli
Copy link
Member

mli commented Feb 2, 2020

Job PR-1136/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1136/4/index.html

@leezu leezu merged commit 75c29a3 into dmlc:master Feb 7, 2020
@eric-haibin-lin
Copy link
Member Author

replaced #1091

value_weight = value_weight.reshape(shape=(self._num_heads, -1, 0), reverse=True)
in_weight = F.concat(query_weight, key_weight, value_weight, dim=-2)
in_weight = in_weight.reshape(shape=(-1, 0), reverse=True)
in_bias = F.concat(query_bias, key_bias, value_bias, dim=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid concat for every iteration? Or at least for inference, we only need concat once, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For inference, yes that's true. It's similar to RNN. If we figure out a way to avoid the weight concat in RNN, we can apply that here, too. @TaoLv do you have any idea/suggestion?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants