[API] use softmax with length, and interleaved matmul for BERT #1136

eric-haibin-lin · 2020-02-01T21:29:32Z

Description

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

…1091) * use softmax with length, and interleaved matmul * push backward compatibility fix * fix failing unittests for output_all_encodings, and valid-len=None * fix lint * Update bert.py * amp patch * Update MXNet 1.6 pre-release version tested on CI * Update bert.py Co-authored-by: Leonard Lausen <leonard@lausen.nl>

codecov · 2020-02-01T21:29:35Z

Codecov Report

Merging #1136 into master will decrease coverage by 0.41%.
The diff coverage is 98.8%.

@@            Coverage Diff             @@
##           master    #1136      +/-   ##
==========================================
- Coverage   87.76%   87.34%   -0.42%     
==========================================
  Files          67       67              
  Lines        6310     6386      +76     
==========================================
+ Hits         5538     5578      +40     
- Misses        772      808      +36

Impacted Files	Coverage Δ
src/gluonnlp/model/bert.py	`94.28% <98.8%> (+1.63%)`	⬆️
src/gluonnlp/data/batchify/embedding.py	`45.16% <0%> (-52.42%)`	⬇️
src/gluonnlp/utils/files.py	`42.62% <0%> (-3.28%)`	⬇️
src/gluonnlp/vocab/subwords.py	`85.1% <0%> (-2.13%)`	⬇️
src/gluonnlp/data/question_answering.py	`100% <0%> (ø)`	⬆️
src/gluonnlp/model/attention_cell.py	`91.06% <0%> (+0.55%)`	⬆️
src/gluonnlp/model/transformer.py	`91.66% <0%> (+4.48%)`	⬆️
src/gluonnlp/model/utils.py	`77.69% <0%> (+6.92%)`	⬆️
src/gluonnlp/model/seq2seq_encoder_decoder.py	`80% <0%> (+30%)`	⬆️

fhieber · 2020-02-01T21:54:01Z

Hi @eric-haibin-lin,
could you help me understand the difference between this PR and the previously reverted commit? That is, what has changed to address #1127 ?
We are about to switch to the fast multi-head attention ops in Sockeye as well. I also observed perplexity differences there, so I'd like to understand the root cause of the reverted commit for gluon-nlp. Thanks!

mli · 2020-02-01T23:10:25Z

Job PR-1136/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1136/1/index.html

eric-haibin-lin · 2020-02-02T00:24:54Z

Hi @fhieber the main difference is in this code block:

        query_weight = query_weight.reshape(shape=(self._num_heads, -1, 0), reverse=True)
        key_weight = key_weight.reshape(shape=(self._num_heads, -1, 0), reverse=True)
        value_weight = value_weight.reshape(shape=(self._num_heads, -1, 0), reverse=True)
        in_weight = F.concat(query_weight, key_weight, value_weight, dim=-2)
        in_weight = in_weight.reshape(shape=(-1, 0), reverse=True)

I think the new ops assume the projection is done with interleaving weights for k/q/v. The concatenated weight should have shape (num_heads, C_out/num_heads * 3, C_in).

mli · 2020-02-02T02:41:09Z

Job PR-1136/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1136/2/index.html

mli · 2020-02-02T03:19:58Z

Job PR-1136/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1136/3/index.html

mli · 2020-02-02T06:58:25Z

Job PR-1136/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1136/4/index.html

eric-haibin-lin · 2020-02-09T18:27:44Z

replaced #1091

TaoLv · 2020-02-11T07:48:37Z

src/gluonnlp/model/bert.py

+        value_weight = value_weight.reshape(shape=(self._num_heads, -1, 0), reverse=True)
+        in_weight = F.concat(query_weight, key_weight, value_weight, dim=-2)
+        in_weight = in_weight.reshape(shape=(-1, 0), reverse=True)
+        in_bias = F.concat(query_bias, key_bias, value_bias, dim=0)


Is it possible to avoid concat for every iteration? Or at least for inference, we only need concat once, right?

For inference, yes that's true. It's similar to RNN. If we figure out a way to avoid the weight concat in RNN, we can apply that here, too. @TaoLv do you have any idea/suggestion?

eric-haibin-lin and others added 3 commits January 31, 2020 23:50

Add fused attn and softmax

ac1e35f

remove amp patch

8d95b00

eric-haibin-lin requested a review from a team as a code owner February 1, 2020 21:29

fhieber mentioned this pull request Feb 1, 2020

[WIP] Sockeye 2 Performance Optimizations awslabs/sockeye#752

Closed

8 tasks

Ubuntu added 2 commits February 2, 2020 01:09

add test

09e96e1

test for checkponts

2c96c47

Update files.py

0cf385e

py3.5 compatibility

c2e1727

leezu approved these changes Feb 7, 2020

View reviewed changes

leezu merged commit 75c29a3 into dmlc:master Feb 7, 2020

TaoLv reviewed Feb 11, 2020

View reviewed changes

leezu mentioned this pull request Apr 15, 2020

Parameter fusion support in Gluon apache/mxnet#18077

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API] use softmax with length, and interleaved matmul for BERT #1136

[API] use softmax with length, and interleaved matmul for BERT #1136

eric-haibin-lin commented Feb 1, 2020 •

edited

Loading

codecov bot commented Feb 1, 2020 •

edited

Loading

fhieber commented Feb 1, 2020

mli commented Feb 1, 2020

eric-haibin-lin commented Feb 2, 2020 •

edited

Loading

mli commented Feb 2, 2020

mli commented Feb 2, 2020

mli commented Feb 2, 2020

eric-haibin-lin commented Feb 9, 2020

TaoLv Feb 11, 2020

eric-haibin-lin Feb 11, 2020

[API] use softmax with length, and interleaved matmul for BERT #1136

[API] use softmax with length, and interleaved matmul for BERT #1136

Conversation

eric-haibin-lin commented Feb 1, 2020 • edited Loading

Description

Checklist

Essentials

Changes

Comments

codecov bot commented Feb 1, 2020 • edited Loading

Codecov Report

fhieber commented Feb 1, 2020

mli commented Feb 1, 2020

eric-haibin-lin commented Feb 2, 2020 • edited Loading

mli commented Feb 2, 2020

mli commented Feb 2, 2020

mli commented Feb 2, 2020

eric-haibin-lin commented Feb 9, 2020

TaoLv Feb 11, 2020

Choose a reason for hiding this comment

eric-haibin-lin Feb 11, 2020

Choose a reason for hiding this comment

eric-haibin-lin commented Feb 1, 2020 •

edited

Loading

codecov bot commented Feb 1, 2020 •

edited

Loading

eric-haibin-lin commented Feb 2, 2020 •

edited

Loading