[FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering #1080

xinyu-intel · 2019-12-24T05:14:46Z

Description

Quantization solution for BERT SC and QA with Intel DLBoost.

Main Code Changes:

change inputs order in BERT SC dataloader to make it align with the inputs order in symbolic model(data0=input_ids, data1=segment_ids, data2=valid_length)
implement BertLayerCollector to support output clipping while calibration. Now we clip the max_range of GeLU output to 10 and the min_range of layer_norm output to -50 by default.
add calibration pass and symbolblock inference pass in finetune_classification.py.
add calibration pass and symbolblock inference pass in finetune_squad.py.
Quantization Readme
Document
accuracy wait to remeasure

Dependency:

apache/mxnet#17161
apache/mxnet#17187
#1091
#1127
#1124
...

FP32 and INT8 Accuracy:

will remeasure on c5 when pending PRs are ready.

Task	maxLength	FP32 Accuracy	INT8 Accuracy	FP32 F1	INT8 F1
SQUAD	128	77.32	76.61	84.84	84.26
SQUAD	384	80.86	80.56	88.31	88.14
MRPC	128	87.75	87.25	70.50	70.56

@pengzhao-intel @TaoLv @eric-haibin-lin @szha

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

…nxiny/gluonnlp into bert_int8

codecov · 2019-12-24T05:14:49Z

Codecov Report

Merging #1080 into master will decrease coverage by 1.31%.
The diff coverage is 92.42%.

@@            Coverage Diff             @@
##           master    #1080      +/-   ##
==========================================
- Coverage   87.29%   85.98%   -1.32%     
==========================================
  Files          69       69              
  Lines        6343     6349       +6     
==========================================
- Hits         5537     5459      -78     
- Misses        806      890      +84

Impacted Files	Coverage Δ
src/gluonnlp/calibration/collector.py	`26.66% <100%> (ø)`	⬆️
src/gluonnlp/data/question_answering.py	`100% <100%> (ø)`	⬆️
src/gluonnlp/model/bert.py	`92.65% <100%> (+4.08%)`	⬆️
src/gluonnlp/data/transforms.py	`83.05% <100%> (ø)`	⬆️
src/gluonnlp/data/utils.py	`85.79% <89.58%> (ø)`	⬆️
src/gluonnlp/data/translation.py	`100% <0%> (ø)`	⬆️
src/gluonnlp/model/train/cache.py	`97.67% <0%> (ø)`	⬆️
src/gluonnlp/model/transformer.py	`86.85% <0%> (-0.33%)`	⬇️
src/gluonnlp/embedding/evaluation.py	`95.79% <0%> (ø)`	⬆️
src/gluonnlp/data/batchify/embedding.py	`45.16% <0%> (-52.42%)`	⬇️
... and 21 more

leezu · 2020-01-06T22:10:19Z

env/docker/py3.yml

@@ -32,7 +32,7 @@ dependencies:
    - flaky==3.6.1
    - flake8==3.7.9
    - mock<3
-    - https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-29/dist/mxnet_cu100-1.6.0b20191229-py2.py3-none-manylinux1_x86_64.whl
+    - https://lllausen-data.s3.amazonaws.com/mxnet_cu100-1.6.0b20191231-py2.py3-none-manylinux1_x86_64.whl


Why this change in this PR?

leezu · 2020-01-06T22:12:27Z

Please merge master due to #1096

xinyu-intel · 2020-01-07T01:05:03Z

@leezu ok, because need new API in apache/mxnet#17161.

leezu · 2020-01-15T15:33:46Z

I'm removing the release-focus label because this PR depends on MXNet 1.7 and thus can't be included for the GluonNLP 0.9 release.

We can merge it soon after the release.

Thanks @xinyu-intel for the contribution!

eric-haibin-lin

What are the pending/work-in-progress items for this PR?

eric-haibin-lin

you can annotate the test. For example, in this PR we skip some test based on days: @pytest.mark.skipif(datetime.date.today().weekday() != 0) https://github.com/dmlc/gluon-nlp/pull/1126/files

mli · 2020-01-27T13:51:43Z

Job PR-1080/21 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/21/index.html

mli · 2020-01-27T13:57:42Z

Job PR-1080/22 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/22/index.html

mli · 2020-01-28T02:15:38Z

Job PR-1080/23 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/23/index.html

leezu

Could you add a test case that runs the script in deploy mode and verifies it works as expected?
There are currently other PRs that plan like to modify the same scripts modified by this PR, and there's a chance of introducing regression if the script is not tested automatically.

mli · 2020-01-30T15:50:05Z

Job PR-1080/25 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/25/index.html

mli · 2020-01-30T15:50:59Z

Job PR-1080/24 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/24/index.html

eric-haibin-lin · 2020-01-31T07:07:19Z

@xinyu-intel is this PR ready for review?

xinyu-intel · 2020-01-31T12:08:58Z

@eric-haibin-lin Sure, I'll remeasure the int8 accuracy and add them to page https://gluon-nlp.mxnet.io/model_zoo/bert/index.html

xinyu-intel · 2020-02-01T14:55:47Z

measured fp32 accuracy with mxnet1.6rc2 and int8 accuracy with mxnet nightly.

Dataset	SQuAD 1.1	MRPC
Model	bert_12_768_12	bert_12_768_12
FP32 EM / F1	81.18 / 88.58	87.01 / 90.97
INT8 EM / F1	80.32 / 88.10	87.01 / 90.88

will remeasure with @zburning params later.

zburning · 2020-02-01T15:09:59Z

Hi, I think it should be EM 87.01 / F1 90.97?

mli · 2020-02-01T15:19:17Z

Job PR-1080/26 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/26/index.html

xinyu-intel · 2020-02-02T00:20:28Z

@zburning Thx.

There might be accuracy variance due to different SW/HW configuration.

Dataset	Model	FP32 EM	INT8 EM	FP32 F1	INT8 F1
SQuAD 1.1	bert_12_768_12	81.18	80.32	88.58	88.10
MRPC	bert_12_768_12	87.01	87.01	90.97	90.88
SST	bert_12_768_12	93.23	93.00

cc @TaoLv @eric-haibin-lin

mli · 2020-02-02T01:09:35Z

Job PR-1080/27 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/27/index.html

eric-haibin-lin

@xinyu-intel The finetune_xx script is getting fat and contains lots of code. I'm thinking whether we can have quantize_classifier.py and quantize_squad.py so that the target users can easily find them? We probably need to duplicate the dataset preprocess function. What do you think?
Also - for tutorial, I am thinking about creating multiple tutorials for deployment, with int8 quantization as one of them. The current quantization tutorial can actually be a standalone tutorial. In this tutorial, we can just download a trained model from s3 and start doing quantization. If needed i can help upload the model

docs/examples/sentence_embedding/bert.md

scripts/bert/index.rst

src/gluonnlp/calibration/collector.py

TaoLv · 2020-02-02T03:32:43Z

@eric-haibin-lin Let's focus on the feature itself in this PR so we can make sure to catch the 0.9 release as an experimental feature. We can refactor the scripts and tutorials in follow-up PRs, and will be better if we get some feedbacks from users about the quantization workflow. What do you think?

mli · 2020-02-02T15:00:29Z

Job PR-1080/28 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/28/index.html

xinyu-intel added 12 commits November 18, 2019 13:18

init support for int8 bert classification

0e2f2f6

support squad calibration

710c6ce

add headers

9ef98b9

enhance quantization and add readme

ee2126c

rename layernorm

f69f087

init support for int8 bert classification

d165481

support squad calibration

c178f25

add headers

96a0b80

enhance quantization and add readme

e7191ff

rename layernorm

9b4505d

Merge branch 'bert_int8' of ssh://gitlab.devtools.intel.com:29418/che…

71b161a

…nxiny/gluonnlp into bert_int8

fix lint

08859e3

xinyu-intel requested a review from a team as a code owner December 24, 2019 05:14

xinyu-intel added 3 commits December 24, 2019 13:31

fix lint

8f9f531

fix ut and lint

4ccc979

fix ut

412ff73

eric-haibin-lin self-assigned this Dec 25, 2019

update python env

53ea62e

eric-haibin-lin added the release focus Progress focus for release label Jan 5, 2020

rebase from master

2b994ea

leezu reviewed Jan 6, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into bert_int8

26c6cd6

leezu removed the release focus Progress focus for release label Jan 15, 2020

eric-haibin-lin added release focus Progress focus for release and removed release focus Progress focus for release labels Jan 15, 2020

eric-haibin-lin reviewed Jan 15, 2020

View reviewed changes

eric-haibin-lin reviewed Jan 24, 2020

View reviewed changes

xinyu-intel added 2 commits January 27, 2020 21:18

add quantization toturial for mrpc

d400b85

Merge remote-tracking branch 'upstream/master' into bert_int8

d7ea196

xinyu-intel changed the title ~~[WIP][FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering~~ [FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering Jan 28, 2020

leezu reviewed Jan 29, 2020

View reviewed changes

xinyu-intel added 2 commits January 30, 2020 23:00

rebase and add deployment part to tutorial

6c08caf

fix lint

3b4eee4

add accuracy to modelzoo

54a6e3a

xinyu-intel added 2 commits February 2, 2020 08:31

resolve conflict

1dbfcf7

add SST int8

99ab77b

eric-haibin-lin reviewed Feb 2, 2020

View reviewed changes

TaoLv reviewed Feb 2, 2020

View reviewed changes

docs/examples/sentence_embedding/bert.md Outdated Show resolved Hide resolved

scripts/bert/index.rst Outdated Show resolved Hide resolved

scripts/bert/index.rst Outdated Show resolved Hide resolved

src/gluonnlp/calibration/collector.py Outdated Show resolved Hide resolved

TaoLv mentioned this pull request Feb 2, 2020

Optimize Inference Performance on CPU #1035

Open

address comments

05918ec

leezu approved these changes Feb 3, 2020

View reviewed changes

leezu merged commit 645fe30 into dmlc:master Feb 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering #1080

[FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering #1080

xinyu-intel commented Dec 24, 2019 •

edited

Loading

codecov bot commented Dec 24, 2019 •

edited

Loading

leezu Jan 6, 2020

leezu commented Jan 6, 2020

xinyu-intel commented Jan 7, 2020

leezu commented Jan 15, 2020

eric-haibin-lin left a comment

eric-haibin-lin left a comment

mli commented Jan 27, 2020

mli commented Jan 27, 2020

mli commented Jan 28, 2020

leezu left a comment

mli commented Jan 30, 2020

mli commented Jan 30, 2020

eric-haibin-lin commented Jan 31, 2020

xinyu-intel commented Jan 31, 2020

xinyu-intel commented Feb 1, 2020 •

edited

Loading

zburning commented Feb 1, 2020

mli commented Feb 1, 2020

xinyu-intel commented Feb 2, 2020

mli commented Feb 2, 2020

eric-haibin-lin left a comment •

edited

Loading

TaoLv commented Feb 2, 2020

mli commented Feb 2, 2020

[FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering #1080

[FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering #1080

Conversation

xinyu-intel commented Dec 24, 2019 • edited Loading

Description

Main Code Changes:

Dependency:

FP32 and INT8 Accuracy:

Checklist

Essentials

Changes

Comments

codecov bot commented Dec 24, 2019 • edited Loading

Codecov Report

leezu Jan 6, 2020

Choose a reason for hiding this comment

leezu commented Jan 6, 2020

xinyu-intel commented Jan 7, 2020

leezu commented Jan 15, 2020

eric-haibin-lin left a comment

Choose a reason for hiding this comment

eric-haibin-lin left a comment

Choose a reason for hiding this comment

mli commented Jan 27, 2020

mli commented Jan 27, 2020

mli commented Jan 28, 2020

leezu left a comment

Choose a reason for hiding this comment

mli commented Jan 30, 2020

mli commented Jan 30, 2020

eric-haibin-lin commented Jan 31, 2020

xinyu-intel commented Jan 31, 2020

xinyu-intel commented Feb 1, 2020 • edited Loading

zburning commented Feb 1, 2020

mli commented Feb 1, 2020

xinyu-intel commented Feb 2, 2020

mli commented Feb 2, 2020

eric-haibin-lin left a comment • edited Loading

Choose a reason for hiding this comment

TaoLv commented Feb 2, 2020

mli commented Feb 2, 2020

xinyu-intel commented Dec 24, 2019 •

edited

Loading

codecov bot commented Dec 24, 2019 •

edited

Loading

xinyu-intel commented Feb 1, 2020 •

edited

Loading

eric-haibin-lin left a comment •

edited

Loading