Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering #1080

Merged
merged 38 commits into from
Feb 3, 2020

Conversation

xinyu-intel
Copy link
Member

@xinyu-intel xinyu-intel commented Dec 24, 2019

Description

Quantization solution for BERT SC and QA with Intel DLBoost.

Main Code Changes:

  • change inputs order in BERT SC dataloader to make it align with the inputs order in symbolic model(data0=input_ids, data1=segment_ids, data2=valid_length)
  • implement BertLayerCollector to support output clipping while calibration. Now we clip the max_range of GeLU output to 10 and the min_range of layer_norm output to -50 by default.
  • add calibration pass and symbolblock inference pass in finetune_classification.py.
  • add calibration pass and symbolblock inference pass in finetune_squad.py.
  • Quantization Readme
  • Document
  • accuracy wait to remeasure

Dependency:

apache/mxnet#17161
apache/mxnet#17187
#1091
#1127
#1124
...

FP32 and INT8 Accuracy:

will remeasure on c5 when pending PRs are ready.

Task maxLength FP32 Accuracy INT8 Accuracy FP32 F1 INT8 F1
SQUAD 128 77.32 76.61 84.84 84.26
SQUAD 384 80.86 80.56 88.31 88.14
MRPC 128 87.75 87.25 70.50 70.56

@pengzhao-intel @TaoLv @eric-haibin-lin @szha

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@xinyu-intel xinyu-intel requested a review from a team as a code owner December 24, 2019 05:14
@codecov
Copy link

codecov bot commented Dec 24, 2019

Codecov Report

Merging #1080 into master will decrease coverage by 1.31%.
The diff coverage is 92.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1080      +/-   ##
==========================================
- Coverage   87.29%   85.98%   -1.32%     
==========================================
  Files          69       69              
  Lines        6343     6349       +6     
==========================================
- Hits         5537     5459      -78     
- Misses        806      890      +84
Impacted Files Coverage Δ
src/gluonnlp/calibration/collector.py 26.66% <100%> (ø) ⬆️
src/gluonnlp/data/question_answering.py 100% <100%> (ø) ⬆️
src/gluonnlp/model/bert.py 92.65% <100%> (+4.08%) ⬆️
src/gluonnlp/data/transforms.py 83.05% <100%> (ø) ⬆️
src/gluonnlp/data/utils.py 85.79% <89.58%> (ø) ⬆️
src/gluonnlp/data/translation.py 100% <0%> (ø) ⬆️
src/gluonnlp/model/train/cache.py 97.67% <0%> (ø) ⬆️
src/gluonnlp/model/transformer.py 86.85% <0%> (-0.33%) ⬇️
src/gluonnlp/embedding/evaluation.py 95.79% <0%> (ø) ⬆️
src/gluonnlp/data/batchify/embedding.py 45.16% <0%> (-52.42%) ⬇️
... and 21 more

@eric-haibin-lin eric-haibin-lin self-assigned this Dec 25, 2019
@eric-haibin-lin eric-haibin-lin added the release focus Progress focus for release label Jan 5, 2020
@@ -32,7 +32,7 @@ dependencies:
- flaky==3.6.1
- flake8==3.7.9
- mock<3
- https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-29/dist/mxnet_cu100-1.6.0b20191229-py2.py3-none-manylinux1_x86_64.whl
- https://lllausen-data.s3.amazonaws.com/mxnet_cu100-1.6.0b20191231-py2.py3-none-manylinux1_x86_64.whl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change in this PR?

@leezu
Copy link
Contributor

leezu commented Jan 6, 2020

Please merge master due to #1096

@xinyu-intel
Copy link
Member Author

@leezu ok, because need new API in apache/mxnet#17161.

@leezu
Copy link
Contributor

leezu commented Jan 15, 2020

I'm removing the release-focus label because this PR depends on MXNet 1.7 and thus can't be included for the GluonNLP 0.9 release.

We can merge it soon after the release.

Thanks @xinyu-intel for the contribution!

@leezu leezu removed the release focus Progress focus for release label Jan 15, 2020
@eric-haibin-lin eric-haibin-lin added release focus Progress focus for release and removed release focus Progress focus for release labels Jan 15, 2020
Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the pending/work-in-progress items for this PR?

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can annotate the test. For example, in this PR we skip some test based on days: @pytest.mark.skipif(datetime.date.today().weekday() != 0) https://github.com/dmlc/gluon-nlp/pull/1126/files

@mli
Copy link
Member

mli commented Jan 27, 2020

Job PR-1080/21 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/21/index.html

@mli
Copy link
Member

mli commented Jan 27, 2020

Job PR-1080/22 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/22/index.html

@xinyu-intel xinyu-intel changed the title [WIP][FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering [FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering Jan 28, 2020
@mli
Copy link
Member

mli commented Jan 28, 2020

Job PR-1080/23 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/23/index.html

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test case that runs the script in deploy mode and verifies it works as expected?
There are currently other PRs that plan like to modify the same scripts modified by this PR, and there's a chance of introducing regression if the script is not tested automatically.

@mli
Copy link
Member

mli commented Jan 30, 2020

Job PR-1080/25 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/25/index.html

@mli
Copy link
Member

mli commented Jan 30, 2020

Job PR-1080/24 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/24/index.html

@eric-haibin-lin
Copy link
Member

@xinyu-intel is this PR ready for review?

@xinyu-intel
Copy link
Member Author

@eric-haibin-lin Sure, I'll remeasure the int8 accuracy and add them to page https://gluon-nlp.mxnet.io/model_zoo/bert/index.html

@xinyu-intel
Copy link
Member Author

xinyu-intel commented Feb 1, 2020

measured fp32 accuracy with mxnet1.6rc2 and int8 accuracy with mxnet nightly.

Dataset SQuAD 1.1 MRPC
Model bert_12_768_12 bert_12_768_12
FP32 EM / F1 81.18 / 88.58 87.01 / 90.97
INT8 EM / F1 80.32 / 88.10 87.01 / 90.88

will remeasure with @zburning params later.

@zburning
Copy link
Contributor

zburning commented Feb 1, 2020

Hi, I think it should be EM 87.01 / F1 90.97?

@mli
Copy link
Member

mli commented Feb 1, 2020

Job PR-1080/26 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/26/index.html

@xinyu-intel
Copy link
Member Author

@zburning Thx.

There might be accuracy variance due to different SW/HW configuration.

Dataset Model FP32 EM INT8 EM FP32 F1 INT8 F1
SQuAD 1.1 bert_12_768_12 81.18 80.32 88.58 88.10
MRPC bert_12_768_12 87.01 87.01 90.97 90.88
SST bert_12_768_12 93.23 93.00    

cc @TaoLv @eric-haibin-lin

@mli
Copy link
Member

mli commented Feb 2, 2020

Job PR-1080/27 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/27/index.html

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xinyu-intel The finetune_xx script is getting fat and contains lots of code. I'm thinking whether we can have quantize_classifier.py and quantize_squad.py so that the target users can easily find them? We probably need to duplicate the dataset preprocess function. What do you think?
Also - for tutorial, I am thinking about creating multiple tutorials for deployment, with int8 quantization as one of them. The current quantization tutorial can actually be a standalone tutorial. In this tutorial, we can just download a trained model from s3 and start doing quantization. If needed i can help upload the model

docs/examples/sentence_embedding/bert.md Outdated Show resolved Hide resolved
scripts/bert/index.rst Outdated Show resolved Hide resolved
scripts/bert/index.rst Outdated Show resolved Hide resolved
src/gluonnlp/calibration/collector.py Outdated Show resolved Hide resolved
@TaoLv
Copy link
Member

TaoLv commented Feb 2, 2020

@eric-haibin-lin Let's focus on the feature itself in this PR so we can make sure to catch the 0.9 release as an experimental feature. We can refactor the scripts and tutorials in follow-up PRs, and will be better if we get some feedbacks from users about the quantization workflow. What do you think?

@mli
Copy link
Member

mli commented Feb 2, 2020

Job PR-1080/28 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1080/28/index.html

@leezu leezu merged commit 645fe30 into dmlc:master Feb 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
release focus Progress focus for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants