-
Notifications
You must be signed in to change notification settings - Fork 538
[FEATURE] INT8 Quantization for BERT Sentence Classification and Question Answering #1080
Conversation
…nxiny/gluonnlp into bert_int8
Codecov Report
@@ Coverage Diff @@
## master #1080 +/- ##
==========================================
- Coverage 87.29% 85.98% -1.32%
==========================================
Files 69 69
Lines 6343 6349 +6
==========================================
- Hits 5537 5459 -78
- Misses 806 890 +84
|
env/docker/py3.yml
Outdated
@@ -32,7 +32,7 @@ dependencies: | |||
- flaky==3.6.1 | |||
- flake8==3.7.9 | |||
- mock<3 | |||
- https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-29/dist/mxnet_cu100-1.6.0b20191229-py2.py3-none-manylinux1_x86_64.whl | |||
- https://lllausen-data.s3.amazonaws.com/mxnet_cu100-1.6.0b20191231-py2.py3-none-manylinux1_x86_64.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change in this PR?
Please merge master due to #1096 |
@leezu ok, because need new API in apache/mxnet#17161. |
I'm removing the release-focus label because this PR depends on MXNet 1.7 and thus can't be included for the GluonNLP 0.9 release. We can merge it soon after the release. Thanks @xinyu-intel for the contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the pending/work-in-progress items for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can annotate the test. For example, in this PR we skip some test based on days: @pytest.mark.skipif(datetime.date.today().weekday() != 0)
https://github.com/dmlc/gluon-nlp/pull/1126/files
Job PR-1080/21 is complete. |
Job PR-1080/22 is complete. |
Job PR-1080/23 is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test case that runs the script in deploy
mode and verifies it works as expected?
There are currently other PRs that plan like to modify the same scripts modified by this PR, and there's a chance of introducing regression if the script is not tested automatically.
Job PR-1080/25 is complete. |
Job PR-1080/24 is complete. |
@xinyu-intel is this PR ready for review? |
@eric-haibin-lin Sure, I'll remeasure the int8 accuracy and add them to page https://gluon-nlp.mxnet.io/model_zoo/bert/index.html |
measured fp32 accuracy with mxnet1.6rc2 and int8 accuracy with mxnet nightly.
will remeasure with @zburning params later. |
Hi, I think it should be EM 87.01 / F1 90.97? |
Job PR-1080/26 is complete. |
@zburning Thx. There might be accuracy variance due to different SW/HW configuration.
|
Job PR-1080/27 is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xinyu-intel The finetune_xx script is getting fat and contains lots of code. I'm thinking whether we can have quantize_classifier.py
and quantize_squad.py
so that the target users can easily find them? We probably need to duplicate the dataset preprocess function. What do you think?
Also - for tutorial, I am thinking about creating multiple tutorials for deployment, with int8 quantization as one of them. The current quantization tutorial can actually be a standalone tutorial. In this tutorial, we can just download a trained model from s3 and start doing quantization. If needed i can help upload the model
@eric-haibin-lin Let's focus on the feature itself in this PR so we can make sure to catch the 0.9 release as an experimental feature. We can refactor the scripts and tutorials in follow-up PRs, and will be better if we get some feedbacks from users about the quantization workflow. What do you think? |
Job PR-1080/28 is complete. |
Description
Quantization solution for BERT SC and QA with Intel DLBoost.
Main Code Changes:
data0=input_ids, data1=segment_ids, data2=valid_length
)BertLayerCollector
to support output clipping while calibration. Now we clip the max_range of GeLU output to 10 and the min_range of layer_norm output to -50 by default.finetune_classification.py
.finetune_squad.py
.Dependency:
apache/mxnet#17161
apache/mxnet#17187
#1091
#1127
#1124
...
FP32 and INT8 Accuracy:
will remeasure on c5 when pending PRs are ready.
@pengzhao-intel @TaoLv @eric-haibin-lin @szha
Checklist
Essentials
Changes
Comments