Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Torch, QNN] Remove FP32 piggy back and use QNN add/mul/concatenate #5061

Merged
merged 2 commits into from
Mar 13, 2020

Conversation

masahi
Copy link
Member

@masahi masahi commented Mar 13, 2020

Previously we were falling back to fp32 op for add/mul/concatenate, because the accuracy on mobilenet v2 would drop if we use QNN's add for torch quantized::add, and also that is the way Torch internally implements some of quantized ops currently.

But I found that the accuracy loss was due to a different reason (turned our for mobilenet v2 only, torchvision people trained it with quantization aware training, and I was doing post training calibration on top of it). Now that the accuracy loss was fixed in a proper way, we don't need to piggy back to fp32 ops like Torch does. No loss of accuracy after this change.

please review @anijain2305
cc @jwfromm @jjohnson-arm

Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

python/tvm/relay/frontend/qnn_torch.py Outdated Show resolved Hide resolved
@masahi
Copy link
Member Author

masahi commented Mar 13, 2020

@anijain2305 @jwfromm @jjohnson-arm

Here is the current result on mobilenet v2, using QNN add and post training calibration (which is wrong).

Model name: mobilenet_v2, per channel quantization
PyTorch accuracy: Top1 = 67.87, Top5 = 88.15
TVM accuracy: Top1 = 62.47, Top5 = 84.67
PyTorch top5 label: [101 386  51 385  69]
TVM top5 label: [101 386  51 385 340]
PyTorch top5 raw output: [18.233843 16.314491 15.674707 13.115572 12.795679]
TVM top5 raw output: [27.510712 26.231144 21.752655 20.153194 17.274168]
max abs diff: 9.916653
mean abs_diff: 2.0649028
50 in 1000 raw outputs correct.

We lost 5 point accuracy compared to Torch.

And here is without post training calibration, also using QNN add. Now the top1 accuracy is much better and almost the same as Torch. Moreover, the raw output of the network, 1000 floating point values, are much closer to Torch. The former has only 50 out of 1000 outputs identical, while the latter, correct one has 376/1000.

Model name: mobilenet_v2, per channel quantization
PyTorch accuracy: Top1 = 71.32, Top5 = 89.86
TVM accuracy: Top1 = 71.27, Top5 = 89.86
PyTorch top5 label: [101 386 385  51 340]
TVM top5 label: [101 386 385  51 340]
PyTorch top5 raw output: [20.168097 18.80845  17.222195 13.59647   9.290921]
TVM top5 raw output: [19.941488 18.581842 16.995586 13.823077  9.064313]
max abs diff: 0.9064312
mean abs_diff: 0.17562106
376 in 1000 raw outputs correct.

@masahi masahi merged commit 4fbc2fb into apache:master Mar 13, 2020
@masahi
Copy link
Member Author

masahi commented Mar 13, 2020

Thanks @anijain2305

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants