Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization : Converting Yolo3 from fp32 to int8 : output always -1 #1046

Closed
djaym7 opened this issue Nov 19, 2019 · 28 comments
Closed

Quantization : Converting Yolo3 from fp32 to int8 : output always -1 #1046

djaym7 opened this issue Nov 19, 2019 · 28 comments

Comments

@djaym7
Copy link

djaym7 commented Nov 19, 2019

import mxnet as mx
import gluoncv as gcv
from mxnet.contrib.quantization import quantize_model #,quantize_net
import logging
net = gcv.model_zoo.yolo3_darknet53_voc(pretrained=True,ctx=mx.gpu())

ctx = [mx.gpu(i) for i in range(mx.context.num_gpus())]
net.hybridize()
_=net(mx.nd.random.randn(1,3,608,608,ctx=mx.gpu()))
net.export('yolo')

def save_symbol(fname, sym, logger=None):
if logger is not None:
logger.info('Saving symbol into file at %s' % fname)
sym.save(fname)

def save_params(fname, arg_params, aux_params, logger=None):
if logger is not None:
logger.info('Saving params into file at %s' % fname)
save_dict = {('arg:%s' % k): v.as_in_context(mx.gpu()) for k, v in arg_params.items()}
save_dict.update({('aux:%s' % k): v.as_in_context(mx.gpu()) for k, v in aux_params.items()})
mx.nd.save(fname, save_dict)

sym, arg_params, aux_params = mx.model.load_checkpoint('yolo',0)

qsym, qarg_params, qaux_params = quantize_model(sym=sym, arg_params=arg_params, aux_params=aux_params,
ctx=mx.gpu(), excluded_sym_names=['yolov30_yolooutputv32_conv0_fwd',
'yolov30_yolooutputv31_conv0_fwd',
'darknetv30_conv0_fwd',
'yolov30_yolooutputv30_conv0_fwd'],
calib_mode=None, quantized_dtype='uint8',
logger=logging)
save_symbol('yolo-int8-symbol.json',qsym)
save_params('yolo-int8-0000.params',qarg_params,aux_params)

#####################################################
--The results are completely different for net(x) and mod.forward(x)--
Check the output for :
data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0])
mod = mx.mod.Module.load("yolo-int8",0)
mod.bind(for_training=False, data_shapes=[('data', (1,3,608,608))],
label_shapes=mod._label_shapes)
mod.load_params('yolo-int8-0000.params')
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])
mod.forward(Batch([data1]))
mod.get_outputs()
data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0])
mod.forward(Batch([data1]))
mody.forward(Batch([data1]))

#mody is same as mod but for fp32 yolo which gives same output as net(x) which is expected
net(data1)[0][0][:5],mod.get_outputs()[0][0][:5],mody.get_outputs()[0][0][:5]

OUT:
(
[[19.]
[-1.]
[-1.]
[-1.]
[-1.]]
<NDArray 5x1 @gpu(0)>,
[[-1.]
[-1.]
[-1.]
[-1.]
[-1.]]
<NDArray 5x1 @cpu(0)>,
[[19.]
[-1.]
[-1.]
[-1.]
[-1.]]
<NDArray 5x1 @cpu(0)>)


Quantized model always gives -1s in classes. Also, is there any source where i can learn on how to convert this manually ? (sick of waiting for small number of devs working on this and also if there is any slack channel please add me i work at Amazon and have so many issues with mxnet/onnx/quantization/tensorrt/...)

@djaym7
Copy link
Author

djaym7 commented Nov 19, 2019

For mask_rcnn_resnet50_v1b_coco ,
#instance
sym, arg_params, aux_params = mx.model.load_checkpoint('instance',0)

qsym, qarg_params, qaux_params = quantize_model(sym=sym, arg_params=arg_params, aux_params=aux_params,
ctx=mx.gpu(), excluded_sym_names=['resnetv1b_conv0_fwd',
#'maskrcnn1_rpn0_conv1_fwd',
'maskrcnn0_rpn0_conv1_fwd'
],
calib_mode=None, quantized_dtype='uint8',
logger=logging)
save_symbol('instance-int8-symbol.json',qsym)
save_params('instance-int8-0000.params',qarg_params,aux_params)

data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0])

mod = mx.mod.Module.load("instance-int8",0)

mod.bind(for_training=False, data_shapes=[('data', (1,3,608,608))],
label_shapes=mod._label_shapes)

mod.load_params('instance-int8-0000.params')

from collections import namedtuple
Batch = namedtuple('Batch', ['data'])
mod.forward(Batch([data1]))
mod.get_outputs()

OUT: MXNetError: [01:36:19] src/operator/quantization/mkldnn/mkldnn_quantized_conv.cc:41: Check failed: in_data[0].dtype() == mshadow::kUint8 (5 vs. 3) : mkldnn_quantized_conv op only supports uint8 as input type
Stack trace: ...

The dtype is uint8 i dont understand, how this is working. When any layer is excluded, how is it supposed to compute in mkldnn int8 when it is fp32 ?

@djaym7
Copy link
Author

djaym7 commented Nov 19, 2019

@pengzhao-intel , @xinyu-intel help ?

@Jerryzcn
Copy link
Contributor

are u trying to run it on GPU? I think it only works on CPU

@djaym7
Copy link
Author

djaym7 commented Nov 20, 2019

Net is Running on gpu p3.16x, mod and mody are running on cpu. Here is more better looking formatted code:

Untitled (1).py.txt

@djaym7
Copy link
Author

djaym7 commented Nov 20, 2019

UPDATE: Tried adding concat and reshape layers to excluded symbol names and still the same output.

@pengzhao-intel
Copy link

@wuxun-zhang to take a look at this issue.
@Jerryzcn feel free to ping us in wechat, we don't track GluonCV issue frequently.

@pengzhao-intel
Copy link

@djaym7 could you try CPU device in C5.12xlarge? Currently, the GPU quantization solution doesn't work very well.

@xinyu-intel
Copy link
Member

@djaym7 sorry for late response. GPU quantization solution doesn't work very well and you can try our CPU quantization solution. You can try my open PR #1004 for yolov3 quantization. If you want to quantize a model manually, you can refer https://mxnet.incubator.apache.org/api/python/docs/tutorials/performance/backend/mkldnn/mkldnn_quantization

@djaym7
Copy link
Author

djaym7 commented Nov 20, 2019

MxNet version 1.4.1cu101mkl, 1.5.1mkl and 1.5.0mkl tried all three but cant find quantized_net.
Copied quantization.py containing quantized_net from https://mxnet.apache.org/api/python/docs/_modules/mxnet/contrib/quantization.html#quantize_net

and everytime the kernel dies. Is there a specific git commit for mxnet that you are running for testing #1004 @xinyu-intel ?

@xinyu-intel
Copy link
Member

@djaym7 pip install --pre --upgrade xxx

@djaym7
Copy link
Author

djaym7 commented Nov 21, 2019

dtype='auto' is not changing any dtype in params,
dtype='int8' kernel dies
dtype='uint8' doesnt change any dtype in params and file size remains same -- check the image

image

@xinyu-intel
Copy link
Member

@djaym7 yes, params will still be saved in fp32 format and will be quantized in the first iteration and then cached in the memory. So, to find whether you are running into int8 kernel, you can !export MKLDNN_VERBOSE=1 before running inference and check if there come int8 conv kernels.
BTW, why int8 dtype dies?

@djaym7
Copy link
Author

djaym7 commented Nov 21, 2019

The int8 problem was solved by reducing the batch size, (probably getting out of memory, though auto worked and uint8 worked )

When trying to run a forward pass, it gives the following error:
ValueError: The argument structure of HybridBlock does not match the cached version. Stored format = [0], input format = [0, 0]

And when trying to run forward mode in quantize_net through calib='naive' it throws:
ValueError: You created Module with Module(..., data_names=['data', 'data0']) but input with name 'data' is not found in symbol.list_arguments(). Did you mean one of: .. (both the errors shown in image below)

image

image

@xinyu-intel
Copy link
Member

@djaym7 Have you tried my patch?

@djaym7
Copy link
Author

djaym7 commented Nov 26, 2019

1.Tried the yolo3.py from the path and int8 works 👍 (but uint8 kernel dies even with batch size =1).

  1. I still dont see after inference the dtype (as you said they will be cached in the memory) changing. Is there any way to save params in int8 format to reduce size and memory footprint ? (the reason i want to quantize )

@xinyu-intel
Copy link
Member

export MKLDNN_VERBOSE=1 before launch inference to see if it run into int8 kernels.

@djaym7
Copy link
Author

djaym7 commented Nov 26, 2019

Yes did the export after your initial comment.. but cant see any logs/verbose
image

@djaym7
Copy link
Author

djaym7 commented Dec 4, 2019

xinyu-intel is there anything i am doing wrong here ?

@xinyu-intel
Copy link
Member

@djaym7 you should use %env MKLDNN_VERBOSE=1 instead of ! ... in jupyter notebook.

@xinyu-intel
Copy link
Member

@djaym7 any updates here?

@djaym7
Copy link
Author

djaym7 commented Dec 10, 2019

Update: ran it in console as .py and shows the verbose.
IN: f32, Out:f32 -- and the values are float... and params are still same size. So not running into in8 @xinyu-intel
image

@xinyu-intel
Copy link
Member

@djaym7 I saw s8u8s32 convolution in your verbose... So it run exactly into int8. Considering the param size, currently design is to convert convolution params to int8 during the first inference loop and cached in memory and fullyconnect params will be saved as int8 directly.

@djaym7
Copy link
Author

djaym7 commented Dec 11, 2019

@xinyu-intel fully connect params (fully connected layers ?)?
Running in int8 makes it run faster but if the device doesnt have enough memory to load fp32 params in memory it is useless then.. isnt that a bad design? Because int8 is mostly used for edge devices with low memory..

@xinyu-intel
Copy link
Member

Currently, our int8 solution target for cloud computing platform which may have enough memory. We may evaluate and re-design this part later...

@djaym7
Copy link
Author

djaym7 commented Dec 11, 2019

Thanks for the info, is there a way the older method quantized_model can be used (which didnt have exclude operators) for this because it saved params in int8 ? @xinyu-intel

@djaym7
Copy link
Author

djaym7 commented Dec 12, 2019

Is there a way i can add a feature request for saving int8 params ?

@xinyu-intel
Copy link
Member

Yes, you can request this feature through MXNet issue. We are planning this feature recently.

@djaym7
Copy link
Author

djaym7 commented Dec 12, 2019

Awesome, thanks.

@djaym7 djaym7 closed this as completed Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants