Quantization : Converting Yolo3 from fp32 to int8 : output always -1 #1046

djaym7 · 2019-11-19T01:23:01Z

import mxnet as mx
import gluoncv as gcv
from mxnet.contrib.quantization import quantize_model #,quantize_net
import logging
net = gcv.model_zoo.yolo3_darknet53_voc(pretrained=True,ctx=mx.gpu())

ctx = [mx.gpu(i) for i in range(mx.context.num_gpus())]
net.hybridize()
_=net(mx.nd.random.randn(1,3,608,608,ctx=mx.gpu()))
net.export('yolo')

def save_symbol(fname, sym, logger=None):
if logger is not None:
logger.info('Saving symbol into file at %s' % fname)
sym.save(fname)

def save_params(fname, arg_params, aux_params, logger=None):
if logger is not None:
logger.info('Saving params into file at %s' % fname)
save_dict = {('arg:%s' % k): v.as_in_context(mx.gpu()) for k, v in arg_params.items()}
save_dict.update({('aux:%s' % k): v.as_in_context(mx.gpu()) for k, v in aux_params.items()})
mx.nd.save(fname, save_dict)

sym, arg_params, aux_params = mx.model.load_checkpoint('yolo',0)

qsym, qarg_params, qaux_params = quantize_model(sym=sym, arg_params=arg_params, aux_params=aux_params,
ctx=mx.gpu(), excluded_sym_names=['yolov30_yolooutputv32_conv0_fwd',
'yolov30_yolooutputv31_conv0_fwd',
'darknetv30_conv0_fwd',
'yolov30_yolooutputv30_conv0_fwd'],
calib_mode=None, quantized_dtype='uint8',
logger=logging)
save_symbol('yolo-int8-symbol.json',qsym)
save_params('yolo-int8-0000.params',qarg_params,aux_params)

#####################################################
--The results are completely different for net(x) and mod.forward(x)--
Check the output for :
data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0])
mod = mx.mod.Module.load("yolo-int8",0)
mod.bind(for_training=False, data_shapes=[('data', (1,3,608,608))],
label_shapes=mod._label_shapes)
mod.load_params('yolo-int8-0000.params')
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])
mod.forward(Batch([data1]))
mod.get_outputs()
data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0])
mod.forward(Batch([data1]))
mody.forward(Batch([data1]))

#mody is same as mod but for fp32 yolo which gives same output as net(x) which is expected
net(data1)[0][0][:5],mod.get_outputs()[0][0][:5],mody.get_outputs()[0][0][:5]

OUT:
(
[[19.]
[-1.]
[-1.]
[-1.]
[-1.]]
<NDArray 5x1 @gpu(0)>,
[[-1.]
[-1.]
[-1.]
[-1.]
[-1.]]
<NDArray 5x1 @cpu(0)>,
[[19.]
[-1.]
[-1.]
[-1.]
[-1.]]
<NDArray 5x1 @cpu(0)>)

Quantized model always gives -1s in classes. Also, is there any source where i can learn on how to convert this manually ? (sick of waiting for small number of devs working on this and also if there is any slack channel please add me i work at Amazon and have so many issues with mxnet/onnx/quantization/tensorrt/...)

djaym7 · 2019-11-19T01:39:40Z

For mask_rcnn_resnet50_v1b_coco ,
#instance
sym, arg_params, aux_params = mx.model.load_checkpoint('instance',0)

qsym, qarg_params, qaux_params = quantize_model(sym=sym, arg_params=arg_params, aux_params=aux_params,
ctx=mx.gpu(), excluded_sym_names=['resnetv1b_conv0_fwd',
#'maskrcnn1_rpn0_conv1_fwd',
'maskrcnn0_rpn0_conv1_fwd'
],
calib_mode=None, quantized_dtype='uint8',
logger=logging)
save_symbol('instance-int8-symbol.json',qsym)
save_params('instance-int8-0000.params',qarg_params,aux_params)

data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0])

mod = mx.mod.Module.load("instance-int8",0)

mod.bind(for_training=False, data_shapes=[('data', (1,3,608,608))],
label_shapes=mod._label_shapes)

mod.load_params('instance-int8-0000.params')

from collections import namedtuple
Batch = namedtuple('Batch', ['data'])
mod.forward(Batch([data1]))
mod.get_outputs()

OUT: MXNetError: [01:36:19] src/operator/quantization/mkldnn/mkldnn_quantized_conv.cc:41: Check failed: in_data[0].dtype() == mshadow::kUint8 (5 vs. 3) : mkldnn_quantized_conv op only supports uint8 as input type
Stack trace: ...

The dtype is uint8 i dont understand, how this is working. When any layer is excluded, how is it supposed to compute in mkldnn int8 when it is fp32 ?

djaym7 · 2019-11-19T18:35:46Z

@pengzhao-intel , @xinyu-intel help ?

Jerryzcn · 2019-11-19T23:03:42Z

are u trying to run it on GPU? I think it only works on CPU

djaym7 · 2019-11-20T00:02:32Z

Net is Running on gpu p3.16x, mod and mody are running on cpu. Here is more better looking formatted code:

Untitled (1).py.txt

djaym7 · 2019-11-20T00:21:37Z

UPDATE: Tried adding concat and reshape layers to excluded symbol names and still the same output.

pengzhao-intel · 2019-11-20T00:59:54Z

@wuxun-zhang to take a look at this issue.
@Jerryzcn feel free to ping us in wechat, we don't track GluonCV issue frequently.

pengzhao-intel · 2019-11-20T01:11:41Z

@djaym7 could you try CPU device in C5.12xlarge? Currently, the GPU quantization solution doesn't work very well.

xinyu-intel · 2019-11-20T03:27:27Z

@djaym7 sorry for late response. GPU quantization solution doesn't work very well and you can try our CPU quantization solution. You can try my open PR #1004 for yolov3 quantization. If you want to quantize a model manually, you can refer https://mxnet.incubator.apache.org/api/python/docs/tutorials/performance/backend/mkldnn/mkldnn_quantization

djaym7 · 2019-11-20T18:35:11Z

MxNet version 1.4.1cu101mkl, 1.5.1mkl and 1.5.0mkl tried all three but cant find quantized_net.
Copied quantization.py containing quantized_net from https://mxnet.apache.org/api/python/docs/_modules/mxnet/contrib/quantization.html#quantize_net

and everytime the kernel dies. Is there a specific git commit for mxnet that you are running for testing #1004 @xinyu-intel ?

xinyu-intel · 2019-11-21T00:49:39Z

@djaym7 pip install --pre --upgrade xxx

djaym7 · 2019-11-21T01:37:01Z

dtype='auto' is not changing any dtype in params,
dtype='int8' kernel dies
dtype='uint8' doesnt change any dtype in params and file size remains same -- check the image

xinyu-intel · 2019-11-21T15:53:15Z

@djaym7 yes, params will still be saved in fp32 format and will be quantized in the first iteration and then cached in the memory. So, to find whether you are running into int8 kernel, you can !export MKLDNN_VERBOSE=1 before running inference and check if there come int8 conv kernels.
BTW, why int8 dtype dies?

djaym7 · 2019-11-21T18:36:24Z

The int8 problem was solved by reducing the batch size, (probably getting out of memory, though auto worked and uint8 worked )

When trying to run a forward pass, it gives the following error:
ValueError: The argument structure of HybridBlock does not match the cached version. Stored format = [0], input format = [0, 0]

And when trying to run forward mode in quantize_net through calib='naive' it throws:
ValueError: You created Module with Module(..., data_names=['data', 'data0']) but input with name 'data' is not found in symbol.list_arguments(). Did you mean one of: .. (both the errors shown in image below)

xinyu-intel · 2019-11-25T01:25:20Z

@djaym7 Have you tried my patch?

djaym7 · 2019-11-26T02:35:38Z

1.Tried the yolo3.py from the path and int8 works 👍 (but uint8 kernel dies even with batch size =1).

I still dont see after inference the dtype (as you said they will be cached in the memory) changing. Is there any way to save params in int8 format to reduce size and memory footprint ? (the reason i want to quantize )

xinyu-intel · 2019-11-26T06:06:26Z

export MKLDNN_VERBOSE=1 before launch inference to see if it run into int8 kernels.

djaym7 · 2019-11-26T18:32:49Z

Yes did the export after your initial comment.. but cant see any logs/verbose

djaym7 · 2019-12-04T18:43:52Z

xinyu-intel is there anything i am doing wrong here ?

xinyu-intel · 2019-12-05T01:46:23Z

@djaym7 you should use %env MKLDNN_VERBOSE=1 instead of ! ... in jupyter notebook.

xinyu-intel · 2019-12-10T04:27:54Z

@djaym7 any updates here?

djaym7 · 2019-12-10T19:00:58Z

Update: ran it in console as .py and shows the verbose.
IN: f32, Out:f32 -- and the values are float... and params are still same size. So not running into in8 @xinyu-intel

xinyu-intel · 2019-12-10T23:48:42Z

@djaym7 I saw s8u8s32 convolution in your verbose... So it run exactly into int8. Considering the param size, currently design is to convert convolution params to int8 during the first inference loop and cached in memory and fullyconnect params will be saved as int8 directly.

djaym7 · 2019-12-11T00:39:00Z

@xinyu-intel fully connect params (fully connected layers ?)?
Running in int8 makes it run faster but if the device doesnt have enough memory to load fp32 params in memory it is useless then.. isnt that a bad design? Because int8 is mostly used for edge devices with low memory..

xinyu-intel · 2019-12-11T02:30:24Z

Currently, our int8 solution target for cloud computing platform which may have enough memory. We may evaluate and re-design this part later...

djaym7 · 2019-12-11T19:26:05Z

Thanks for the info, is there a way the older method quantized_model can be used (which didnt have exclude operators) for this because it saved params in int8 ? @xinyu-intel

djaym7 · 2019-12-12T00:46:35Z

Is there a way i can add a feature request for saving int8 params ?

xinyu-intel · 2019-12-12T01:02:03Z

Yes, you can request this feature through MXNet issue. We are planning this feature recently.

djaym7 · 2019-12-12T06:51:30Z

Awesome, thanks.

djaym7 mentioned this issue Dec 12, 2019

Save quantized params in int8 apache/mxnet#17053

Open

djaym7 closed this as completed Dec 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization : Converting Yolo3 from fp32 to int8 : output always -1 #1046

Quantization : Converting Yolo3 from fp32 to int8 : output always -1 #1046

djaym7 commented Nov 19, 2019 •

edited

Loading

djaym7 commented Nov 19, 2019

djaym7 commented Nov 19, 2019 •

edited

Loading

Jerryzcn commented Nov 19, 2019

djaym7 commented Nov 20, 2019 •

edited

Loading

djaym7 commented Nov 20, 2019

pengzhao-intel commented Nov 20, 2019

pengzhao-intel commented Nov 20, 2019

xinyu-intel commented Nov 20, 2019

djaym7 commented Nov 20, 2019

xinyu-intel commented Nov 21, 2019

djaym7 commented Nov 21, 2019

xinyu-intel commented Nov 21, 2019

djaym7 commented Nov 21, 2019

xinyu-intel commented Nov 25, 2019

djaym7 commented Nov 26, 2019 •

edited

Loading

xinyu-intel commented Nov 26, 2019

djaym7 commented Nov 26, 2019 •

edited

Loading

djaym7 commented Dec 4, 2019

xinyu-intel commented Dec 5, 2019

xinyu-intel commented Dec 10, 2019

djaym7 commented Dec 10, 2019 •

edited

Loading

xinyu-intel commented Dec 10, 2019

djaym7 commented Dec 11, 2019 •

edited

Loading

xinyu-intel commented Dec 11, 2019

djaym7 commented Dec 11, 2019

djaym7 commented Dec 12, 2019

xinyu-intel commented Dec 12, 2019

djaym7 commented Dec 12, 2019

Quantization : Converting Yolo3 from fp32 to int8 : output always -1 #1046

Quantization : Converting Yolo3 from fp32 to int8 : output always -1 #1046

Comments

djaym7 commented Nov 19, 2019 • edited Loading

djaym7 commented Nov 19, 2019

djaym7 commented Nov 19, 2019 • edited Loading

Jerryzcn commented Nov 19, 2019

djaym7 commented Nov 20, 2019 • edited Loading

djaym7 commented Nov 20, 2019

pengzhao-intel commented Nov 20, 2019

pengzhao-intel commented Nov 20, 2019

xinyu-intel commented Nov 20, 2019

djaym7 commented Nov 20, 2019

xinyu-intel commented Nov 21, 2019

djaym7 commented Nov 21, 2019

xinyu-intel commented Nov 21, 2019

djaym7 commented Nov 21, 2019

xinyu-intel commented Nov 25, 2019

djaym7 commented Nov 26, 2019 • edited Loading

xinyu-intel commented Nov 26, 2019

djaym7 commented Nov 26, 2019 • edited Loading

djaym7 commented Dec 4, 2019

xinyu-intel commented Dec 5, 2019

xinyu-intel commented Dec 10, 2019

djaym7 commented Dec 10, 2019 • edited Loading

xinyu-intel commented Dec 10, 2019

djaym7 commented Dec 11, 2019 • edited Loading

xinyu-intel commented Dec 11, 2019

djaym7 commented Dec 11, 2019

djaym7 commented Dec 12, 2019

xinyu-intel commented Dec 12, 2019

djaym7 commented Dec 12, 2019

djaym7 commented Nov 19, 2019 •

edited

Loading

djaym7 commented Nov 19, 2019 •

edited

Loading

djaym7 commented Nov 20, 2019 •

edited

Loading

djaym7 commented Nov 26, 2019 •

edited

Loading

djaym7 commented Nov 26, 2019 •

edited

Loading

djaym7 commented Dec 10, 2019 •

edited

Loading

djaym7 commented Dec 11, 2019 •

edited

Loading