-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantization : Converting Yolo3 from fp32 to int8 : output always -1 #1046
Comments
For mask_rcnn_resnet50_v1b_coco , qsym, qarg_params, qaux_params = quantize_model(sym=sym, arg_params=arg_params, aux_params=aux_params, data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0]) mod = mx.mod.Module.load("instance-int8",0) mod.bind(for_training=False, data_shapes=[('data', (1,3,608,608))], mod.load_params('instance-int8-0000.params') from collections import namedtuple OUT: MXNetError: [01:36:19] src/operator/quantization/mkldnn/mkldnn_quantized_conv.cc:41: Check failed: in_data[0].dtype() == mshadow::kUint8 (5 vs. 3) : mkldnn_quantized_conv op only supports uint8 as input type The dtype is uint8 i dont understand, how this is working. When any layer is excluded, how is it supposed to compute in mkldnn int8 when it is fp32 ? |
@pengzhao-intel , @xinyu-intel help ? |
are u trying to run it on GPU? I think it only works on CPU |
Net is Running on gpu p3.16x, mod and mody are running on cpu. Here is more better looking formatted code: |
UPDATE: Tried adding concat and reshape layers to excluded symbol names and still the same output. |
@wuxun-zhang to take a look at this issue. |
@djaym7 could you try CPU device in C5.12xlarge? Currently, the GPU quantization solution doesn't work very well. |
@djaym7 sorry for late response. GPU quantization solution doesn't work very well and you can try our CPU quantization solution. You can try my open PR #1004 for yolov3 quantization. If you want to quantize a model manually, you can refer https://mxnet.incubator.apache.org/api/python/docs/tutorials/performance/backend/mkldnn/mkldnn_quantization |
MxNet version 1.4.1cu101mkl, 1.5.1mkl and 1.5.0mkl tried all three but cant find quantized_net. and everytime the kernel dies. Is there a specific git commit for mxnet that you are running for testing #1004 @xinyu-intel ? |
@djaym7 pip install --pre --upgrade xxx |
@djaym7 yes, params will still be saved in fp32 format and will be quantized in the first iteration and then cached in the memory. So, to find whether you are running into int8 kernel, you can |
The int8 problem was solved by reducing the batch size, (probably getting out of memory, though auto worked and uint8 worked ) When trying to run a forward pass, it gives the following error: And when trying to run forward mode in quantize_net through calib='naive' it throws: |
1.Tried the yolo3.py from the path and int8 works 👍 (but uint8 kernel dies even with batch size =1).
|
|
xinyu-intel is there anything i am doing wrong here ? |
@djaym7 you should use |
@djaym7 any updates here? |
Update: ran it in console as .py and shows the verbose. |
@djaym7 I saw s8u8s32 convolution in your verbose... So it run exactly into int8. Considering the param size, currently design is to convert convolution params to int8 during the first inference loop and cached in memory and fullyconnect params will be saved as int8 directly. |
@xinyu-intel fully connect params (fully connected layers ?)? |
Currently, our int8 solution target for cloud computing platform which may have enough memory. We may evaluate and re-design this part later... |
Thanks for the info, is there a way the older method quantized_model can be used (which didnt have exclude operators) for this because it saved params in int8 ? @xinyu-intel |
Is there a way i can add a feature request for saving int8 params ? |
Yes, you can request this feature through MXNet issue. We are planning this feature recently. |
Awesome, thanks. |
import mxnet as mx
import gluoncv as gcv
from mxnet.contrib.quantization import quantize_model #,quantize_net
import logging
net = gcv.model_zoo.yolo3_darknet53_voc(pretrained=True,ctx=mx.gpu())
ctx = [mx.gpu(i) for i in range(mx.context.num_gpus())]
net.hybridize()
_=net(mx.nd.random.randn(1,3,608,608,ctx=mx.gpu()))
net.export('yolo')
def save_symbol(fname, sym, logger=None):
if logger is not None:
logger.info('Saving symbol into file at %s' % fname)
sym.save(fname)
def save_params(fname, arg_params, aux_params, logger=None):
if logger is not None:
logger.info('Saving params into file at %s' % fname)
save_dict = {('arg:%s' % k): v.as_in_context(mx.gpu()) for k, v in arg_params.items()}
save_dict.update({('aux:%s' % k): v.as_in_context(mx.gpu()) for k, v in aux_params.items()})
mx.nd.save(fname, save_dict)
sym, arg_params, aux_params = mx.model.load_checkpoint('yolo',0)
qsym, qarg_params, qaux_params = quantize_model(sym=sym, arg_params=arg_params, aux_params=aux_params,
ctx=mx.gpu(), excluded_sym_names=['yolov30_yolooutputv32_conv0_fwd',
'yolov30_yolooutputv31_conv0_fwd',
'darknetv30_conv0_fwd',
'yolov30_yolooutputv30_conv0_fwd'],
calib_mode=None, quantized_dtype='uint8',
logger=logging)
save_symbol('yolo-int8-symbol.json',qsym)
save_params('yolo-int8-0000.params',qarg_params,aux_params)
#####################################################
--The results are completely different for net(x) and mod.forward(x)--
Check the output for :
data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0])
mod = mx.mod.Module.load("yolo-int8",0)
mod.bind(for_training=False, data_shapes=[('data', (1,3,608,608))],
label_shapes=mod._label_shapes)
mod.load_params('yolo-int8-0000.params')
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])
mod.forward(Batch([data1]))
mod.get_outputs()
data1 = mx.nd.random.randn(1,3,608,608,ctx = ctx[0])
mod.forward(Batch([data1]))
mody.forward(Batch([data1]))
#mody is same as mod but for fp32 yolo which gives same output as net(x) which is expected
net(data1)[0][0][:5],mod.get_outputs()[0][0][:5],mody.get_outputs()[0][0][:5]
OUT:
(
[[19.]
[-1.]
[-1.]
[-1.]
[-1.]]
<NDArray 5x1 @gpu(0)>,
[[-1.]
[-1.]
[-1.]
[-1.]
[-1.]]
<NDArray 5x1 @cpu(0)>,
[[19.]
[-1.]
[-1.]
[-1.]
[-1.]]
<NDArray 5x1 @cpu(0)>)
Quantized model always gives -1s in classes. Also, is there any source where i can learn on how to convert this manually ? (sick of waiting for small number of devs working on this and also if there is any slack channel please add me i work at Amazon and have so many issues with mxnet/onnx/quantization/tensorrt/...)
The text was updated successfully, but these errors were encountered: