MXNet 1.5.0 is slower than 1.3.0 when intputs are variant #13928

wkcn · 2019-01-18T14:58:14Z

Description

Hi! I have an experiment about Object Counting, which needs variant inputs.
I write the code with Gluon, and hybridize the model with static_alloc=True
I found there is obvious difference between MXNet 1.5.0 and MXNet 1.3.0, and I checked it on two servers.

~~I think the method of memory allocation for Gluon may be changed after MXNet 1.3.0.~~

Thanks!

Update:
When there are dilated Convolutional layers in the model, and the input size is variational, the performance will drop.
I think it may be related to one of the two PRs: #11742 #12722

Environment info (Required)

OS: ubuntu 14.04
GPU: Tesla M40 x 4

Minimum reproducible example

I write a minimum reproducible example without dataset.
Code

Performance for test code [a fully convolutional model(vgg16 without FC layer), variant inputs]:
MXNet 1.5.0: 10 images / sec
MXNet 1.3.0: 40+ images / sec

The performances are the same when input shape is fixed.

Input shape: (9, 3, 300~512, 300~512) in NCHW order

Package used (Python/R/Scala/Julia):
Python 2.7.12, 3.7.1

MXNet is installed by pip:

# MXNet 1.5.0
pip install mxnet-cu80 --pre
# MXNet 1.3.0
pip install mxnet-cu80==1.3.0

Steps to reproduce

Download the test code.
Run the test code in different version (1.3.0 and 1.5.0) of MXNet.

Performance

I test several versions of MXNet.

version	performance
1.4.0b20181207	slow
1.3.1b20181101	slow
1.3.1b20181010	slow
1.3.1b20181004	fast
1.3.1b20181001	fast

Some pre-build versions don't support CUDA9.0, so I cound't test it.
The performance drops during 20181004 to 20181010.

If changing the dilation of dilated conv to 1, the performance will be normal.
It seems the problem occurs in dilated conv.

The text was updated successfully, but these errors were encountered:

piyushghai · 2019-01-18T17:52:54Z

@wkcn Thanks for raising this issue. The performance degradation is indeed concerning.
I'm labelling it so that the other community members can have a look at it.

@mxnet-label-bot Add [Gluon, Performance]

@szha Any thoughts here ?

zhreshold · 2019-01-18T19:10:17Z

@wkcn
Performance:
MXNet 1.5.0: 20 images / sec
MXNet 1.3.0: 70+ images / sec

What are these numbers specifically? Training speed for Faster-RCNN? If so, what is the network?

adaaaaaa · 2019-01-18T19:39:28Z

what is the different between 1.3.0 and 1.5.0 in memory allocation?

wkcn · 2019-01-18T23:14:25Z

@piyushghai Thanks.
@zhreshold In my experiment, it's a fully convolutional network model(vgg16 without FC layers), whose inputs are variant.
The performance I show is a fully convolutional network, not faster r-cnn model.
I guess that the performance of faster r-cnn is also dropped in MXNrt 1.5.0.
I will check the performance of faster r-cnn, or write a minimum reproduce example.

wkcn · 2019-01-19T00:10:19Z

@adaaaaaa I don't know. I found the speeds are the same between two versions when input shapes are fixed.
In my code, I call 'hybridize()' first, then call 'hybridize(static_alloc=True)'.

szha · 2019-01-19T02:14:43Z

what are the typical input sizes?

wkcn · 2019-01-19T02:22:15Z

@szha
In my experiment, the input size is (9,3,300 to 512,300 to 512), 9 is the batch size and 3 is the number of channels.
I will write a minimum reproduce example later.

wkcn · 2019-01-19T13:13:31Z

@zhreshold @szha
Hello! I have written a minimum reproducible example which doesn't need dataset.
Code

I test it on a machine which owns Tesla M40 (22945MiB) x 4.
Here is the result:
MXNet 1.5.0: 10 images / sec
MXNet 1.3.0: 40+ images / sec

MXNet is installed by pip install mxnet-cu90 --pre or pip install mxnet-cu90==1.3.0

I test several versions of MXNet.

version	performance
1.4.0b20181207	slow
1.3.1b20181101	slow
1.3.1b20181010	slow
1.3.1b20181004	fast
1.3.1b20181001	fast

Some pre-build versions don't support CUDA9.0, so I cound't test it.
The performance drops during 20181004 to 20181010.

zhreshold · 2019-01-22T23:16:14Z

@wkcn I've tested it using V100 x4
there's no visual difference between 1.3.1 release, 1.4.0b20181207 and 1.5.0b20190122 nightly, both around 140(+-20) images/sec

Actually I tested 1.3.1b20181001, it is slower (120+-20 images/sec on average) than any of the previous three builds. In summary, my experimental results are reversed version of @wkcn 's results.

wkcn · 2019-01-22T23:35:53Z

@zhreshold Thank you！

It’s flaky.
I test it on the server with Ubuntu 14.04, Tesla M40(24G) x 4, CUDA 9.0.
When I remove all dilated convolutions （the dilation of convolution is greater than 1），there will be no obvious difference between MXNet 1.3 and 1.5

wkcn · 2019-01-23T07:26:59Z

@zhreshold
I test it on the server with Ubuntu 14.04, Tesla M40(24G) x 4, CUDA 8.0 just now.
The training speed is 40+ samples/sec.

I think the performance drops because of driver rather than MXNet.
The CUDA 9.0 driver installed on the server is not matched with latest MXNet.

zhreshold · 2019-01-23T19:09:23Z

@wkcn

THanks for the update, can be resolve this issue?

wkcn · 2019-01-23T22:17:24Z

@zhreshold solved. Thank you！

mikeobr · 2019-03-18T15:57:45Z

@wkcn

The CUDA 9.0 driver installed on the server is not matched with latest MXNet.
What exactly did you check to diagnose this?

I'm currently seeing some of my inference stuff slowing down a lot on mxnet versions over 1.3.1 with cuda 9.2 (run on a docker container), but I do not know how to check if it is the same thing you ran into.

wkcn · 2019-03-18T16:15:25Z

@mikeobr You can run this code:
https://gist.githubusercontent.com/wkcn/69f0f6d2ca467816dc481a00c225104f/raw/2899896f42a920ff0fde5ff93b9a16d16aec507f/test_fcn_for_mxnet.py

It seems that the performance of dilated Convolutional layer drops in CUDA 9.

PapaMadeleine2022 · 2019-03-25T04:07:50Z

Hello, I have a problem:
I use libmxnet.so compiled with mxnetv0.7 comparison to mxnetv1.0(or v1.3 and v1.4) to run my code to infer a batch images, I find the inference speed with mxnet of higher version is slower than mxnetv0.7. What causes this problem? How to fix it? anyone can give some advises?

envs: p40/cuda8/cudnn5.1.10/nvidia-driver384.81

wkcn · 2019-03-25T04:59:01Z

@IvyGongoogle Is there any dilated convolutional layers in your model?

vc384 · 2019-03-25T05:09:46Z

I met the same problem, I have a project with dilation convolution (resnet backbone). If I use a mxnet1.3.1-cu80 (pip install), the speed is 0.18-0.19s one iter. However, when I switch to mxnet1.4.0-cute80(pip install), the speed drop to 0.19-0.20s one iter. The speed drop slightly, I confused with this problem.
OS: Ubuntu 16.04
Driver: 384.130
CUDA: 8.0
cudnn: maybe 7.4.1 or 6.0.21

wkcn · 2019-03-25T07:42:46Z

Could anyone try MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION=1 python test.py ?
There are some PRs which may be related to the issue:
#11742 #12722

PapaMadeleine2022 · 2019-03-25T08:29:52Z

@wkcn no dilated convolutional layers in my model which is a ocr recognition model with simple cnn and rnn

chinakook · 2019-03-27T03:44:50Z

Based on experience, you should use newer version of CUDA and CUDNN to get more performance. In my opinion, cuda 8.0 is obsoleted.
ps: Dilated convolution is not optimized in the old CUDNN version( < 6.0 or maybe 6.5 ) .

wkcn · 2019-04-20T07:44:13Z

Close it since dilated convolution is not optimized in the old version of CUDNN as chinakook said.

marcoabreu added Gluon Performance labels Jan 18, 2019

wkcn closed this as completed Jan 23, 2019

wkcn reopened this Mar 25, 2019

wkcn added CUDA Gluon and removed Gluon labels Mar 25, 2019

wkcn mentioned this issue Mar 25, 2019

the inference speed using C++ API with mxnet of higher version is slower than lower mxnet #14512

Open

apeforest mentioned this issue Mar 29, 2019

add a compiler flag to use int64 as tensor size #14570

Merged

7 tasks

wkcn closed this as completed Apr 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MXNet 1.5.0 is slower than 1.3.0 when intputs are variant #13928

MXNet 1.5.0 is slower than 1.3.0 when intputs are variant #13928

wkcn commented Jan 18, 2019 •

edited

Loading

piyushghai commented Jan 18, 2019

zhreshold commented Jan 18, 2019

adaaaaaa commented Jan 18, 2019 •

edited

Loading

wkcn commented Jan 18, 2019 •

edited

Loading

wkcn commented Jan 19, 2019

szha commented Jan 19, 2019

wkcn commented Jan 19, 2019

wkcn commented Jan 19, 2019 •

edited

Loading

zhreshold commented Jan 22, 2019

wkcn commented Jan 22, 2019

wkcn commented Jan 23, 2019

zhreshold commented Jan 23, 2019

wkcn commented Jan 23, 2019

mikeobr commented Mar 18, 2019

wkcn commented Mar 18, 2019

PapaMadeleine2022 commented Mar 25, 2019 •

edited

Loading

wkcn commented Mar 25, 2019

vc384 commented Mar 25, 2019 •

edited

Loading

wkcn commented Mar 25, 2019 •

edited

Loading

PapaMadeleine2022 commented Mar 25, 2019 •

edited

Loading

chinakook commented Mar 27, 2019 •

edited

Loading

wkcn commented Apr 20, 2019

MXNet 1.5.0 is slower than 1.3.0 when intputs are variant #13928

MXNet 1.5.0 is slower than 1.3.0 when intputs are variant #13928

Comments

wkcn commented Jan 18, 2019 • edited Loading

Description

Environment info (Required)

Minimum reproducible example

Steps to reproduce

Performance

piyushghai commented Jan 18, 2019

zhreshold commented Jan 18, 2019

adaaaaaa commented Jan 18, 2019 • edited Loading

wkcn commented Jan 18, 2019 • edited Loading

wkcn commented Jan 19, 2019

szha commented Jan 19, 2019

wkcn commented Jan 19, 2019

wkcn commented Jan 19, 2019 • edited Loading

zhreshold commented Jan 22, 2019

wkcn commented Jan 22, 2019

wkcn commented Jan 23, 2019

zhreshold commented Jan 23, 2019

wkcn commented Jan 23, 2019

mikeobr commented Mar 18, 2019

wkcn commented Mar 18, 2019

PapaMadeleine2022 commented Mar 25, 2019 • edited Loading

wkcn commented Mar 25, 2019

vc384 commented Mar 25, 2019 • edited Loading

wkcn commented Mar 25, 2019 • edited Loading

PapaMadeleine2022 commented Mar 25, 2019 • edited Loading

chinakook commented Mar 27, 2019 • edited Loading

wkcn commented Apr 20, 2019

wkcn commented Jan 18, 2019 •

edited

Loading

adaaaaaa commented Jan 18, 2019 •

edited

Loading

wkcn commented Jan 18, 2019 •

edited

Loading

wkcn commented Jan 19, 2019 •

edited

Loading

PapaMadeleine2022 commented Mar 25, 2019 •

edited

Loading

vc384 commented Mar 25, 2019 •

edited

Loading

wkcn commented Mar 25, 2019 •

edited

Loading

PapaMadeleine2022 commented Mar 25, 2019 •

edited

Loading

chinakook commented Mar 27, 2019 •

edited

Loading