Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MXNet 1.5.0 is slower than 1.3.0 when intputs are variant #13928

Closed
wkcn opened this issue Jan 18, 2019 · 22 comments
Closed

MXNet 1.5.0 is slower than 1.3.0 when intputs are variant #13928

wkcn opened this issue Jan 18, 2019 · 22 comments

Comments

@wkcn
Copy link
Member

wkcn commented Jan 18, 2019

Description

Hi! I have an experiment about Object Counting, which needs variant inputs.
I write the code with Gluon, and hybridize the model with static_alloc=True
I found there is obvious difference between MXNet 1.5.0 and MXNet 1.3.0, and I checked it on two servers.

I think the method of memory allocation for Gluon may be changed after MXNet 1.3.0.

Thanks!

Update:
When there are dilated Convolutional layers in the model, and the input size is variational, the performance will drop.
I think it may be related to one of the two PRs: #11742 #12722

Environment info (Required)

OS: ubuntu 14.04
GPU: Tesla M40 x 4

Minimum reproducible example

I write a minimum reproducible example without dataset.
Code

  • Performance for test code [a fully convolutional model(vgg16 without FC layer), variant inputs]:
    MXNet 1.5.0: 10 images / sec
    MXNet 1.3.0: 40+ images / sec

The performances are the same when input shape is fixed.

Input shape: (9, 3, 300~512, 300~512) in NCHW order

Package used (Python/R/Scala/Julia):
Python 2.7.12, 3.7.1

MXNet is installed by pip:

# MXNet 1.5.0
pip install mxnet-cu80 --pre
# MXNet 1.3.0
pip install mxnet-cu80==1.3.0

Steps to reproduce

Download the test code.
Run the test code in different version (1.3.0 and 1.5.0) of MXNet.

Performance

I test several versions of MXNet.

version performance
1.4.0b20181207 slow
1.3.1b20181101 slow
1.3.1b20181010 slow
1.3.1b20181004 fast
1.3.1b20181001 fast

Some pre-build versions don't support CUDA9.0, so I cound't test it.
The performance drops during 20181004 to 20181010.

If changing the dilation of dilated conv to 1, the performance will be normal.
It seems the problem occurs in dilated conv.

@piyushghai
Copy link
Contributor

@wkcn Thanks for raising this issue. The performance degradation is indeed concerning.
I'm labelling it so that the other community members can have a look at it.

@mxnet-label-bot Add [Gluon, Performance]

@szha Any thoughts here ?

@zhreshold
Copy link
Member

@wkcn
Performance:
MXNet 1.5.0: 20 images / sec
MXNet 1.3.0: 70+ images / sec

What are these numbers specifically? Training speed for Faster-RCNN? If so, what is the network?

@adaaaaaa
Copy link

adaaaaaa commented Jan 18, 2019

what is the different between 1.3.0 and 1.5.0 in memory allocation?

@wkcn
Copy link
Member Author

wkcn commented Jan 18, 2019

@piyushghai Thanks.
@zhreshold In my experiment, it's a fully convolutional network model(vgg16 without FC layers), whose inputs are variant.
The performance I show is a fully convolutional network, not faster r-cnn model.
I guess that the performance of faster r-cnn is also dropped in MXNrt 1.5.0.
I will check the performance of faster r-cnn, or write a minimum reproduce example.

@wkcn
Copy link
Member Author

wkcn commented Jan 19, 2019

@adaaaaaa I don't know. I found the speeds are the same between two versions when input shapes are fixed.
In my code, I call 'hybridize()' first, then call 'hybridize(static_alloc=True)'.

@szha
Copy link
Member

szha commented Jan 19, 2019

what are the typical input sizes?

@wkcn
Copy link
Member Author

wkcn commented Jan 19, 2019

@szha
In my experiment, the input size is (9,3,300 to 512,300 to 512), 9 is the batch size and 3 is the number of channels.
I will write a minimum reproduce example later.

@wkcn
Copy link
Member Author

wkcn commented Jan 19, 2019

@zhreshold @szha
Hello! I have written a minimum reproducible example which doesn't need dataset.
Code

I test it on a machine which owns Tesla M40 (22945MiB) x 4.
Here is the result:
MXNet 1.5.0: 10 images / sec
MXNet 1.3.0: 40+ images / sec

MXNet is installed by pip install mxnet-cu90 --pre or pip install mxnet-cu90==1.3.0

I test several versions of MXNet.

version performance
1.4.0b20181207 slow
1.3.1b20181101 slow
1.3.1b20181010 slow
1.3.1b20181004 fast
1.3.1b20181001 fast

Some pre-build versions don't support CUDA9.0, so I cound't test it.
The performance drops during 20181004 to 20181010.

@zhreshold
Copy link
Member

@wkcn I've tested it using V100 x4
there's no visual difference between 1.3.1 release, 1.4.0b20181207 and 1.5.0b20190122 nightly, both around 140(+-20) images/sec

Actually I tested 1.3.1b20181001, it is slower (120+-20 images/sec on average) than any of the previous three builds. In summary, my experimental results are reversed version of @wkcn 's results.

@wkcn
Copy link
Member Author

wkcn commented Jan 22, 2019

@zhreshold Thank you!

It’s flaky.
I test it on the server with Ubuntu 14.04, Tesla M40(24G) x 4, CUDA 9.0.
When I remove all dilated convolutions (the dilation of convolution is greater than 1),there will be no obvious difference between MXNet 1.3 and 1.5

@wkcn
Copy link
Member Author

wkcn commented Jan 23, 2019

@zhreshold
I test it on the server with Ubuntu 14.04, Tesla M40(24G) x 4, CUDA 8.0 just now.
The training speed is 40+ samples/sec.

I think the performance drops because of driver rather than MXNet.
The CUDA 9.0 driver installed on the server is not matched with latest MXNet.

@zhreshold
Copy link
Member

@wkcn

THanks for the update, can be resolve this issue?

@wkcn
Copy link
Member Author

wkcn commented Jan 23, 2019

@zhreshold solved. Thank you!

@wkcn wkcn closed this as completed Jan 23, 2019
@mikeobr
Copy link

mikeobr commented Mar 18, 2019

@wkcn

The CUDA 9.0 driver installed on the server is not matched with latest MXNet.
What exactly did you check to diagnose this?

I'm currently seeing some of my inference stuff slowing down a lot on mxnet versions over 1.3.1 with cuda 9.2 (run on a docker container), but I do not know how to check if it is the same thing you ran into.

@wkcn
Copy link
Member Author

wkcn commented Mar 18, 2019

@mikeobr You can run this code:
https://gist.githubusercontent.com/wkcn/69f0f6d2ca467816dc481a00c225104f/raw/2899896f42a920ff0fde5ff93b9a16d16aec507f/test_fcn_for_mxnet.py

It seems that the performance of dilated Convolutional layer drops in CUDA 9.

@PapaMadeleine2022
Copy link

PapaMadeleine2022 commented Mar 25, 2019

Hello, I have a problem:
I use libmxnet.so compiled with mxnetv0.7 comparison to mxnetv1.0(or v1.3 and v1.4) to run my code to infer a batch images, I find the inference speed with mxnet of higher version is slower than mxnetv0.7. What causes this problem? How to fix it? anyone can give some advises?

envs: p40/cuda8/cudnn5.1.10/nvidia-driver384.81

@wkcn
Copy link
Member Author

wkcn commented Mar 25, 2019

@IvyGongoogle Is there any dilated convolutional layers in your model?

@vc384
Copy link

vc384 commented Mar 25, 2019

I met the same problem, I have a project with dilation convolution (resnet backbone). If I use a mxnet1.3.1-cu80 (pip install), the speed is 0.18-0.19s one iter. However, when I switch to mxnet1.4.0-cute80(pip install), the speed drop to 0.19-0.20s one iter. The speed drop slightly, I confused with this problem.
OS: Ubuntu 16.04
Driver: 384.130
CUDA: 8.0
cudnn: maybe 7.4.1 or 6.0.21

@wkcn
Copy link
Member Author

wkcn commented Mar 25, 2019

Could anyone try MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION=1 python test.py ?
There are some PRs which may be related to the issue:
#11742 #12722

@PapaMadeleine2022
Copy link

PapaMadeleine2022 commented Mar 25, 2019

@wkcn no dilated convolutional layers in my model which is a ocr recognition model with simple cnn and rnn

@chinakook
Copy link
Contributor

chinakook commented Mar 27, 2019

Based on experience, you should use newer version of CUDA and CUDNN to get more performance. In my opinion, cuda 8.0 is obsoleted.
ps: Dilated convolution is not optimized in the old CUDNN version( < 6.0 or maybe 6.5 ) .

@wkcn
Copy link
Member Author

wkcn commented Apr 20, 2019

Close it since dilated convolution is not optimized in the old version of CUDNN as chinakook said.

@wkcn wkcn closed this as completed Apr 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants