Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Use CPUPinned context in ImageRecordIOParser2 #13980

Merged
merged 5 commits into from
Jan 25, 2019

Conversation

yuxihu
Copy link
Member

@yuxihu yuxihu commented Jan 24, 2019

This PR is to fix performance regression introduced by #12666.

We observed the following performance regression from 1.3.1 to 1.4.x to master when training resnet50_v1 (symbolic_fp16) with MXNet kvstore on a single p3.xlarge instance:

===1.3.1===

INFO:root:Epoch[0] Batch [50]   Speed: 5749.09 samples/sec      accuracy=0.008119
INFO:root:Epoch[0] Batch [100]  Speed: 6180.00 samples/sec      accuracy=0.018252
INFO:root:Epoch[0] Batch [150]  Speed: 6243.93 samples/sec      accuracy=0.026289
INFO:root:Epoch[0] Batch [200]  Speed: 6234.02 samples/sec      accuracy=0.036436
INFO:root:Epoch[0] Batch [250]  Speed: 6135.59 samples/sec      accuracy=0.043975
INFO:root:Epoch[0] Batch [300]  Speed: 6169.00 samples/sec      accuracy=0.053945
INFO:root:Epoch[0] Batch [350]  Speed: 6076.10 samples/sec      accuracy=0.061250
INFO:root:Epoch[0] Batch [400]  Speed: 6134.20 samples/sec      accuracy=0.071016
INFO:root:Epoch[0] Batch [450]  Speed: 6201.94 samples/sec      accuracy=0.076260
INFO:root:Epoch[0] Batch [500]  Speed: 6140.77 samples/sec      accuracy=0.082266
INFO:root:Epoch[0] Batch [550]  Speed: 6149.55 samples/sec      accuracy=0.088701
INFO:root:Epoch[0] Batch [600]  Speed: 6149.39 samples/sec      accuracy=0.088584

===1.4.x===

INFO:root:Epoch[0] Batch [0-50] Speed: 4606.70 samples/sec      accuracy=0.008004
INFO:root:Epoch[0] Batch [50-100]       Speed: 4776.33 samples/sec      accuracy=0.018809
INFO:root:Epoch[0] Batch [100-150]      Speed: 4747.80 samples/sec      accuracy=0.027314
INFO:root:Epoch[0] Batch [150-200]      Speed: 4812.72 samples/sec      accuracy=0.036582
INFO:root:Epoch[0] Batch [200-250]      Speed: 5090.36 samples/sec      accuracy=0.046816
INFO:root:Epoch[0] Batch [250-300]      Speed: 4885.30 samples/sec      accuracy=0.055801
INFO:root:Epoch[0] Batch [300-350]      Speed: 4912.66 samples/sec      accuracy=0.064531
INFO:root:Epoch[0] Batch [350-400]      Speed: 4897.15 samples/sec      accuracy=0.073496
INFO:root:Epoch[0] Batch [400-450]      Speed: 4957.09 samples/sec      accuracy=0.080127
INFO:root:Epoch[0] Batch [450-500]      Speed: 4875.13 samples/sec      accuracy=0.085576
INFO:root:Epoch[0] Batch [500-550]      Speed: 4943.11 samples/sec      accuracy=0.092383
INFO:root:Epoch[0] Batch [550-600]      Speed: 4873.64 samples/sec      accuracy=0.094678

===master===

INFO:root:Epoch[0] Batch [0-50] Speed: 4957.82 samples/sec      accuracy=0.008464
INFO:root:Epoch[0] Batch [50-100]       Speed: 5011.30 samples/sec      accuracy=0.018232
INFO:root:Epoch[0] Batch [100-150]      Speed: 4903.28 samples/sec      accuracy=0.026904
INFO:root:Epoch[0] Batch [150-200]      Speed: 4871.87 samples/sec      accuracy=0.036299
INFO:root:Epoch[0] Batch [200-250]      Speed: 4793.67 samples/sec      accuracy=0.044736
INFO:root:Epoch[0] Batch [250-300]      Speed: 4835.09 samples/sec      accuracy=0.054385
INFO:root:Epoch[0] Batch [300-350]      Speed: 4732.73 samples/sec      accuracy=0.062266
INFO:root:Epoch[0] Batch [350-400]      Speed: 4794.80 samples/sec      accuracy=0.070781
INFO:root:Epoch[0] Batch [400-450]      Speed: 4785.00 samples/sec      accuracy=0.077178
INFO:root:Epoch[0] Batch [450-500]      Speed: 4776.88 samples/sec      accuracy=0.082549
INFO:root:Epoch[0] Batch [500-550]      Speed: 4795.37 samples/sec      accuracy=0.088789
INFO:root:Epoch[0] Batch [550-600]      Speed: 4718.87 samples/sec      accuracy=0.090850

By changing the default context to CPUPinned(0) in ImageRecordIoParser2 in this PR, the performance is comparable with 1.3.1 using the same training settings.

In addition, we add a device_id parameter to the ImageRecParserParam struct such that we can set the device id for the CPUPinned context when creating ImageRecordIter. This newly exposed device_id allows training with large batch_size using Horovod. Users can also set device_id to -1 indicating that they want to use CPU(0) context which saves memory usage on GPU.

===1.4.x with this PR===

INFO:root:Epoch[0] Batch [0-50]	Speed: 5870.13 samples/sec	accuracy=0.008301
INFO:root:Epoch[0] Batch [50-100]	Speed: 6316.20 samples/sec	accuracy=0.017910
INFO:root:Epoch[0] Batch [100-150]	Speed: 6344.68 samples/sec	accuracy=0.028066
INFO:root:Epoch[0] Batch [150-200]	Speed: 6272.05 samples/sec	accuracy=0.037432
INFO:root:Epoch[0] Batch [200-250]	Speed: 6311.87 samples/sec	accuracy=0.046504
INFO:root:Epoch[0] Batch [250-300]	Speed: 6214.49 samples/sec	accuracy=0.056309
INFO:root:Epoch[0] Batch [300-350]	Speed: 6267.78 samples/sec	accuracy=0.062568
INFO:root:Epoch[0] Batch [350-400]	Speed: 6259.29 samples/sec	accuracy=0.071963
INFO:root:Epoch[0] Batch [400-450]	Speed: 6273.53 samples/sec	accuracy=0.080605
INFO:root:Epoch[0] Batch [450-500]	Speed: 6255.01 samples/sec	accuracy=0.087021
INFO:root:Epoch[0] Batch [500-550]	Speed: 6314.75 samples/sec	accuracy=0.091787
INFO:root:Epoch[0] Batch [550-600]	Speed: 6302.38 samples/sec	accuracy=0.093311

@yuxihu
Copy link
Member Author

yuxihu commented Jan 24, 2019

@mxnet-label-bot update [pr-work-in-progress]

@marcoabreu marcoabreu added the pr-work-in-progress PR is still work in progress label Jan 24, 2019
@apeforest
Copy link
Contributor

@eric-haibin-lin Could you please help to review this PR. You have raised some concerns on this change in #12666 earlier.

src/io/image_iter_common.h Outdated Show resolved Hide resolved
@yuxihu
Copy link
Member Author

yuxihu commented Jan 24, 2019

@mxnet-label-bot update [pr-awaiting-review]

@marcoabreu marcoabreu added pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress labels Jan 24, 2019
@yuxihu
Copy link
Member Author

yuxihu commented Jan 24, 2019

@apeforest @ctcyang @eric-haibin-lin @szha please help review and merge.

@yuxihu yuxihu changed the title [WIP] Use CPUPinned context in ImageRecordIOParser2 Use CPUPinned context in ImageRecordIOParser2 Jan 24, 2019
Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eric-haibin-lin eric-haibin-lin merged commit 49e8c57 into apache:master Jan 25, 2019
yuxihu added a commit to yuxihu/incubator-mxnet that referenced this pull request Jan 25, 2019
* create NDArray with CPUPinned context in ImageRecordIOParser2

* update document

* use -1 device_id as an option to create CPU(0) context

* retrigger CI

* fix cpplint error
eric-haibin-lin pushed a commit that referenced this pull request Jan 25, 2019
* create NDArray with CPUPinned context in ImageRecordIOParser2

* update document

* use -1 device_id as an option to create CPU(0) context

* retrigger CI

* fix cpplint error
@yuxihu yuxihu deleted the cpu_pinned branch January 25, 2019 18:29
jessr92 pushed a commit to jessr92/incubator-mxnet that referenced this pull request Jan 27, 2019
* create NDArray with CPUPinned context in ImageRecordIOParser2

* update document

* use -1 device_id as an option to create CPU(0) context

* retrigger CI

* fix cpplint error
stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
* create NDArray with CPUPinned context in ImageRecordIOParser2

* update document

* use -1 device_id as an option to create CPU(0) context

* retrigger CI

* fix cpplint error
lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this pull request Feb 18, 2019
…13990)

* create NDArray with CPUPinned context in ImageRecordIOParser2

* update document

* use -1 device_id as an option to create CPU(0) context

* retrigger CI

* fix cpplint error
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* create NDArray with CPUPinned context in ImageRecordIOParser2

* update document

* use -1 device_id as an option to create CPU(0) context

* retrigger CI

* fix cpplint error
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants