-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[C++] Improve inference script to support benchmark on Imagenet #15164
[C++] Improve inference script to support benchmark on Imagenet #15164
Conversation
@wuxun-zhang ping me by this ID :) |
@wuxun-zhang Thanks! That's a great addition! @mxnet-label-bot Add [pr-awaiting-review, C++] |
@wuxun-zhang could you rebase the code? |
1b4784f
to
19ced82
Compare
Now CI has passed, could you please take a look at this PR again? @pengzhao-intel @ZhennanQin @TaoLv @ciyongch Thanks |
|[ResNet152-V2](#8)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|76.65%|76.36%| | ||
|[Inception-BN](#9)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/inception-bn/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|72.28%|72.20%| | ||
|
||
The following performance numbers are collected by using Skylake 6148 with 20 physical cores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refresh the data by C5.x18large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion. Already Done.
@szha is it ok to change the file name from inception_inference.cpp to imagenet_inference.cpp to cover more models? Any backward compatible issue? |
@anirudh2290 would you mind reviewing this PR? |
*/ | ||
void Predictor::InitParameters() { | ||
std::vector<mx_uint> data_shape; | ||
for (index_t i=0; i < input_shape.ndim(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code style
// initializer to call | ||
Xavier xavier(Xavier::uniform, Xavier::avg, 2.0f); | ||
index_t i = 0; | ||
for (auto& name : net.ListArguments()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use const
if possible.
} | ||
|
||
i = 0; | ||
for (auto& name : net.ListAuxiliaryStates()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use const
if possible.
|
||
// initializer to call | ||
Xavier xavier(Xavier::uniform, Xavier::avg, 2.0f); | ||
index_t i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why declare i
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for indexing infered shape of each arguments (see Line334), which is needed for Xavier
to initialize. So does i = 0
in Line339.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite strange usage. If so, please declare i in for loop header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
args_map[name] = tmp_arr.Copy(global_ctx); | ||
} | ||
|
||
i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems redundant.
|
||
// infer and create ndarrays according to the given input ndarrays. | ||
net.InferExecutorArrays(global_ctx, &arg_arrays, &grad_arrays, &grad_reqs, &aux_arrays, args_map, | ||
std::map<std::string, NDArray>(), std::map<std::string, OpReqType>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
|
||
double get_msec() { | ||
struct timeval time; | ||
gettimeofday(&time, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a Linux function, if you want to support this example on Windows, you need to define windows code path with macro. Of course, it's not mandatory. Maybe you can add some comment for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can use std::chrono::system_clock::now()
to cover Windows and Linux.
std::map<std::string, NDArray> aux_map; | ||
Symbol net; | ||
Executor *executor; | ||
Shape input_shape; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add suffix _
to these private vars.
|
||
// infer and create ndarrays according to the given input ndarrays. | ||
net.InferExecutorArrays(global_ctx, &arg_arrays, &grad_arrays, &grad_reqs, &aux_arrays, args_map, | ||
std::map<std::string, NDArray>(), std::map<std::string, OpReqType>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent.
net.InferExecutorArrays(global_ctx, &arg_arrays, &grad_arrays, &grad_reqs, &aux_arrays, args_map, | ||
std::map<std::string, NDArray>(), std::map<std::string, OpReqType>(), | ||
aux_map); | ||
for (auto& i : grad_reqs) i = OpReqType::kNullOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In InferExecutorArrays
function, if a user didn't specify OpReqType
for an argument, it will be assigned a default value, like kNullOp
for data and label, kWriteTo
for others. However, we don't need consider updating gradients in only forward pass, so we need reassign kNullOp
for all arguments.
return false; | ||
} | ||
|
||
std::vector<index_t> shape_vec {input_shape[1], input_shape[2], input_shape[3]}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if input_shape
is not 4D? better not use hard code here to get the shape value from input_shape
.
std::vector<float> dummy_data(input_shape.Size()); | ||
std::default_random_engine generator; | ||
std::uniform_real_distribution<float> val(0.0f, 1.0f); | ||
for (int i = 0; i < static_cast<int>(input_shape.Size()); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a large input array, int
might not be enough to indexing the array.
} | ||
} | ||
ms = get_msec() - ms; | ||
num_inference_batches = (nBatch == num_inference_batches) ? num_inference_batches : nBatch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as num_inference_batches = nBatch
?
num_inference_batches = (nBatch == num_inference_batches) ? num_inference_batches : nBatch; | ||
|
||
auto args_name = net.ListArguments(); | ||
std::cout << "INFO:" << "Dataset for inference: " << dataset_ << std::endl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use LG <<
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
dst_vec.push_back(elem); | ||
while (*p_next) { | ||
if (!bFloat) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent.
int index = 1; | ||
while (index < argc) { | ||
if (strcmp("--symbol_file", argv[index]) == 0) { | ||
index++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent.
std::string input_shape("3 224 224"); | ||
int seed = 48564309; | ||
int shuffle_chunk_seed = 3982304; | ||
int data_nthreads = 60; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not the recommened num_threads
by default, please change to a proper one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, num_threads
for data decoding will be determined by two parameters. One is data_nthreads
and another is user-defined OMP_NUM_THREADS
env variable through calling omp_get_num_procs()
, please refer to iter_image_recordio_2.cc#L146. So here I want to set a large number to make use of all cores. Also, data_nthreads = 60
is to align with the current Python implementation.
# under the License. | ||
|
||
# Downloading the data and model | ||
mkdir -p model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to only download the files when they're not existed in the specified directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. I used wget -nc
below to avoid re-downloading dataset and models.
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
# Downloading the data and model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add set -ex
``` | ||
Alternatively, The script [unit_test_inception_inference.sh](<https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/unit_test_inception_inference.sh>) downloads the pre-trained **Inception** model and a test image. The users can invoke this script as follows: | ||
For a quickly inference test, users can directly run [unit_test_imagenet_inference.sh](<https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/unit_test_imagenet_inference.sh>) by using the below command. This script will automatically download the pre-trained **Inception-Bn** and **resnet50_v1_int8** model and **validation dataset** which are required for inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a quickly inference test -> For a quick inference test
} | ||
virtual void InitQuantizedBias(NDArray* arr) { | ||
(*arr) = (int8_t)0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesnt operator= only support float currently ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments. Now operator=
does not support int8. If I add support for int8, there will be ambiguity due to implicit type conversion. For example, when I try to use (*arr) = 0
to initialize a NDArray, the compiler doesn't know which operator=
function should be called. So, as a work-around method, I will assign a float value to a int8 NDArray temporarily.
3f3efd1
to
7fde2f2
Compare
@anirudh2290 @ZhennanQin @ciyongch please review again if all concerns are resolved :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|[ResNet152-V2](#8)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|76.65%|76.36%| | ||
|[Inception-BN](#9)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/inception-bn/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|72.28%|72.20%| | ||
|
||
The following performance numbers are collected by using C5.18xlarge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following performance numbers are collected by using C5.18xlarge. | |
The following performance numbers are collected by using AWS EC2 C5.18xlarge. |
|
||
This example demonstrates image classification workflow with pre-trained models using MXNet C++ API. Now this script also supports inference with quantized CNN models generated by Intel® MKL-DNN (see this [quantization flow](https://github.com/apache/incubator-mxnet/blob/master/example/quantization/README.md)). By using C++ API, the latency of most models can get **2%~17%** speedup compared with current Python implementation. The below tables show accuracy and latency of several CNN models. | ||
|
||
The following models have been tested on Linux systems. And 50000 images are used to collect the following accuracy numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect that C++ inference should have the same accuracy as python inference. But seems the data here is different from https://github.com/apache/incubator-mxnet/tree/master/example/quantization. I guess these two tables might use different versions of MXNet or GluonCV. But it's still confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I have tested accuracy for C++ and Python locally. There are no difference. I think the numbers in page maybe too old and need to be updated.
|
||
# Running inference on imagenet. | ||
if [ "$(uname)" == "Darwin" ]; then | ||
echo ">>> INFO: FP32 real data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation.
echo ">>> INFO: FP32 dummy data" | ||
DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH}:../../../lib ./imagenet_inference --symbol_file "./model/Inception-BN-symbol.json" --batch_size 1 --num_inference_batches 500 --benchmark | ||
else | ||
echo ">>> INFO: FP32 real data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
lib_name=$(ls -a ../../../lib | grep -oE 'mkldnn' | tail -1) | ||
if [[ -n ${lib_name} ]] && [[ 'mkldnn' =~ ${lib_name} ]]; then | ||
echo ">>> INFO: INT8 dummy data" | ||
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:../../../lib ./imagenet_inference --symbol_file "./model/resnet50_v1_int8-symbol.json" --batch_size 1 --num_inference_batches 500 --benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have int8 real data inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I cannot find someplace to download INT8 params file which is required for real data inference. So, I just added a dummy data inference here for INT8 resnet50 model.
@TaoLv Have updated the accuracy numbers in https://github.com/apache/incubator-mxnet/tree/master/example/quantization. There are no difference in Top-1 accuracy for C++ and Python. Please help review again. Thanks. |
@wuxun-zhang Thank you for addressing the comments. The problem is whether we want to maintain two int8 accuracy tables in different places. @pengzhao-intel what do you think? |
Agree, so add a link to point the accuracy into quantization README and performance data to perf.md. |
## [imagenet_inference.cpp](<https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/inference/imagenet_inference.cpp>) | ||
|
||
This example demonstrates image classification workflow with pre-trained models using MXNet C++ API. Now this script also supports inference with quantized CNN models generated by Intel® MKL-DNN (see this [quantization flow](https://github.com/apache/incubator-mxnet/blob/master/example/quantization/README.md)). By using C++ API, the latency of most models can get **2%~17%** speedup compared with current Python implementation. The below tables show accuracy and latency of several CNN models. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't mention the exact number (2%~15%) since this number may be changed with the evolution of python and C++ interface.
After offline talking, I think we can leave the performance of CLX (c5.12xlarge/24xlarge) on this page :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
b01653b
to
b265dca
Compare
Seems all checks are passed but hang now. |
Thanks for the nice PR and merge now. |
Description
This PR is used to improve C++ inference script. Now users can run benchmark with FP32/INT8 models on Imagenet. Simultaneously, there are a bit latency improvement of C++ over Python.
The below table shows latency improvement between C++ and Python for several CNN model.
(Update numbers based on C5.18xlarge)
@pengzhao-intel @ZhennanQin @TaoLv @ciyongch Please help to review. Thanks.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments