optimization of max_pool3d forward #45820

s5u13b · 2022-09-07T02:30:41Z

PR types

Performance optimization

PR changes

OPs

Describe

Environment:
- V100-32G, CUDA 11.2, cuDNN 8
Feature：
- replace the div and mod operation with fast_divmod operation
- replace 1d gpu launch with 3d gpu launch
Performance (OP Benchmark):

Paddle Kernel	Config ID	Performance Before	Performance After	Improvement
KernelMaxPool3DWithIdx	1	1103.2us	966.28us	14.16%

CLAassistant · 2022-09-07T02:30:45Z

All committers have signed the CLA.

paddle-bot · 2022-09-07T02:30:45Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

JamesLim-sy · 2022-09-07T07:39:00Z

I think it is better to list the perf value with a table.

JamesLim-sy · 2022-09-07T07:49:06Z

paddle/phi/kernels/funcs/pooling.cu

+            if (ele <
+                input_data_cur[(d * input_height + h) * input_width + w]) {
+              max_index = (d * input_height + h) * input_width + w;
+              ele = input_data_cur[max_index];


Formula (d * input_height + h) * input_width + w has appeard twice, would it be better to assign this formula to a local data, example below:

T1 cur_data = input_data_cur[(d * input_height + h) * input_width + w]; ele = ele < cur_data] ? cur_data: ele;

Because max_index should be assigned only when the condition ele < input_data_cur[(d * input_height + h) * input_width + w] is true and it would be used at the end of funciton for return mask_data, so I should assign the value to max_index inside the if statement and therefore formula (d * input_height + h) * input_width + w appears twice.

Will time cost of formula (d * input_height + h) * input_width + w shrink with code below ?

for (int w = wstart; w < wend; ++w) { max_index = (d * input_height + h) * input_width + w; if (ele < input_data[max_index]) { ele = input_data[max_index]; } }

The max_index cannot be calculated correctly with this codes, and consequently the mask_data return in the function would be incorrect.

s5u13b · 2022-09-07T08:16:01Z

I think it is better to list the perf value with a table.

Thanks for your advice, I have modified description to present the perf value in the form of table.

JamesLim-sy · 2022-09-09T05:28:34Z

paddle/phi/kernels/funcs/pooling.cu

-    int blocks = (nthreads + thread_num - 1) / thread_num;
-    dim3 threads(thread_num, 1);
-    dim3 grid(blocks, 1);
+    int thread_x = 32;


@fengxiaoshuai hi, this PR deletes code for NV_JETSON about threads config, however the thread_num is 256 in all which is under the limitation for NV_JETSON.
Can this PR be merged?

jetson系列中比较低端的GPU寄存器很少，有时候会出现资源不足，kernel launch失败，这个QA后台有CE监控

JamesLim-sy

LGTM

paddle-bot bot added contributor External developers status: proposed labels Sep 7, 2022

optimization of max_pool3d forward

10f2932

s5u13b force-pushed the maxpool3d branch from 1d00a6d to 10f2932 Compare September 7, 2022 06:14

luotao1 assigned luotao1, ZzSean and JamesLim-sy and unassigned ZzSean Sep 7, 2022

JamesLim-sy reviewed Sep 7, 2022

View reviewed changes

JamesLim-sy reviewed Sep 9, 2022

View reviewed changes

b3602sss approved these changes Sep 9, 2022

View reviewed changes

JamesLim-sy approved these changes Sep 9, 2022

View reviewed changes

JamesLim-sy merged commit 2632d77 into PaddlePaddle:develop Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization of max_pool3d forward #45820

optimization of max_pool3d forward #45820

s5u13b commented Sep 7, 2022 •

edited

Loading

CLAassistant commented Sep 7, 2022 •

edited

Loading

paddle-bot bot commented Sep 7, 2022

JamesLim-sy commented Sep 7, 2022

JamesLim-sy Sep 7, 2022

s5u13b Sep 7, 2022

JamesLim-sy Sep 8, 2022

s5u13b Sep 9, 2022

JamesLim-sy Sep 9, 2022

s5u13b commented Sep 7, 2022

JamesLim-sy Sep 9, 2022

fengxiaoshuai Sep 9, 2022

JamesLim-sy left a comment

optimization of max_pool3d forward #45820

optimization of max_pool3d forward #45820

Conversation

s5u13b commented Sep 7, 2022 • edited Loading

PR types

PR changes

Describe

CLAassistant commented Sep 7, 2022 • edited Loading

paddle-bot bot commented Sep 7, 2022

JamesLim-sy commented Sep 7, 2022

JamesLim-sy Sep 7, 2022

Choose a reason for hiding this comment

s5u13b Sep 7, 2022

Choose a reason for hiding this comment

JamesLim-sy Sep 8, 2022

Choose a reason for hiding this comment

s5u13b Sep 9, 2022

Choose a reason for hiding this comment

JamesLim-sy Sep 9, 2022

Choose a reason for hiding this comment

s5u13b commented Sep 7, 2022

JamesLim-sy Sep 9, 2022

Choose a reason for hiding this comment

fengxiaoshuai Sep 9, 2022

Choose a reason for hiding this comment

JamesLim-sy left a comment

Choose a reason for hiding this comment

s5u13b commented Sep 7, 2022 •

edited

Loading

CLAassistant commented Sep 7, 2022 •

edited

Loading