fix reduce_any kernel data race on sharedMem #47233

zhangbopd · 2022-10-20T13:24:18Z

PR types

Bug fixes

PR changes

OPs

Describe

分析并解决了 issue： Potential race-condition in phi::funcs::ReduceAnyKernel #46974 在sharedMem中出现data race的问题，
1. 问题原因：代码中存在val = shared[bid * block_dim_x + lane]; 读取了sharedMem中的数据，后续的CudaShuffleDownSync虽然进行了 warp内的同步，但是不同warp间的线程未同步，而紧接着的shared[threadIdx.y] = val; 在sharedMem上写数据时，与前述的sharedMem中读取数据的行为产生了data race.
2. 解决方法：在写数据操作前加入线程同步操作.
3. 补充说明：若使用如下判断替代线程同步操作也可解决data race，但是性能上不如线程同步
```
  if (wid % block_dim_x == 0) { val = shared[bid * block_dim_x + lane] };
```
kp 涉及的文件较多，ci任务超时，benchmark本地测得的性能数据如下图，个别显卡中间被别的任务占用，部分数据不准确。
使用位运算替代除法运算、取模运算，可带来小幅度性能提升。

paddle-bot · 2022-10-20T13:24:22Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

ZzSean · 2022-10-24T08:14:14Z

paddle/phi/kernels/primitive/compute_primitives.h

-    int tid = threadIdx.y * blockDim.x + threadIdx.x;
-    int wid = tid / kWarpSize;
+    int lane, tid, wid, n;
+    if (kWarpSize == 32 || kWarpSize == 64) {


kWarpSize一定满足这个条件吧，还需要这个判断吗

原考虑NV后面架构、warpSize的大小或许变动，保持了原来计算的分支，后面将删除

…pd/Paddle into fix_reduce_any_data_race

ZzSean

LGTM for CI-OP-Benchmark

JamesLim-sy · 2022-10-25T02:53:30Z

paddle/phi/kernels/primitive/compute_primitives.h

+    int lane, tid, wid, bid, n;
+    // Bit operation can be used when kWarpSize is 32 or 64 now
+    n = kWarpSize == 32 ? 5 : 6;
+    block_dim_x = blockDim.x >> n;


这里的n = kWarpSize == 32 ? 5 : 6; 存在两处可以继续修改的地方：

n作为右移位的规模，可以改为int rshift_val

这个数字可以在编译过程中判断，所以可以改为：

constexpr int rshift_val = (kWarpSize != 32) ? ((kWarpSize == 64) ? 6 : 5) : 5;

JamesLim-sy · 2022-10-25T03:02:06Z

paddle/phi/kernels/primitive/compute_primitives.h

-    int tid = threadIdx.y * blockDim.x + threadIdx.x;
-    int wid = tid / kWarpSize;
-    int bid = threadIdx.y;
+    int lane, tid, wid, bid, n;


代码习惯上的小建议：变量声明和赋值分离，且声明时未赋初值，会影响代码的维护和问题排查；建议改回原来的直接声明时赋值的写法

之前因if分支考虑到变量生命周期，导致变量声明和赋值分离，现已取消分支，按照建议修改。done

ZzSean

LGTM for CI-OP-Benchmark

JamesLim-sy · 2022-10-26T06:52:42Z

@niuliling123 请大佬review下对于Kps的修改；
@zhangbopd 请在PR描述中解释下为什么对CUDA Kernel进行修改就能够解决issue46974中出现的问题；
@mingxu1067 请大佬reveiw下是否有效地解决了issue问题.

JamesLim-sy

LGTM
等后续的reviewer全部通过后可以合入

AnnaTrainingG

LGTM

mingxu1067 · 2022-10-27T04:14:28Z

Veried, no more errors reported from copmute-sanitizer. LGTM

fix reduce_any kernel data race on sharedMem

4586e52

zhangbopd and others added 2 commits October 21, 2022 05:37

use bit operation instead of div & mod

331b695

Merge branch 'PaddlePaddle:develop' into fix_reduce_any_data_race

09e0a5c

ZzSean reviewed Oct 24, 2022

View reviewed changes

zhangbopd requested a review from ZzSean October 24, 2022 08:40

zhangbopd added 2 commits October 24, 2022 08:53

unbranch

79f745b

Merge branch 'fix_reduce_any_data_race' of https://github.com/zhangbo…

1873948

…pd/Paddle into fix_reduce_any_data_race

ZzSean previously approved these changes Oct 25, 2022

View reviewed changes

JamesLim-sy reviewed Oct 25, 2022

View reviewed changes

modified according to PR comments

cd14eb8

zhangbopd dismissed ZzSean’s stale review via cd14eb8 October 25, 2022 03:25

ZzSean approved these changes Oct 26, 2022

View reviewed changes

JamesLim-sy approved these changes Oct 26, 2022

View reviewed changes

AnnaTrainingG approved these changes Oct 26, 2022

View reviewed changes

JamesLim-sy mentioned this pull request Oct 26, 2022

Potential race-condition in phi::funcs::ReduceAnyKernel #46974

Closed

JamesLim-sy merged commit 77dbb31 into PaddlePaddle:develop Oct 27, 2022

zhangbopd deleted the fix_reduce_any_data_race branch October 27, 2022 05:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix reduce_any kernel data race on sharedMem #47233

fix reduce_any kernel data race on sharedMem #47233

zhangbopd commented Oct 20, 2022 •

edited by JamesLim-sy

Loading

paddle-bot bot commented Oct 20, 2022

ZzSean Oct 24, 2022

zhangbopd Oct 24, 2022 •

edited

Loading

zhangbopd Oct 24, 2022

ZzSean left a comment •

edited

Loading

JamesLim-sy Oct 25, 2022

zhangbopd Oct 25, 2022

JamesLim-sy Oct 25, 2022

zhangbopd Oct 25, 2022

ZzSean left a comment •

edited

Loading

JamesLim-sy commented Oct 26, 2022

JamesLim-sy left a comment

AnnaTrainingG left a comment

mingxu1067 commented Oct 27, 2022

fix reduce_any kernel data race on sharedMem #47233

fix reduce_any kernel data race on sharedMem #47233

Conversation

zhangbopd commented Oct 20, 2022 • edited by JamesLim-sy Loading

PR types

PR changes

Describe

paddle-bot bot commented Oct 20, 2022

ZzSean Oct 24, 2022

Choose a reason for hiding this comment

zhangbopd Oct 24, 2022 • edited Loading

Choose a reason for hiding this comment

zhangbopd Oct 24, 2022

Choose a reason for hiding this comment

ZzSean left a comment • edited Loading

Choose a reason for hiding this comment

JamesLim-sy Oct 25, 2022

Choose a reason for hiding this comment

zhangbopd Oct 25, 2022

Choose a reason for hiding this comment

JamesLim-sy Oct 25, 2022

Choose a reason for hiding this comment

zhangbopd Oct 25, 2022

Choose a reason for hiding this comment

ZzSean left a comment • edited Loading

Choose a reason for hiding this comment

JamesLim-sy commented Oct 26, 2022

JamesLim-sy left a comment

Choose a reason for hiding this comment

AnnaTrainingG left a comment

Choose a reason for hiding this comment

mingxu1067 commented Oct 27, 2022

zhangbopd commented Oct 20, 2022 •

edited by JamesLim-sy

Loading

zhangbopd Oct 24, 2022 •

edited

Loading

ZzSean left a comment •

edited

Loading

ZzSean left a comment •

edited

Loading