Use StridedMemCpy in Concat/Split Kernel #4188

Yancey1989 · 2017-09-19T07:54:17Z

Add GradKernel for ConcatOP
Use a general functor to copy memory in Split/Concat CPU/GPU Kernel.

qingqing01 · 2017-09-19T08:50:08Z

paddle/operators/math/math_function.cu

+    float* dest = b + b_offset * after * i;
+    cudaMemcpy(dest, src, len, cudaMemcpyDeviceToDevice);
+  }
+}


有memory::Copy接口，可以直接用？

https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/memory/memcpy.h

Xreki · 2017-09-19T11:16:21Z

paddle/operators/math/math_function.cc

+void copy_matrix<platform::CPUPlace, float>(const float* a, size_t a_offset,
+                                            float* b, size_t b_offset,
+                                            size_t len, size_t before,
+                                            size_t after) {


before，after分别是什么概念呢？这个函数如果要作为一个通用于Matrix的一个Copy函数，在参数命名上可能需要更Matrix化一些，比如row, col, width, height之类的。
单纯从copy_matrix的实现来看，我理解这个函数包含以下信息：

a矩阵size为X x after

b矩阵的size为Y x after

要拷贝的子矩阵size为before x (len / sizeof(T))

a矩阵起始位置(a_offset, 0)

b矩阵起始位置(b_offset, 0)

是否类似原来的subMatrix ?
caffe2的CopyMatrix

另外，@qingqing01 我觉得这个函数不算是math操作，是否单独实现在一个比如matrix.cc的文件里比较好？matrix.cc也可以定义一些其他的通用于Matrix的函数。

多谢 @Xreki , copy_matrix这个函数是和caffe2中的CopyMatrix类似的，把它作为一个单独的functor主要也是为了简化代码，使CPU和GPU能够公用同一个Kernel。

根据 @qingqing01 的提醒，或许我们可以不需要这个copy_matrix，直接在Kernel里调用memory::Copy来实现内存/显存的Copy。

根据https://devblogs.nvidia.com/parallelforall/how-overlap-data-transfers-cuda-cc/ ，可以使用cudaMemcpyAsync, 来实现异步的拷贝显存来优化性能，但这需要单独实现GPU的Kernel，这是因为cudaMemcpyAsync需要cudaStream，会造成CPU和GPU Kernel的代码不一致，不过或许可以在另外的PR来实现这个优化。

Yancey1989 · 2017-09-19T16:16:50Z

paddle/operators/split_op.h

        const T* src =
            in->data<T>() + input_offset + input_axis_dim * after * j;
-        memcpy(dest, src, len);
+        paddle::memory::Copy<Place, Place>(
+            boost::get<Place>(ctx.GetPlace()), static_cast<void*>(dst),


Maybe we can add an interface to get Place #4203

reyoung · 2017-09-19T21:11:43Z

Please see #4205, let's give a general function for concat and crop

…trix_functor

Yancey1989 · 2017-09-26T09:48:24Z

From @reyoung

Please see #4205, let's give a general function for concat and crop

Done.

…trix_functor

add copy_matrix functor

01b5dc0

Yancey1989 requested review from Xreki and typhoonzero September 19, 2017 07:54

typhoonzero added the OpPorting label Sep 19, 2017

qingqing01 reviewed Sep 19, 2017

View reviewed changes

Xreki reviewed Sep 19, 2017

View reviewed changes

use memory::Copy

f0e43b4

Yancey1989 commented Sep 19, 2017

View reviewed changes

reyoung self-requested a review September 19, 2017 21:11

Yancey1989 changed the title ~~add copy_matrix functor~~ Use general function in Concat/Split CPU/GPU Kernel Sep 20, 2017

Yancey1989 added 2 commits September 23, 2017 09:41

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into copy_ma…

969093d

…trix_functor

use trided_memcpy

430e87e

Yancey1989 requested a review from luotao1 September 26, 2017 09:12

Yancey1989 added 4 commits September 27, 2017 11:36

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into copy_ma…

7c9e58e

…trix_functor

fix build failed

d2068bb

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into copy_ma…

f67b632

…trix_functor

update

c39004f

Yancey1989 changed the title ~~Use general function in Concat/Split CPU/GPU Kernel~~ Use StridedMemCpy in Concat/Split Kernel Sep 27, 2017

update

a09a3a7

reyoung approved these changes Sep 27, 2017

View reviewed changes

Yancey1989 merged commit d7db15f into PaddlePaddle:develop Sep 28, 2017

Yancey1989 deleted the copy_matrix_functor branch September 28, 2017 03:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use StridedMemCpy in Concat/Split Kernel #4188

Use StridedMemCpy in Concat/Split Kernel #4188

Yancey1989 commented Sep 19, 2017 •

edited

Loading

qingqing01 Sep 19, 2017

Yancey1989 Sep 19, 2017

Xreki Sep 19, 2017 •

edited

Loading

Yancey1989 Sep 19, 2017

Yancey1989 Sep 19, 2017

Yancey1989 Sep 19, 2017

reyoung commented Sep 19, 2017

Yancey1989 commented Sep 26, 2017 •

edited

Loading

Use StridedMemCpy in Concat/Split Kernel #4188

Use StridedMemCpy in Concat/Split Kernel #4188

Conversation

Yancey1989 commented Sep 19, 2017 • edited Loading

qingqing01 Sep 19, 2017

Choose a reason for hiding this comment

Yancey1989 Sep 19, 2017

Choose a reason for hiding this comment

Xreki Sep 19, 2017 • edited Loading

Choose a reason for hiding this comment

Yancey1989 Sep 19, 2017

Choose a reason for hiding this comment

Yancey1989 Sep 19, 2017

Choose a reason for hiding this comment

Yancey1989 Sep 19, 2017

Choose a reason for hiding this comment

reyoung commented Sep 19, 2017

Yancey1989 commented Sep 26, 2017 • edited Loading

Yancey1989 commented Sep 19, 2017 •

edited

Loading

Xreki Sep 19, 2017 •

edited

Loading

Yancey1989 commented Sep 26, 2017 •

edited

Loading