Add multiplex operator #4064

kuke · 2017-09-13T09:22:55Z

Resolve #4010

QiJune · 2017-09-21T03:00:10Z

paddle/operators/multiplex_op.cc

+    auto num_ins = ins.size();
+    PADDLE_ENFORCE(num_ins > 2,
+                   "multiplex operator should have more than 2 inputs.");
+    PADDLE_ENFORCE_EQ(ins[0]->dims().size(), 1,


We also have to check the index in ins[0], index in ins[0] must less than ins[0]->dims()

Done. Add the index check in the forward compute function.

QiJune · 2017-09-21T03:04:51Z

paddle/operators/multiplex_op.cc

+                            "Input(Out@GRAD) shouldn't be null.");
+    auto d_ins = ctx.MultiOutput<LoDTensor>(framework::GradVarName("X"));
+    auto ins = ctx.MultiInput<Tensor>("X");
+    // don;t compute gradient for index


don;t --> don't

QiJune · 2017-09-21T03:14:14Z

paddle/operators/multiplex_op.cu

+    auto index = index_t_cpu.data<T>();
+    for (auto i = 0; i < rows; i++) {
+      int k = (int)index[i] + 1;
+      cudaMemcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,


Please use cuda stream.

auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( ctx.device_context()) .stream(); platform::GPUPlace place = boost::Get<platform::GPUPlace>(ctx.GetPlace()); memory::Copy(place, out->data<T>() + i * cols, place, ins[k]->data<T>() + i * cols, cols * sizeof(T), stream);

QiJune · 2017-09-21T03:18:33Z

paddle/operators/multiplex_op.h

+    auto cols = ins[1]->dims()[1];
+    for (auto i = 0; i < rows; i++) {
+      int k = (int)index[i] + 1;
+      memcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,


Maybe we can combine cpu code and cuda code in one file.

template <typename Place, typename T> class MultiplexKernel : public framework::OpKernel

We can use

t.device(context.GetEigenDevice<Place>()) = t.constant(static_cast<T>(0));

for set cpu/gpu to zero

And we can use

memory::Copy

for both cpu/gpu copy

It seems that merge CPU/GPU code together is not a good idea here. I make a mistake.
If CPU and GPU both use Eigen, we can reuse codes easily. But if not, it's actually better to split CPU and GPU implementation.

Done. split CPU/GPU code again.

pkuyym · 2017-09-21T02:35:02Z

paddle/operators/multiplex_op.cc

+
+class MultiplexOp : public framework::OperatorWithKernel {
+ public:
+  MultiplexOp(const std::string &type, const framework::VariableNameMap &inputs,


Why not use using framework::OperatorWithKernel:: OperatorWithKernel

kuke

Thanks for the valuable comments. Please review the changes

kuke · 2017-09-23T05:57:39Z

paddle/operators/multiplex_op.cc

+
+class MultiplexOp : public framework::OperatorWithKernel {
+ public:
+  MultiplexOp(const std::string &type, const framework::VariableNameMap &inputs,


kuke · 2017-09-23T05:57:49Z

paddle/operators/multiplex_op.cc

+                            "Input(Out@GRAD) shouldn't be null.");
+    auto d_ins = ctx.MultiOutput<LoDTensor>(framework::GradVarName("X"));
+    auto ins = ctx.MultiInput<Tensor>("X");
+    // don;t compute gradient for index


kuke · 2017-09-23T05:57:59Z

paddle/operators/multiplex_op.cu

+    auto index = index_t_cpu.data<T>();
+    for (auto i = 0; i < rows; i++) {
+      int k = (int)index[i] + 1;
+      cudaMemcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,


kuke · 2017-09-23T05:58:08Z

paddle/operators/multiplex_op.h

+    auto cols = ins[1]->dims()[1];
+    for (auto i = 0; i < rows; i++) {
+      int k = (int)index[i] + 1;
+      memcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,


QiJune

LGTM

Yibing Liu added 2 commits September 13, 2017 17:16

add multiplex operator

b3f44ad

merge conflicts

4a71d95

kuke requested review from qingqing01, reyoung and dzhwinter September 13, 2017 09:23

kuke added the OpPorting label Sep 13, 2017

qingqing01 requested review from QiJune and pkuyym and removed request for dzhwinter, qingqing01 and reyoung September 18, 2017 08:31

Yibing Liu added 2 commits September 20, 2017 17:39

merge multiplex_op with the latest upstream

18dc201

adapt multiplex_op to the dev of framework

9da5192

QiJune reviewed Sep 21, 2017

View reviewed changes

pkuyym reviewed Sep 21, 2017

View reviewed changes

Yibing Liu added 2 commits September 22, 2017 11:56

Merge branch 'develop' of upstream into multiplex_op_dev

85a5d38

combine gpu&cpu code in multiplex_op

7620efd

kuke commented Sep 23, 2017

View reviewed changes

revert code layout in multiplex_op

fb52bc6

QiJune approved these changes Sep 25, 2017

View reviewed changes

kuke merged commit 47fbc96 into PaddlePaddle:develop Sep 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multiplex operator #4064

Add multiplex operator #4064

kuke commented Sep 13, 2017

QiJune Sep 21, 2017

kuke Sep 23, 2017

QiJune Sep 21, 2017

kuke Sep 23, 2017

QiJune Sep 21, 2017

kuke Sep 23, 2017

QiJune Sep 21, 2017

kuke Sep 23, 2017

QiJune Sep 25, 2017

kuke Sep 25, 2017

pkuyym Sep 21, 2017

kuke Sep 23, 2017

kuke left a comment

kuke Sep 23, 2017

kuke Sep 23, 2017

kuke Sep 23, 2017

kuke Sep 23, 2017

QiJune left a comment

Add multiplex operator #4064

Add multiplex operator #4064

Conversation

kuke commented Sep 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QiJune left a comment

Choose a reason for hiding this comment