Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiplex operator #4064

Merged
merged 7 commits into from
Sep 25, 2017
Merged

Add multiplex operator #4064

merged 7 commits into from
Sep 25, 2017

Conversation

kuke
Copy link
Contributor

@kuke kuke commented Sep 13, 2017

Resolve #4010

@kuke kuke added the OpPorting label Sep 13, 2017
@qingqing01 qingqing01 requested review from QiJune and pkuyym and removed request for dzhwinter, qingqing01 and reyoung September 18, 2017 08:31
auto num_ins = ins.size();
PADDLE_ENFORCE(num_ins > 2,
"multiplex operator should have more than 2 inputs.");
PADDLE_ENFORCE_EQ(ins[0]->dims().size(), 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have to check the index in ins[0], index in ins[0] must less than ins[0]->dims()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Add the index check in the forward compute function.

"Input(Out@GRAD) shouldn't be null.");
auto d_ins = ctx.MultiOutput<LoDTensor>(framework::GradVarName("X"));
auto ins = ctx.MultiInput<Tensor>("X");
// don;t compute gradient for index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don;t --> don't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto index = index_t_cpu.data<T>();
for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1;
cudaMemcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use cuda stream.

auto stream = reinterpret_cast<const platform::CUDADeviceContext&>(
                       ctx.device_context())
                       .stream();
platform::GPUPlace place = boost::Get<platform::GPUPlace>(ctx.GetPlace());
memory::Copy(place, out->data<T>() + i * cols, place, ins[k]->data<T>() + i * cols, cols * sizeof(T), stream);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto cols = ins[1]->dims()[1];
for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1;
memcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can combine cpu code and cuda code in one file.

template <typename Place, typename T>
class MultiplexKernel : public framework::OpKernel 

We can use

t.device(context.GetEigenDevice<Place>()) = t.constant(static_cast<T>(0));

for set cpu/gpu to zero

And we can use

memory::Copy

for both cpu/gpu copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that merge CPU/GPU code together is not a good idea here. I make a mistake.
If CPU and GPU both use Eigen, we can reuse codes easily. But if not, it's actually better to split CPU and GPU implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. split CPU/GPU code again.


class MultiplexOp : public framework::OperatorWithKernel {
public:
MultiplexOp(const std::string &type, const framework::VariableNameMap &inputs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use using framework::OperatorWithKernel:: OperatorWithKernel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

Copy link
Contributor Author

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the valuable comments. Please review the changes


class MultiplexOp : public framework::OperatorWithKernel {
public:
MultiplexOp(const std::string &type, const framework::VariableNameMap &inputs,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

"Input(Out@GRAD) shouldn't be null.");
auto d_ins = ctx.MultiOutput<LoDTensor>(framework::GradVarName("X"));
auto ins = ctx.MultiInput<Tensor>("X");
// don;t compute gradient for index
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto index = index_t_cpu.data<T>();
for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1;
cudaMemcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto cols = ins[1]->dims()[1];
for (auto i = 0; i < rows; i++) {
int k = (int)index[i] + 1;
memcpy(out->data<T>() + i * cols, ins[k]->data<T>() + i * cols,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kuke kuke merged commit 47fbc96 into PaddlePaddle:develop Sep 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants