Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cudnn conv op #4195

Merged
merged 25 commits into from
Oct 12, 2017
Merged

Cudnn conv op #4195

merged 25 commits into from
Oct 12, 2017

Conversation

typhoonzero
Copy link
Contributor

@typhoonzero typhoonzero commented Sep 19, 2017

Fix #4194
Fix #4187

TODO list all done, please review, thanks~

@typhoonzero typhoonzero changed the title Cudnn conv op [WIP] Cudnn conv op Sep 19, 2017
@typhoonzero typhoonzero changed the title [WIP] Cudnn conv op Cudnn conv op Sep 25, 2017
@qingqing01 qingqing01 requested review from jacquesqiao and reyoung and removed request for jacquesqiao September 25, 2017 10:03
@chengduoZH
Copy link
Contributor

fix #3691

@chengduoZH chengduoZH closed this Sep 25, 2017
@chengduoZH chengduoZH reopened this Sep 25, 2017
@@ -19,6 +19,7 @@ limitations under the License. */
#include <stdexcept>
#include <string>
#include <vector>
#include "paddle/gserver/dataproviders/MultiDataProvider.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this file in this PR.

@@ -0,0 +1,110 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文件命名:conv实现可以统一conv开头,这样所有conv相关的代码都放在一起。

return output_size;
}

class CudnnConvOp : public framework::OperatorWithKernel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Op的定义可以和ConvGemm公共一个吧

}
};

class CudnnConvGradOp : public framework::OperatorWithKernel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同forward的定义,可以和conv gemm公用一个

@@ -0,0 +1,260 @@
/* Copyright (c) 2016 PaddlePaddle Authors All Rights Reserve.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudnn实现可以用g++编译,直接用.cc是不是就行了,但是注册GPU kernel时,又需要在.cu里,所以那还是这样吧..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,cudnn的API都是host执行的,但API调用后是在device执行的。

cudnn_output_grad_desc, cudnn_conv_desc,
// dxDesc: Handle to the previously initialized output tensor
// descriptor.
cudnn_input_grad_desc, CUDNN_CONVOLUTION_BWD_DATA_PREFER_FASTEST,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDNN_CONVOLUTION_BWD_DATA_PREFER_FASTEST使用和forward同样的问题。

cudnn_workspace = paddle::memory::Alloc(gpu, workspace_size_in_bytes);
// ------------------- cudnn conv backward data ---------------------
// FIXME(typhoonzero): template type T may not be the same as cudnn call.
float alpha = 1.0f, beta = 0.0f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

T* input_grad_data = input_grad->mutable_data<T>(ctx.GetPlace());
PADDLE_ENFORCE(platform::dynload::cudnnConvolutionBackwardData(
h, &alpha, cudnn_filter_desc, filter_data + i * group_offset_filter,
cudnn_output_grad_desc, output_grad_data + i * group_offset_Y,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉cudnn_xxx_desc可以把cudnn_前缀去了,名字会短一些。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudnn_xxx_desc 都是诸如cudnnTensorDescriptor_t的类型,xxx_desc的名称是诸如ScopedTensorDescriptor的类型。cudnn_xxx_desc也会多次调用,所以保存一个临时变量也是有必要的。

using Tensor = framework::Tensor;

// FIXME(typhoonzer): If CudnnConvOp is running on CPU
// reuse the code from gemm_conv2d_op.h.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

觉得这个可以不用写,GPU上GEMM Conv、CUDNN Conv的选择可以后续Python里(或其他方式)自动选择。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加这个是考虑如果用户配置了使用cudnn conv op,并且需要测试cpu上的执行这样用户的程序可以不用做修改。感觉也没必要增加在pybind中注册类似USE_GPU_ONLY_OP

const std::vector<int>& dims) {
// the format is not used now, but it maybe useful feature
const std::vector<int>& dims,
const int groups = 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScopedTensorDescriptor不止Conv层可以用,Pooling/BatchNorm均可用,所以这里的group是不是应该在conv op里处理,而不是这个接口处理

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个groups觉得还是移到cudnn_conv_op里处理吧

void InferShape(const framework::InferShapeContext &ctx) const override {
auto in = ctx.Input<Tensor>("Input");
auto filter = ctx.Input<Tensor>("Filter");
auto out = ctx.Output<framework::Tensor>("Output");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

从ctx获取的数据要进行非空检测

    PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("Input"),
                            "Input(Input) of CudnnConvOp should not be null.");
    PADDLE_ENFORCE_NOT_NULL(ctx.OutputVar("Output"),
                            "Output(Output) of CudnnConvOp should not be null.");

void InferShape(const framework::InferShapeContext &ctx) const override {
auto in = ctx.Input<Tensor>("Input");
auto filter = ctx.Input<Tensor>("Filter");
auto out = ctx.Output<framework::Tensor>("Output");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto out = ctx.Output<framework::Tensor>("Output"); => auto out = ctx.Output<Tensor>("Output");

auto d_in = ctx.Output<framework::Tensor>(framework::GradVarName("Input"));
auto d_filter =
ctx.Output<framework::Tensor>(framework::GradVarName("Filter"));
if (d_in) d_in->Resize(in->dims());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 同上,从ctx获取数据进行非空检测
  2. 另外:
auto d_in = ctx.Output<framework::Tensor>(framework::GradVarName("Input"));
auto d_filter =  ctx.Output<framework::Tensor>(framework::GradVarName("Filter"));

=>

auto d_in = ctx.Output<Tensor>(framework::GradVarName("Input"));
auto d_filter =  ctx.Output<Tensor>(framework::GradVarName("Filter"));

AddAttr<std::vector<int>>("strides", "").SetDefault(std::vector<int>{});
AddAttr<std::vector<int>>("paddings", "paddings of convolution operator.")
.SetDefault(std::vector<int>{});
// FIXME(typhoonzero): cudnn doesn't support "group" Attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dilations default{1,1},strides default{1,1},paddings default{0,0}

@@ -0,0 +1,110 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conv => conv2d,下同

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成了conv_cudnn_op.x 后续考虑是否可以和3d合并?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得2d 和 3d的可以合并

@typhoonzero
Copy link
Contributor Author

typhoonzero commented Sep 29, 2017

@qingqing01 I've encountered some wired problem, I move dims update with groups to conv_cudnn_op.cu instead of in cudnn_helper.h:

// TensorDimWithGroups convert 4d Tensor dims to dims split by groups.
std::vector<int> TensorDimWithGroups(const framework::DDim& dims,
                                     const int groups) {
  // Update tensor descriptor dims setting if groups > 1
  // FIXME(typhoonzero): Assume using NCHW order
  std::vector<int> ret = Dims2Vector(dims);
  if (groups > 1) {
    ret[1] = ret[1] / groups;
  }
  return ret;
}

// FilterDimWithGroups convert filter dims to dims split by groups.
std::vector<int> FilterDimWithGroups(const framework::DDim& dims,
                                     const int groups) {
  // filter layout: MCHW
  std::vector<int> ret = Dims2Vector(dims);
  if (groups > 1) {
    ret[0] = ret[0] / groups;
  }
  return ret;
}

and call filter_desc.descriptor<T>(layout, FilterDimWithGroups(filter->dims(), groups));

Then running gradient check fails like:

AssertionError: Gradient Check On GPUPlace(0) Variable Filter@GRAD max gradient diff 0.148416 over limit 0.050000, the first error element is 0

but put modifying dims in cudnn_helper.h is fine. The updated dims of both methods are all same.

Tested inline functions have nothing to do with this error.

REGISTER_OP(conv_cudnn, ops::CudnnConvOp, ops::CudnnConvOpMaker,
conv_cudnn_grad, ops::CudnnConvGradOp);
REGISTER_OP_CPU_KERNEL(conv_cudnn,
ops::CudnnConvKernel<paddle::platform::CPUPlace, float>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary, you can directly register "GemmConv2DKernel", like this:

REGISTER_OP_CPU_KERNEL(conv_cudnn,
                        ops:: GemmConv2DKernel <paddle::platform::CPUPlace, float>);

And

REGISTER_OP(conv_cudnn, ops:: Conv2DOp, ops::CudnnConvOpMaker,
           conv_cudnn_grad, ops:: Conv2DOpGrad);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point will try.

void ConvInferShape(framework::InferShapeContextBase* ctx);
void ConvGradInferShape(framework::InferShapeContextBase* ctx);

template <typename Place, typename T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not appropriate to carry out the function alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@typhoonzero
Copy link
Contributor Author

Need to update this PR to get cudnn handle after #4593
merged

T alpha = 1.0f, beta = 0.0f;
if (input_grad) {
for (int i = 0; i < groups; i++) {
T* input_grad_data = input_grad->mutable_data<T>(ctx.GetPlace());
Copy link
Contributor

@chengduoZH chengduoZH Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 238 should be moved out of loop.
And input_grad should be clean cleared. like this https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/lookup_table_op.h#L60

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

// ------------------- cudnn conv backward filter ---------------------
if (filter_grad) {
for (int i = 0; i < groups; i++) {
T* filter_grad_data = filter_grad->mutable_data<T>(ctx.GetPlace());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same above.

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外,后续需要增加一个对比不同实现结果一致的单测,不止是conv需要,其他的一些op也需要,这个可以后续做~

@@ -24,6 +24,38 @@ namespace operators {

using Tensor = framework::Tensor;

// Base convolution operator definations for other conv
// like operators to reuse the implementation.
inline int outputSize(int input_size, int filter_size, int padding,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputSize -> OutputSize

const std::vector<int>& dims) {
// the format is not used now, but it maybe useful feature
const std::vector<int>& dims,
const int groups = 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个groups觉得还是移到cudnn_conv_op里处理吧


namespace paddle {
namespace operators {} // namespace operators
} // namespace paddle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle/operators/conv_cudnn_op.h可以去掉吧,在conv_cudnn.cc/cu里直接使用paddle/operators/conv2d_op.h就可以吧。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个groups觉得还是移到cudnn_conv_op里处理吧

前面有个comment,groups的处理挪出去会有一个诡异的bug,后续跟进下,可否先放在这里,其他op在使用的时候,不加group参数和原先的表现一致。

这个merge后创建一个issue,后续PR fix这个问题。

@@ -14,6 +14,7 @@ limitations under the License. */

#pragma once

#include <iostream>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line.

CudnnConvOpMaker(framework::OpProto* proto,
framework::OpAttrChecker* op_checker)
: Conv2DOpMaker(proto, op_checker) {
AddAttr<std::vector<int>>("dilations", "paddings of convolution operator.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is not correct for attr dilations.

: Conv2DOpMaker(proto, op_checker) {
AddAttr<std::vector<int>>("dilations", "paddings of convolution operator.")
.SetDefault(std::vector<int>{1, 1});
AddAttr<int>("workspace_size_MB", "workspace size for cudnn, in MB.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need more detailed comments.

distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License和conv_cudnn_op.cc 保持一致,去掉缩进。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

保持和其他一致,conv_cudnn_op.cc增加了缩进。


int group_offset_X = input_channels / groups * input_height * input_width;
int group_offset_Y =
output_channels / groups * output_height * output_width;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_offset_X -> group_offset_in
group_offset_Y -> group_offset_out

命名保持一致,不要X、input混用,Y、output混用。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Done.

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve this PR, some remaining issues can be updated later.

@typhoonzero typhoonzero merged commit a3ccbdb into PaddlePaddle:develop Oct 12, 2017
@typhoonzero typhoonzero deleted the cudnn_conv_op branch December 22, 2017 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants