Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conv2d support bfloat16 #32221

Merged
merged 16 commits into from
Jun 2, 2021
Merged

Conversation

Avin0323
Copy link
Contributor

@Avin0323 Avin0323 commented Apr 13, 2021

PR types

Others

PR changes

Others

Describe

PR功能

  • conv2dconv2d_gradconv2d_grad_graddepthwise_conv2d_grad_grad共四个OP添加bfloat16数据类型支持;
  • test_conv2d_op.py单测中,添加create_test_cudnn_bf16_class用于测试conv2dbfloat16数据类型下的单测;
  • OpTest单测框架中,check_output_with_place方法检查前向计算结果时,针对bfloat16类型,使用相对误差替代绝对误差;

PR改动

  • conv2dCUDNN下添加bfloat16数据类型支持
  1. SearchAlgorithm<cudnnConvolutionFwdAlgoPerf_t>逻辑中,添加对于bfloat16数据类型处理,即当使用bfloat16时,cudnnSetConvolutionMathType调用中使用CUDNN_DEFAULT_MATH;
  2. GetExpectedKernelType添加运行时检查逻辑,当使用bfloat16类型时,限制libraryframework::LibraryType::kCUDNN以及platform::CudnnVersion()大于8100;
  3. 编译期根据CUDNN_VERSION判断是否编译进行bfloat16类型的Kernel注册,只有cudnn 8.1及以上版本编译时才注册;
  • test_conv2d_op.py中添加bfloat16数据类型单测
  1. 利用 relu supports bfloat16 data type #32542 中对OpTest单测框架在bfloat16类型的支持,添加create_test_cudnn_bf16_class方法拓展conv2dbfloat16类型上的单测,test_check_output用来测试前向计算,test_check_grad_no_filtertest_check_filter_no_grad用来测试反向计算;
  2. TestConv2DOP中针对bfloat16类型测试:
    1. 作为参考值output使用float类型计算及表示,inputfilter使用convert_float_to_uint16float类型转换为uint16类型标识(在paddle内则按bfloat16处理);
    2. self.inputs_fp32记录inputfilter原始float类型,该记录用于后续计算反向参考值使用;
  3. test_check_grad_no_filtertest_check_filter_no_grad检查反向时,使用OpTest单测框架中提供的get_numeric_gradient方法完成,与其他类型检查不同地方在于inputs参数使用self.inputs_fp32,从而减小数据多次转换带来的误差;
  • OpTest单测框架检查前向时使用相对误差

check_output_with_place中,检查精度使用numpy.allclose方法,在该方法调用中添加rtol参数,当数据类型为bfloat16时,设置rtol=1e-2,其他情况设置rtol=1e-5(默认值);

自测结果

  • 单测中设置rtol=1e-2、atol=1e-2,测试bfloat16数据类型对应前向、反向计算结果,本地及CI均测试通过;
  • 单测中使用bfloat16类型检查前向、反向结果使用的网络如下图所示,可以看到只有conv2d使用bfloat16类型(目前uint16表示bfloat16):

image
image

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Avin0323 Avin0323 marked this pull request as draft April 22, 2021 15:12
@Avin0323 Avin0323 marked this pull request as ready for review April 22, 2021 15:12
@Avin0323 Avin0323 marked this pull request as draft April 22, 2021 15:14
@Avin0323 Avin0323 marked this pull request as ready for review April 22, 2021 15:14
@Avin0323 Avin0323 marked this pull request as draft April 23, 2021 02:35
@Avin0323 Avin0323 marked this pull request as ready for review April 23, 2021 02:35
@AshburnLee
Copy link
Contributor

Describe中可以提供下单测结果

static const cudnnDataType_t type = CUDNN_DATA_BFLOAT16;
#else
static const cudnnDataType_t type = CUDNN_DATA_HALF;
#endif
Copy link
Contributor

@AshburnLee AshburnLee Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#else分支不需要吧。当cudnn版本 < 8.1时,整个class应该不被编译。所以是不是在class整体头尾分别加上#if和#endif。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为conv2d对于bfloat16需要编译成功,代码逻辑中CudnnDataType<bfloat16>部分会被实例化,如果将整个模板特化使用预处理会出现编译失败问题。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudnn8.1版本以下也不该用half类型,应该直接挂掉。另外,你加的都是#ifdef做编译时判断,运行时判断也要加一下?可以参考(但也不完整,需要加下cudnn version的判断):

if (input_data_type == framework::proto::VarType::FP16) {
PADDLE_ENFORCE_EQ(library, framework::LibraryType::kCUDNN,
platform::errors::InvalidArgument(
"float16 can only be used when CUDNN is used"));
}

@Avin0323 Avin0323 changed the title [WIP]conv2d support bfloat16 conv2d support bfloat16 Apr 26, 2021
@Avin0323
Copy link
Contributor Author

Describe中可以提供下单测结果

done

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudnn bf16使用TensorCore计算,cudnnSetConvolutionMathType需要设置特定的值吗?

@@ -51,6 +51,13 @@ template <typename T>
class CUDNNConvOpKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
#if CUDNN_VERSION_MIN(8, 1, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个检查能不能放到一个公共的地方,比如CudnnDataType<bfloat16>里面?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CudnnDataType<bfloat16>里只能做编译期检查,这里直接改为cudnn8.1以下不添加bfloat16数据类型的Kernel。

static const cudnnDataType_t type = CUDNN_DATA_BFLOAT16;
#else
static const cudnnDataType_t type = CUDNN_DATA_HALF;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudnn8.1版本以下也不该用half类型,应该直接挂掉。另外,你加的都是#ifdef做编译时判断,运行时判断也要加一下?可以参考(但也不完整,需要加下cudnn version的判断):

if (input_data_type == framework::proto::VarType::FP16) {
PADDLE_ENFORCE_EQ(library, framework::LibraryType::kCUDNN,
platform::errors::InvalidArgument(
"float16 can only be used when CUDNN is used"));
}

create_test_cudnn_bf16_class(TestWithStride, grad_check=False)
create_test_cudnn_bf16_class(TestWithGroup, grad_check=False)
create_test_cudnn_bf16_class(TestWith1x1, grad_check=False)
create_test_cudnn_bf16_class(TestWithInput1x1Filter1x1, grad_check=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

都不检查梯度?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前试参考cpu上bf16测试,重新commit代码已默认添加反向测试。

@@ -167,6 +167,37 @@ def test_check_grad_no_input(self):
globals()[cls_name] = TestConv2DCUDNNFp16


def create_test_cudnn_bf16_class(parent, grad_check=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conv的测试不需要依赖OpTest单测框架的增强?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要的,目前已merge最新代码,同步OpTest单测框架改动。

@paddle-bot-old
Copy link

paddle-bot-old bot commented May 3, 2021

Sorry to inform you that cd612c5's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@@ -32,7 +32,8 @@ def set_confs(self):
def test_check_output(self):
for use_seq in {True, False}:
self.attrs['use_seq'] = use_seq
self.check_output(check_dygraph=False, no_check_set=["Cell"])
self.check_output(
check_dygraph=False, no_check_set=["Cell"], atol=2e-2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里指明了atol,因为你把op_test.py中的atol值改了。这样还是会影响到其他op的单测吧,我觉得最好不改op_test.py,重写OpTest函数就不会影响到其他op单测了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里只影响bfloat16前向精度测试,之前单测框架中写死用0.03,PR的修改只是取消这种固定值,在有需要的各个单测中指定即可。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改了op单测中的精度检查方式,影响了mkldnn的单测,请@luotao1 review一下。

platform::errors::InvalidArgument(
"bfloat16 can only be used when CUDNN is used"));
#else
PADDLE_ENFORCE_NE(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 这个else的逻辑似乎没有必要,永远都不会走到的样子。
  • 需要检查运行时的cudnn version是不是>=8.1.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -167,6 +168,52 @@ def test_check_grad_no_input(self):
globals()[cls_name] = TestConv2DCUDNNFp16


def create_test_cudnn_bf16_class(parent, check_grad=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是不是应该设置check_grad=True,以及梯度检查相关的变量no_need_check_grad?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
self.inputs_fp32 = {
'Input': OpTest.np_dtype_to_fluid_dtype(input),
'Filter': OpTest.np_dtype_to_fluid_dtype(filter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这是还构造了fp32的conv2d?在PR描述里面说明一下单测检查的逻辑吧。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -32,7 +32,8 @@ def set_confs(self):
def test_check_output(self):
for use_seq in {True, False}:
self.attrs['use_seq'] = use_seq
self.check_output(check_dygraph=False, no_check_set=["Cell"])
self.check_output(
check_dygraph=False, no_check_set=["Cell"], atol=2e-2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改了op单测中的精度检查方式,影响了mkldnn的单测,请@luotao1 review一下。

@@ -32,7 +32,8 @@ def set_confs(self):
def test_check_output(self):
for use_seq in {True, False}:
self.attrs['use_seq'] = use_seq
self.check_output(check_dygraph=False, no_check_set=["Cell"])
self.check_output(
check_dygraph=False, no_check_set=["Cell"], atol=2e-2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In file python/paddle/fluid/tests/unittests/op_test.py, atol = 0.03 is not a good way to check forward accuracy. This PR modified the relative error of checking the accuracy of bfload16 data type and deleted the limitation of 0.03. And add atol = 2e-2 here to keep the same accuracy limit as before to ensure the test pass.

luotao1
luotao1 previously approved these changes May 26, 2021
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -1413,6 +1413,31 @@ REGISTER_OP_KERNEL(
paddle::operators::CUDNNConvDoubleGradOpKernel<float>,
paddle::operators::CUDNNConvDoubleGradOpKernel<plat::float16>);
#else
#if CUDNN_VERSION_MIN(8, 1, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注册的代码超过100+行了,可以简化下。这些注册无非3种类型:

  • CUDA,CUDNN < 8.1,支持float、double、float16
  • CUDA,CUDNN >= 8.1,支持float、double、float16、bfloat16
  • ROCM,支持float、float16

可以定义一些注册的宏,比如:REGISTER_CONV_CUDNN_KERNEL_WITH_FP64_BF16、REGISTER_CONV_CUDNN_KERNEL_WITH_FP64、REGISTER_CONV_CUDNN_KERNEL_WITH_BF16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,后续跟进。


def init_kernel_type(self):
self.use_cudnn = True
self.no_need_check_grad = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.no_need_check_grad = True还保留在,有什么影响吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个主要是防止父类里的单测被执行到。

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for skip unittest

Copy link
Collaborator

@phlrain phlrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for check_dygraph

@Xreki Xreki merged commit 5981bee into PaddlePaddle:develop Jun 2, 2021
@Avin0323 Avin0323 deleted the conv2d-support-bf16 branch June 2, 2021 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants