-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDNN v8 Implementation of Convolution Kernels #47454
Conversation
8ffdb06
to
24b78bd
Compare
fe24aac
to
fe4c630
Compare
namespace phi { | ||
namespace autotune { | ||
|
||
class CudnnFrontendPlanCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前版本,AlgorithmsCache
和CudnnAlgorithmsCacheMap
存在太多的共同代码,#47667 尝试通过继承的方式重写,后续可更新下。
} | ||
|
||
private: | ||
static cudnn_frontend::feature_vector_t MakeKey( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudnn frontend API本身就提供了cache key的计算方式?相比当前ConvCacheKey
,主要差别在哪里呢?
struct ConvCacheKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
本质上没有大的区别,只是API的区别。cudnn v7的编程模式是自己去维护convolution 相关的descriptors, 自己去做数据结构去存各种参数,自己去实现key和cache。CUDNN Frontend API提供了更多的封装,面向对象的设计。在v7里面,这个ConvCacheKey 是通过ConvArgs里面的ConvertToConvCacheKey实现的。但在v8里面我们不再用这个ConvArgs管理descriptor了,而是用v8的class。那么最方便的就是使用提供好的key实现方式。
std::make_pair(MakeKey(op_graph, use_addto), plan.GetEngineConfig())); | ||
} | ||
|
||
bool IsStable(const cudnn_frontend::OperationGraph& op_graph, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没太理解,这个函数的作用是什么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有可能穷举搜索的结果不稳定,例如某5次搜索得到的algo是: [1,1,0,0,0],相比于只信任某一次搜索的结果,我们提供一个选项,可以设定一个saturation count,设为N. 某一个algo至少在N次搜索中得到最佳结果,才会被加到cache中去。上面的例子,如果N取3,那么algo1就不会加入cache,0会被认为是最快的algo并加入cache. 这个函数是用来判断是否达到saturation count的,如果返回true才会加到cache。
padding_common[i] = paddings[2 * i]; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处大断代码和v7分支重复,会使后续功能维护变得很困难。请考虑一下细粒度一些的封装方案。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已重构。
a4a45d3
to
1c96dfc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work~
bool ret = false; | ||
std::lock_guard<std::mutex> lock(*cache_mutex_); | ||
auto key = op_graph.getFeatureVector(); | ||
if (map_.count(MakeKey(op_graph, use_addto)) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MakeKey
这个函数需要多次重复调用吗,会有开销吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要多次调用。该函数定义是implicitly inline的,所以还好。
CUDNNv8 implementation
- move functions in conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu - add const specifier for input tensor - add logging when plans fail to execute - move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h
c0364b5
to
ec83533
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 相关功能和代码建议后续进一步优化,一方面和cudnn v7当前行为保持一致,另一方便后续便于将Frontend API应用到更多算子。
|
||
#include <vector> | ||
|
||
#include "paddle/fluid/framework/convert_utils.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最近PHI算子库在做解耦Fluid依赖的工作,convert_utils.h刚刚清理完毕,参考PR:#48001 这里建议参考该PR的方式把这个头文件移除
#include <vector> | ||
|
||
#include "paddle/fluid/framework/convert_utils.h" | ||
#include "paddle/fluid/platform/device/gpu/cuda/cudnn_desc.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里还是不建议引入额外的Fluid头文件到PHi下,可以先在phi的backends/gpu/cuda目录下建立同样的头文件,把需要用到的函数拷贝过来,我看了这个文件里用到的应该不多,比较好处理
.setStrides(strides.size(), strides.data()) | ||
.setId(id) | ||
.setAlignment(GetAlignment(tensor)) | ||
.setDataType(paddle::platform::ToCudnnDataType( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里或许可以这样改:phi下新加一个ToCudnnDataType,逻辑和fluid下的一样,只不过传入的datatype类型不是proto::Vartype,而是phi的DataType
该PR先合入,Fluid头文件让相关同学进行清理 |
PR types
New features
PR changes
OPs
Describe
We are making a transition from the legacy CUDNNv7 APIs into the latest CUDNN frontend API which is recommended for CUDNNv8 and later. CUDNN frontend API provides easier programming interface along with modern functionalities like autotuning and errata filter(blocks certain engine configs by json files), providing much more flexibility.
As a first step, this merge implements CUDNNv8 APIs for the convolution operator, both forward and backward.
WITH_CUDNN_FRONTEND
toON
. Currently, the default value of this option isOFF
.FLAGS_enable_cudnn_frontend=1
.