-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design Of Refactor Topology #1665
Changes from 1 commit
0e92fbf
f5a14b4
cf2d77c
a09299a
52d43cd
b79af86
cab093d
d30c033
f001bc9
857f752
1cfd1da
b922b00
b3a3b0e
4a94baa
d346d49
3e5d22a
e3d0fa6
7d440eb
4ac8719
386133a
ff63670
12a430a
7ce9fd5
03184c1
4acd579
a109c54
e99e19c
6b8893e
726ba05
d4ccdea
bb562b6
bb68fda
ccf5d7d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# 在Protobuf中支持多种类型的字典字段 | ||
|
||
## 背景 | ||
|
||
这项工作的背景是我们要使用代码生成器或者运行时自动生成模型配置函数,并在运行时自动检查配置的正确性。 | ||
|
||
|
||
现阶段如何编写一个Layer呢?可以参考[文章](http://www.paddlepaddle.org/doc/dev/new_layer/index.html)。主体可以分为一下几个步骤: | ||
|
||
* 在[Protobuf文件](https://github.com/PaddlePaddle/Paddle/blob/develop/proto/ModelConfig.proto#L284)里,添加里面编写这个Layer需要的参数。如果这个Layer只需要size等常见配置,这个protobuf已经包含,复用即可。但是如果这个Layer有其他自定义的参数,就需要在这个文件里面添加字段。 | ||
* 也就是目前新建Layer和修改Protobuf文件是强耦合的。且这个protobuf message已经有了52个字段了。 | ||
* 在C++端实现Layer | ||
* 在Python端实现这个Layer的解析函数,Wrapper,V2Layer等等。 | ||
|
||
|
||
这个设计文档,旨在解决 Protobuf文件和Layer之间的耦合性,让用户新建一个Layer的时候不需要改Protobuf。并且,极大的简化Protobuf文件。 | ||
|
||
## 实现方式 | ||
|
||
使用Protobuf中的[map](https://developers.google.com/protocol-buffers/docs/proto#maps)和[oneof](https://developers.google.com/protocol-buffers/docs/proto#oneof)将Paddle Potobuf中的配置简化成一个`map<string, variant>`形式。 | ||
|
||
简单的示例代码为: | ||
|
||
```protobuf | ||
message Attribute { | ||
oneof AttributeField { | ||
string s_value = 1; | ||
int i_value = 2; | ||
float f_value = 3; | ||
double d_value = 4; | ||
... | ||
} | ||
} | ||
|
||
message LayerConfig { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这里说是 "简单的示例代码", LayerConfig其实并不全吧? 最终会降 https://github.com/PaddlePaddle/Paddle/blob/develop/proto/ModelConfig.proto#L284 里所有的属性都以代码生成吗? 主要是指 LayerInputConfig, OperatorConfig等 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 有道理。。LayerInputConfig,OperatorConfig我没考虑周全,那个是一个repeated所以不能这么搞。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 已经添加实例 |
||
required string name = 1; | ||
required string type = 2; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 是否需要description字段,文字描述这个layer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这里的LayerConfig是每一个Layer实际的一些参数,例如某一个具体的fc_layer的size是多大,activation是啥。 Description写到了LayerDef里面,LayerDef是说一个FC Layer可以有哪些参数,在那里面加一个Description作为Layer的注释即可。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 明白了! |
||
map<string, Attribute> attributes = 3; | ||
} | ||
``` | ||
|
||
其中,每种Layer都有不同的`type`。 而`attributes`作为一个`map`,他的Key可以被每个Layer自己定义。对于一些常见的配置参数,例如`activation`,可以共享一个key。对于一些layer专有属性,可以使用`.`分隔开。例如,对于CTCLayer可以设置`blank`属性,它的Key可以为`ctc.blank`。 | ||
|
||
这样,实现一个新的Layer,用户就不需要修改Protobuf消息了。并且,用户写一个新的Layer的时候,可以说明自己需要哪些属性,而这些属性的取值范围又是如何的。这样,我们在生成Python配置函数的代码时,可以生成运行时检查的代码。避免用户错误的配置神经网络。 | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这里是用户在C++代码里面说明Layer的属性吗? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 参见第一个 |
||
|
||
## 实现问题 | ||
|
||
实现这项工作目前来看有如下几个先决条件需要解决: | ||
|
||
* 这项工作会修改 `Python <==> Paddle core`中间的protobuf消息定义,对于Python端Layer解析函数,需要有覆盖完整的单元测试,才能保证这一步工作进行完之后,系统行为没有问题。否则,直接修改 Protobuf 风险较高。 | ||
* `oneof`与`map`是`protobuf2`语法,但是这是在`Protobuf 3.0`之后的代码库中添加的功能,如果Paddle依赖这个功能,那么Paddle必须依赖Protobuf 3.0以上的Protobuf版本。 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 看到有地方是oneof是比较晚的protobuf2版本才支持,这个有说明文章的链接么,了解下细节 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://developers.google.com/protocol-buffers/docs/proto#oneof 我记得官网之前有一些说明,但是现在我找不到了,你可以google再搜一下。。 map和oneof应该是protobuf2的library都不支持的,只有protobuf3的library支持。但是map和oneof是proto2的语法(Syntax)。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 嗯,好的,我刚才也是搜了下没找到,不过明确了就行了 |
||
|
||
|
||
## 总结 | ||
|
||
* 最终目的: 用户只需要写Layer的C++实现,剩下的Python代码自动生成 | ||
* 阶段目的: 解耦合 Protobuf与Layer的C++实现 | ||
* 解决办法: 用`map`和`oneof`,将属性变成一个多种类型的字典 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这个字典是否可以考虑直接用C++里面的map来做呢?即给每一个Layer定义一个map成员变量,用来描述属性,干脆移除proto中LayerConfig,Attribute这两个message。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 理论上当然可以。但是现实里会比较麻烦。 Protobuf在这里只是用来做多语言之间的通信协议。如果不使用这个通信协议,那就要直接调用C++的函数。对于比较复杂的消息类型,直接调用C++函数还挺麻烦的。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 我们需要支持动态图,那还需要确保构造模型的overhead不大。python的protobuf很慢,所以也可以考虑直接通过C++API来构造图是否可行。 |
||
* 问题: | ||
* 需要先完善config_parser的单测,增加单测覆盖率 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 新旧两种方式又共存的可能性么,比如老的还在,新的重新在另外一个地方加入,但是parse的时候拼接在一起,然后逐步替换,还是说直接修改老的config_parser更加合理 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 真正实现的时候,肯定还是得一个Attribute一个Attribute的改(估计不是一个Layer一个Layer的改)。。所以会是一个渐近的过程,不会有突变。 |
||
* 这会让Paddle强制依赖`Protobuf 3.0+`的Protobuf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一下 => 以下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.