From 0e92fbfb2bf21287bfd6270b793165c371d46f1f Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Tue, 21 Mar 2017 12:20:30 +0800
Subject: [PATCH 01/27] Design Draft for using map in protobuf.

---
 .../01.use_map_in_protobuf.md                 | 62 +++++++++++++++++++
 1 file changed, 62 insertions(+)
 create mode 100644 doc/design/layer_generation/01.use_map_in_protobuf.md

diff --git a/doc/design/layer_generation/01.use_map_in_protobuf.md b/doc/design/layer_generation/01.use_map_in_protobuf.md
new file mode 100644
index 0000000000000..4a3753533eb5a
--- /dev/null
+++ b/doc/design/layer_generation/01.use_map_in_protobuf.md
@@ -0,0 +1,62 @@
+# 在Protobuf中支持多种类型的字典字段
+
+## 背景
+
+这项工作的背景是我们要使用代码生成器或者运行时自动生成模型配置函数，并在运行时自动检查配置的正确性。
+
+
+现阶段如何编写一个Layer呢？可以参考[文章](http://www.paddlepaddle.org/doc/dev/new_layer/index.html)。主体可以分为一下几个步骤:
+
+* 在[Protobuf文件](https://github.com/PaddlePaddle/Paddle/blob/develop/proto/ModelConfig.proto#L284)里，添加里面编写这个Layer需要的参数。如果这个Layer只需要size等常见配置，这个protobuf已经包含，复用即可。但是如果这个Layer有其他自定义的参数，就需要在这个文件里面添加字段。
+	* 也就是目前新建Layer和修改Protobuf文件是强耦合的。且这个protobuf message已经有了52个字段了。
+* 在C++端实现Layer
+* 在Python端实现这个Layer的解析函数，Wrapper，V2Layer等等。
+
+
+这个设计文档，旨在解决 Protobuf文件和Layer之间的耦合性，让用户新建一个Layer的时候不需要改Protobuf。并且，极大的简化Protobuf文件。
+
+## 实现方式
+
+使用Protobuf中的[map](https://developers.google.com/protocol-buffers/docs/proto#maps)和[oneof](https://developers.google.com/protocol-buffers/docs/proto#oneof)将Paddle Potobuf中的配置简化成一个`map<string, variant>`形式。
+
+简单的示例代码为:
+
+```protobuf
+message Attribute {
+    oneof AttributeField {
+   	     string s_value = 1;
+   	     int    i_value = 2;
+   	     float  f_value = 3;
+   	     double d_value = 4;
+   	     ...
+    }
+}
+
+message LayerConfig {
+   required string name = 1;
+   required string type = 2;
+   map<string, Attribute> attributes = 3;
+}
+```
+
+其中，每种Layer都有不同的`type`。 而`attributes`作为一个`map`，他的Key可以被每个Layer自己定义。对于一些常见的配置参数，例如`activation`，可以共享一个key。对于一些layer专有属性，可以使用`.`分隔开。例如，对于CTCLayer可以设置`blank`属性，它的Key可以为`ctc.blank`。
+
+这样，实现一个新的Layer，用户就不需要修改Protobuf消息了。并且，用户写一个新的Layer的时候，可以说明自己需要哪些属性，而这些属性的取值范围又是如何的。这样，我们在生成Python配置函数的代码时，可以生成运行时检查的代码。避免用户错误的配置神经网络。
+
+
+## 实现问题
+
+实现这项工作目前来看有如下几个先决条件需要解决:
+
+* 这项工作会修改 `Python <==> Paddle core`中间的protobuf消息定义，对于Python端Layer解析函数，需要有覆盖完整的单元测试，才能保证这一步工作进行完之后，系统行为没有问题。否则，直接修改 Protobuf 风险较高。
+* `oneof`与`map`是`protobuf2`语法，但是这是在`Protobuf 3.0`之后的代码库中添加的功能，如果Paddle依赖这个功能，那么Paddle必须依赖Protobuf 3.0以上的Protobuf版本。
+
+
+## 总结
+
+* 最终目的: 用户只需要写Layer的C++实现，剩下的Python代码自动生成
+* 阶段目的: 解耦合 Protobuf与Layer的C++实现
+* 解决办法: 用`map`和`oneof`，将属性变成一个多种类型的字典
+* 问题:
+	* 需要先完善config_parser的单测，增加单测覆盖率
+	* 这会让Paddle强制依赖`Protobuf 3.0+`的Protobuf

From f5a14b4c6e41dce4ce03817330b24afaddea09ec Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Tue, 21 Mar 2017 14:42:40 +0800
Subject: [PATCH 02/27] Add whole design

---
 .../00.how_to_write_a_layer.md                | 158 ++++++++++++++++++
 1 file changed, 158 insertions(+)
 create mode 100644 doc/design/layer_generation/00.how_to_write_a_layer.md

diff --git a/doc/design/layer_generation/00.how_to_write_a_layer.md b/doc/design/layer_generation/00.how_to_write_a_layer.md
new file mode 100644
index 0000000000000..9c8de53773496
--- /dev/null
+++ b/doc/design/layer_generation/00.how_to_write_a_layer.md
@@ -0,0 +1,158 @@
+# 如何写一个Layer
+
+这个文档是一个概念性的文档，描述在重构后用户如何写一个Layer。
+
+## 基本目标
+
+用户只需要写Layer的计算信息，而不需要写配置解析器，也不修改写Protobuf的内容。就可以完成Layer的书写。
+
+## 实现方式
+
+### 总体概览
+
+* 在注册Layer的时候，不只注册Layer的C++类型，同时注册Layer的信息，这个信息使用Protobuf来表示。
+* 使用一个静态函数生成，Layer信息的Protobuf。
+
+
+### LayerDef/LayerOutputDef Protobuf.
+
+Paddle将Layer在C++端注册信息，声明成Protobuf。一个Layer的信息主体分为两个部分:
+
+* Layer本身的信息
+	* 包括这个Layer支持什么样类型的输入
+	* 这个Layer的参数，bias有哪些可以设置的属性
+	* 这个Layer本身有哪些可以设置的属性
+* Layer输出什么类型
+	* 这个Layer在某一种输入下，的输出类型是什么样子的。
+	* 由于Paddle的一个Layer可以接受和产生不同类型的输入和输出，Layer的输出类型(例如size)是和输入有关系的。所以这个信息是解析配置文件过程中运行时调用生成的。
+
+```protobuf
+enum DataType {
+  DENSE=0,
+  SPARSE_INT=1,
+  SPARSE=2,
+  INT=3,
+}
+
+enum AttributeType {
+  STRING=0,
+  INT=1,
+  FLOAT=2,
+  DOUBLE=3,
+  ...
+}
+
+message Attribute {
+  oneof {
+    string s_value = 1;
+    int    i_value = 2;
+    float  f_value = 3;
+    ...
+  }
+}
+
+message AttributeDef {
+  required string name = 1;  // supported attribute name.
+  required AttributeType type = 2;  // supported type.
+  required string description = 3; // Attribute description & comments.
+  
+  optional Attribute default_value = 4; // default value.
+  optional Attribute max_value = 5;    // max value.
+  optional Attribute min_value = 6;   // min value.
+}
+
+// Argument Define the Supported InputTypes.
+message ArgumentDef {
+   	// Supported Input Type.
+   	// The data type of input/output.
+   	repeated DataType data_type = 1; 
+   	// 0 means it is not a sequence. 1 means a plain sequence. 2 means a nested sequence.  One layer could support many sequence type.
+   	repeated uint32 seq_nested_level = 2;
+    	
+   	// In paddle, some layer can handle variable length input.
+   	// If some input is repeatable, it means there are one or many inputs as the same input type.
+   	required bool repeatable = 3;
+    	
+	// In Paddle, a layer could return many outputs. Each output contains a different name.
+   	required string name = 4;
+   	
+   	// Comments
+  	required string description = 5;
+}
+
+message LayerDef {
+    required string type = 1;  // Layer type, such as 'fc', 'conv'
+    required string description = 2;  // Layer description & comments.
+    
+    
+    repeated ArgumentDef inputs = 3;
+    
+    
+    message ParameterDef {
+        repeated AttributeDef attributes = 1;  // Parameter Attributes Definition.
+    }
+    
+    // Each input of Paddle Layer should contain zero or one parameter.
+    // so parameter_attr.size() == inputs.size()
+    repeated ParameterDef parameter_attr = 5;
+    
+    // Set the bias attribute, If this layer support bias.
+    optional ParameterDef bias_attr = 6;
+    
+    // The Layer Attributes.
+    repeated AttributeDef layer_attr = 7;
+}
+
+// Define the layer's output types by given input types.
+message LayerOutputDef {
+	// Output name, Each Paddle Layer could have multiple outputs.
+	optional string name = 1;
+	
+	// Output type
+	required DataType type = 2;
+	required uint32 size = 3;
+	required uint32 seq_nested_level = 4;
+	
+}
+```
+
+### C++ 端暴露LayerDef/LayerOutputDef Protobuf.
+
+基本想法:
+
+* 对于每一种类型的Layer，Paddle根据Layer的名字约定两个全局函数的名字。例如，对于FC Layer，全局函数的名字是 `__get_fc_layer_definition__` 和 `__get_fc_layer_output_definition__`。 这两个全局函数通过`REGISTER_LAYER`自动生成。
+* 对于每个Layer实现的时候，实现两个静态(`static`)函数，分别实现这两个函数。
+
+举例来说，例如对于FCLayer，可能的实现为:
+
+```C++
+
+class FCLayer :public Layer {
+public:
+  void init() { ... }
+  void forward() { ... }
+  void backward() { ... }
+  
+  static void getLayerDefinition(LayerDef& def) {
+    LayerDefinition::supportSize(def);
+    LayerDefinition::supportDropout(def);
+    LayerDefinition::addInput()
+        .setRepeatable(True)
+        .addSupport({ InputType::Dense, InputType::SparseInt, InputType::Sparse })
+        .addSupportSeqLevel({0, 1, 2})
+        .addDoc("FC Layer is fully connected. Blah blah blah...");
+  }
+  
+  static std::vector<LayerOutputDef> getLayerOutputDefinition(const std::vector<LayerOutputDef>& inputs,
+  	const LayerConfig& self) {
+    LayerOutputDef out;
+    out.set_size(self.size());
+    out.set_type(InputType::Dense);
+    out.set_seq_nested_level(inputs[0].seq_nested_level);
+    return { out };
+  }
+};
+
+
+REGISTER_LAYER(fc, FCLayer);
+```

From cf2d77c77042cf881fd4ef5eb0f7c210126fc1b5 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Tue, 21 Mar 2017 14:55:36 +0800
Subject: [PATCH 03/27] Typo

---
 doc/design/layer_generation/01.use_map_in_protobuf.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/design/layer_generation/01.use_map_in_protobuf.md b/doc/design/layer_generation/01.use_map_in_protobuf.md
index 4a3753533eb5a..3c471bb04bc66 100644
--- a/doc/design/layer_generation/01.use_map_in_protobuf.md
+++ b/doc/design/layer_generation/01.use_map_in_protobuf.md
@@ -5,7 +5,7 @@
 这项工作的背景是我们要使用代码生成器或者运行时自动生成模型配置函数，并在运行时自动检查配置的正确性。
 
 
-现阶段如何编写一个Layer呢？可以参考[文章](http://www.paddlepaddle.org/doc/dev/new_layer/index.html)。主体可以分为一下几个步骤:
+现阶段如何编写一个Layer呢？可以参考[文章](http://www.paddlepaddle.org/doc/dev/new_layer/index.html)。主体可以分为以下几个步骤:
 
 * 在[Protobuf文件](https://github.com/PaddlePaddle/Paddle/blob/develop/proto/ModelConfig.proto#L284)里，添加里面编写这个Layer需要的参数。如果这个Layer只需要size等常见配置，这个protobuf已经包含，复用即可。但是如果这个Layer有其他自定义的参数，就需要在这个文件里面添加字段。
 	* 也就是目前新建Layer和修改Protobuf文件是强耦合的。且这个protobuf message已经有了52个字段了。

From a09299aff0436ca3b2e66390878daf2b5a42a66a Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Tue, 21 Mar 2017 15:05:51 +0800
Subject: [PATCH 04/27] Make self mutable

---
 doc/design/layer_generation/00.how_to_write_a_layer.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/design/layer_generation/00.how_to_write_a_layer.md b/doc/design/layer_generation/00.how_to_write_a_layer.md
index 9c8de53773496..747517dc0aa7e 100644
--- a/doc/design/layer_generation/00.how_to_write_a_layer.md
+++ b/doc/design/layer_generation/00.how_to_write_a_layer.md
@@ -144,7 +144,8 @@ public:
   }
   
   static std::vector<LayerOutputDef> getLayerOutputDefinition(const std::vector<LayerOutputDef>& inputs,
-  	const LayerConfig& self) {
+  	    LayerConfig& self) {
+  	 // self could be modified, for calculating parameter size, etc.
     LayerOutputDef out;
     out.set_size(self.size());
     out.set_type(InputType::Dense);

From 52d43cd5a9d248a74561afea829bbcbb62737475 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Tue, 21 Mar 2017 15:43:03 +0800
Subject: [PATCH 05/27] Add invoke graph

---
 .../00.how_to_write_a_layer.md                | 20 ++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/doc/design/layer_generation/00.how_to_write_a_layer.md b/doc/design/layer_generation/00.how_to_write_a_layer.md
index 747517dc0aa7e..809f4ee8bf11b 100644
--- a/doc/design/layer_generation/00.how_to_write_a_layer.md
+++ b/doc/design/layer_generation/00.how_to_write_a_layer.md
@@ -122,6 +122,7 @@ message LayerOutputDef {
 
 * 对于每一种类型的Layer，Paddle根据Layer的名字约定两个全局函数的名字。例如，对于FC Layer，全局函数的名字是 `__get_fc_layer_definition__` 和 `__get_fc_layer_output_definition__`。 这两个全局函数通过`REGISTER_LAYER`自动生成。
 * 对于每个Layer实现的时候，实现两个静态(`static`)函数，分别实现这两个函数。
+* 对于获得LayerOutputDef的函数，其还有一个作用就是在运行时设置ParameterSize，动态添加辅助输入等等。
 
 举例来说，例如对于FCLayer，可能的实现为:
 
@@ -145,7 +146,7 @@ public:
   
   static std::vector<LayerOutputDef> getLayerOutputDefinition(const std::vector<LayerOutputDef>& inputs,
   	    LayerConfig& self) {
-  	 // self could be modified, for calculating parameter size, etc.
+  	    // self could be modified, for calculating parameter size, etc.
     LayerOutputDef out;
     out.set_size(self.size());
     out.set_type(InputType::Dense);
@@ -157,3 +158,20 @@ public:
 
 REGISTER_LAYER(fc, FCLayer);
 ```
+
+### 配置解析运行流程
+
+配置解析(config parser)的运行流程如下图所示:
+
+![配置解析运行流程](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/0a3d7bfb44e45d61d7bd80b26ca18fbc/raw/4177e2ca56f0410a65338a089cf4e37b9bb87c93/gistfile1.txt)
+
+1. 读取Paddle Core中所有的Layer Def。
+1. 根据所有LayerDef生成解析器ConfigParser
+	* 如何生成解析器是每个语言自定义的过程
+	* 这个过程可以是离线的过程。即先将所有Layer的LayerDef写入到一个文件里，然后其他语言读取这个文件，来生成代码。
+	* 这个过程同时也可以是在线的过程。比如对于Python这种动态类型语言，运行时生成函数比较简单，就没必要先生成代码，再生成函数了。
+1. 使用ConfigParser，解析用户的配置文件`trainer_config.conf`。
+	* 这时，解析器只返回一个调用图，即Layer与Layer之间的调用关系，而不返回真正的`ModelConfig`。
+1. 讲这个调用图传递给Paddle Core，生成真正的`ModelConfig`。
+	* 对于每一个Layer，顺序执行 `getLayerOutputDefinition`获得这个Layer的输出，传递给下一个Layer。
+	* 在C++端真正的生成每一个Layer的LayerConfig，在`getLayerOutputDefinition`中，用户可以对生成的LayerConfig进行修改。例如添加辅助输入，设置参数大小等等。

From b79af86aa1bffda67acd5e3f8a916f25a00f19a2 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Tue, 21 Mar 2017 17:08:25 +0800
Subject: [PATCH 06/27] Add More details

---
 .../00.how_to_write_a_layer.md                | 116 ++++++++++++++----
 1 file changed, 94 insertions(+), 22 deletions(-)

diff --git a/doc/design/layer_generation/00.how_to_write_a_layer.md b/doc/design/layer_generation/00.how_to_write_a_layer.md
index 809f4ee8bf11b..fa9e36c1c6cf6 100644
--- a/doc/design/layer_generation/00.how_to_write_a_layer.md
+++ b/doc/design/layer_generation/00.how_to_write_a_layer.md
@@ -10,21 +10,92 @@
 
 ### 总体概览
 
-* 在注册Layer的时候，不只注册Layer的C++类型，同时注册Layer的信息，这个信息使用Protobuf来表示。
-* 使用一个静态函数生成，Layer信息的Protobuf。
+* 在注册Layer的时候，不只注册Layer的C++类型，同时注册Layer的元信息，元信息使用Protobuf来表示。
+* 使用全局静态函数生成Layer的元信息。代码生成器通过Layer访问元信息来生成配置解析器(ConfigParser)
+* 将神经网络参数推导(每一个参数的size多大，输出size是多大)功能，移至Paddle C++ Core中
+
+### Layer元信息
+
+Paddle将**每种**Layer在C++端注册元信息，将元信息声明成Protobuf。
+
+主要的元信息有两个
+
+####  LayerDef
+* LayerDef 是描述了每**种**Layer的元信息，他包含每种Layer的类型名，注释，可以接受的输入类型，参数类型，Layer的其他属性。不包括这个Layer输出什么类型
+* 注意这是**元信息**。一个`LayerDef`描述了一**种**`Layer`的类型，而不是一**个**`Layer`的具体参数。
+* 同理，LayerDef中使用的 `ArgumentDef`描述的是某**一种输入参数的类型**，而不是某一个具体的输入参数是什么。`AttributeDef`是表示某一个属性(Attribute)的**类型**，而不是这个属性的具体参数。
+* 一个全连接层(FullyConnected， 下简写为FC)的LayerDef可能为
+
+```json
+{
+  "type": "fc",
+  "description": "Fully Connected Layer is the simplest layer in nerual network. ...",
+  "inputs" : [
+    {
+      "name": "input",
+      "description": "The input of fully connected layer, could be several.",
+      "data_type": ["Dense", "Sparse", "SparseInt", "Int"],
+      "seq_nested_level": [0, 1, 2],
+      "repeatable": true
+    }
+  ],
+  "parameter_attr": [
+    {
+      "attributes": [{
+        "name": "weight_decay",
+        "type": "float",
+        "description": "The weight decay rate of parameter, used to implement L2 Norm",
+        "default_value": 0.0,
+        "max_value": 1.0,
+        "min_value": 0.0
+      }, {
+        "name": "gradient_clipping",
+        "type": "float",
+        "description": "The gradient clipping threshold",
+        "default_value": 0.0,
+        "min_value": 0.0
+      }]
+    }
+  ],
+  "bias_attr": {
+    "attributes": [{
+      "name": "weight_decay",
+      "type": "float",
+      "description": "The weight decay rate of parameter, used to implement L2 Norm",
+      "default_value": 0.0,
+      "max_value": 1.0,
+      "min_value": 0.0
+    }]
+  },
+  "layer_attr":  [
+    {
+      "name": "dropout_rate",
+      "type": "float",
+      "description": "The dropout rate of this layer",
+      "default_value": 0.0,
+      "max_value": 1.0,
+      "min_value": 0.0
+    }
+  ]
+}
+```
+
+#### LayerOutputType
 
+* LayerOutputType 表示的是，某一个Layer输入输出具体是什么类型的(不是输入输出具体是什么值)。这是在运行时中计算出来的。
+* 某一个FC Layer的LayerOutputType可能是
 
-### LayerDef/LayerOutputDef Protobuf.
+```json
+{
+	"type": "Dense",
+	"size": 200,
+	"seq_nested_level": 2
+}
+```
 
-Paddle将Layer在C++端注册信息，声明成Protobuf。一个Layer的信息主体分为两个部分:
+#### Layer元信息的Protobuf定义
 
-* Layer本身的信息
-	* 包括这个Layer支持什么样类型的输入
-	* 这个Layer的参数，bias有哪些可以设置的属性
-	* 这个Layer本身有哪些可以设置的属性
-* Layer输出什么类型
-	* 这个Layer在某一种输入下，的输出类型是什么样子的。
-	* 由于Paddle的一个Layer可以接受和产生不同类型的输入和输出，Layer的输出类型(例如size)是和输入有关系的。所以这个信息是解析配置文件过程中运行时调用生成的。
+下面是Layer元信息的Protobuf定义。
 
 ```protobuf
 enum DataType {
@@ -104,7 +175,7 @@ message LayerDef {
 }
 
 // Define the layer's output types by given input types.
-message LayerOutputDef {
+message LayerOutputType {
 	// Output name, Each Paddle Layer could have multiple outputs.
 	optional string name = 1;
 	
@@ -116,13 +187,13 @@ message LayerOutputDef {
 }
 ```
 
-### C++ 端暴露LayerDef/LayerOutputDef Protobuf.
+### C++ 端暴露LayerDef/LayerOutputType Protobuf.
 
 基本想法:
 
-* 对于每一种类型的Layer，Paddle根据Layer的名字约定两个全局函数的名字。例如，对于FC Layer，全局函数的名字是 `__get_fc_layer_definition__` 和 `__get_fc_layer_output_definition__`。 这两个全局函数通过`REGISTER_LAYER`自动生成。
+* 对于每一种类型的Layer，Paddle根据Layer的名字约定两个全局函数的名字。例如，对于FC Layer，全局函数的名字是 `__get_fc_layer_definition__` 和 `__get_fc_layer_output_type__`。 这两个全局函数通过`REGISTER_LAYER`自动生成。
 * 对于每个Layer实现的时候，实现两个静态(`static`)函数，分别实现这两个函数。
-* 对于获得LayerOutputDef的函数，其还有一个作用就是在运行时设置ParameterSize，动态添加辅助输入等等。
+* 对于获得LayerOutputType的函数,同时完成**神经网络推导**过程。即在运行时设置ParameterSize，动态添加Layer的辅助输入等等。
 
 举例来说，例如对于FCLayer，可能的实现为:
 
@@ -144,9 +215,9 @@ public:
         .addDoc("FC Layer is fully connected. Blah blah blah...");
   }
   
-  static std::vector<LayerOutputDef> getLayerOutputDefinition(const std::vector<LayerOutputDef>& inputs,
+  static std::vector<LayerOutputType> getLayerOutputType(const std::vector<LayerOutputDef>& inputs,
   	    LayerConfig& self) {
-  	    // self could be modified, for calculating parameter size, etc.
+  	 // self could be modified, for calculating parameter size, etc.
     LayerOutputDef out;
     out.set_size(self.size());
     out.set_type(InputType::Dense);
@@ -165,13 +236,14 @@ REGISTER_LAYER(fc, FCLayer);
 
 ![配置解析运行流程](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/0a3d7bfb44e45d61d7bd80b26ca18fbc/raw/4177e2ca56f0410a65338a089cf4e37b9bb87c93/gistfile1.txt)
 
-1. 读取Paddle Core中所有的Layer Def。
-1. 根据所有LayerDef生成解析器ConfigParser
+1. 读取Paddle Core中所有的Layer的元信息， LayerDef。
+1. 根据所有Layer的元信息，LayerDefs生成解析器ConfigParser
 	* 如何生成解析器是每个语言自定义的过程
 	* 这个过程可以是离线的过程。即先将所有Layer的LayerDef写入到一个文件里，然后其他语言读取这个文件，来生成代码。
 	* 这个过程同时也可以是在线的过程。比如对于Python这种动态类型语言，运行时生成函数比较简单，就没必要先生成代码，再生成函数了。
 1. 使用ConfigParser，解析用户的配置文件`trainer_config.conf`。
-	* 这时，解析器只返回一个调用图，即Layer与Layer之间的调用关系，而不返回真正的`ModelConfig`。
+	* 这时，解析器只返回一个调用图，即Layer与Layer之间的调用关系(`Graph Protobuf`)，而不返回真正的`ModelConfig`。
+	* 这个Graph Protobuf非常简单，只包括调用了哪个Layer，设置了那个Attribute即可
 1. 讲这个调用图传递给Paddle Core，生成真正的`ModelConfig`。
-	* 对于每一个Layer，顺序执行 `getLayerOutputDefinition`获得这个Layer的输出，传递给下一个Layer。
-	* 在C++端真正的生成每一个Layer的LayerConfig，在`getLayerOutputDefinition`中，用户可以对生成的LayerConfig进行修改。例如添加辅助输入，设置参数大小等等。
+	* 对于`GraphProtobuf`中每一个项目，生成每一个LayerConfig。
+	* 进而顺序执行 `getLayerOutputType`获得这个Layer的输出，并完成神经网络参数推导过程。再将这个LayerConfig传递给下一个Layer。

From cab093d6e22f5e6c84f67150274a7e467eef2859 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Mon, 27 Mar 2017 11:35:13 +0800
Subject: [PATCH 07/27] Update Design.

GraphProto is a partial ModelConfig. Nothing else.
---
 doc/design/layer_generation/00.how_to_write_a_layer.md | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/doc/design/layer_generation/00.how_to_write_a_layer.md b/doc/design/layer_generation/00.how_to_write_a_layer.md
index fa9e36c1c6cf6..0e599eead234a 100644
--- a/doc/design/layer_generation/00.how_to_write_a_layer.md
+++ b/doc/design/layer_generation/00.how_to_write_a_layer.md
@@ -234,7 +234,7 @@ REGISTER_LAYER(fc, FCLayer);
 
 配置解析(config parser)的运行流程如下图所示:
 
-![配置解析运行流程](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/0a3d7bfb44e45d61d7bd80b26ca18fbc/raw/4177e2ca56f0410a65338a089cf4e37b9bb87c93/gistfile1.txt)
+![配置解析运行流程](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/0a3d7bfb44e45d61d7bd80b26ca18fbc/raw/7ad64cdfc31ba5a427a9d599e837af9fd3774138/parsing.dot)
 
 1. 读取Paddle Core中所有的Layer的元信息， LayerDef。
 1. 根据所有Layer的元信息，LayerDefs生成解析器ConfigParser
@@ -242,8 +242,7 @@ REGISTER_LAYER(fc, FCLayer);
 	* 这个过程可以是离线的过程。即先将所有Layer的LayerDef写入到一个文件里，然后其他语言读取这个文件，来生成代码。
 	* 这个过程同时也可以是在线的过程。比如对于Python这种动态类型语言，运行时生成函数比较简单，就没必要先生成代码，再生成函数了。
 1. 使用ConfigParser，解析用户的配置文件`trainer_config.conf`。
-	* 这时，解析器只返回一个调用图，即Layer与Layer之间的调用关系(`Graph Protobuf`)，而不返回真正的`ModelConfig`。
-	* 这个Graph Protobuf非常简单，只包括调用了哪个Layer，设置了那个Attribute即可
+	* 这时，解析器只返回一个不完整的`ModelConfig`。这个`ModelConfig`只包括用户在配置文件中的配置，而神经网络参数大小的推导在下一步解析中完成。
 1. 讲这个调用图传递给Paddle Core，生成真正的`ModelConfig`。
-	* 对于`GraphProtobuf`中每一个项目，生成每一个LayerConfig。
+	* 对于`ModelConfig`中每一个不完整的LayerConfig，补全默认值。
 	* 进而顺序执行 `getLayerOutputType`获得这个Layer的输出，并完成神经网络参数推导过程。再将这个LayerConfig传递给下一个Layer。

From d30c0336c2c3ab908d453eea48c00e80404fd1b3 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Mon, 27 Mar 2017 13:53:34 +0800
Subject: [PATCH 08/27] Follow luotao's tips, add more description.

---
 .../00.how_to_write_a_layer.md                | 22 ++++-
 .../01.use_map_in_protobuf.md                 | 92 ++++++++++++++++++-
 2 files changed, 111 insertions(+), 3 deletions(-)

diff --git a/doc/design/layer_generation/00.how_to_write_a_layer.md b/doc/design/layer_generation/00.how_to_write_a_layer.md
index 0e599eead234a..922ebbdb0fc07 100644
--- a/doc/design/layer_generation/00.how_to_write_a_layer.md
+++ b/doc/design/layer_generation/00.how_to_write_a_layer.md
@@ -197,8 +197,28 @@ message LayerOutputType {
 
 举例来说，例如对于FCLayer，可能的实现为:
 
+LayerDefinition.h是一个公共头文件，他的接口为
+
 ```C++
 
+class LayerDefinition {
+public:
+  // Mark a layer support size attribute.
+  static void supportSize(LayerDef& );
+
+  // Make a layer support dropout attribute.
+  static void supportDropout(LayerDef& );
+
+  // Add a input of layer.
+  static LayerInputDefinition& addInput(LayerDef& );
+  ...
+};
+
+```
+
+FullyConnectedLayer.h是全连接层实现的头文件，它的实现为:
+
+```C++
 class FCLayer :public Layer {
 public:
   void init() { ... }
@@ -208,7 +228,7 @@ public:
   static void getLayerDefinition(LayerDef& def) {
     LayerDefinition::supportSize(def);
     LayerDefinition::supportDropout(def);
-    LayerDefinition::addInput()
+    LayerDefinition::addInput(def)
         .setRepeatable(True)
         .addSupport({ InputType::Dense, InputType::SparseInt, InputType::Sparse })
         .addSupportSeqLevel({0, 1, 2})
diff --git a/doc/design/layer_generation/01.use_map_in_protobuf.md b/doc/design/layer_generation/01.use_map_in_protobuf.md
index 3c471bb04bc66..8399f57c4521a 100644
--- a/doc/design/layer_generation/01.use_map_in_protobuf.md
+++ b/doc/design/layer_generation/01.use_map_in_protobuf.md
@@ -13,7 +13,7 @@
 * 在Python端实现这个Layer的解析函数，Wrapper，V2Layer等等。
 
 
-这个设计文档，旨在解决 Protobuf文件和Layer之间的耦合性，让用户新建一个Layer的时候不需要改Protobuf。并且，极大的简化Protobuf文件。
+这个设计文档，旨在解决 Protobuf文件和Layer之间的耦合性，让用户新建一个Layer的时候不需要改Protobuf。并且，极大的简化Protobuf文件，清理原先protobuf中的冗余字段，例如合并LayerInputConfig中关于图像的若干字段(`ConvConfig`, `PoolConfig`, `NormConfig`等)。
 
 ## 实现方式
 
@@ -32,10 +32,16 @@ message Attribute {
     }
 }
 
+message LayerInputConfig {
+  required string name = 1;
+  map<string, Attribute> attributes = 2;
+};
+
 message LayerConfig {
    required string name = 1;
    required string type = 2;
    map<string, Attribute> attributes = 3;
+   repeated LayerInputConfig inputs = 4;
 }
 ```
 
@@ -43,6 +49,87 @@ message LayerConfig {
 
 这样，实现一个新的Layer，用户就不需要修改Protobuf消息了。并且，用户写一个新的Layer的时候，可以说明自己需要哪些属性，而这些属性的取值范围又是如何的。这样，我们在生成Python配置函数的代码时，可以生成运行时检查的代码。避免用户错误的配置神经网络。
 
+## 样例配置
+
+"""json
+{
+  "layers": [
+    {
+      "name": "image",
+      "type": "data",
+      "attributes": {
+        "size": 65536
+      }
+    },
+    {
+      "name": "__conv_0__",
+      "type": "exconv", 
+      "attributes": {
+        "size": 3297856,
+        "activation": "linear",
+        "num_filters": 64,
+        "out.x": 227,
+        "out.y": 227,
+        "bias.name": "___conv_0__.wbias",
+        "bias.shared": true
+      },
+      "inputs" : [{
+        "name": "image",
+        "attributes": {
+          "parameter_name": "___conv_0__.w0",
+          "conv.filter_size": 32,
+          "conv.stride.x": 1,
+          "conv.padding.x": 1,
+          "conv.stride.y": 1,
+          "conv.padding.y": 1,
+          "conv.groups": 1,
+          "conv.filter_channels": 1,
+
+          "img.channels": 1,
+          "img.x": 256,
+          "img.y": 256
+        }
+      }]
+    },
+    {
+      "name": "__batch_norm_0__",
+      "type": "batch_norm",
+      "attributes": {
+        "size": 3297856,
+        "activation": "relu",
+        "out.x": 227,
+        "out.y": 227,
+        "bias.name": "___batch_norm_0__.wbias",
+        "moving_average_fraction": 0.9
+      },
+      "inputs": [
+        {
+          "name": "__conv_0__",
+          "attributes": {
+            "parameter_name": "___batch_norm_0__.w0",
+            "img.x": 227,
+            "img.y": 227,
+            "img.channels": 64
+          }
+        },
+        {
+          "name": "__conv_0__",
+          "attributes": {
+            "parameter_name": "___batch_norm_0__.w1"
+          }
+        },
+        {
+          "name": "__conv_0__",
+          "attributes": {
+            "parameter_name": "___batch_norm_0__.w2"
+          }
+        }
+      ]
+    },
+  ]
+}
+
+"""
 
 ## 实现问题
 
@@ -50,7 +137,8 @@ message LayerConfig {
 
 * 这项工作会修改 `Python <==> Paddle core`中间的protobuf消息定义，对于Python端Layer解析函数，需要有覆盖完整的单元测试，才能保证这一步工作进行完之后，系统行为没有问题。否则，直接修改 Protobuf 风险较高。
 * `oneof`与`map`是`protobuf2`语法，但是这是在`Protobuf 3.0`之后的代码库中添加的功能，如果Paddle依赖这个功能，那么Paddle必须依赖Protobuf 3.0以上的Protobuf版本。
-
+* 这个阶段保证Paddle的配置接口向后兼容，但是生成的Protobuf二进制有所修改。但保证可以新生成一个Protobuf二进制，使用命令 `
+python -m paddle.utils.dump_config trainer_config.conf "" --binary > trainer_config.bin`
 
 ## 总结
 

From f001bc976ab5ea112930c45bcdceb4d3c550d827 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Fri, 31 Mar 2017 18:42:53 +0800
Subject: [PATCH 09/27] Fix wrong code style.

---
 doc/design/layer_generation/01.use_map_in_protobuf.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/design/layer_generation/01.use_map_in_protobuf.md b/doc/design/layer_generation/01.use_map_in_protobuf.md
index 8399f57c4521a..d01e44cfc8983 100644
--- a/doc/design/layer_generation/01.use_map_in_protobuf.md
+++ b/doc/design/layer_generation/01.use_map_in_protobuf.md
@@ -51,7 +51,7 @@ message LayerConfig {
 
 ## 样例配置
 
-"""json
+```json
 {
   "layers": [
     {
@@ -129,7 +129,7 @@ message LayerConfig {
   ]
 }
 
-"""
+```
 
 ## 实现问题
 

From 857f752d1721de99c10f3aa681df00f68e94cfc8 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 5 Apr 2017 10:08:12 +0800
Subject: [PATCH 10/27] Rearrange Documentation

---
 .../00.how_to_implenment_dynamic_net.md       | 36 +++++++++++++++++++
 .../00.ways_to_define_layer.md                | 35 ++++++++++++++++++
 ...=> 01.how_to_write_a_layer_in_protobuf.md} |  0
 ..._protobuf.md => 02.use_map_in_protobuf.md} |  0
 4 files changed, 71 insertions(+)
 create mode 100644 doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
 create mode 100644 doc/design/layer_generation/00.ways_to_define_layer.md
 rename doc/design/layer_generation/{00.how_to_write_a_layer.md => 01.how_to_write_a_layer_in_protobuf.md} (100%)
 rename doc/design/layer_generation/{01.use_map_in_protobuf.md => 02.use_map_in_protobuf.md} (100%)

diff --git a/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md b/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
new file mode 100644
index 0000000000000..7091ca293e322
--- /dev/null
+++ b/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
@@ -0,0 +1,36 @@
+# 动态神经网络的实现
+
+动态网络是目前神经网络框架的前沿课题。动态神经网络的优势解决了普通神经网络框架的一个重要问题，**神经网络的定义和计算是分离的**。即普通神经网络框架的计算步骤是，先定义一个神经网络的计算图，再使用计算引擎计算这个计算图。而动态神经网络的特点是，直接对每个操作求值，隐式的定义计算图，从而再对这个隐式的计算图反向传播。
+
+常见的使用方式为:
+
+
+```python
+x = paddle.dyn.data(type=DenseVector(784))
+x.fill([0.058, 0.548, ...])
+
+y = paddle.dyn.data(type=Integer(10))
+y.fill(9)
+
+hidden = paddle.dyn.fc(input=y, size=200)
+
+# You can use hidden.npvalue() to get this layer's value now.
+
+prediction = paddle.dyn.fc(input=hidden, size=10, act=Softmax())
+
+cost = paddle.dyn.classification_cost(input=prediction, label=y)
+
+if cost.npvalue() < 0.001:
+	cost *= 100 # scale up cost if cost is little, just a demo for dynamic network.
+
+print 'Cost = ', cost.npvalue()
+
+cost.backward()
+parameters.update()
+```
+
+## 动态神经网络解决的问题
+
+动态神经网络只有神经网络的计算步骤，而隐藏了神经网络的定义步骤。他解决的问题是:
+
+* 可以任意的在计算过程中添加非线性的操作，例如`if`。并且对于不同的数据，神经网络的计算图可以不同。例如 树形神经网络
diff --git a/doc/design/layer_generation/00.ways_to_define_layer.md b/doc/design/layer_generation/00.ways_to_define_layer.md
new file mode 100644
index 0000000000000..22f01e543bf8d
--- /dev/null
+++ b/doc/design/layer_generation/00.ways_to_define_layer.md
@@ -0,0 +1,35 @@
+# 定义Layer/OP的几种方式对比
+
+
+这篇文章主要是要说明Paddle中，用户定义配置文件现状，问题，并给出数种定义Layer/OP方式的对比。方便大家做出决策。
+
+
+## 为什么我们要重构配置定义的过程
+
+目前Paddle中，解析用户配置的过程非常繁复。这也是因为Paddle作为一个四年左右项目的遗留问题。为了**兼容**之前所有的Paddle配置文件格式，也为了简化用户配置流程，现阶段Paddle共有三种配置风格。最原始的配置文件格式(`config_parser.py`)，`trainer_config_helper`和`paddle.v2.layer`。三者的调用关系为 `paddle.v2` 调用 `trainer_config_helper`再调用`config_parser.py`。虽然我们没有重复的写这些代码，但是多层的封装让代码很难维护。
+
+主要痛点在于:
+
+* 用户使用Layer，想去查询某一个参数应该如何使用。深入调研Paddle的代码会非常迷惑。同时，这些代码中只有`trainer_config_helper`是具有良好注释和良好风格的。`paddle.v2`虽然也有注释与文档，但其函数是动态生成的，而不是静态的代码，所以也不能**阅读**，而`config_parser.py`缺乏文档和注释。
+* 开发者如果想要新写一个Layer，需要修改多个文件。
+	* 首先，新写一个Layer，开发者需要在Paddle的[Protobuf文件](https://github.com/PaddlePaddle/Paddle/blob/develop/proto/ModelConfig.proto)中，添加这个Layer需要的参数。
+	* 其次，完成这个Layer需要的C/C++文件。完成这个Layer的前向后向代码。
+	* 最后完成这个Layer配置文件解析`config_parser.py`，`trainer_config_helpers`和`paddle.v2`
+* Paddle的维护成本很高。特别是需要修改某种Layer的实现的时候。
+* 如果有其他Language Binding，需要开发的工作量太高。
+
+所以这个设计的目标就是**治理**目前Paddle定义Layer和配置混乱复杂的问题，得到一个清爽的结果, **用户只需要**写一个`C/C++`实现即可完成一个Layer的开发。
+
+同时，这个设计还会兼顾的问题有:
+
+* 向后兼容性 ---- 即是否兼容之前的配置方式
+* 动态网络开发 ---- 该配置同时也会作为Paddle动态网络配置的基础部分使用。
+
+
+
+## 重构配置文件的方法
+
+目前想到了两种方式重构配置文件，他们是:
+
+* 简化当前的Protobuf的配置方式，用户配置和
+* 使用
diff --git a/doc/design/layer_generation/00.how_to_write_a_layer.md b/doc/design/layer_generation/01.how_to_write_a_layer_in_protobuf.md
similarity index 100%
rename from doc/design/layer_generation/00.how_to_write_a_layer.md
rename to doc/design/layer_generation/01.how_to_write_a_layer_in_protobuf.md
diff --git a/doc/design/layer_generation/01.use_map_in_protobuf.md b/doc/design/layer_generation/02.use_map_in_protobuf.md
similarity index 100%
rename from doc/design/layer_generation/01.use_map_in_protobuf.md
rename to doc/design/layer_generation/02.use_map_in_protobuf.md

From b922b006eef554d98c8599eed50b75058455f1c8 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 5 Apr 2017 10:32:11 +0800
Subject: [PATCH 11/27] Add how to write a layer in pure cpp.

---
 doc/design/layer_generation/00.ways_to_define_layer.md   | 9 +++++++--
 .../03.how_to_write_a_layer_in_pure_cpp.md               | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)
 create mode 100644 doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md

diff --git a/doc/design/layer_generation/00.ways_to_define_layer.md b/doc/design/layer_generation/00.ways_to_define_layer.md
index 22f01e543bf8d..bbe687b5a3893 100644
--- a/doc/design/layer_generation/00.ways_to_define_layer.md
+++ b/doc/design/layer_generation/00.ways_to_define_layer.md
@@ -31,5 +31,10 @@
 
 目前想到了两种方式重构配置文件，他们是:
 
-* 简化当前的Protobuf的配置方式，用户配置和
-* 使用
+* 简化当前的Protobuf的配置方式。用户的配置文件最终还是序列化成Protobuf
+	* 但是Protobuf需要尽量简化，只接受用户的输入参数。所有的参数推导功能放在C++端来做。
+	* 整体设计参考[Generate Layer By Protobuf](./01.how_to_write_a_layer_in_protobuf.md)
+* 使用C/C++暴露网络配置的API。网络配置使用第三方语言直接读写C/C++变量
+	* 整体设计参考[Generate Layer By C/C++](./03.how_to_write_a_layer_in_pure_cpp.md)
+
+这两种方法的优缺点对比为:
diff --git a/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md b/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
new file mode 100644
index 0000000000000..67b03d91847c0
--- /dev/null
+++ b/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
@@ -0,0 +1 @@
+TODO(qiaolongfei): Complete this documentation.

From b3a3b0ea265d629fac5cbcce80cedd2e5a06aefc Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 5 Apr 2017 10:38:50 +0800
Subject: [PATCH 12/27] Add skeleton of dynet

* Also add TODO comments
---
 .../dynamic_net/00.how_to_implenment_dynamic_net.md  | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md b/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
index 7091ca293e322..26c3f052d5c90 100644
--- a/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
+++ b/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
@@ -34,3 +34,15 @@ parameters.update()
 动态神经网络只有神经网络的计算步骤，而隐藏了神经网络的定义步骤。他解决的问题是:
 
 * 可以任意的在计算过程中添加非线性的操作，例如`if`。并且对于不同的数据，神经网络的计算图可以不同。例如 树形神经网络
+
+// TODO(qijun): Complete this docs
+
+TBD
+
+## 动态神经网络的实现思路
+
+TBD
+
+## 动态神经网络对神经网络框架的要求
+
+TBD

From 4a94baafc9936515c45a5f9423bdf80687356339 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 5 Apr 2017 10:52:05 +0800
Subject: [PATCH 13/27] Add comparation of ways to define layer.

---
 .../00.ways_to_define_layer.md                 | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/doc/design/layer_generation/00.ways_to_define_layer.md b/doc/design/layer_generation/00.ways_to_define_layer.md
index bbe687b5a3893..baf27427fc111 100644
--- a/doc/design/layer_generation/00.ways_to_define_layer.md
+++ b/doc/design/layer_generation/00.ways_to_define_layer.md
@@ -22,9 +22,8 @@
 
 同时，这个设计还会兼顾的问题有:
 
-* 向后兼容性 ---- 即是否兼容之前的配置方式
-* 动态网络开发 ---- 该配置同时也会作为Paddle动态网络配置的基础部分使用。
-
+* 向后兼容性 ---- 即是否兼容之前的配置方式.
+* 动态网络开发 ---- 神经网络配置解析为动态网络的基础部分。动态网络要求**配置解析必须快**。
 
 
 ## 重构配置文件的方法
@@ -35,6 +34,19 @@
 	* 但是Protobuf需要尽量简化，只接受用户的输入参数。所有的参数推导功能放在C++端来做。
 	* 整体设计参考[Generate Layer By Protobuf](./01.how_to_write_a_layer_in_protobuf.md)
 * 使用C/C++暴露网络配置的API。网络配置使用第三方语言直接读写C/C++变量
+	* Paddle的Layer配置构造和解析完全不依赖Protobuf，会导致无法向后兼容。这意味着新版本将不能构建`paddle_trainer`
 	* 整体设计参考[Generate Layer By C/C++](./03.how_to_write_a_layer_in_pure_cpp.md)
 
 这两种方法的优缺点对比为:
+
+|  | 用户配置序列化成Protobuf | 用户直接操作C/C++对象 |
+| --- | --- | --- |
+| 解析速度 | 慢 | 快 |
+| 支持序列化 | 直接支持 | 不直接支持，可以添加 |
+| 实现难度 | 简单，但是向后兼容工作量大 | 一般，但是没有向后兼容的包袱 |
+| 向后兼容性 | 可以做到向后兼容 | 无法向后兼容，无法实现`paddle_trainer` |
+
+
+## 结论
+
+经讨论，Paddle开发者认为XXXX是可行的，即使有XXX的问题，也是可以接受的方案。故采取XXXX作为Paddle Layer配置的重构方式。

From 3e5d22add988eb136616580aff88f18aacecc50e Mon Sep 17 00:00:00 2001
From: qijun <qijun1994@hotmail.com>
Date: Wed, 5 Apr 2017 23:44:03 +0800
Subject: [PATCH 14/27] add dynamic net doc

---
 .../00.how_to_implenment_dynamic_net.md       | 31 +++++++++++++------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md b/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
index 26c3f052d5c90..7934ebec97780 100644
--- a/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
+++ b/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
@@ -1,6 +1,6 @@
 # 动态神经网络的实现
 
-动态网络是目前神经网络框架的前沿课题。动态神经网络的优势解决了普通神经网络框架的一个重要问题，**神经网络的定义和计算是分离的**。即普通神经网络框架的计算步骤是，先定义一个神经网络的计算图，再使用计算引擎计算这个计算图。而动态神经网络的特点是，直接对每个操作求值，隐式的定义计算图，从而再对这个隐式的计算图反向传播。
+动态网络是目前神经网络框架的前沿课题。动态神经网络的优势解决了普通神经网络框架的一个重要问题，**神经网络的定义和计算是分离的**。即静态神经网络框架的计算步骤是，先定义一个神经网络的计算图，再使用计算引擎计算这个计算图。而动态神经网络的特点是，直接对每个操作求值，隐式的定义计算图，从而再对这个隐式的计算图反向传播。
 
 常见的使用方式为:
 
@@ -12,7 +12,7 @@ x.fill([0.058, 0.548, ...])
 y = paddle.dyn.data(type=Integer(10))
 y.fill(9)
 
-hidden = paddle.dyn.fc(input=y, size=200)
+hidden = paddle.dyn.fc(input=x, size=200)
 
 # You can use hidden.npvalue() to get this layer's value now.
 
@@ -31,18 +31,31 @@ parameters.update()
 
 ## 动态神经网络解决的问题
 
-动态神经网络只有神经网络的计算步骤，而隐藏了神经网络的定义步骤。他解决的问题是:
+动态神经网络只有神经网络的计算步骤，而隐藏了神经网络的定义步骤，用户可以为每一个sample或者batch定义一个不同的网络。相对于静态神经网络而言，动态神经网络解决了以下几个问题：
 
-* 可以任意的在计算过程中添加非线性的操作，例如`if`。并且对于不同的数据，神经网络的计算图可以不同。例如 树形神经网络
+* 可以任意的在计算过程中添加复杂的控制逻辑，例如迭代，递归，条件选择等，这些控制逻辑都可以由host language(C++/Python)来实现。
+* 可以支持更复杂的数据类型，并且对于不同的数据，神经网络的计算图可以不同。
+* 动态神经网络的执行过程就是其定义过程，用户可以对神经网络中的参数，中间结果等信息直接求值，方便debug的过程。
 
-// TODO(qijun): Complete this docs
-
-TBD
 
 ## 动态神经网络的实现思路
 
-TBD
+动态神经网络计算图的定义是隐式的，其设计哲学可以参考一些autograd库(例如https://github.com/HIPS/autograd)。具体实现思路如下：
+
+
+1. 对于每一个sample，用户使用layer的组合来定义神经网络结构。每个sample都拥有一个graph结构来记录该sample的计算图。
+2. graph中包含每一层layer的信息，包括输入数据来源，该层layer进行的操作，输出数据大小等。新连接上的layer的相关信息会被持续追加到graph中。
+3. layer的求值操作是lazy的，直到用户显式的调用value()方法，graph中记录的计算图才会被execute engine真正执行，计算得到该层layer的输出结果。通常情况下执行forward()操作时会对网络进行求值。
+4. 用户可以在组合layer的时候加入控制逻辑，被选择的分支信息也会记录到graph中。
+5. 在进行backward()操作时，graph的execute engine会根据记录的计算图执行求导操作，计算梯度。
+
+
+
 
 ## 动态神经网络对神经网络框架的要求
 
-TBD
+* 最核心的要求就是构建计算图的过程要足够轻量，后端使用C++来实现，并且考虑设计特定的内存/显存 管理策略。前端的Python wrapper也要足够小，可以直接使用后端C++提供的接口。
+
+* 考虑到layer的求值是lazy的，可以使用表达式模板对计算过程进行优化。
+
+* 考虑对不同大小数据/不同网络结构 组batch进行训练。在动态网络中，每一个sample都拥有自己的计算图，相比于静态网络，在GPU上进行并行操作是比较困难的。

From 4ac8719ef7ba8db7fd05804476050c4922562002 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Thu, 6 Apr 2017 12:44:02 +0800
Subject: [PATCH 15/27] Simplize dynamic net implementation

---
 .../00.how_to_implenment_dynamic_net.md          | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md b/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
index 7934ebec97780..b17049919f045 100644
--- a/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
+++ b/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
@@ -40,22 +40,24 @@ parameters.update()
 
 ## 动态神经网络的实现思路
 
-动态神经网络计算图的定义是隐式的，其设计哲学可以参考一些autograd库(例如https://github.com/HIPS/autograd)。具体实现思路如下：
+动态神经网络计算图的定义是隐式的，其设计哲学可以参考一些[autograd库](https://github.com/HIPS/autograd)。具体实现思路如下：
 
 
-1. 对于每一个sample，用户使用layer的组合来定义神经网络结构。每个sample都拥有一个graph结构来记录该sample的计算图。
+1. 对于每一个batch，用户使用layer或者OP组合来定义神经网络的操作。每个batch都拥有一个graph结构来记录该sample的计算图。
+	* 这个graph经常是一个全局变量。在不同的库中名称不一样。例如对于PyTorch，这个对象叫做[Tape](https://github.com/pytorch/pytorch#dynamic-neural-networks-tape-based-autograd)
+	* Graph 是实时动态构造的。即用户调用一次Layer或者OP，就会向这个Graph里面添加一个节点。
 2. graph中包含每一层layer的信息，包括输入数据来源，该层layer进行的操作，输出数据大小等。新连接上的layer的相关信息会被持续追加到graph中。
-3. layer的求值操作是lazy的，直到用户显式的调用value()方法，graph中记录的计算图才会被execute engine真正执行，计算得到该层layer的输出结果。通常情况下执行forward()操作时会对网络进行求值。
+3. layer求值操作可以是lazy的，直到用户显式的调用value()方法，graph中记录的计算图才会被execute engine真正执行，计算得到该层layer的输出结果。
 4. 用户可以在组合layer的时候加入控制逻辑，被选择的分支信息也会记录到graph中。
-5. 在进行backward()操作时，graph的execute engine会根据记录的计算图执行求导操作，计算梯度。
-
+	* 用户代码可以有分支，循环等等。但对于Graph这个结构并没有循环或者分支的操作。Graph只记录Layer或者Op调用的行为。
+5. 在进行backward()操作时，graph的execute engine会根据记录的计算图执行反向传播，计算梯度。
 
 
 
 ## 动态神经网络对神经网络框架的要求
 
-* 最核心的要求就是构建计算图的过程要足够轻量，后端使用C++来实现，并且考虑设计特定的内存/显存 管理策略。前端的Python wrapper也要足够小，可以直接使用后端C++提供的接口。
+* 最核心的要求就是构建计算图的过程要足够**轻量**。前端的Python wrapper也要足够薄和快，可以直接使用后端C++提供的接口。
 
 * 考虑到layer的求值是lazy的，可以使用表达式模板对计算过程进行优化。
 
-* 考虑对不同大小数据/不同网络结构 组batch进行训练。在动态网络中，每一个sample都拥有自己的计算图，相比于静态网络，在GPU上进行并行操作是比较困难的。
+* 考虑对不同大小数据/不同网络结构组batch进行训练。在动态网络中，每一个batch/sample都可以拥有自己的计算图，相比于静态网络，在GPU上进行并行操作是比较困难的。

From 386133a9d029d814d66e4169e46513519ce5a68a Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Thu, 6 Apr 2017 12:50:32 +0800
Subject: [PATCH 16/27] Add link to dynet

---
 doc/design/layer_generation/00.ways_to_define_layer.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/design/layer_generation/00.ways_to_define_layer.md b/doc/design/layer_generation/00.ways_to_define_layer.md
index baf27427fc111..34a5bfa1c15e1 100644
--- a/doc/design/layer_generation/00.ways_to_define_layer.md
+++ b/doc/design/layer_generation/00.ways_to_define_layer.md
@@ -23,7 +23,7 @@
 同时，这个设计还会兼顾的问题有:
 
 * 向后兼容性 ---- 即是否兼容之前的配置方式.
-* 动态网络开发 ---- 神经网络配置解析为动态网络的基础部分。动态网络要求**配置解析必须快**。
+* 动态网络开发 ---- 神经网络配置解析为动态网络的基础部分。动态网络要求**配置解析必须快**。详细关于动态网络介绍，请参考[DynamicNet](../dynamic_net/00.how_to_implenment_dynamic_net.md)
 
 
 ## 重构配置文件的方法

From ff63670b2755c05031fe1b23d0495a851eb48115 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Thu, 6 Apr 2017 16:24:11 +0800
Subject: [PATCH 17/27] Add how to write a layer in pure cpp

---
 .../01.how_to_write_a_layer_in_protobuf.md    |   4 +-
 .../03.how_to_write_a_layer_in_pure_cpp.md    | 460 +++++++++++++++++-
 2 files changed, 461 insertions(+), 3 deletions(-)

diff --git a/doc/design/layer_generation/01.how_to_write_a_layer_in_protobuf.md b/doc/design/layer_generation/01.how_to_write_a_layer_in_protobuf.md
index 922ebbdb0fc07..e3de2aa9cbd6d 100644
--- a/doc/design/layer_generation/01.how_to_write_a_layer_in_protobuf.md
+++ b/doc/design/layer_generation/01.how_to_write_a_layer_in_protobuf.md
@@ -1,6 +1,6 @@
-# 如何写一个Layer
+# 如何写一个Layer[Protobuf Version]
 
-这个文档是一个概念性的文档，描述在重构后用户如何写一个Layer。
+这个文档是一个概念性的文档，描述简化Protobuf重构后用户如何写一个Layer。
 
 ## 基本目标
 
diff --git a/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md b/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
index 67b03d91847c0..97f849f54508f 100644
--- a/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
+++ b/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
@@ -1 +1,459 @@
-TODO(qiaolongfei): Complete this documentation.
+# 如何写一个Layer (Pure CPP)
+
+这个文档是一个概念性的文档，描述如何使用Pure Cpp重构Paddle的配置解析，在此之上用户如何实现一个Layer。
+
+## 基本目标
+
+用户只需要写Layer的计算信息，不需要写配置解析器，就可以完成Layer的书写。
+
+同时，Layer解析过程尽可能的快。
+
+
+## 实现总体概览
+
+* 在注册Layer的时候，不只注册Layer的C++类型，同时注册Layer的元信息。元信息是一个C++对象，是简单的 `std::unordered_map<std::string, std::any>`的组合。
+* 使用全局静态函数生成Layer的元信息。
+* 使用统一的C-API，让用户可以根据元信息，new出来一个layer，并做参数正确性检查
+* 网络参数推导在new出一个Layer之后立即进行。
+
+注意: 元信息是指描述某种信息的信息。譬如Layer的元信息是指描述某种Layer可以如何被描述的信息。而不是具体某一个Layer的实际描述。
+
+
+## 实现细节
+
+### 前置依赖
+
+#### std::any
+
+* [std::any](http://en.cppreference.com/w/cpp/utility/any)由于是 CPP 17的标准库，Paddle支持的最低CPP标准是CPP 11。所以可能需要手写一个简单的`std::any`。
+	* std::any是一个可以放置任何类型的对象。
+
+### 元信息
+
+#### AttributeDef
+
+某一个属性的元信息。即描述一个属性是何种类型，可以接受何种参数。其定义如下:
+
+
+```cpp
+struct AttributeDef {
+   std::string name;
+   std::type_info type;
+   std::string description;
+   std::any checkCallback;
+};
+```
+其中 checkCallback是一个回调函数，他的类型是`(T* attr, bool setted) => paddle::Error`，因为输入的attr可以是任意类型的泛型，故这里用std::any表示类型。其中`setted`是表示这个参数是不是被用户设置过。
+
+例如，对于dropout_rate的AttributeDef可以是:
+
+```json
+{
+	"name": "dropout",
+	"type": "float",
+	"description": "Set drop out rate of layer. 1 means all activations are dropped, 0 means do not drop any activation",
+	"checkCallback": function (float* attr, bool setted) => paddle::Error {
+		if (!setted) {
+			*attr = 0.0;  // default value.
+		} else {
+			if (0.0 <= *attr <= 1.0) {
+				return paddle::Error::OK;
+			} else {
+				return paddle::Error("Dropout should be in [0.0, 1.0].");
+			}
+		}
+	}
+}
+```
+
+其中，对于`checkCallback`可以预定义一些常用checkCallback。譬如，同样对于dropout，可以定义为
+
+```json
+{
+	...
+	"checkCallback": paddle::AttributeDef::inRange<float>(name="dropout", min=0.0, max=1.0, default=0.0)
+}
+```
+
+#### ParameterDef
+
+定义一个输入参数(Parameter)的元信息。即这个参数可以支持哪些属性
+
+```cpp
+struct ParameterDef {
+	std::vector<AttributeDef> attributes;
+};
+```
+
+对于常见的ParameterDef为:
+
+```json
+[
+	{
+		"name": "name",
+		"description": "The name of this parameter",
+		"checkCallback": paddle::AttributeDef::notNull<std::string>()
+	},
+	{
+		"name": "weight_decay",
+		"description": "The weight decay rate of parameter, used to implement L2 Norm",
+		"checkCallback": paddle::AttributeDef::inRange<float>(name="weight_decay", min=0.0, max=1.0, default=0.0)
+	},
+	{
+		"name": "dims",
+		"description": "The dimension of parameter",
+		"checkCallback": function (std::vector<size_t>* dims, bool setted) {
+			if (!setted) {
+				return "Dims must be set".
+			}
+			if (dims->size() != 2) {
+				return "Dims must be 2 in this parameter. They are height * width.";
+			}
+			return OK;
+		}
+	}
+]
+
+```
+
+
+#### InputDef
+
+定义一个Layer或者一个Op输入的元信息。即描述可以接受何种输入类型。
+
+```cpp
+struct IutputDef {
+	std::vector<DataType> dataType;  // 可以接受的输入类型
+	std::vector<SeqType> seqType; // 可以接受的sequence type
+	bool repeatable;  // 该输入是否可以为多个。例如，对于fc layer，它的输入就是无限多个的。
+	std::string name;
+	std::string description;
+	
+	std::unique_ptr<ParameterDef> paramAttr; // 该输入参数的元信息，可以为空。为空表示该输入没有参数
+};
+```
+
+例如，对于FC Layer的输入，可能定义为:
+
+```json
+{
+    "name": "input",
+    "description": "input fields of fc layer"
+    "dataType": [Dense, Sparse, SparseBinary],
+    "seqType": [0, 1, 2],
+    "repeatable": true,
+    "paramAttr" : [
+    	{
+    		name: "dims",
+    		...
+    	},
+    	{
+    		name: "weight_decay",
+    		...
+    	},
+    	{
+    		name: "gradient_clipping",
+    		...
+    	}
+    ]
+}
+```
+
+#### LayerDef
+
+定义一个Layer需要的元信息，即描述这个Layer需要接受的参数有哪些。
+
+```cpp
+struct LayerDef {
+	std::string type;
+	std::string description;
+	std::vector<IutputDef> inputs;
+	std::unique_ptr<ParameterDef> biasAttr;
+	std::vector<AttributeDef> attrs;
+};
+```
+
+例如 FC Layer的如下表示
+
+```json
+{
+	"type": "fc",
+	"description": "Fully connected layer",
+	"inputs": [...],  # just like IutputDef example
+	"biasAttr": [{
+		"name": "dims",
+		...
+	},{
+		"name": "weight_decay",
+		...
+	},
+		...
+	],
+	"attrs": [{
+		"name": "dropout",
+		...
+	},
+		...
+	]
+}
+```
+
+#### GraphDef
+
+定义一个计算图需要的元信息
+
+```cpp
+struct GraphDef {
+	std::unordered_map<std::string, AttributeDef> attrs;
+};
+```
+
+### 具体对象
+
+具体对象表示一个计算图里面里面一个具体层或者OP的描述。
+
+#### ParameterAttributes
+
+表示神经网络中，某一个参数(Parameter)的具体属性。
+
+```cpp
+struct ParameterAttributes {
+	std::unordered_map<std::string, std::any> attrs;
+};
+```
+
+可能的取值为:
+
+```json
+{
+	"name": "fc.w",
+	"dims": [784, 200],
+	...
+}
+```
+#### OutputAttribute
+
+表示一个Layer/OP的输出属性。这个属性没有元信息。一个Layer或者OP的输出就是其他Layer/OP的输入。
+
+```cpp
+struct OutputAttribute {
+	std::string name;  // 一个Layer可以有多个输出，但是他们的name不同。
+	DataType dataType;
+	SeqType seqType;
+	size_t size;
+	
+	std::unordered_map<std::string, std::any> attrs;
+}
+```
+
+可能的取值为
+
+```json
+{
+	"name": "", // 空为这个Layer的默认输出
+	"DataType": Dense,
+	"SeqType": 0,
+	"size": 784,
+	"attrs" : {
+		"num_channels": 1
+	}
+}
+```
+
+
+#### InputAttribute
+
+表示一个Layer的输入数据属性。
+
+```cpp
+struct InputAttribute {
+	std::string name;
+	std::shared_ptr<LayerAttribute> inputLayer;
+	std::string inputName;
+	std::shared_ptr<ParameterAttributes> paramAttr;  // same param could be shared by multiple inputs
+	std::unordered_map<std::string, std::any> attrs;  // extra input attr.
+};
+```
+
+可能的取值为
+
+```json
+{
+	"name": "input",
+	"inputLayer": {
+		"name": "fc1"
+	},
+	"inputName": "",
+	"paramAttr": {
+		"name": "fc.w",
+		...
+	},
+	"attrs" : {
+		"num_channels": 1
+	}
+}
+```
+
+#### LayerAttribute
+
+表示一个Layer的属性
+
+```cpp
+struct LayerAttribute {
+	std::vector<InputAttribute> inputs;
+	std::vector<OutputAttribute> outputs;
+	std::vector<std::shared_ptr<LayerAttribute>> preDepends;  # most dependencies are written in outputs. But we should add preDepends for some situation. for example, RecurrentLayerGroup.
+	std::unordered_map<std::string, std::any> attrs;
+};
+```
+
+可能的取值为:
+
+```json
+{
+	"inputs": [{
+		name: input,
+		inputLayer: {
+			name: pixel,
+			...
+		},
+		paramAttr: {
+			name: fc.w,
+			...
+		},
+		...
+	}],
+	"outputs": [{
+		name: "",
+		size: 200,
+		...
+	}],
+	attrs: {
+		"size": 200,
+		"activation": tanh,
+		...
+	},
+	...
+}
+```
+
+#### ComputationGraph
+
+ComputationGraph为当前的计算图
+
+```cpp
+struct ComputationGraph {
+	std::unordered_map<std::string, std::any> attrs;
+	std::vector<std::shared_ptr<OutputAttribute>> outputs;
+	std::vector<<std::shared_ptr<LayerAttribute>> extraLayers;  // extra layers are attached to this computation graph, but it is not the outputs.
+};
+```
+
+
+## 用户定义一个新的Layer
+
+### 定义Layer接受的参数
+
+用户定义一个新的Layer需要定义这个Layer的元信息 LayerDef。定义方法为:
+
+```cpp
+class FCLayer :public Layer {
+public:
+  ...
+  
+  static void getLayerDefinition(LayerDef& def) {
+    LayerDefinition::supportSize(def);
+    LayerDefinition::supportDropout(def);
+    LayerDefinition::addInput(def)
+        .setRepeatable(True)
+        .addSupport({ InputType::Dense, InputType::SparseInt, InputType::Sparse })
+        .addSupportSeqLevel({0, 1, 2})
+        .addDoc("FC Layer is fully connected. Blah blah blah...");
+  }
+};
+```
+
+### 定义参数推导过程
+
+用户定义参数推导过程，定义方法为:
+
+```cpp
+class FCLayer :public Layer {
+public:
+  ...
+  
+  static paddle::Error calculateOutputAndParam(LayerAttribute& self) {
+    // fill self.outputs by self.inputs.
+    // also calculate self.inputs.parameters's size. etc.
+  }
+};
+```
+
+### Layer初始化
+
+经过正确性检查的LayerAttribute会被用来初始化某一个Layer，初始化方法为:
+
+```cpp
+class FCLayer :public Layer {
+public:
+	void init(LayerAttribute& attr, Parameters& params) {
+	...
+	}
+	...
+};
+```
+
+### 注册Layer
+
+直接注册这个Layer的类型即可
+
+```cpp
+REGISTER_LAYER(fc, FCLayer);
+```
+
+## 用户配置与解析
+
+对于Layer的配置，只有如下几个接口。他们是
+
+```cpp
+extern "C" {
+
+typedef void* Symbol;
+
+Symbol newSymbol(int symbolType);
+void setAttribute(void* symbol, const char* path, int type_id, void* value);
+void appendAttribute(void* symbol, const char* path, int type_id, void* value);
+void destroySymbol(Symbol sym);
+}
+```
+其中Symbol是`ParameterAttribute`, `InputAttribute`, `OutputAttribute`, `LayerAttribute`, `ComputationGraph`的通称
+
+简单的使用样例如下:
+
+```cpp
+auto paramAttr = newSymbol(PARAMETER_ATTRIBUTE)
+setAttribute(paramAttr, "name", STRING_TYPE, "fc.w");
+appendAttribute(paramAttr, "dims", INT_TYPE, 784);
+appendAttribute(paramAttr, "dims", INT_TYPE, 200);
+
+auto inputAttr = newSymbol(INPUT_ATTRIBUTE);
+setAttribute(inputAttr, "name", STRING_TYPE, "input");
+setAttribute(inputAttr, "paramAttr", SYMBOL_TYPE, paramAttr);
+setAttribute(inputAttr, "inputLayer", SYMBOL_TYPE, dataLayerAttr);
+
+auto layerAttr = newSymbol(LAYER_ATTRIBUTE);
+setAttribute(layerAttr, "type", STRING_TYPE, "fc");
+setAttribute(layerAttr, "name", STRING_TYPE, "fc");
+setAttribute(layerAttr, "size", INT_TYPE, 200);
+setAttribute(layerAttr, "activation", INT_TYPE, 0);  // 0 means sigmoid, etc.
+appendAttribute(layerAttr, "inputs", SYMBOL_TYPE, inputAttr);
+
+
+auto graph = newSymbol(GRAPH_ATTRIBUTE);
+appendAttribute(layerAttr, "outputs", SYMBOL_TYPE, layerAttr);
+
+auto engine = newExecuteEngine(graph);
+engine.forward()
+engine.backward()
+parameters.update()
+```

From 12a430ae4cc13ae7fc1b2b4dbbccbbb6093b626d Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Thu, 6 Apr 2017 17:43:38 +0800
Subject: [PATCH 18/27] Change highlight to text

---
 .../03.how_to_write_a_layer_in_pure_cpp.md     | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md b/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
index 97f849f54508f..7651b5fe90b3c 100644
--- a/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
+++ b/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
@@ -47,7 +47,7 @@ struct AttributeDef {
 
 例如，对于dropout_rate的AttributeDef可以是:
 
-```json
+```text
 {
 	"name": "dropout",
 	"type": "float",
@@ -68,7 +68,7 @@ struct AttributeDef {
 
 其中，对于`checkCallback`可以预定义一些常用checkCallback。譬如，同样对于dropout，可以定义为
 
-```json
+```text
 {
 	...
 	"checkCallback": paddle::AttributeDef::inRange<float>(name="dropout", min=0.0, max=1.0, default=0.0)
@@ -87,7 +87,7 @@ struct ParameterDef {
 
 对于常见的ParameterDef为:
 
-```json
+```text
 [
 	{
 		"name": "name",
@@ -135,7 +135,7 @@ struct IutputDef {
 
 例如，对于FC Layer的输入，可能定义为:
 
-```json
+```text
 {
     "name": "input",
     "description": "input fields of fc layer"
@@ -175,7 +175,7 @@ struct LayerDef {
 
 例如 FC Layer的如下表示
 
-```json
+```text
 {
 	"type": "fc",
 	"description": "Fully connected layer",
@@ -224,7 +224,7 @@ struct ParameterAttributes {
 
 可能的取值为:
 
-```json
+```text
 {
 	"name": "fc.w",
 	"dims": [784, 200],
@@ -248,7 +248,7 @@ struct OutputAttribute {
 
 可能的取值为
 
-```json
+```text
 {
 	"name": "", // 空为这个Layer的默认输出
 	"DataType": Dense,
@@ -277,7 +277,7 @@ struct InputAttribute {
 
 可能的取值为
 
-```json
+```text
 {
 	"name": "input",
 	"inputLayer": {
@@ -309,7 +309,7 @@ struct LayerAttribute {
 
 可能的取值为:
 
-```json
+```text
 {
 	"inputs": [{
 		name: input,

From 03184c14d52e3d2c711505a1b8b30d733e7f3e21 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Sat, 6 May 2017 13:17:18 +0800
Subject: [PATCH 19/27] Unify topology design in CPP

---
 .../00.ways_to_define_layer.md                |  52 --
 .../01.how_to_write_a_layer_in_protobuf.md    | 268 ----------
 .../02.use_map_in_protobuf.md                 | 150 ------
 .../03.how_to_write_a_layer_in_pure_cpp.md    | 459 ------------------
 doc/design/topology_in_cpp.md                 | 264 ++++++++++
 5 files changed, 264 insertions(+), 929 deletions(-)
 delete mode 100644 doc/design/layer_generation/00.ways_to_define_layer.md
 delete mode 100644 doc/design/layer_generation/01.how_to_write_a_layer_in_protobuf.md
 delete mode 100644 doc/design/layer_generation/02.use_map_in_protobuf.md
 delete mode 100644 doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
 create mode 100644 doc/design/topology_in_cpp.md

diff --git a/doc/design/layer_generation/00.ways_to_define_layer.md b/doc/design/layer_generation/00.ways_to_define_layer.md
deleted file mode 100644
index 34a5bfa1c15e1..0000000000000
--- a/doc/design/layer_generation/00.ways_to_define_layer.md
+++ /dev/null
@@ -1,52 +0,0 @@
-# 定义Layer/OP的几种方式对比
-
-
-这篇文章主要是要说明Paddle中，用户定义配置文件现状，问题，并给出数种定义Layer/OP方式的对比。方便大家做出决策。
-
-
-## 为什么我们要重构配置定义的过程
-
-目前Paddle中，解析用户配置的过程非常繁复。这也是因为Paddle作为一个四年左右项目的遗留问题。为了**兼容**之前所有的Paddle配置文件格式，也为了简化用户配置流程，现阶段Paddle共有三种配置风格。最原始的配置文件格式(`config_parser.py`)，`trainer_config_helper`和`paddle.v2.layer`。三者的调用关系为 `paddle.v2` 调用 `trainer_config_helper`再调用`config_parser.py`。虽然我们没有重复的写这些代码，但是多层的封装让代码很难维护。
-
-主要痛点在于:
-
-* 用户使用Layer，想去查询某一个参数应该如何使用。深入调研Paddle的代码会非常迷惑。同时，这些代码中只有`trainer_config_helper`是具有良好注释和良好风格的。`paddle.v2`虽然也有注释与文档，但其函数是动态生成的，而不是静态的代码，所以也不能**阅读**，而`config_parser.py`缺乏文档和注释。
-* 开发者如果想要新写一个Layer，需要修改多个文件。
-	* 首先，新写一个Layer，开发者需要在Paddle的[Protobuf文件](https://github.com/PaddlePaddle/Paddle/blob/develop/proto/ModelConfig.proto)中，添加这个Layer需要的参数。
-	* 其次，完成这个Layer需要的C/C++文件。完成这个Layer的前向后向代码。
-	* 最后完成这个Layer配置文件解析`config_parser.py`，`trainer_config_helpers`和`paddle.v2`
-* Paddle的维护成本很高。特别是需要修改某种Layer的实现的时候。
-* 如果有其他Language Binding，需要开发的工作量太高。
-
-所以这个设计的目标就是**治理**目前Paddle定义Layer和配置混乱复杂的问题，得到一个清爽的结果, **用户只需要**写一个`C/C++`实现即可完成一个Layer的开发。
-
-同时，这个设计还会兼顾的问题有:
-
-* 向后兼容性 ---- 即是否兼容之前的配置方式.
-* 动态网络开发 ---- 神经网络配置解析为动态网络的基础部分。动态网络要求**配置解析必须快**。详细关于动态网络介绍，请参考[DynamicNet](../dynamic_net/00.how_to_implenment_dynamic_net.md)
-
-
-## 重构配置文件的方法
-
-目前想到了两种方式重构配置文件，他们是:
-
-* 简化当前的Protobuf的配置方式。用户的配置文件最终还是序列化成Protobuf
-	* 但是Protobuf需要尽量简化，只接受用户的输入参数。所有的参数推导功能放在C++端来做。
-	* 整体设计参考[Generate Layer By Protobuf](./01.how_to_write_a_layer_in_protobuf.md)
-* 使用C/C++暴露网络配置的API。网络配置使用第三方语言直接读写C/C++变量
-	* Paddle的Layer配置构造和解析完全不依赖Protobuf，会导致无法向后兼容。这意味着新版本将不能构建`paddle_trainer`
-	* 整体设计参考[Generate Layer By C/C++](./03.how_to_write_a_layer_in_pure_cpp.md)
-
-这两种方法的优缺点对比为:
-
-|  | 用户配置序列化成Protobuf | 用户直接操作C/C++对象 |
-| --- | --- | --- |
-| 解析速度 | 慢 | 快 |
-| 支持序列化 | 直接支持 | 不直接支持，可以添加 |
-| 实现难度 | 简单，但是向后兼容工作量大 | 一般，但是没有向后兼容的包袱 |
-| 向后兼容性 | 可以做到向后兼容 | 无法向后兼容，无法实现`paddle_trainer` |
-
-
-## 结论
-
-经讨论，Paddle开发者认为XXXX是可行的，即使有XXX的问题，也是可以接受的方案。故采取XXXX作为Paddle Layer配置的重构方式。
diff --git a/doc/design/layer_generation/01.how_to_write_a_layer_in_protobuf.md b/doc/design/layer_generation/01.how_to_write_a_layer_in_protobuf.md
deleted file mode 100644
index e3de2aa9cbd6d..0000000000000
--- a/doc/design/layer_generation/01.how_to_write_a_layer_in_protobuf.md
+++ /dev/null
@@ -1,268 +0,0 @@
-# 如何写一个Layer[Protobuf Version]
-
-这个文档是一个概念性的文档，描述简化Protobuf重构后用户如何写一个Layer。
-
-## 基本目标
-
-用户只需要写Layer的计算信息，而不需要写配置解析器，也不修改写Protobuf的内容。就可以完成Layer的书写。
-
-## 实现方式
-
-### 总体概览
-
-* 在注册Layer的时候，不只注册Layer的C++类型，同时注册Layer的元信息，元信息使用Protobuf来表示。
-* 使用全局静态函数生成Layer的元信息。代码生成器通过Layer访问元信息来生成配置解析器(ConfigParser)
-* 将神经网络参数推导(每一个参数的size多大，输出size是多大)功能，移至Paddle C++ Core中
-
-### Layer元信息
-
-Paddle将**每种**Layer在C++端注册元信息，将元信息声明成Protobuf。
-
-主要的元信息有两个
-
-####  LayerDef
-* LayerDef 是描述了每**种**Layer的元信息，他包含每种Layer的类型名，注释，可以接受的输入类型，参数类型，Layer的其他属性。不包括这个Layer输出什么类型
-* 注意这是**元信息**。一个`LayerDef`描述了一**种**`Layer`的类型，而不是一**个**`Layer`的具体参数。
-* 同理，LayerDef中使用的 `ArgumentDef`描述的是某**一种输入参数的类型**，而不是某一个具体的输入参数是什么。`AttributeDef`是表示某一个属性(Attribute)的**类型**，而不是这个属性的具体参数。
-* 一个全连接层(FullyConnected， 下简写为FC)的LayerDef可能为
-
-```json
-{
-  "type": "fc",
-  "description": "Fully Connected Layer is the simplest layer in nerual network. ...",
-  "inputs" : [
-    {
-      "name": "input",
-      "description": "The input of fully connected layer, could be several.",
-      "data_type": ["Dense", "Sparse", "SparseInt", "Int"],
-      "seq_nested_level": [0, 1, 2],
-      "repeatable": true
-    }
-  ],
-  "parameter_attr": [
-    {
-      "attributes": [{
-        "name": "weight_decay",
-        "type": "float",
-        "description": "The weight decay rate of parameter, used to implement L2 Norm",
-        "default_value": 0.0,
-        "max_value": 1.0,
-        "min_value": 0.0
-      }, {
-        "name": "gradient_clipping",
-        "type": "float",
-        "description": "The gradient clipping threshold",
-        "default_value": 0.0,
-        "min_value": 0.0
-      }]
-    }
-  ],
-  "bias_attr": {
-    "attributes": [{
-      "name": "weight_decay",
-      "type": "float",
-      "description": "The weight decay rate of parameter, used to implement L2 Norm",
-      "default_value": 0.0,
-      "max_value": 1.0,
-      "min_value": 0.0
-    }]
-  },
-  "layer_attr":  [
-    {
-      "name": "dropout_rate",
-      "type": "float",
-      "description": "The dropout rate of this layer",
-      "default_value": 0.0,
-      "max_value": 1.0,
-      "min_value": 0.0
-    }
-  ]
-}
-```
-
-#### LayerOutputType
-
-* LayerOutputType 表示的是，某一个Layer输入输出具体是什么类型的(不是输入输出具体是什么值)。这是在运行时中计算出来的。
-* 某一个FC Layer的LayerOutputType可能是
-
-```json
-{
-	"type": "Dense",
-	"size": 200,
-	"seq_nested_level": 2
-}
-```
-
-#### Layer元信息的Protobuf定义
-
-下面是Layer元信息的Protobuf定义。
-
-```protobuf
-enum DataType {
-  DENSE=0,
-  SPARSE_INT=1,
-  SPARSE=2,
-  INT=3,
-}
-
-enum AttributeType {
-  STRING=0,
-  INT=1,
-  FLOAT=2,
-  DOUBLE=3,
-  ...
-}
-
-message Attribute {
-  oneof {
-    string s_value = 1;
-    int    i_value = 2;
-    float  f_value = 3;
-    ...
-  }
-}
-
-message AttributeDef {
-  required string name = 1;  // supported attribute name.
-  required AttributeType type = 2;  // supported type.
-  required string description = 3; // Attribute description & comments.
-  
-  optional Attribute default_value = 4; // default value.
-  optional Attribute max_value = 5;    // max value.
-  optional Attribute min_value = 6;   // min value.
-}
-
-// Argument Define the Supported InputTypes.
-message ArgumentDef {
-   	// Supported Input Type.
-   	// The data type of input/output.
-   	repeated DataType data_type = 1; 
-   	// 0 means it is not a sequence. 1 means a plain sequence. 2 means a nested sequence.  One layer could support many sequence type.
-   	repeated uint32 seq_nested_level = 2;
-    	
-   	// In paddle, some layer can handle variable length input.
-   	// If some input is repeatable, it means there are one or many inputs as the same input type.
-   	required bool repeatable = 3;
-    	
-	// In Paddle, a layer could return many outputs. Each output contains a different name.
-   	required string name = 4;
-   	
-   	// Comments
-  	required string description = 5;
-}
-
-message LayerDef {
-    required string type = 1;  // Layer type, such as 'fc', 'conv'
-    required string description = 2;  // Layer description & comments.
-    
-    
-    repeated ArgumentDef inputs = 3;
-    
-    
-    message ParameterDef {
-        repeated AttributeDef attributes = 1;  // Parameter Attributes Definition.
-    }
-    
-    // Each input of Paddle Layer should contain zero or one parameter.
-    // so parameter_attr.size() == inputs.size()
-    repeated ParameterDef parameter_attr = 5;
-    
-    // Set the bias attribute, If this layer support bias.
-    optional ParameterDef bias_attr = 6;
-    
-    // The Layer Attributes.
-    repeated AttributeDef layer_attr = 7;
-}
-
-// Define the layer's output types by given input types.
-message LayerOutputType {
-	// Output name, Each Paddle Layer could have multiple outputs.
-	optional string name = 1;
-	
-	// Output type
-	required DataType type = 2;
-	required uint32 size = 3;
-	required uint32 seq_nested_level = 4;
-	
-}
-```
-
-### C++ 端暴露LayerDef/LayerOutputType Protobuf.
-
-基本想法:
-
-* 对于每一种类型的Layer，Paddle根据Layer的名字约定两个全局函数的名字。例如，对于FC Layer，全局函数的名字是 `__get_fc_layer_definition__` 和 `__get_fc_layer_output_type__`。 这两个全局函数通过`REGISTER_LAYER`自动生成。
-* 对于每个Layer实现的时候，实现两个静态(`static`)函数，分别实现这两个函数。
-* 对于获得LayerOutputType的函数,同时完成**神经网络推导**过程。即在运行时设置ParameterSize，动态添加Layer的辅助输入等等。
-
-举例来说，例如对于FCLayer，可能的实现为:
-
-LayerDefinition.h是一个公共头文件，他的接口为
-
-```C++
-
-class LayerDefinition {
-public:
-  // Mark a layer support size attribute.
-  static void supportSize(LayerDef& );
-
-  // Make a layer support dropout attribute.
-  static void supportDropout(LayerDef& );
-
-  // Add a input of layer.
-  static LayerInputDefinition& addInput(LayerDef& );
-  ...
-};
-
-```
-
-FullyConnectedLayer.h是全连接层实现的头文件，它的实现为:
-
-```C++
-class FCLayer :public Layer {
-public:
-  void init() { ... }
-  void forward() { ... }
-  void backward() { ... }
-  
-  static void getLayerDefinition(LayerDef& def) {
-    LayerDefinition::supportSize(def);
-    LayerDefinition::supportDropout(def);
-    LayerDefinition::addInput(def)
-        .setRepeatable(True)
-        .addSupport({ InputType::Dense, InputType::SparseInt, InputType::Sparse })
-        .addSupportSeqLevel({0, 1, 2})
-        .addDoc("FC Layer is fully connected. Blah blah blah...");
-  }
-  
-  static std::vector<LayerOutputType> getLayerOutputType(const std::vector<LayerOutputDef>& inputs,
-  	    LayerConfig& self) {
-  	 // self could be modified, for calculating parameter size, etc.
-    LayerOutputDef out;
-    out.set_size(self.size());
-    out.set_type(InputType::Dense);
-    out.set_seq_nested_level(inputs[0].seq_nested_level);
-    return { out };
-  }
-};
-
-
-REGISTER_LAYER(fc, FCLayer);
-```
-
-### 配置解析运行流程
-
-配置解析(config parser)的运行流程如下图所示:
-
-![配置解析运行流程](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/0a3d7bfb44e45d61d7bd80b26ca18fbc/raw/7ad64cdfc31ba5a427a9d599e837af9fd3774138/parsing.dot)
-
-1. 读取Paddle Core中所有的Layer的元信息， LayerDef。
-1. 根据所有Layer的元信息，LayerDefs生成解析器ConfigParser
-	* 如何生成解析器是每个语言自定义的过程
-	* 这个过程可以是离线的过程。即先将所有Layer的LayerDef写入到一个文件里，然后其他语言读取这个文件，来生成代码。
-	* 这个过程同时也可以是在线的过程。比如对于Python这种动态类型语言，运行时生成函数比较简单，就没必要先生成代码，再生成函数了。
-1. 使用ConfigParser，解析用户的配置文件`trainer_config.conf`。
-	* 这时，解析器只返回一个不完整的`ModelConfig`。这个`ModelConfig`只包括用户在配置文件中的配置，而神经网络参数大小的推导在下一步解析中完成。
-1. 讲这个调用图传递给Paddle Core，生成真正的`ModelConfig`。
-	* 对于`ModelConfig`中每一个不完整的LayerConfig，补全默认值。
-	* 进而顺序执行 `getLayerOutputType`获得这个Layer的输出，并完成神经网络参数推导过程。再将这个LayerConfig传递给下一个Layer。
diff --git a/doc/design/layer_generation/02.use_map_in_protobuf.md b/doc/design/layer_generation/02.use_map_in_protobuf.md
deleted file mode 100644
index d01e44cfc8983..0000000000000
--- a/doc/design/layer_generation/02.use_map_in_protobuf.md
+++ /dev/null
@@ -1,150 +0,0 @@
-# 在Protobuf中支持多种类型的字典字段
-
-## 背景
-
-这项工作的背景是我们要使用代码生成器或者运行时自动生成模型配置函数，并在运行时自动检查配置的正确性。
-
-
-现阶段如何编写一个Layer呢？可以参考[文章](http://www.paddlepaddle.org/doc/dev/new_layer/index.html)。主体可以分为以下几个步骤:
-
-* 在[Protobuf文件](https://github.com/PaddlePaddle/Paddle/blob/develop/proto/ModelConfig.proto#L284)里，添加里面编写这个Layer需要的参数。如果这个Layer只需要size等常见配置，这个protobuf已经包含，复用即可。但是如果这个Layer有其他自定义的参数，就需要在这个文件里面添加字段。
-	* 也就是目前新建Layer和修改Protobuf文件是强耦合的。且这个protobuf message已经有了52个字段了。
-* 在C++端实现Layer
-* 在Python端实现这个Layer的解析函数，Wrapper，V2Layer等等。
-
-
-这个设计文档，旨在解决 Protobuf文件和Layer之间的耦合性，让用户新建一个Layer的时候不需要改Protobuf。并且，极大的简化Protobuf文件，清理原先protobuf中的冗余字段，例如合并LayerInputConfig中关于图像的若干字段(`ConvConfig`, `PoolConfig`, `NormConfig`等)。
-
-## 实现方式
-
-使用Protobuf中的[map](https://developers.google.com/protocol-buffers/docs/proto#maps)和[oneof](https://developers.google.com/protocol-buffers/docs/proto#oneof)将Paddle Potobuf中的配置简化成一个`map<string, variant>`形式。
-
-简单的示例代码为:
-
-```protobuf
-message Attribute {
-    oneof AttributeField {
-   	     string s_value = 1;
-   	     int    i_value = 2;
-   	     float  f_value = 3;
-   	     double d_value = 4;
-   	     ...
-    }
-}
-
-message LayerInputConfig {
-  required string name = 1;
-  map<string, Attribute> attributes = 2;
-};
-
-message LayerConfig {
-   required string name = 1;
-   required string type = 2;
-   map<string, Attribute> attributes = 3;
-   repeated LayerInputConfig inputs = 4;
-}
-```
-
-其中，每种Layer都有不同的`type`。 而`attributes`作为一个`map`，他的Key可以被每个Layer自己定义。对于一些常见的配置参数，例如`activation`，可以共享一个key。对于一些layer专有属性，可以使用`.`分隔开。例如，对于CTCLayer可以设置`blank`属性，它的Key可以为`ctc.blank`。
-
-这样，实现一个新的Layer，用户就不需要修改Protobuf消息了。并且，用户写一个新的Layer的时候，可以说明自己需要哪些属性，而这些属性的取值范围又是如何的。这样，我们在生成Python配置函数的代码时，可以生成运行时检查的代码。避免用户错误的配置神经网络。
-
-## 样例配置
-
-```json
-{
-  "layers": [
-    {
-      "name": "image",
-      "type": "data",
-      "attributes": {
-        "size": 65536
-      }
-    },
-    {
-      "name": "__conv_0__",
-      "type": "exconv", 
-      "attributes": {
-        "size": 3297856,
-        "activation": "linear",
-        "num_filters": 64,
-        "out.x": 227,
-        "out.y": 227,
-        "bias.name": "___conv_0__.wbias",
-        "bias.shared": true
-      },
-      "inputs" : [{
-        "name": "image",
-        "attributes": {
-          "parameter_name": "___conv_0__.w0",
-          "conv.filter_size": 32,
-          "conv.stride.x": 1,
-          "conv.padding.x": 1,
-          "conv.stride.y": 1,
-          "conv.padding.y": 1,
-          "conv.groups": 1,
-          "conv.filter_channels": 1,
-
-          "img.channels": 1,
-          "img.x": 256,
-          "img.y": 256
-        }
-      }]
-    },
-    {
-      "name": "__batch_norm_0__",
-      "type": "batch_norm",
-      "attributes": {
-        "size": 3297856,
-        "activation": "relu",
-        "out.x": 227,
-        "out.y": 227,
-        "bias.name": "___batch_norm_0__.wbias",
-        "moving_average_fraction": 0.9
-      },
-      "inputs": [
-        {
-          "name": "__conv_0__",
-          "attributes": {
-            "parameter_name": "___batch_norm_0__.w0",
-            "img.x": 227,
-            "img.y": 227,
-            "img.channels": 64
-          }
-        },
-        {
-          "name": "__conv_0__",
-          "attributes": {
-            "parameter_name": "___batch_norm_0__.w1"
-          }
-        },
-        {
-          "name": "__conv_0__",
-          "attributes": {
-            "parameter_name": "___batch_norm_0__.w2"
-          }
-        }
-      ]
-    },
-  ]
-}
-
-```
-
-## 实现问题
-
-实现这项工作目前来看有如下几个先决条件需要解决:
-
-* 这项工作会修改 `Python <==> Paddle core`中间的protobuf消息定义，对于Python端Layer解析函数，需要有覆盖完整的单元测试，才能保证这一步工作进行完之后，系统行为没有问题。否则，直接修改 Protobuf 风险较高。
-* `oneof`与`map`是`protobuf2`语法，但是这是在`Protobuf 3.0`之后的代码库中添加的功能，如果Paddle依赖这个功能，那么Paddle必须依赖Protobuf 3.0以上的Protobuf版本。
-* 这个阶段保证Paddle的配置接口向后兼容，但是生成的Protobuf二进制有所修改。但保证可以新生成一个Protobuf二进制，使用命令 `
-python -m paddle.utils.dump_config trainer_config.conf "" --binary > trainer_config.bin`
-
-## 总结
-
-* 最终目的: 用户只需要写Layer的C++实现，剩下的Python代码自动生成
-* 阶段目的: 解耦合 Protobuf与Layer的C++实现
-* 解决办法: 用`map`和`oneof`，将属性变成一个多种类型的字典
-* 问题:
-	* 需要先完善config_parser的单测，增加单测覆盖率
-	* 这会让Paddle强制依赖`Protobuf 3.0+`的Protobuf
diff --git a/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md b/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
deleted file mode 100644
index 7651b5fe90b3c..0000000000000
--- a/doc/design/layer_generation/03.how_to_write_a_layer_in_pure_cpp.md
+++ /dev/null
@@ -1,459 +0,0 @@
-# 如何写一个Layer (Pure CPP)
-
-这个文档是一个概念性的文档，描述如何使用Pure Cpp重构Paddle的配置解析，在此之上用户如何实现一个Layer。
-
-## 基本目标
-
-用户只需要写Layer的计算信息，不需要写配置解析器，就可以完成Layer的书写。
-
-同时，Layer解析过程尽可能的快。
-
-
-## 实现总体概览
-
-* 在注册Layer的时候，不只注册Layer的C++类型，同时注册Layer的元信息。元信息是一个C++对象，是简单的 `std::unordered_map<std::string, std::any>`的组合。
-* 使用全局静态函数生成Layer的元信息。
-* 使用统一的C-API，让用户可以根据元信息，new出来一个layer，并做参数正确性检查
-* 网络参数推导在new出一个Layer之后立即进行。
-
-注意: 元信息是指描述某种信息的信息。譬如Layer的元信息是指描述某种Layer可以如何被描述的信息。而不是具体某一个Layer的实际描述。
-
-
-## 实现细节
-
-### 前置依赖
-
-#### std::any
-
-* [std::any](http://en.cppreference.com/w/cpp/utility/any)由于是 CPP 17的标准库，Paddle支持的最低CPP标准是CPP 11。所以可能需要手写一个简单的`std::any`。
-	* std::any是一个可以放置任何类型的对象。
-
-### 元信息
-
-#### AttributeDef
-
-某一个属性的元信息。即描述一个属性是何种类型，可以接受何种参数。其定义如下:
-
-
-```cpp
-struct AttributeDef {
-   std::string name;
-   std::type_info type;
-   std::string description;
-   std::any checkCallback;
-};
-```
-其中 checkCallback是一个回调函数，他的类型是`(T* attr, bool setted) => paddle::Error`，因为输入的attr可以是任意类型的泛型，故这里用std::any表示类型。其中`setted`是表示这个参数是不是被用户设置过。
-
-例如，对于dropout_rate的AttributeDef可以是:
-
-```text
-{
-	"name": "dropout",
-	"type": "float",
-	"description": "Set drop out rate of layer. 1 means all activations are dropped, 0 means do not drop any activation",
-	"checkCallback": function (float* attr, bool setted) => paddle::Error {
-		if (!setted) {
-			*attr = 0.0;  // default value.
-		} else {
-			if (0.0 <= *attr <= 1.0) {
-				return paddle::Error::OK;
-			} else {
-				return paddle::Error("Dropout should be in [0.0, 1.0].");
-			}
-		}
-	}
-}
-```
-
-其中，对于`checkCallback`可以预定义一些常用checkCallback。譬如，同样对于dropout，可以定义为
-
-```text
-{
-	...
-	"checkCallback": paddle::AttributeDef::inRange<float>(name="dropout", min=0.0, max=1.0, default=0.0)
-}
-```
-
-#### ParameterDef
-
-定义一个输入参数(Parameter)的元信息。即这个参数可以支持哪些属性
-
-```cpp
-struct ParameterDef {
-	std::vector<AttributeDef> attributes;
-};
-```
-
-对于常见的ParameterDef为:
-
-```text
-[
-	{
-		"name": "name",
-		"description": "The name of this parameter",
-		"checkCallback": paddle::AttributeDef::notNull<std::string>()
-	},
-	{
-		"name": "weight_decay",
-		"description": "The weight decay rate of parameter, used to implement L2 Norm",
-		"checkCallback": paddle::AttributeDef::inRange<float>(name="weight_decay", min=0.0, max=1.0, default=0.0)
-	},
-	{
-		"name": "dims",
-		"description": "The dimension of parameter",
-		"checkCallback": function (std::vector<size_t>* dims, bool setted) {
-			if (!setted) {
-				return "Dims must be set".
-			}
-			if (dims->size() != 2) {
-				return "Dims must be 2 in this parameter. They are height * width.";
-			}
-			return OK;
-		}
-	}
-]
-
-```
-
-
-#### InputDef
-
-定义一个Layer或者一个Op输入的元信息。即描述可以接受何种输入类型。
-
-```cpp
-struct IutputDef {
-	std::vector<DataType> dataType;  // 可以接受的输入类型
-	std::vector<SeqType> seqType; // 可以接受的sequence type
-	bool repeatable;  // 该输入是否可以为多个。例如，对于fc layer，它的输入就是无限多个的。
-	std::string name;
-	std::string description;
-	
-	std::unique_ptr<ParameterDef> paramAttr; // 该输入参数的元信息，可以为空。为空表示该输入没有参数
-};
-```
-
-例如，对于FC Layer的输入，可能定义为:
-
-```text
-{
-    "name": "input",
-    "description": "input fields of fc layer"
-    "dataType": [Dense, Sparse, SparseBinary],
-    "seqType": [0, 1, 2],
-    "repeatable": true,
-    "paramAttr" : [
-    	{
-    		name: "dims",
-    		...
-    	},
-    	{
-    		name: "weight_decay",
-    		...
-    	},
-    	{
-    		name: "gradient_clipping",
-    		...
-    	}
-    ]
-}
-```
-
-#### LayerDef
-
-定义一个Layer需要的元信息，即描述这个Layer需要接受的参数有哪些。
-
-```cpp
-struct LayerDef {
-	std::string type;
-	std::string description;
-	std::vector<IutputDef> inputs;
-	std::unique_ptr<ParameterDef> biasAttr;
-	std::vector<AttributeDef> attrs;
-};
-```
-
-例如 FC Layer的如下表示
-
-```text
-{
-	"type": "fc",
-	"description": "Fully connected layer",
-	"inputs": [...],  # just like IutputDef example
-	"biasAttr": [{
-		"name": "dims",
-		...
-	},{
-		"name": "weight_decay",
-		...
-	},
-		...
-	],
-	"attrs": [{
-		"name": "dropout",
-		...
-	},
-		...
-	]
-}
-```
-
-#### GraphDef
-
-定义一个计算图需要的元信息
-
-```cpp
-struct GraphDef {
-	std::unordered_map<std::string, AttributeDef> attrs;
-};
-```
-
-### 具体对象
-
-具体对象表示一个计算图里面里面一个具体层或者OP的描述。
-
-#### ParameterAttributes
-
-表示神经网络中，某一个参数(Parameter)的具体属性。
-
-```cpp
-struct ParameterAttributes {
-	std::unordered_map<std::string, std::any> attrs;
-};
-```
-
-可能的取值为:
-
-```text
-{
-	"name": "fc.w",
-	"dims": [784, 200],
-	...
-}
-```
-#### OutputAttribute
-
-表示一个Layer/OP的输出属性。这个属性没有元信息。一个Layer或者OP的输出就是其他Layer/OP的输入。
-
-```cpp
-struct OutputAttribute {
-	std::string name;  // 一个Layer可以有多个输出，但是他们的name不同。
-	DataType dataType;
-	SeqType seqType;
-	size_t size;
-	
-	std::unordered_map<std::string, std::any> attrs;
-}
-```
-
-可能的取值为
-
-```text
-{
-	"name": "", // 空为这个Layer的默认输出
-	"DataType": Dense,
-	"SeqType": 0,
-	"size": 784,
-	"attrs" : {
-		"num_channels": 1
-	}
-}
-```
-
-
-#### InputAttribute
-
-表示一个Layer的输入数据属性。
-
-```cpp
-struct InputAttribute {
-	std::string name;
-	std::shared_ptr<LayerAttribute> inputLayer;
-	std::string inputName;
-	std::shared_ptr<ParameterAttributes> paramAttr;  // same param could be shared by multiple inputs
-	std::unordered_map<std::string, std::any> attrs;  // extra input attr.
-};
-```
-
-可能的取值为
-
-```text
-{
-	"name": "input",
-	"inputLayer": {
-		"name": "fc1"
-	},
-	"inputName": "",
-	"paramAttr": {
-		"name": "fc.w",
-		...
-	},
-	"attrs" : {
-		"num_channels": 1
-	}
-}
-```
-
-#### LayerAttribute
-
-表示一个Layer的属性
-
-```cpp
-struct LayerAttribute {
-	std::vector<InputAttribute> inputs;
-	std::vector<OutputAttribute> outputs;
-	std::vector<std::shared_ptr<LayerAttribute>> preDepends;  # most dependencies are written in outputs. But we should add preDepends for some situation. for example, RecurrentLayerGroup.
-	std::unordered_map<std::string, std::any> attrs;
-};
-```
-
-可能的取值为:
-
-```text
-{
-	"inputs": [{
-		name: input,
-		inputLayer: {
-			name: pixel,
-			...
-		},
-		paramAttr: {
-			name: fc.w,
-			...
-		},
-		...
-	}],
-	"outputs": [{
-		name: "",
-		size: 200,
-		...
-	}],
-	attrs: {
-		"size": 200,
-		"activation": tanh,
-		...
-	},
-	...
-}
-```
-
-#### ComputationGraph
-
-ComputationGraph为当前的计算图
-
-```cpp
-struct ComputationGraph {
-	std::unordered_map<std::string, std::any> attrs;
-	std::vector<std::shared_ptr<OutputAttribute>> outputs;
-	std::vector<<std::shared_ptr<LayerAttribute>> extraLayers;  // extra layers are attached to this computation graph, but it is not the outputs.
-};
-```
-
-
-## 用户定义一个新的Layer
-
-### 定义Layer接受的参数
-
-用户定义一个新的Layer需要定义这个Layer的元信息 LayerDef。定义方法为:
-
-```cpp
-class FCLayer :public Layer {
-public:
-  ...
-  
-  static void getLayerDefinition(LayerDef& def) {
-    LayerDefinition::supportSize(def);
-    LayerDefinition::supportDropout(def);
-    LayerDefinition::addInput(def)
-        .setRepeatable(True)
-        .addSupport({ InputType::Dense, InputType::SparseInt, InputType::Sparse })
-        .addSupportSeqLevel({0, 1, 2})
-        .addDoc("FC Layer is fully connected. Blah blah blah...");
-  }
-};
-```
-
-### 定义参数推导过程
-
-用户定义参数推导过程，定义方法为:
-
-```cpp
-class FCLayer :public Layer {
-public:
-  ...
-  
-  static paddle::Error calculateOutputAndParam(LayerAttribute& self) {
-    // fill self.outputs by self.inputs.
-    // also calculate self.inputs.parameters's size. etc.
-  }
-};
-```
-
-### Layer初始化
-
-经过正确性检查的LayerAttribute会被用来初始化某一个Layer，初始化方法为:
-
-```cpp
-class FCLayer :public Layer {
-public:
-	void init(LayerAttribute& attr, Parameters& params) {
-	...
-	}
-	...
-};
-```
-
-### 注册Layer
-
-直接注册这个Layer的类型即可
-
-```cpp
-REGISTER_LAYER(fc, FCLayer);
-```
-
-## 用户配置与解析
-
-对于Layer的配置，只有如下几个接口。他们是
-
-```cpp
-extern "C" {
-
-typedef void* Symbol;
-
-Symbol newSymbol(int symbolType);
-void setAttribute(void* symbol, const char* path, int type_id, void* value);
-void appendAttribute(void* symbol, const char* path, int type_id, void* value);
-void destroySymbol(Symbol sym);
-}
-```
-其中Symbol是`ParameterAttribute`, `InputAttribute`, `OutputAttribute`, `LayerAttribute`, `ComputationGraph`的通称
-
-简单的使用样例如下:
-
-```cpp
-auto paramAttr = newSymbol(PARAMETER_ATTRIBUTE)
-setAttribute(paramAttr, "name", STRING_TYPE, "fc.w");
-appendAttribute(paramAttr, "dims", INT_TYPE, 784);
-appendAttribute(paramAttr, "dims", INT_TYPE, 200);
-
-auto inputAttr = newSymbol(INPUT_ATTRIBUTE);
-setAttribute(inputAttr, "name", STRING_TYPE, "input");
-setAttribute(inputAttr, "paramAttr", SYMBOL_TYPE, paramAttr);
-setAttribute(inputAttr, "inputLayer", SYMBOL_TYPE, dataLayerAttr);
-
-auto layerAttr = newSymbol(LAYER_ATTRIBUTE);
-setAttribute(layerAttr, "type", STRING_TYPE, "fc");
-setAttribute(layerAttr, "name", STRING_TYPE, "fc");
-setAttribute(layerAttr, "size", INT_TYPE, 200);
-setAttribute(layerAttr, "activation", INT_TYPE, 0);  // 0 means sigmoid, etc.
-appendAttribute(layerAttr, "inputs", SYMBOL_TYPE, inputAttr);
-
-
-auto graph = newSymbol(GRAPH_ATTRIBUTE);
-appendAttribute(layerAttr, "outputs", SYMBOL_TYPE, layerAttr);
-
-auto engine = newExecuteEngine(graph);
-engine.forward()
-engine.backward()
-parameters.update()
-```
diff --git a/doc/design/topology_in_cpp.md b/doc/design/topology_in_cpp.md
new file mode 100644
index 0000000000000..cf389ff11bf6c
--- /dev/null
+++ b/doc/design/topology_in_cpp.md
@@ -0,0 +1,264 @@
+# Paddle神经网络拓扑表示方式重构
+
+## 背景
+
+目前Paddle中，解析用户配置的过程非常繁复。这也是因为Paddle作为一个四年左右项目的遗留问题。为了**兼容**之前所有的Paddle配置文件格式，也为了简化用户配置流程，现阶段Paddle共有三种配置风格。最原始的配置文件格式(`config_parser.py`)，`trainer_config_helper`和`paddle.v2.layer`。三者的调用关系为 `paddle.v2` 调用 `trainer_config_helper`再调用`config_parser.py`。虽然我们没有重复的写这些代码，但是多层的封装让代码很难维护。
+
+主要痛点在于:
+
+* 用户使用Layer，想去查询某一个参数应该如何使用。深入调研Paddle的代码会非常迷惑。同时，这些代码中只有`trainer_config_helper`是具有良好注释和良好风格的。`paddle.v2`虽然也有注释与文档，但其函数是动态生成的，而不是静态的代码，所以也不能**阅读**，而`config_parser.py`缺乏文档和注释。
+* 开发者如果想要新写一个Layer，需要修改多个文件。开发者如果需要修改一个Layer的实现，也有同样的问题。
+   * 首先，新写一个Layer，开发者需要在Paddle的[protobuf文件](https://github.com/PaddlePaddle/Paddle/blob/develop/proto/ModelConfig.proto)中，添加这个Layer需要的参数。
+   * 其次，完成这个Layer需要的C/C++文件。完成这个Layer的前向后向代码。
+   * 最后完成这个Layer配置文件解析`config_parser.py`，`trainer_config_helpers`和`paddle.v2`。
+* 如果有其他Language Binding，需要开发的工作量太高。
+* 使用protobuf作为多语言接口的中间协议，序列化速度慢，且对于嵌入式设备库大小过大。
+
+所以这个设计的目标就是**治理**目前Paddle定义Layer和配置混乱复杂的问题，得到一个清爽的结果, **用户只需要**写一个`C/C++`实现即可完成一个Layer的开发。
+
+同时，这个设计还会兼顾的问题有:
+
+* 向后兼容性 ---- 即是否兼容之前的配置方式.
+* 动态网络开发 ---- 神经网络配置解析为动态网络的基础部分。动态网络要求**配置解析必须快**。详细关于动态网络介绍，请参考[DynamicNet](./dynamic_net/00.how_to_implenment_dynamic_net.md)
+
+
+## 实现方案
+
+### 主要思路
+
+* 在Paddle C++ Core中，开发新的数据结构表示神经网络的拓扑结构。第三方语言可以**直接**操纵C API来配置神经网络拓扑结构。
+	* C++中新建一个namespace，叫做`paddle::topology`。
+	* 新建的数据结构可以被序列化成`json`而不是`protobuf`，方便嵌入式部署。
+* 开发一个新的存储元信息(`meta`)的数据结构，表示一个神经网络拓扑结构可以有哪些参数。
+	* 这个元信息
+	* 存储元信息的数据结构不要求可以被序列化和反序列化，因为元信息不影响模型的部署。但存储元信息的数据结构可以被C API暴露给第三方语言(例如Python)。
+	* 第三方语言读取拓扑结构元信息，进而在第三方语言中生成配置拓扑结构的函数。
+* 目前Paddle中的`protobuf`格式，可以转换为新的拓扑结构。进而维持原有代码的向后兼容性。让Paddle可以渐进式优化。
+
+### 具体实现
+
+#### 元信息
+
+所有的元信息类放置到`paddle::topology::meta`名字空间下。基本的元信息包括如下几个方面:
+
+##### AttributeMeta
+
+`AttributeMeta`表示拓扑结构中所有属性的元信息。属性是指拓扑结构中配置的参数。例如层的大小，激活函数的形式等等。这些属性的元信息都由`AttributeMeta`表示。
+
+```cpp
+class AttributeMeta {
+public:
+  std::string name;  // attribute name, e.g., 'size' in layer.
+  std::type_info type;  // attribute type, e.g., 'uint64_t' about 'size'
+  std::string description; // the description of this attribute, e.g., 'the size of layer'.
+  std::any checkCallback; // The function check whether this attribute is valid or not.
+};
+```
+其中 checkCallback是一个回调函数，他的类型是(T* attr, bool setted) => paddle::Error，因为输入的attr可以是任意类型的泛型，故这里用std::any表示类型。其中setted是表示这个参数是不是被用户设置过。
+
+举例对于layer的`dropout_rate`这个属性的AttributeMeta可以设置为:
+
+```cpp
+auto dropoutMeta = new AttributeMeta();
+dropoutMeta->name = "dropout";
+dropoutMeta->type = typeid(float);
+dropoutMeta->description = "Set drop out rate of layer. "
+                            "1 means all activations are dropped, 0 means do not drop any activation";
+dropoutMeta->checkCallback = [](float* attr, bool setted) -> paddle::Error {
+   if (!setted) {
+     *attr = 0.0f;  // default value;
+     return paddle::Error::OK;
+   }
+   // Check whether input is valid.
+   if (*attr < 0.0f || *attr > 1.0f) {
+   	 return paddle::Error("dropout rate should be in [0.0, 1.0]");
+   } else {
+     return paddle::Error::OK;
+   }
+};
+```
+
+##### TensorMeta
+
+`TensorMeta`表示一个神经网络拓扑结构中，每一层的输入信息和参数信息的元信息(对于于目前Paddle C++ Core中的Parameter和Argument的元信息)。
+
+`TensorMeta`由许多`AttributeMeta`构成。
+
+```cpp
+class TensorMeta {
+public:
+  std::vector<AttributeMeta> attributes;
+};
+```
+
+举例说明，对于全连接层的输入，可能的`TensorMeta`值为:
+
+```cpp
+enum DataType {
+  DENSE=0,
+  SPARSE_BINARY,
+  SPARSE,
+  INTEGER
+};
+
+enum SequenceType {
+  NO_SEQUENCE=0,
+  SEQUENCE,
+  NESTED_SEQUENCE
+};
+...
+
+auto inputMeta = new TensorMeta();
+auto dataTypeMeta = new AttrbuteMeta("type", typeid(std::pair<DataType, SequenceType>), "Data type of this tensor");
+dataTypeMeta->checkCallback = [](std::pair<DataType, SequenceType>* type, bool setted) -> paddle::Error {
+  if (!setted) {
+    return paddle::Error("Type of tensor should be setted");
+  }
+  if (*type != {DENSE, NO_SEQUENCE}) {
+    return paddle::Error("FC Layer only support dense, no_sequence data type as input.");
+  }
+  return paddle::Error::OK;
+}
+inputMeta->attributes.push_back(dataTypeMeta);
+
+auto shapeMeta = new AttributeMeta("shape", typeid(std::vector<uint32_t>), "The shape of this tensor");
+shapeMeta->checkCallback = [](std::vector<uint32_t>* shape, bool setted) {
+  if (!setted) {
+    return paddle::Error("Shape of tensor should be setted");
+  }
+  if (shape->size() != 2) {
+  	return paddle::Error("FC Layer only support 2 dim tensor(a.k.a matrix) as input.");
+  }
+  if (shape->at(1) > 0) {
+  	return paddle::Error("The width of fc layer input should larger than 0.");
+  }
+  return paddle::Error::OK;
+}
+inputMeta->attributes.push_back(inputMeta);
+```
+
+对于全连接层的输入参数，可能的`TensorMeta`值为:
+
+```cpp
+auto inputParamMeta = new TensorMeta();
+inputMeta->attributes.push_back(new AttributeMeta("shape", ...));
+inputMeta->attributes.push_back(new AttributeMeta("type", ..));
+inputMeta->attributes.push_back(new AttributeMeta("weight_decay", ...)); // support weight decay;
+inputMeta->attributes.push_back(new AttributeMeta("initial_mean", ...)); // init strategy;
+inputMeta->attributes.push_back(new AttributeMeta("initial_std", ...)); // init std.
+...
+```
+
+##### LayerInputMeta
+
+神经网络层输入的元信息。一个层的输入既包括了某一个层输入的值，也包括了和这个输入配合的参数值。某些层的输入可以没有参数。某些层的输入可以是无穷多个，但是每一个输入都使用同一个`LayerInputMeta`.
+
+```cpp
+
+class LayerInputMeta {
+public:
+  std::string name;
+  std::string description;
+  bool canBeMany; // some layer can have unlimited number of input, but share same meta.
+  TensorMeta inputTensorMeta;
+  std::unique_ptr<TensorMeta> paramTensorMeta;  // could be null;
+}
+```
+
+举例FC Layer的`LayerInputMeta`为:
+
+```cpp
+auto fcInputMeta = new LayerInputMeta();
+fcInputMeta->name = "input";
+fcInputMeta->description = "The input of fully connected layer";
+fcInputMeta->canBeMany = true;
+fcInputMeta->inputTensorMeta = fcInputTensorMeta;
+fcInputMeta->inputParameterMeta = fcInputParamMeta;
+```
+
+##### LayerMeta
+
+LayerMeta表示一个神经网络层可以的元信息。它包括这个层的类型，描述，这个层输入的元信息，bias参数的元信息和这个层的一些其他属性(例如dropout_rate)。
+
+```cpp
+struct LayerMeta {
+	std::string type;
+	std::string description;
+	std::vector<LayerInputMeta> inputs;
+	std::unique_ptr<TensorMeta> bias;
+	std::vector<AttributeDef> attrs;
+};
+```
+
+实际举例略，和`LayerInputMeta`等类似。
+
+##### TopologyMeta
+
+TopologyMeta表示一个拓扑结构可以设置的属性。他包括Paddle支持的层的信息，也包括拓扑结构中可以配置的其他属性。例如输入层的名字等等。
+
+```cpp
+struct TopologyMeta {
+  std::vector<LayerMeta> layers;
+  std::vector<AttributeMeta> attrs;
+};
+
+```
+
+#### 实际信息
+
+实际信息指每一个神经网络拓扑结构的实际描述，也表示着神经网络拓扑结构中的实际参数。它是可以被序列化成`json`的数据结构。也是每次训练或者预测时，Paddle载入的实际模型配置。这些信息被放置在`paddle::topology`名字空间下。
+
+##### Attribute
+
+神经网络的属性可以是任意类型，使用`std::pair<std::string, std::any>`表示。不再创建新的类型。
+
+
+##### Tensor
+
+Tensorl表示拓扑结构中某一个Tensor实际配置属性。
+
+```cpp
+class Tensor {
+public:
+  std::unordered_map<std::string, std::any> attributes;
+};
+```
+
+##### LayerInput
+
+`LayerInput`表示神经网络某一层的实际输入配置。
+
+```cpp
+class LayerInput {
+public:
+  std::string name;
+  Tensor inputTensor;
+  std::unique_ptr<Tensor> paramTensor;  // could be null;
+}
+```
+
+##### Layer
+
+`Layer`表示神经网络某一个层的实际配置。
+
+```cpp
+class Layer {
+public:
+	std::vector<LayerInput> inputs;
+	std::unique_ptr<Tensor> bias;
+	std::unordered_map<std::string, std::any> attrs;
+};
+
+```
+
+##### Topology
+
+`Topology`表示一个神经网络Topology的全部配置。即为Paddle中的`Topology`类。
+
+```cpp
+class Topology {
+public:
+  std::vector<Layer> layers;
+  std::unordered_map<std::string, std::any> attrs;
+}
+```

From 4acd5798418da60e5e95d14a2d170aa41b3ff4f7 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 10 May 2017 14:34:49 +0800
Subject: [PATCH 20/27] Add topology user stories and goals

---
 .../00.how_to_implenment_dynamic_net.md       |  63 -----
 doc/design/topology.md                        |  86 ++++++
 doc/design/topology_in_cpp.md                 | 264 ------------------
 3 files changed, 86 insertions(+), 327 deletions(-)
 delete mode 100644 doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
 create mode 100644 doc/design/topology.md
 delete mode 100644 doc/design/topology_in_cpp.md

diff --git a/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md b/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
deleted file mode 100644
index b17049919f045..0000000000000
--- a/doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
+++ /dev/null
@@ -1,63 +0,0 @@
-# 动态神经网络的实现
-
-动态网络是目前神经网络框架的前沿课题。动态神经网络的优势解决了普通神经网络框架的一个重要问题，**神经网络的定义和计算是分离的**。即静态神经网络框架的计算步骤是，先定义一个神经网络的计算图，再使用计算引擎计算这个计算图。而动态神经网络的特点是，直接对每个操作求值，隐式的定义计算图，从而再对这个隐式的计算图反向传播。
-
-常见的使用方式为:
-
-
-```python
-x = paddle.dyn.data(type=DenseVector(784))
-x.fill([0.058, 0.548, ...])
-
-y = paddle.dyn.data(type=Integer(10))
-y.fill(9)
-
-hidden = paddle.dyn.fc(input=x, size=200)
-
-# You can use hidden.npvalue() to get this layer's value now.
-
-prediction = paddle.dyn.fc(input=hidden, size=10, act=Softmax())
-
-cost = paddle.dyn.classification_cost(input=prediction, label=y)
-
-if cost.npvalue() < 0.001:
-	cost *= 100 # scale up cost if cost is little, just a demo for dynamic network.
-
-print 'Cost = ', cost.npvalue()
-
-cost.backward()
-parameters.update()
-```
-
-## 动态神经网络解决的问题
-
-动态神经网络只有神经网络的计算步骤，而隐藏了神经网络的定义步骤，用户可以为每一个sample或者batch定义一个不同的网络。相对于静态神经网络而言，动态神经网络解决了以下几个问题：
-
-* 可以任意的在计算过程中添加复杂的控制逻辑，例如迭代，递归，条件选择等，这些控制逻辑都可以由host language(C++/Python)来实现。
-* 可以支持更复杂的数据类型，并且对于不同的数据，神经网络的计算图可以不同。
-* 动态神经网络的执行过程就是其定义过程，用户可以对神经网络中的参数，中间结果等信息直接求值，方便debug的过程。
-
-
-## 动态神经网络的实现思路
-
-动态神经网络计算图的定义是隐式的，其设计哲学可以参考一些[autograd库](https://github.com/HIPS/autograd)。具体实现思路如下：
-
-
-1. 对于每一个batch，用户使用layer或者OP组合来定义神经网络的操作。每个batch都拥有一个graph结构来记录该sample的计算图。
-	* 这个graph经常是一个全局变量。在不同的库中名称不一样。例如对于PyTorch，这个对象叫做[Tape](https://github.com/pytorch/pytorch#dynamic-neural-networks-tape-based-autograd)
-	* Graph 是实时动态构造的。即用户调用一次Layer或者OP，就会向这个Graph里面添加一个节点。
-2. graph中包含每一层layer的信息，包括输入数据来源，该层layer进行的操作，输出数据大小等。新连接上的layer的相关信息会被持续追加到graph中。
-3. layer求值操作可以是lazy的，直到用户显式的调用value()方法，graph中记录的计算图才会被execute engine真正执行，计算得到该层layer的输出结果。
-4. 用户可以在组合layer的时候加入控制逻辑，被选择的分支信息也会记录到graph中。
-	* 用户代码可以有分支，循环等等。但对于Graph这个结构并没有循环或者分支的操作。Graph只记录Layer或者Op调用的行为。
-5. 在进行backward()操作时，graph的execute engine会根据记录的计算图执行反向传播，计算梯度。
-
-
-
-## 动态神经网络对神经网络框架的要求
-
-* 最核心的要求就是构建计算图的过程要足够**轻量**。前端的Python wrapper也要足够薄和快，可以直接使用后端C++提供的接口。
-
-* 考虑到layer的求值是lazy的，可以使用表达式模板对计算过程进行优化。
-
-* 考虑对不同大小数据/不同网络结构组batch进行训练。在动态网络中，每一个batch/sample都可以拥有自己的计算图，相比于静态网络，在GPU上进行并行操作是比较困难的。
diff --git a/doc/design/topology.md b/doc/design/topology.md
new file mode 100644
index 0000000000000..08be81acbedd1
--- /dev/null
+++ b/doc/design/topology.md
@@ -0,0 +1,86 @@
+Topology is a concept in Paddle for representing Neural Network.  A neural network contains one topology, which describes how layers connected to each other, and many parameters. The other deep learning frameworks may call this concept a computation graph, neural network configurations.
+
+The topology is not only an API level concept but also how Paddle organizes the computation codes for each `Layer` or `Function`. The Paddle hold a dictionary from `Layer Type` to Layer implementation, e.g.  from string `mul` to function `void tensor_multiply(Tensor& ins, Tensor& outs)'. So the mechanism about how to manipulate topology by Users, how Paddle maps user topology to implementations of `Layer` and `Function` is a fundamental problem for refactoring Paddle.
+
+## User Stories and examples
+
+### Kernel Developers
+
+Alan is a professional developer in CPU and GPU. He can write the kernel functions of a new `Layer` with the best performance. However, he is not a familiar with Paddle third-party language, e.g. Python. However, Paddle uses Python as its API language. Alan just needs to write the kernel function and register them to Paddle, and then Paddle generates the user-side APIs for this kernel functions without any codes written by Alan.
+
+```cpp
+template <DeviceType devType>
+void cos_kernel(std::vector<Tensor>& ins, std::vector<Tensor>& outs,  double scale) {
+    // implemetation here.
+}
+
+BEGIN_REGISTER_FUNCTION(cos, cos_kernel)
+// The parameter of cos function. 
+func.addAttribute("scale", "The scale of cos layer").defaultValue(1.0).largerThan(0.0);
+
+// Two inputs
+func.addInput().dataType(Dense).dimension(2).supportSeqType();
+func.addInput().dataType(Dense).dimension(2).supportSeqType();
+
+// One outputs
+func.addOutput().dataType(Dense).dimension(2).supportSeqType();
+
+// Tell Paddle how to inference the output shape?
+func.setShapeInferer([](std::vector<Dims>& ins, std::vector<Dims>& outs){
+    outs[0] = {ins[0][0], 1};  // output dimension = batch_size * 1
+});
+
+END_REGISTER_FUNCTION()
+```
+
+### QA developer
+
+Bob is a QA developer of Paddle.  He wants to tests all Paddle supported `Function` and `Layer`.  However, each layer has different configuration parameters, e.g. `scale` in `cosine` function. Each configuration parameter has different value range, data type. By using Topology registers, Bob can easily test all boundary conditions of one Layer or Functions.
+
+```
+auto cos = function::Register("cos");
+
+for each_attribute in cos.attributes:
+    each_attribute = each_attribute.min_value
+
+test(cos);
+
+for each_attribute in cos.attributes:
+    each_attribute = each_attribute.max_value
+test(cos);
+```
+
+### Language Binding developer
+
+Carol is a language binding developer of Paddle. She wants to develop a language binding of Paddle, and she is not familiar with Paddle C++ core and does not want to go so deep in Paddle. She just wants a clear list of what Layer Paddle supports, the configuration parameters of each Layer. 
+
+Also as a language binding developer, Carol does not want to write any configuration validation code in language binding because Paddle C++ Core could be in flux and layer's configuration could be changed.
+
+She just can access the register information of `Topology` and uses this information in another language. She can either uses reflection or code generation in that language to generate configuration APIs.
+
+```python
+import paddle
+
+for layer_name in paddle.topology.meta.all_registed_layers:
+    def __func__(**kwargs):
+        layer_meta = paddle.topology.meta.all_registed_layers["layer_name"]
+        return layer_meta.new_configration(kwargs)
+
+    globals()[layer_name] = __func__
+```
+
+### API End-Users
+
+David is a new user of Paddle, who are not familiar with Paddle and deep learning. He writes a Python program and configures a neural network. When he run this program, he expects a clear error message when his configuration is wrong, such as `cosine layer's scale parameter should be larger than 0.0.`, not just a `check error` in our computation kernel. Because we register all parameter's meta information, it is easy to achieve this goal.
+
+
+## Goals
+
+After thinking lots of user stories, we make the conclusion of what we want in Topology design.
+
+* User should directly operate C++ topology configuration because we should maintain the consistency between each language bindings, and make language binding layer thin and easily to develop.
+* Our topology configuration should be able to validate user's input and give a reasonable error message. Also, we should maintain some meta information of each configuration attribute, e.g. `scale` attribute in `cos` layer is a `double` value, should be larger than 0.0, and the default value is 1.0.
+* We should serialize our topology into a portable format, so users can use the model they trained before for inference.
+* We should let our kernel developer easily to register their kernel functions to Paddle. The only information they should provide is what are the configuration attribute of this function, what could be inputted to this function, what could be outputs of this function.
+
+## Implementation
diff --git a/doc/design/topology_in_cpp.md b/doc/design/topology_in_cpp.md
deleted file mode 100644
index cf389ff11bf6c..0000000000000
--- a/doc/design/topology_in_cpp.md
+++ /dev/null
@@ -1,264 +0,0 @@
-# Paddle神经网络拓扑表示方式重构
-
-## 背景
-
-目前Paddle中，解析用户配置的过程非常繁复。这也是因为Paddle作为一个四年左右项目的遗留问题。为了**兼容**之前所有的Paddle配置文件格式，也为了简化用户配置流程，现阶段Paddle共有三种配置风格。最原始的配置文件格式(`config_parser.py`)，`trainer_config_helper`和`paddle.v2.layer`。三者的调用关系为 `paddle.v2` 调用 `trainer_config_helper`再调用`config_parser.py`。虽然我们没有重复的写这些代码，但是多层的封装让代码很难维护。
-
-主要痛点在于:
-
-* 用户使用Layer，想去查询某一个参数应该如何使用。深入调研Paddle的代码会非常迷惑。同时，这些代码中只有`trainer_config_helper`是具有良好注释和良好风格的。`paddle.v2`虽然也有注释与文档，但其函数是动态生成的，而不是静态的代码，所以也不能**阅读**，而`config_parser.py`缺乏文档和注释。
-* 开发者如果想要新写一个Layer，需要修改多个文件。开发者如果需要修改一个Layer的实现，也有同样的问题。
-   * 首先，新写一个Layer，开发者需要在Paddle的[protobuf文件](https://github.com/PaddlePaddle/Paddle/blob/develop/proto/ModelConfig.proto)中，添加这个Layer需要的参数。
-   * 其次，完成这个Layer需要的C/C++文件。完成这个Layer的前向后向代码。
-   * 最后完成这个Layer配置文件解析`config_parser.py`，`trainer_config_helpers`和`paddle.v2`。
-* 如果有其他Language Binding，需要开发的工作量太高。
-* 使用protobuf作为多语言接口的中间协议，序列化速度慢，且对于嵌入式设备库大小过大。
-
-所以这个设计的目标就是**治理**目前Paddle定义Layer和配置混乱复杂的问题，得到一个清爽的结果, **用户只需要**写一个`C/C++`实现即可完成一个Layer的开发。
-
-同时，这个设计还会兼顾的问题有:
-
-* 向后兼容性 ---- 即是否兼容之前的配置方式.
-* 动态网络开发 ---- 神经网络配置解析为动态网络的基础部分。动态网络要求**配置解析必须快**。详细关于动态网络介绍，请参考[DynamicNet](./dynamic_net/00.how_to_implenment_dynamic_net.md)
-
-
-## 实现方案
-
-### 主要思路
-
-* 在Paddle C++ Core中，开发新的数据结构表示神经网络的拓扑结构。第三方语言可以**直接**操纵C API来配置神经网络拓扑结构。
-	* C++中新建一个namespace，叫做`paddle::topology`。
-	* 新建的数据结构可以被序列化成`json`而不是`protobuf`，方便嵌入式部署。
-* 开发一个新的存储元信息(`meta`)的数据结构，表示一个神经网络拓扑结构可以有哪些参数。
-	* 这个元信息
-	* 存储元信息的数据结构不要求可以被序列化和反序列化，因为元信息不影响模型的部署。但存储元信息的数据结构可以被C API暴露给第三方语言(例如Python)。
-	* 第三方语言读取拓扑结构元信息，进而在第三方语言中生成配置拓扑结构的函数。
-* 目前Paddle中的`protobuf`格式，可以转换为新的拓扑结构。进而维持原有代码的向后兼容性。让Paddle可以渐进式优化。
-
-### 具体实现
-
-#### 元信息
-
-所有的元信息类放置到`paddle::topology::meta`名字空间下。基本的元信息包括如下几个方面:
-
-##### AttributeMeta
-
-`AttributeMeta`表示拓扑结构中所有属性的元信息。属性是指拓扑结构中配置的参数。例如层的大小，激活函数的形式等等。这些属性的元信息都由`AttributeMeta`表示。
-
-```cpp
-class AttributeMeta {
-public:
-  std::string name;  // attribute name, e.g., 'size' in layer.
-  std::type_info type;  // attribute type, e.g., 'uint64_t' about 'size'
-  std::string description; // the description of this attribute, e.g., 'the size of layer'.
-  std::any checkCallback; // The function check whether this attribute is valid or not.
-};
-```
-其中 checkCallback是一个回调函数，他的类型是(T* attr, bool setted) => paddle::Error，因为输入的attr可以是任意类型的泛型，故这里用std::any表示类型。其中setted是表示这个参数是不是被用户设置过。
-
-举例对于layer的`dropout_rate`这个属性的AttributeMeta可以设置为:
-
-```cpp
-auto dropoutMeta = new AttributeMeta();
-dropoutMeta->name = "dropout";
-dropoutMeta->type = typeid(float);
-dropoutMeta->description = "Set drop out rate of layer. "
-                            "1 means all activations are dropped, 0 means do not drop any activation";
-dropoutMeta->checkCallback = [](float* attr, bool setted) -> paddle::Error {
-   if (!setted) {
-     *attr = 0.0f;  // default value;
-     return paddle::Error::OK;
-   }
-   // Check whether input is valid.
-   if (*attr < 0.0f || *attr > 1.0f) {
-   	 return paddle::Error("dropout rate should be in [0.0, 1.0]");
-   } else {
-     return paddle::Error::OK;
-   }
-};
-```
-
-##### TensorMeta
-
-`TensorMeta`表示一个神经网络拓扑结构中，每一层的输入信息和参数信息的元信息(对于于目前Paddle C++ Core中的Parameter和Argument的元信息)。
-
-`TensorMeta`由许多`AttributeMeta`构成。
-
-```cpp
-class TensorMeta {
-public:
-  std::vector<AttributeMeta> attributes;
-};
-```
-
-举例说明，对于全连接层的输入，可能的`TensorMeta`值为:
-
-```cpp
-enum DataType {
-  DENSE=0,
-  SPARSE_BINARY,
-  SPARSE,
-  INTEGER
-};
-
-enum SequenceType {
-  NO_SEQUENCE=0,
-  SEQUENCE,
-  NESTED_SEQUENCE
-};
-...
-
-auto inputMeta = new TensorMeta();
-auto dataTypeMeta = new AttrbuteMeta("type", typeid(std::pair<DataType, SequenceType>), "Data type of this tensor");
-dataTypeMeta->checkCallback = [](std::pair<DataType, SequenceType>* type, bool setted) -> paddle::Error {
-  if (!setted) {
-    return paddle::Error("Type of tensor should be setted");
-  }
-  if (*type != {DENSE, NO_SEQUENCE}) {
-    return paddle::Error("FC Layer only support dense, no_sequence data type as input.");
-  }
-  return paddle::Error::OK;
-}
-inputMeta->attributes.push_back(dataTypeMeta);
-
-auto shapeMeta = new AttributeMeta("shape", typeid(std::vector<uint32_t>), "The shape of this tensor");
-shapeMeta->checkCallback = [](std::vector<uint32_t>* shape, bool setted) {
-  if (!setted) {
-    return paddle::Error("Shape of tensor should be setted");
-  }
-  if (shape->size() != 2) {
-  	return paddle::Error("FC Layer only support 2 dim tensor(a.k.a matrix) as input.");
-  }
-  if (shape->at(1) > 0) {
-  	return paddle::Error("The width of fc layer input should larger than 0.");
-  }
-  return paddle::Error::OK;
-}
-inputMeta->attributes.push_back(inputMeta);
-```
-
-对于全连接层的输入参数，可能的`TensorMeta`值为:
-
-```cpp
-auto inputParamMeta = new TensorMeta();
-inputMeta->attributes.push_back(new AttributeMeta("shape", ...));
-inputMeta->attributes.push_back(new AttributeMeta("type", ..));
-inputMeta->attributes.push_back(new AttributeMeta("weight_decay", ...)); // support weight decay;
-inputMeta->attributes.push_back(new AttributeMeta("initial_mean", ...)); // init strategy;
-inputMeta->attributes.push_back(new AttributeMeta("initial_std", ...)); // init std.
-...
-```
-
-##### LayerInputMeta
-
-神经网络层输入的元信息。一个层的输入既包括了某一个层输入的值，也包括了和这个输入配合的参数值。某些层的输入可以没有参数。某些层的输入可以是无穷多个，但是每一个输入都使用同一个`LayerInputMeta`.
-
-```cpp
-
-class LayerInputMeta {
-public:
-  std::string name;
-  std::string description;
-  bool canBeMany; // some layer can have unlimited number of input, but share same meta.
-  TensorMeta inputTensorMeta;
-  std::unique_ptr<TensorMeta> paramTensorMeta;  // could be null;
-}
-```
-
-举例FC Layer的`LayerInputMeta`为:
-
-```cpp
-auto fcInputMeta = new LayerInputMeta();
-fcInputMeta->name = "input";
-fcInputMeta->description = "The input of fully connected layer";
-fcInputMeta->canBeMany = true;
-fcInputMeta->inputTensorMeta = fcInputTensorMeta;
-fcInputMeta->inputParameterMeta = fcInputParamMeta;
-```
-
-##### LayerMeta
-
-LayerMeta表示一个神经网络层可以的元信息。它包括这个层的类型，描述，这个层输入的元信息，bias参数的元信息和这个层的一些其他属性(例如dropout_rate)。
-
-```cpp
-struct LayerMeta {
-	std::string type;
-	std::string description;
-	std::vector<LayerInputMeta> inputs;
-	std::unique_ptr<TensorMeta> bias;
-	std::vector<AttributeDef> attrs;
-};
-```
-
-实际举例略，和`LayerInputMeta`等类似。
-
-##### TopologyMeta
-
-TopologyMeta表示一个拓扑结构可以设置的属性。他包括Paddle支持的层的信息，也包括拓扑结构中可以配置的其他属性。例如输入层的名字等等。
-
-```cpp
-struct TopologyMeta {
-  std::vector<LayerMeta> layers;
-  std::vector<AttributeMeta> attrs;
-};
-
-```
-
-#### 实际信息
-
-实际信息指每一个神经网络拓扑结构的实际描述，也表示着神经网络拓扑结构中的实际参数。它是可以被序列化成`json`的数据结构。也是每次训练或者预测时，Paddle载入的实际模型配置。这些信息被放置在`paddle::topology`名字空间下。
-
-##### Attribute
-
-神经网络的属性可以是任意类型，使用`std::pair<std::string, std::any>`表示。不再创建新的类型。
-
-
-##### Tensor
-
-Tensorl表示拓扑结构中某一个Tensor实际配置属性。
-
-```cpp
-class Tensor {
-public:
-  std::unordered_map<std::string, std::any> attributes;
-};
-```
-
-##### LayerInput
-
-`LayerInput`表示神经网络某一层的实际输入配置。
-
-```cpp
-class LayerInput {
-public:
-  std::string name;
-  Tensor inputTensor;
-  std::unique_ptr<Tensor> paramTensor;  // could be null;
-}
-```
-
-##### Layer
-
-`Layer`表示神经网络某一个层的实际配置。
-
-```cpp
-class Layer {
-public:
-	std::vector<LayerInput> inputs;
-	std::unique_ptr<Tensor> bias;
-	std::unordered_map<std::string, std::any> attrs;
-};
-
-```
-
-##### Topology
-
-`Topology`表示一个神经网络Topology的全部配置。即为Paddle中的`Topology`类。
-
-```cpp
-class Topology {
-public:
-  std::vector<Layer> layers;
-  std::unordered_map<std::string, std::any> attrs;
-}
-```

From a109c544bd06e3045c1995e9f0a116f7506ddee8 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 10 May 2017 14:36:12 +0800
Subject: [PATCH 21/27] Add title

---
 doc/design/topology.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/design/topology.md b/doc/design/topology.md
index 08be81acbedd1..60352d5900cb8 100644
--- a/doc/design/topology.md
+++ b/doc/design/topology.md
@@ -1,3 +1,5 @@
+# Topology Overview
+
 Topology is a concept in Paddle for representing Neural Network.  A neural network contains one topology, which describes how layers connected to each other, and many parameters. The other deep learning frameworks may call this concept a computation graph, neural network configurations.
 
 The topology is not only an API level concept but also how Paddle organizes the computation codes for each `Layer` or `Function`. The Paddle hold a dictionary from `Layer Type` to Layer implementation, e.g.  from string `mul` to function `void tensor_multiply(Tensor& ins, Tensor& outs)'. So the mechanism about how to manipulate topology by Users, how Paddle maps user topology to implementations of `Layer` and `Function` is a fundamental problem for refactoring Paddle.

From e99e19c7120919129a9dfe58ab0bb119dc4baa30 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 10 May 2017 14:37:32 +0800
Subject: [PATCH 22/27] Fix typo

---
 doc/design/topology.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/design/topology.md b/doc/design/topology.md
index 60352d5900cb8..8b5b350811e3c 100644
--- a/doc/design/topology.md
+++ b/doc/design/topology.md
@@ -1,6 +1,6 @@
 # Topology Overview
 
-Topology is a concept in Paddle for representing Neural Network.  A neural network contains one topology, which describes how layers connected to each other, and many parameters. The other deep learning frameworks may call this concept a computation graph, neural network configurations.
+Topology is a concept in Paddle for representing neural networks.  A neural network contains one topology, which describes how layers connected to each other, and many parameters. The other deep learning frameworks may call this concept a computation graph, neural network configurations.
 
 The topology is not only an API level concept but also how Paddle organizes the computation codes for each `Layer` or `Function`. The Paddle hold a dictionary from `Layer Type` to Layer implementation, e.g.  from string `mul` to function `void tensor_multiply(Tensor& ins, Tensor& outs)'. So the mechanism about how to manipulate topology by Users, how Paddle maps user topology to implementations of `Layer` and `Function` is a fundamental problem for refactoring Paddle.
 

From 6b8893e974c4a57f4b4b6f1f218b54cee7add030 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 10 May 2017 14:50:31 +0800
Subject: [PATCH 23/27] Refine English

---
 doc/design/topology.md | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/doc/design/topology.md b/doc/design/topology.md
index 8b5b350811e3c..aae7a38fd2104 100644
--- a/doc/design/topology.md
+++ b/doc/design/topology.md
@@ -1,14 +1,13 @@
 # Topology Overview
-
 Topology is a concept in Paddle for representing neural networks.  A neural network contains one topology, which describes how layers connected to each other, and many parameters. The other deep learning frameworks may call this concept a computation graph, neural network configurations.
 
-The topology is not only an API level concept but also how Paddle organizes the computation codes for each `Layer` or `Function`. The Paddle hold a dictionary from `Layer Type` to Layer implementation, e.g.  from string `mul` to function `void tensor_multiply(Tensor& ins, Tensor& outs)'. So the mechanism about how to manipulate topology by Users, how Paddle maps user topology to implementations of `Layer` and `Function` is a fundamental problem for refactoring Paddle.
+The topology is not only an API level concept but also how we organize the computation codes for each `Layer` or `Function` in Paddle. The Paddle should maintain a dictionary from `Layer Type` to Layer implementation, e.g.  from string `mul` to function `void tensor_multiply(Tensor& ins, Tensor& outs)'. The mechanism about how to manipulate topology by users, how Paddle maps user topology to implementations of `Layer` and `Function` is a fundamental problem for refactoring Paddle.
 
 ## User Stories and examples
 
 ### Kernel Developers
 
-Alan is a professional developer in CPU and GPU. He can write the kernel functions of a new `Layer` with the best performance. However, he is not a familiar with Paddle third-party language, e.g. Python. However, Paddle uses Python as its API language. Alan just needs to write the kernel function and register them to Paddle, and then Paddle generates the user-side APIs for this kernel functions without any codes written by Alan.
+Alan is a professional developer in CPU and GPU. He can write kernel functions of a new `Layer` with the best performance. However, he is not a familiar with Paddle API language, Python. Alan just needs to write the kernel function and register them in Paddle, and then Paddle should generate the user-side APIs for these kernel functions without any codes written by Alan.
 
 ```cpp
 template <DeviceType devType>
@@ -37,7 +36,7 @@ END_REGISTER_FUNCTION()
 
 ### QA developer
 
-Bob is a QA developer of Paddle.  He wants to tests all Paddle supported `Function` and `Layer`.  However, each layer has different configuration parameters, e.g. `scale` in `cosine` function. Each configuration parameter has different value range, data type. By using Topology registers, Bob can easily test all boundary conditions of one Layer or Functions.
+Bob is a QA developer of Paddle.  He wants to tests all Paddle supported `Function` and `Layer`.  However, each layer has different configuration attributes, e.g. `scale` in `cosine` function. Each configuration attribute has different value range, data type. Bob should easily test all boundary conditions of one Layer or Functions by using new mechanism about topology.
 
 ```
 auto cos = function::Register("cos");
@@ -54,11 +53,11 @@ test(cos);
 
 ### Language Binding developer
 
-Carol is a language binding developer of Paddle. She wants to develop a language binding of Paddle, and she is not familiar with Paddle C++ core and does not want to go so deep in Paddle. She just wants a clear list of what Layer Paddle supports, the configuration parameters of each Layer. 
+Carol is a language binding developer of Paddle. She wants to develop a language binding of Paddle. She is not familiar with Paddle C++ core and does not want to go so deep in Paddle. She just wants a clear list of what Layer Paddle supports, the configuration parameters of each Layer.
 
-Also as a language binding developer, Carol does not want to write any configuration validation code in language binding because Paddle C++ Core could be in flux and layer's configuration could be changed.
+Also as a language binding developer, Carol does not want to write any topology validation code in language binding because Paddle C++ Core could be in flux and layer's API could be changed.
 
-She just can access the register information of `Topology` and uses this information in another language. She can either uses reflection or code generation in that language to generate configuration APIs.
+She just can access the register information of `Topology` and uses this information in another language. She can either uses reflection or code generation in that language to generate end-user APIs.
 
 ```python
 import paddle
@@ -73,7 +72,7 @@ for layer_name in paddle.topology.meta.all_registed_layers:
 
 ### API End-Users
 
-David is a new user of Paddle, who are not familiar with Paddle and deep learning. He writes a Python program and configures a neural network. When he run this program, he expects a clear error message when his configuration is wrong, such as `cosine layer's scale parameter should be larger than 0.0.`, not just a `check error` in our computation kernel. Because we register all parameter's meta information, it is easy to achieve this goal.
+David is a new user of Paddle, who are not familiar with Paddle and deep learning. He writes a Python program and configures a neural network. When he run this program, he expects a clear error message when his configuration is wrong. The error message should be like `cosine layer's scale parameter should be larger than 0.0.`, not just a `check error` in our computation kernel. Because we register all parameter's meta information, it is easy to achieve this goal.
 
 
 ## Goals
@@ -83,6 +82,6 @@ After thinking lots of user stories, we make the conclusion of what we want in T
 * User should directly operate C++ topology configuration because we should maintain the consistency between each language bindings, and make language binding layer thin and easily to develop.
 * Our topology configuration should be able to validate user's input and give a reasonable error message. Also, we should maintain some meta information of each configuration attribute, e.g. `scale` attribute in `cos` layer is a `double` value, should be larger than 0.0, and the default value is 1.0.
 * We should serialize our topology into a portable format, so users can use the model they trained before for inference.
-* We should let our kernel developer easily to register their kernel functions to Paddle. The only information they should provide is what are the configuration attribute of this function, what could be inputted to this function, what could be outputs of this function.
+* We should let our kernel developer easily to register their kernel functions to Paddle and not make them write configuration APIs in Python.
 
 ## Implementation

From 726ba0528db8ced9d193f2e15f0b99700a88b225 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 10 May 2017 18:30:58 +0800
Subject: [PATCH 24/27] Add implementation steps.

---
 doc/design/topology.md | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/doc/design/topology.md b/doc/design/topology.md
index aae7a38fd2104..6b1502c3cff8d 100644
--- a/doc/design/topology.md
+++ b/doc/design/topology.md
@@ -85,3 +85,35 @@ After thinking lots of user stories, we make the conclusion of what we want in T
 * We should let our kernel developer easily to register their kernel functions to Paddle and not make them write configuration APIs in Python.
 
 ## Implementation
+
+### Meta Information
+To achieve goals above, we need a place to store meta information of each layer. The meta information is used to describe what a layer could be configured, what the attributes of one layer could set, what the input types could be.
+
+For example, the cosine layer should have two inputs, and the two inputs should be the same shape. The two inputs should both be the dense matrix. The cosine layer should have only one output, and the output shape should be [batch_size, 1] because, for each pair of input sample, the cosine similarity should be a scalar. The cosine layer has one configurable argument, `scale`. It is the scalar number multiplied to the cosine similarity.  `scale` should be a `double` value,  the default value is 1.0,  and should be larger than 0.0.
+
+All these meta information should be written in namespace `paddle::topology::meta`. There are several basic classes in this namespace.
+
+* Constraints:  It is a function list which stores the constraints of one attribute. It used to validate user input must be correct.
+* AttributeMeta:  It represent a meta information of an attribute, e.g. `scale`. It contains the attribute name,  description, type information and `Constraints`.
+* TensorMeta: Tensor is the input/output of the Layer or Function. It contains a vector of `AttributeMeta`. The data type, sequence type is just an attribute of the tensor.
+* FunctionMeta: It represent a meta information of a paddle::Function. It contains two vectors of TensorMeta, and they are inputs and outputs. The FunctionMeta also contains a vector of AttributeMeta, that kernel developers can add the attributes used by their kernel.
+* LayerMeta: A similar concept like FunctionMeta, but used to represent `Layer'.
+* TopologyMeta: A topology meta contains a vector of `AttributeMeta`, which represent the attributes can be set globally in a topology.
+
+### Topology information
+
+The topology information is the actual information of a neural network. It is one to one correspondence to meta information. We use `std::any`(a.k.a `boost::any`) to represent the attribute value of each attribute because attribute could be any type(double/int/vector<int>, etc).
+
+So the `topology::Tensor` contains an attribute map, e.g. `map<string, any>`.  The `Function` contains an attribute map, input tensors, and output tensors. The rest types of topology information are correspondent to its meta information.
+
+## Step by step approach
+
+After building the `Topology` concept in C++, Paddle's Python code could be clean up. However, the development process would be broken down into step by step, carefully completed, to make Paddle code steady and not introduce bugs.
+
+The step by step approach are:
+
+1. Add `Constraints`, `AttributeMeta` , `TensorMeta`, `FunctionMeta` to refactor the `paddle::Function` package. Make `paddle::Function` just a plain function registered to `FunctionMeta`. Use a small scope experiment make sure we could uses `topology::meta` and `topology` represent a piece of neural network.
+
+2. Complete the `LayerMeta`, `TopologyMeta`, etc. But write a conversion method from `protobuf::LayerConfig`/`protobuf::ModelConfig` to `topology::Layer`/`topology::Topology`. Make `paddle_trainer` can use and test `topology` package. A side-effect of this job is to let `paddle_trainer` validation users' `trainer_config.conf` file, and give a reasonalbe error message when user gives a wrong configuration.
+
+3. Clean up the implementation of `paddle.v2` topology. Let `v2` package not invoke `trainer_config_helper`, just invoke `topology` package directly from C-API.

From d4ccdea0e2fe77ac0e6a7e3fb14098887b260513 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 10 May 2017 18:46:44 +0800
Subject: [PATCH 25/27] Typo

---
 doc/design/topology.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/design/topology.md b/doc/design/topology.md
index 6b1502c3cff8d..874ef6509b852 100644
--- a/doc/design/topology.md
+++ b/doc/design/topology.md
@@ -3,7 +3,7 @@ Topology is a concept in Paddle for representing neural networks.  A neural netw
 
 The topology is not only an API level concept but also how we organize the computation codes for each `Layer` or `Function` in Paddle. The Paddle should maintain a dictionary from `Layer Type` to Layer implementation, e.g.  from string `mul` to function `void tensor_multiply(Tensor& ins, Tensor& outs)'. The mechanism about how to manipulate topology by users, how Paddle maps user topology to implementations of `Layer` and `Function` is a fundamental problem for refactoring Paddle.
 
-## User Stories and examples
+## User Stories and Examples
 
 ### Kernel Developers
 
@@ -34,7 +34,7 @@ func.setShapeInferer([](std::vector<Dims>& ins, std::vector<Dims>& outs){
 END_REGISTER_FUNCTION()
 ```
 
-### QA developer
+### QA Developers
 
 Bob is a QA developer of Paddle.  He wants to tests all Paddle supported `Function` and `Layer`.  However, each layer has different configuration attributes, e.g. `scale` in `cosine` function. Each configuration attribute has different value range, data type. Bob should easily test all boundary conditions of one Layer or Functions by using new mechanism about topology.
 
@@ -51,7 +51,7 @@ for each_attribute in cos.attributes:
 test(cos);
 ```
 
-### Language Binding developer
+### Language Binding Developers
 
 Carol is a language binding developer of Paddle. She wants to develop a language binding of Paddle. She is not familiar with Paddle C++ core and does not want to go so deep in Paddle. She just wants a clear list of what Layer Paddle supports, the configuration parameters of each Layer.
 

From bb68fdaf654d0a35b5083850c9f1c4d88ba3c4ab Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 17 May 2017 22:14:45 +0800
Subject: [PATCH 26/27] Update developer code example

---
 doc/design/topology.md | 52 ++++++++++++++++++++++++++++--------------
 1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/doc/design/topology.md b/doc/design/topology.md
index 874ef6509b852..7e2b4130ebf41 100644
--- a/doc/design/topology.md
+++ b/doc/design/topology.md
@@ -10,28 +10,46 @@ The topology is not only an API level concept but also how we organize the compu
 Alan is a professional developer in CPU and GPU. He can write kernel functions of a new `Layer` with the best performance. However, he is not a familiar with Paddle API language, Python. Alan just needs to write the kernel function and register them in Paddle, and then Paddle should generate the user-side APIs for these kernel functions without any codes written by Alan.
 
 ```cpp
+struct CosSimAttribute : public topology::Attribute {
+  double scale;
+  REGISTER_FUNC_ATTRIBUTE() {
+    regAttr(&CosSimAttribute::scale, "scale", "the scale of cosine operator").defaultValue(1.0).largerThan(0.0);
+  }
+};
+
 template <DeviceType devType>
-void cos_kernel(std::vector<Tensor>& ins, std::vector<Tensor>& outs,  double scale) {
+void cos_kernel(std::vector<Tensor>& ins, std::vector<Tensor>& outs, const CosSimAttribute& attr) {
     // implemetation here.
 }
 
-BEGIN_REGISTER_FUNCTION(cos, cos_kernel)
-// The parameter of cos function. 
-func.addAttribute("scale", "The scale of cos layer").defaultValue(1.0).largerThan(0.0);
-
-// Two inputs
-func.addInput().dataType(Dense).dimension(2).supportSeqType();
-func.addInput().dataType(Dense).dimension(2).supportSeqType();
-
-// One outputs
-func.addOutput().dataType(Dense).dimension(2).supportSeqType();
-
-// Tell Paddle how to inference the output shape?
-func.setShapeInferer([](std::vector<Dims>& ins, std::vector<Dims>& outs){
-    outs[0] = {ins[0][0], 1};  // output dimension = batch_size * 1
+BEGIN_REGISTER_FUNCTION(cosFwd, cosineForward, CosSimAttribute)
+addTensor<INPUT>(/*dim*/ 2);
+addTensor<INPUT>(/*dim*/ 2);
+addTensor<OUTPUT>(/*shape = */ {topology::meta::kTensorShape_BATCH_SIZE, 1},
+                  /*arg_type*/ ASSIGN_TO);
+
+setDescription(R"DOC(Cosine similarity forward function.
+There are two inputs of this function. The first matrix is a [h*w] matrix the
+second input is a [h*w] matrix or a [1*w] matrix. the output matrix will be a
+[h*1] matrix.
+)DOC");
+
+setShapeInferer([](std::vector<topology::TensorPtr>& ins,
+                   std::vector<topology::TensorPtr>& outs) {
+  auto& shape0 = ins[0]->shape();
+  auto& shape1 = ins[1]->shape();
+
+  if (shape0 != shape1 && (shape0[1] != shape1[1] || shape1[0] != 1))
+    return Error(
+        "Input shape should be same, or the second height should be 1");
+  if (ins[0]->sequenceType() != ins[1]->sequenceType())
+    return Error("Input sequence type should be same");
+  outs[0]->setShape({ins[0]->shape()[0], 1});
+  outs[0]->setSequenceType(ins[0]->sequenceType());
+  outs[0]->setDataType(ins[0]->dataType());
+  return Error();
 });
-
-END_REGISTER_FUNCTION()
+END_REGISTER_FUNCTION(cosFwd);
 ```
 
 ### QA Developers

From ccf5d7d787a791d3b5159f0ac36463e598dba888 Mon Sep 17 00:00:00 2001
From: Yu Yang <yuyang18@baidu.com>
Date: Wed, 17 May 2017 23:19:48 +0800
Subject: [PATCH 27/27] Add implementation details

---
 doc/design/topology.md | 90 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 88 insertions(+), 2 deletions(-)

diff --git a/doc/design/topology.md b/doc/design/topology.md
index 7e2b4130ebf41..9be202ad23e6a 100644
--- a/doc/design/topology.md
+++ b/doc/design/topology.md
@@ -112,12 +112,98 @@ For example, the cosine layer should have two inputs, and the two inputs should
 All these meta information should be written in namespace `paddle::topology::meta`. There are several basic classes in this namespace.
 
 * Constraints:  It is a function list which stores the constraints of one attribute. It used to validate user input must be correct.
+    ```cpp
+    template <typename T>  // T is attribute type
+    class Constraints {
+    private:
+      // (T* attr, bool alreadySet) -> Error
+      // attr is an inout parameter for attribute.
+      // alreadySet means if this attribute is set by user or previous callbacks.
+      // return Error if the attribute is not valid.
+      std::vector<std::function<Error(T*, bool)>> callbacks_;
+    public:
+
+      // Each constraint function will add a check function to callbacks_;
+      Constraints<T>& mustSet();
+      Constraints<T>& defaultValue(const T& val);
+      Constraints<T>& largerThan(const T& val);
+
+      // More constraint function below.
+    };
+    ```
+
 * AttributeMeta:  It represent a meta information of an attribute, e.g. `scale`. It contains the attribute name,  description, type information and `Constraints`.
-* TensorMeta: Tensor is the input/output of the Layer or Function. It contains a vector of `AttributeMeta`. The data type, sequence type is just an attribute of the tensor.
-* FunctionMeta: It represent a meta information of a paddle::Function. It contains two vectors of TensorMeta, and they are inputs and outputs. The FunctionMeta also contains a vector of AttributeMeta, that kernel developers can add the attributes used by their kernel.
+    ```cpp
+    class AttributeMeta {
+    public:
+      std::string name;         // e.g. "scale"
+      std::string description;  // e.g. "the scale of cosine operator"
+      const std::type_info& type;  // e.g. type_id(double)
+
+      std::any constraints;  // the contraints of this attribute. When implementation, 
+                             // using `std::any` to get rid of template argument.
+                             // But it is actually Constraints<T> actually, while T's type
+                             // is store in the `type` field.
+    };
+    typedef std::shared_ptr<AttributeMeta> AttributeMetaPtr;
+    ```
+* AttributeMetaMap: The attribute meta map contains many AttributeMeta. Each Layer, FunctionMeta, TensorMeta is a AttributeMetaMap. User can addAttribute to an AttributeMetaMap.
+    ```cpp
+    class AttributeMetaMap: public std::unordered_map<std::string, AttributeMetaPtr> {
+    public:
+
+      /// Add attribute to map, returns constraints object, which user can add constraints.
+      /// @code
+      ///   attr_map.addAttribute("scale", "the scale of cosine operator").defaultValue(1.0).largerThan(0.0);
+      /// @endcode
+      template <typename T>
+      Constraints<T>& addAttribute(const std::string& name, const std::string& description);
+    };
+    ```
+
+* AttributeMap: The attribute map is the data structure which save attributes. The AttributeMap is not only used by user defined topology information, but also used by some meta information, which make meta information can store any type of Attributes, and be decoupled with upper invoker.
+    ```cpp
+    typedef std::unordered_map<std::string, std::any> AttributeMap;
+    ```
+
+* TensorMeta: Tensor is the input/output of the Layer or Function. It is an vector `AttributeMetaMap`. The data type, sequence type is just an attribute of the tensor.
+    ```cpp
+    enum DataType {DENSE, SPARSE, ...};
+    enum SequenceType {NO_SEQUENCE, SEQUENCE, ...};
+    class TensorMeta: public AttributeMetaMap {
+    public:
+      TensorMeta& setValidDataType(std::set<DataType> dataType);
+      TensorMeta& setValidSeqType(std::set<SeqType> seqType);
+      TensorMeta& setShapeDim(size_t dim);
+
+    private:
+      // nothing! TensorMeta just a AttributeMetaMap, but add some helper functions.
+    };
+    ```
+
+* FunctionMeta: It represent a meta information of a paddle::Function. It contains two vectors of TensorMeta, and they are inputs and outputs. The FunctionMeta is a AttributeMetaMap. that kernel developers can add the attributes used by their kernel.
+    ```cpp
+    class FunctionMeta : public AttributeMetaMap {
+    public:
+      std::vector<TensorMetaPtr> inputs;
+      std::vector<TensorMetaPtr> outputs;
+
+      /// `cpuKernel` `gpuKernel` `shapeInferer`, `estimateFlops` just store into metaAttrs. 
+      /// The invoker of function meta will decide which type and name should be.
+      /// Also each field could be many types.
+      /// For example,
+      ///   shapeInferer is a function from `std::vector<Tensor>& in` to `std::vector<Tensor>& out`.
+      ///   but different function could use different Attribute class.
+      ///   cosine layer's shapeInferer could be
+      ///   (std::vector<Tensor>& in, std::vector<Tensor>& out, const CosSimAttribute& attr) {...}
+      AttributeMap metaAttrs;
+    };
+    ```
+
 * LayerMeta: A similar concept like FunctionMeta, but used to represent `Layer'.
 * TopologyMeta: A topology meta contains a vector of `AttributeMeta`, which represent the attributes can be set globally in a topology.
 
+
 ### Topology information
 
 The topology information is the actual information of a neural network. It is one to one correspondence to meta information. We use `std::any`(a.k.a `boost::any`) to represent the attribute value of each attribute because attribute could be any type(double/int/vector<int>, etc).