Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caffe2 Survey for Python Topology #3502

Closed
wants to merge 1 commit into from

Conversation

reyoung
Copy link
Collaborator

@reyoung reyoung commented Aug 15, 2017

No description provided.

* 参数初始化网络
* 设备信息

Model并不直接提供`XXX_layer`的接口,而是提供了`create_parameter`, `get_param_info`等接口给层的开发者。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

提供接口给应用层的开发者

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

给层的开发者。其实应该是layer的开发者。而不是『应用层』。


```python
model = model_helper.ModelHelper(name="train_net")
fc1 = brew.fc(model, input, output, dim_in, dim_out, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

返回值是blobref么

# The ground truth parameters.
W_gt = init_net.GivenTensorFill(
[], "W_gt", shape=[1, 2], values=[2.0, 1.5])
B_gt = init_net.GivenTensorFill([], "B_gt", shape=[1], values=[0.5])
Copy link
Contributor

@helinwang helinwang Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好奇每个xxFill()的第一个参数[]是干啥的?试着搜了下,但是没找到caffe2对应的文档。

Copy link
Member

@jacquesqiao jacquesqiao Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it call _CreateAndAddToSelf at last, and the first argument is inputs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because these Operators are used to init parameters, so they do not have input variables, only output variables, which is the second argument.

Copy link
Contributor

@helinwang helinwang Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just curious are these lower level APIs? (seems not very user friendly to use)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these APIs are for the end user, we can try to do better than caffe2 :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helinwang 这些是Caffe2的low-level API,高层的API,可以参考
https://github.com/caffe2/caffe2/blob/master/caffe2/python/tutorials/MNIST.ipynb

https://github.com/caffe2/caffe2/blob/master/caffe2/python/examples/char_rnn.py

但,正如Caffe2的char_rnn样例为例,一个复杂(RNN在Caffe2可能就要算复杂了)模型,不可避免要使用low level API。并且Caffe2的model是一个特别薄的封装。

个人感觉(但不一定对),Caffe2其实追求的是一种极简。所以不想对API层有过多封装。所有的API用户都可以直接调用,也不限制用户去调用,甚至鼓励用户去调用。虽然开始看代码的时候会直接看到最底层的API。但是知道最底层API的逻辑后,用户自定义会非常简单。


## 启发

1. Model可以管理多个拓扑结构,但构造Layer的函数,没必要也放到Model里面。可以放到`layer`模块下(`brew`不是一个好名字)。我们约定其中一个参数是`model`。但是可以将默认值设为一个全局model。
Copy link
Contributor

@Superjomn Superjomn Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可否把 layer 和 op 都放到 Model 命名空间里?感觉基于 __getattr__ 实现的代价基本相同(可以另外有个module放一堆layer实现,但引入model的空间里),如果 FC 叫 fc_layer ,那感觉没有必要引入一个新的命名叫 layer.xxx

m = Model(....)
fc0 = m.fc(xxx)
sum = m.add(a, b)
seq = m.rnn(xxx)

对比

m = Model(...)
fc0 = layer.fc(m, xxx)
sum = layer.add(m, a,b)
seq = layer.rnn(m, xxx)

感觉使用起来会清晰一点,因为用户可能不知道哪些是layer,哪些是op,如果layer和op用起来的接口相似,感觉可以统一到 model.xxx 下面,用户只需要知道,要用一些op/layer,直接从 model. 空间里获取就行了。

就像我们之前把 fc 写成了 fc_op,但现在决定用 layer 去实现,这些对于user 应该是不感知的;layer和op的区别对于user,感觉是没必要刻意去感知的。

Copy link
Collaborator Author

@reyoung reyoung Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

就像我们之前把 fc 写成了 fc_op,但现在决定用 layer 去实现,这些对于user 应该是不感知的;layer和op的区别对于user,感觉是没必要刻意去感知的。

这个有道理。

不过Caffe2的这个实现的好处是极具扩展性。用户可以随便写一个python模块,比对brew里面的函数,照猫画虎就能实现一堆自己想封装的Layer。而不需要将『layer』的实现写到固定的某些文件或者某些路径里。

感觉

brew.fc(model, ...)

model.fc(...)

对于用户来说,肯定是下面的更可读(谁知道brew是啥意思呢?还要去查一下)。

所以,我倒是觉得我们需要尽量的把二者都做好。既要让用户能在任何Python文件中定义层,也能直接用model.xxx_layer调用这些层。

一个简单的做法是,给model增加一个register_module的函数,可以把一个Python Module注册成model的私有函数。

@jacquesqiao
Copy link
Member

我们可以借鉴一下 https://keras.io/ 的高层接口设计,keras流传广泛,也是因为高层api抽象比较好

from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential()

model.add(Dense(units=64, input_dim=100))
model.add(Activation('relu'))
model.add(Dense(units=10))
model.add(Activation('softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True))

# x_train and y_train are Numpy arrays --just like in the Scikit-Learn API.
model.fit(x_train, y_train, epochs=5, batch_size=32)
# or
model.train_on_batch(x_batch, y_batch)

@Superjomn
Copy link
Contributor

Superjomn commented Aug 16, 2017

keras 的一些特性确实很赞,的确可以在保持适当抽象的情况下参考一部分。

之前有用过keras,感觉类似 Sequential 这类比较简单的特性还是很好的;但高层的抽象跟 tf 的op的思想有点冲突,复杂模型还是得混着 tf 的原始接口搞,比如 tf.scope 等。

@jacquesqiao

@helinwang
Copy link
Contributor

helinwang commented Aug 16, 2017

很多地方值得借鉴,同时感觉caffe2 API有些地方有空间加强,比如:

caffe2的API初始化变量还需要新建另外一个net,这点上感觉比TF更难用(我认识在Google的朋友已经在说TF不好用了)。torch和pytorch都可以做高级模型的开发,但是貌似API比TF好用很多。

workspace.RunNet(train_net.Proto().name)这里run一个network需要train_net.Proto().name,把实现细节暴露给用户了,也跟之前的workspace.CreateNet(train_net)不一致。workspace.RunNet(train_net)可能会好很多。

以下这段也有点让人一上来看不懂(原因见注释):

# ITER is the iterator count.
ITER = init_net.ConstantFill([], "ITER", shape=[1], value=0, dtype=core.DataType.INT32)
train_net.Iter(ITER, ITER) # 不理解(ITER, ITER)是干啥的,看起来很绕。
LR = train_net.LearningRate(ITER, "LR", base_lr=-0.1,
                            policy="step", stepsize=20, gamma=0.9)

train_net.WeightedSum([W, ONE, gradient_map[W], LR], W) # 要是每个参数都需要加LR,是否太冗长了。
train_net.WeightedSum([B, ONE, gradient_map[B], LR], B)

@Superjomn
Copy link
Contributor

pytorch 用起来确实很赞,除了原生的python语法设计,貌似也有类似keras的一些私货,比如

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

算是借鉴了keras的优点。 但 pytorch 并没有那么高,类似 forward, backward 也都暴露出来了

for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Variable of input data to the Module and it produces
    # a Variable of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Variables containing the predicted and true
    # values of y, and the loss function returns a Variable containing the
    # loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.data[0])

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Variables with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

这些反而灵活且自然,pytorch的keras库没有很快出来,可能因为压根不需要吧。 keras 的感觉的确比TF用的爽,但抽象过高一站式服务不灵活,复杂点的模型必须搭配TF的底层接口用,两种风格的代码混在一起有点奇怪。

感觉我们如果想借鉴 keras,应该到 pytorch 这个层次就够了,底层还是得靠 op 撑灵活性,高层已经有了个 v2要去继续沿用/兼容 。。
@helinwang

@helinwang
Copy link
Contributor

helinwang commented Aug 16, 2017

@Superjom 嗯嗯,同意,感觉不一定要像keras那么high level,但我们还是尽量比TF好用吧。

@reyoung
Copy link
Collaborator Author

reyoung commented Aug 16, 2017

ITER = init_net.ConstantFill([], "ITER", shape=[1], value=0, dtype=core.DataType.INT32)
train_net.Iter(ITER, ITER) # 不理解(ITER, ITER)是干啥的,看起来很绕。
LR = train_net.LearningRate(ITER, "LR", base_lr=-0.1,
policy="step", stepsize=20, gamma=0.9)
train_net.WeightedSum([W, ONE, gradient_map[W], LR], W) # 要是每个参数都需要加LR,是否太冗长了。

这是最low level的API。所以会比较啰嗦。
ITER是一个变量,用来记录当前是第几个batch
LR是用来做Learning Rate Schedule的变量,他是从ITER和base_lr等信息中,计算出来的每一个mini-batch都不同的LR(通常会随着ITER变大而变小)。

@reyoung
Copy link
Collaborator Author

reyoung commented Aug 16, 2017

@Superjom @helinwang High Level的API可以参考markdown最后付的两个Caffe2的demo

data = net.GivenTensorFill(...)
hidden = data.FC(...)
hidden = hidden.FC(...)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

作用是可以使用Net.xxxOp返回的变量继续创建Op
hidden = data.FC(...),

觉得这块直觉上不是很好, A.B 直觉会理解成调用A的B方法/变量。但这里是把A当成B的输入吧? 即:data.FC(...),是将data作为FC Op的输入?觉得不符合直觉。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里只是展示了Caffe2的实现是什么样子。。

我自己也觉得这么搞不好。

train_net.WeightedSum([B, ONE, gradient_map[B], LR], B)

workspace.RunNetOnce(init_net)
workspace.CreateNet(train_net)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyoung 不确定 #3505 FillOp 的run_once是不是为了参数初始化而设计, https://github.com/PaddlePaddle/Paddle/pull/3505/files#diff-500814883c9884d672f742194113a98dR29

但觉这块分成 init_net, train_net, 只对init_net调用一次RunNetOnce,觉得更好些。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是的,FillOp其实是可以定义一些常量或者让用户自己初始化参数。
定义常量对于我们自己实现框架也很有用,譬如减法Op的梯度,就是一个IdentityOp和一个ScaleOp,那个ScaleOp的另一个输入是常量 -1。

@luotao1
Copy link
Contributor

luotao1 commented Feb 1, 2019

感谢您给PaddlePaddle贡献文档。由于文档已迁移至FluidDoc repo,因此关闭您的PR,欢迎您向FluidDoc Repo贡献文档。
Thanks for contributing to PaddlePaddle! Since documents have been moved to FluidDoc repo, we close this PR. Welcome to contribute to FluidDoc repo.

@luotao1 luotao1 closed this Feb 1, 2019
heavengate pushed a commit to heavengate/Paddle that referenced this pull request Aug 16, 2021
* add document for keypoint dataset preparation

* test=document_fix

* test=document_fix

* 关键点文档中json示例格式化显示,test=document_fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants