Skip to content
This repository has been archived by the owner on Oct 30, 2019. It is now read-only.

Add inception-resnet-v2 model #64

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

lim0606
Copy link

@lim0606 lim0606 commented Jun 9, 2016

For personal interests, I and my friend Sunghun Kang (shuni@kaist.ac.kr) trained inception-resnet-v2 (http://arxiv.org/abs/1602.07261) from scratch based on torch, esp. facebook's training scripts (https://github.com/facebook/fb.resnet.torch).

This PR might not be a proper one for this repository, but I’d like to share this for someone who may be interested in this model too.

See, https://github.com/lim0606/torch-inception-resnet-v2 for more details about the trained model.

@ghost ghost added the CLA Signed label Jun 9, 2016
@colesbury
Copy link
Contributor

Cool!

@colesbury
Copy link
Contributor

Regarding momentum, I think we match the Caffe style momentum by setting the dampening to 0.

i.e. by setting dampening to 0 we compute:
v := momentum*v + lr*g
instead of
v := momentum*v + (1-momentum )*lr*g

@lim0606
Copy link
Author

lim0606 commented Jun 11, 2016

@colesbury

Thank you for comments!

I will check the training result with 0.9 momentum.

@ghost ghost added the CLA Signed label Jul 12, 2016
@superzrx
Copy link

@lim0606 why changeCAddTable(true) to CAddTable() ?

@ghost ghost added the CLA Signed label Aug 14, 2016
@lim0606
Copy link
Author

lim0606 commented Aug 14, 2016

@superzrx

Hi,

Since identity layers pass memory addresses of their input tensors directly to next layer, CAddTable(true) seems to cause a problem, changing the values of inputs in residual layers.

In the case of cls task, the effect was minor; therefore, i didn't notice the problem for a long time... ;(

However, when I tried to apply the model to other types of tasks, having additional layers branching feature output to several paths, the model gave me nan within some iterations.

Sincerely,

Jaehyun Lim

@ghost ghost added the CLA Signed label Aug 14, 2016
@superzrx
Copy link

@lim0606
Hi Jaehyun,
As CAddTable(true) save result on its first child, it seems ok to make identity path second child.

@ghost ghost added the CLA Signed label Aug 14, 2016
@lim0606
Copy link
Author

lim0606 commented Aug 15, 2016

@superzrx

Hi,

Since the first child is the identity layer, which directly refer its input as output, writing values over the first child of CAddTable(true) becomes writing values over the input of the second child.

Furthermore, resnet as well as inception-resnet-v2 consist of the residual layers, having identity path; therefore, CAddTable(true)s at different layers access the same memory addresses and change their values in a single forward path.

Best regards,

Jaehyun

@ghost ghost added the CLA Signed label Aug 15, 2016
@superzrx
Copy link

@lim0606
Hi lim,
In my last reply, I mean switch identity path and inception path. So identity becomes second child and CAddTable(true) do not couses problem ( which saves some memory ).
Also I see you use ConcatTable and JoinTable instead of Concat. I tried to use Concat on some networks but it encountered training problem. Is there some problem with concat and shareGradInput that you use ConcatTable and JoinTable to walk around?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants