Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nn4.v2 Training Progress #55

Closed
7 tasks done
bamos opened this issue Nov 17, 2015 · 5 comments
Closed
7 tasks done

nn4.v2 Training Progress #55

bamos opened this issue Nov 17, 2015 · 5 comments
Milestone

Comments

@bamos
Copy link
Collaborator

bamos commented Nov 17, 2015

@bamos bamos added this to the v0.2.0 milestone Nov 17, 2015
@bamos bamos changed the title nn4.v2 Training Progress nn4.v2 Training Progress Nov 17, 2015
bamos added a commit that referenced this issue Dec 21, 2015
Reduces the number of parameters from 7472144 to 6959088.
@bamos
Copy link
Collaborator Author

bamos commented Dec 29, 2015

Hi all,

I started training a new model on Dec 20 and it's still improving now (Dec 28).
Here's an in-progress loss plot of the experiment.
Epoch 103 gives 73% accuracy on the LFW.

loss

It looks a lot different than the loss plot for nn4.v1:
image

I think the variance at the beginning of the in-progress experiment shows
@melgor's bug fix for semi-hard triplets (#48) is working.
The nn4.v1 experiment almost randomly samples triplets and the loss
goes down smoothly from the beginning, but the in-progress experiment
is saturated with semi-hard triplets at the beginning and the loss only
reflects semi-hard triplets, not overall progress on random triplets.

Also, I think the LRN next to the pooling in the early layers makes the
training more difficult by adding a restriction, but hopefully it will
result in better generalization.

Interested to hear anybody else's interpretations.

-Brandon.

bamos added a commit that referenced this issue Jan 6, 2016
bamos added a commit that referenced this issue Jan 7, 2016
@bamos
Copy link
Collaborator Author

bamos commented Jan 7, 2016

nn4.v2 released! Details in this mailing list post:
https://groups.google.com/forum/#!topic/cmu-openface/wsUp0p0F960

@bamos bamos closed this as completed Jan 7, 2016
@myme5261314
Copy link

@bamos @melgor.
I'm checking the code of openface and my facenet line by line, and just found what was stated in this issue.

The 5x5 kernels are removed in the last 2 layers.

And I'd like to know the reason to do so.

Memory reduce or something? Can't find any text mentioned the reason.

By the way, my facenet via tensorflow could achieve around 92% now. But I notice a weird thing that the lfw accuracy of openface increases rapidly while not for my facenet.

Say, with same epoch_size setting say 250, openface could achieve around 70% after 2 or 3 epochs. While my facenet can only achieve around 55% even on the identical dataset. Even though my facenet will finally converge around 91~92% on lfw, but the slow increase of accuracy seems really weird. Here's a figure from one of my experiments.

screenshot_2016-05-19_17-18-28

Got any ideas about potential reasons?

@bamos
Copy link
Collaborator Author

bamos commented May 24, 2016

Hi @myme5261314

The 5x5 kernels are removed in the last 2 layers.

I didn't include these because section 3.3 of the FaceNet paper says:

In addition to the reduced input size [the nn4 model]
does not use 5x5 convolutions in the higher layers as the
receptive field is already too small by then.

Unfortunately the phrasing is vague and doesn't say the layers they don't have the 5x5 convolutions in. I haven't tried training with the 5x5 convolutions added here.

-Brandon.

@melgor
Copy link
Contributor

melgor commented May 24, 2016

Hi,
I have just checked the features size at each level of Inception Layer. The input to last Inception-Layer is 3x3. And using 5x5 kernel is too small, as Brandon quoted:

as the receptive field is already too small by then

About slow converge:

Got any ideas about potential reasons?

What about the Optimizer. OpenFace use AdaDelta. Are you using same optimizer?
Do you use BatchNorm (as orignal FaceNet does not)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants