Model equivalent to nn4.small2.v1.t7 #108

mhaghighat · 2017-01-10T23:51:42Z

I wonder why the size of the model files (meta and ckpt) is so big compared to the Torch model (nn4.small2.v1.t7) provided in the OpenFace library?

model-20161116-234200.ckpt-80000 (600MB)
model-20161116-234200.meta (63MB)

compared to:

nn4.small2.v1.t7 (31.5MB)

And is there any model equivalent to the nn4.small2.v1.t7 that is small and can be used in the TensorFlow implementation?

Thanks.

davidsandberg · 2017-01-11T10:39:08Z

Hi @mhaghighat,
I haven't looked at the difference between the file sizes, but it's definitely an interesting thing to look at. I guess there are mainly two factors contributing to the difference:

The nn4.small2.v1 is a model with fewer parameters compared to the inception_resnet_v1 model (don't know how much smaller though). The nn4.small2.v1 model for tensorflow can be found in the model directory, but I haven't used it for a while. Don't expect the performance to be super-impressive though.
The tensorflow checkpoint format is maybe not as size-efficient as the .t7 format. It would be interesting to compare the two formats for the same model.
So my best advice is to try the nn4.small2.v1 model in the facenet repo to see how it performs and what file size you get for the checkpoint file.

mhaghighat · 2017-01-12T20:07:04Z

Thank you for your reply, David.

The size of the model being more than 20 times larger was making me curious if there is any redundant data stored. So, I checked the content of the model file, following:

sess = tf.Session()
new_saver = tf.train.import_meta_graph('model-20161116-234200.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
all_vars = tf.trainable_variables()
for v in all_vars: print(v.name)

Below is a screenshot of a part of the printed list. You can see there are several repetitions of the blocks and branches stored with extra _N suffixes. Do you think that these might be the result of the recreations of the variables added to the same graph? Please refer to the answer in this StackOverflow post for a similar issue.

Thanks again for your time and support.

davidsandberg · 2017-01-12T20:58:07Z

Ok, what you are seeing is just the structure of the model. A residual network (resnet) consists of a bunch of blocks which in tensorflow are created using slim.repeat. Check out models.inception_resnet_v1 to see how the model is created.
I still think you should compare the same model in Torch and Tensorflow to figure out why the size differs that much.

mhaghighat · 2017-01-12T21:19:28Z

Unfortunately, I don't have the FaceScrub and the CASIA-WebFace databases to train the nn4.small2.v1. I wonder if anyone has done it and can share the meta and ckpt files.
I will do it as soon as I can get the databases.

davidsandberg · 2017-01-12T21:22:53Z

To check the size of the checkpoint you don't need to run any training. Just initialize the model and store the parameters.

crockpotveggies · 2017-01-13T05:20:13Z

How many parameters is the resnet model? The nn4.small2.v1 is approx 3.7 million parameters.

mhaghighat · 2017-01-28T01:30:44Z

@davidsandberg,

Following your advice, I tried to train with the nn4_small2_v1 model. But the current facenet_train_classifier.py gives an error when using this model. The error is:

ValueError: Variable conv1_7x7/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?

Can you please advise?

davidsandberg · 2017-01-28T08:47:43Z

This sepecific problem has been fixed when the input pipeline was refactored so you need to update your repo. But there is still a problem

tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'phase_train' with dtype bool

when global variables are initialized. I'm not sure why this problem happens but it has to do with batch normalization. It can be fixed by changing

phase_train_placeholder = tf.placeholder(tf.bool, name='phase_train')

to

phase_train_placeholder = tf.placeholder_with_default(tf.convert_to_tensor(True, dtype=tf.bool), shape=(), name='phase_train')

And then it seems to work fine.

lodemo · 2017-03-02T13:25:39Z

I think i have an error related to that above, but cant resolve it with the solution.

I freezed the 20170216-091149 model with the freeze_graph.py script, and used it like in compare.py only with a different loading routine for the freezed graph (resnet is the name with which its loaded).

images_placeholder = tf.get_default_graph().get_tensor_by_name("resnet/input:0")
embeddings = tf.get_default_graph().get_tensor_by_name("resnet/embeddings:0")
phase_train_placeholder = tf.placeholder_with_default(tf.convert_to_tensor(True, dtype=tf.bool), shape=(), name='resnet/phase_train')


feed_dict = { images_placeholder: imgs, phase_train_placeholder:False }
emb = session.run(embeddings, feed_dict=feed_dict)

results in the error

InvalidArgumentError: You must feed a value for placeholder tensor 'resnet/phase_train' with dtype bool [[Node: resnet/phase_train = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

using freezed graph with

phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("resnet/phase_train:0")

results in

FailedPreconditionError: Attempting to use uninitialized value resnet/Bottleneck/BatchNorm/moving_variance [[Node: resnet/Bottleneck/BatchNorm/moving_variance/read = Identity[T=DT_FLOAT, _class=["loc:@resnet/Bottleneck/BatchNorm/moving_variance"], _device="/job:localhost/replica:0/task:0/cpu:0"](resnet/Bottleneck/BatchNorm/moving_variance)]]

then again using the unfreezed graph works normal

ugtony · 2017-03-02T14:42:12Z

@lodemo
You can check #161 to see how to use a frozen model. The batchNorm error was discussed and solved there.

lodemo · 2017-03-02T14:48:21Z

Thanks but i am already using the latest revision, which should include the bug fix, loading routine is also the same as discussed in #161. (Only difference is name='resnet', can try without this)

For reference older freezed models (20170117-215115) are running fine.

ugtony · 2017-03-02T15:11:34Z

@lodemo ,
Sorry for misinterpretting your question.
I think the error was because freeze_graph.py does not include the newly added bottleneck layer in the whitelist.

lodemo · 2017-03-02T17:02:27Z

@ugtony
Thank you for the suggestion, i did add the Bottleneck layer to the whitelist and it seems to resolved my error.
I dont know if is entirely correct but it seem to resolve my issue with the latest model.

Added Bottleneck to the condition like this

if node.name.startswith('InceptionResnetV1') or node.name.startswith('embeddings') or node.name.startswith('phase_train') or node.name.startswith('Bottleneck'):

ugtony · 2017-03-03T02:05:38Z

Good to know that it helped.
I think @davidsandberg should know the patch.

mhaghighat changed the title ~~Equivalent model to nn4.small2.v1.t7~~ Model equivalent to nn4.small2.v1.t7 Jan 10, 2017

davidsandberg closed this as completed Jan 19, 2017

This was referenced Mar 7, 2017

Error runnig a frozen graph of the 128-dimensional embeddings model #195

Closed

add bottleneck nodes to whitelist when freezing #196

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model equivalent to nn4.small2.v1.t7 #108

Model equivalent to nn4.small2.v1.t7 #108

mhaghighat commented Jan 10, 2017 •

edited

Loading

davidsandberg commented Jan 11, 2017

mhaghighat commented Jan 12, 2017

davidsandberg commented Jan 12, 2017

mhaghighat commented Jan 12, 2017

davidsandberg commented Jan 12, 2017

crockpotveggies commented Jan 13, 2017

mhaghighat commented Jan 28, 2017 •

edited

Loading

davidsandberg commented Jan 28, 2017

lodemo commented Mar 2, 2017

ugtony commented Mar 2, 2017

lodemo commented Mar 2, 2017 •

edited

Loading

ugtony commented Mar 2, 2017 •

edited

Loading

lodemo commented Mar 2, 2017

ugtony commented Mar 3, 2017

Model equivalent to nn4.small2.v1.t7 #108

Model equivalent to nn4.small2.v1.t7 #108

Comments

mhaghighat commented Jan 10, 2017 • edited Loading

davidsandberg commented Jan 11, 2017

mhaghighat commented Jan 12, 2017

davidsandberg commented Jan 12, 2017

mhaghighat commented Jan 12, 2017

davidsandberg commented Jan 12, 2017

crockpotveggies commented Jan 13, 2017

mhaghighat commented Jan 28, 2017 • edited Loading

davidsandberg commented Jan 28, 2017

lodemo commented Mar 2, 2017

ugtony commented Mar 2, 2017

lodemo commented Mar 2, 2017 • edited Loading

ugtony commented Mar 2, 2017 • edited Loading

lodemo commented Mar 2, 2017

ugtony commented Mar 3, 2017

mhaghighat commented Jan 10, 2017 •

edited

Loading

mhaghighat commented Jan 28, 2017 •

edited

Loading

lodemo commented Mar 2, 2017 •

edited

Loading

ugtony commented Mar 2, 2017 •

edited

Loading