Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INFRA] Reduce size of TFLEARN models for brain extraction #55

Merged
merged 11 commits into from
Dec 7, 2020

Conversation

sebastientourbier
Copy link
Member

@sebastientourbier sebastientourbier commented Dec 5, 2020

It investigates ways to address #52 to optimize model size to be able to publish pymialsrtk to PyPi,

Code contained in https://nbviewer.jupyter.org/github/Medical-Image-Analysis-Laboratory/mialsuperresolutiontoolkit/blob/52_tflearn_model_size/notebooks/optimize_tensorflow_checkpoint.ipynb.

However, it seems as we rely on tflearn way it is impossible or very challenging to optimize models as we can with tensorflow models for serving prediction. I do not think it is worst to dedicate time.

For more details see report in #55 (comment)

@sebastientourbier
Copy link
Member Author

sebastientourbier commented Dec 5, 2020

Based on a number of discussions, it seems it is indeed challenging to save properly TFlearn models in the tensorflow SavedModel which would allow to freeze and optimize the graph.

Discussions and resources

I made several attempts with no success until now (See results in https://nbviewer.jupyter.org/github/Medical-Image-Analysis-Laboratory/mialsuperresolutiontoolkit/blob/52_tflearn_model_size/notebooks/optimize_tensorflow_checkpoint.ipynb).

There would one last thing to try is to freeze the graph directly after training and follows post tflearn/tflearn#964 (comment) that I quoted below. They also say it is really important to do del tf.get_collection_ref(tf.GraphKeys.TRAIN_OPS)[:] before saving model.

It also seems tf.train.Saver() is more adapted for serving training than prediction. However, it seems TFLEARN was not developed having in mind to use the cleanest tensorflow builder or SavedModel approach and so it explains the challenges. (https://stackoverflow.com/questions/33759623/tensorflow-how-to-save-restore-a-model/47235448#47235448)

This is an additional argument in direction to use only tensorflow to create all the layers which would give more control in producing optimized models for prediction (i.e. application). This adds to the first limitation in using tflearn the development of which has stopped and depends on tensorflow 1, which causes issues in compatibility as we found only a combination of packages that works only in a Python 3.6 environment.

However, I still managed to reduce both sizes from ~90MB down to ~30MB, which allows us to create packages publiable on Pypi 🥳

By testing on the first scan, it seems that the brain masks are exactly the same. Let's see if it passes all tests!

@sebastientourbier
Copy link
Member Author

sebastientourbier commented Dec 5, 2020

From tflearn/tflearn#964 (comment):

Hi @fffupeng!
I use an additional script to do that.

Tensorflow graph freezer
Converts Tensorflow trained models in .pb

Code adapted from:
https://gist.github.com/morgangiraud/249505f540a5e53a48b0c1a869d370bf#file-medium-tffreeze-1-py
"""

import os, argparse
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
from tensorflow.python.framework import graph_util

def freeze_graph(model_folder,output_graph="frozen_model.pb"):
    # We retrieve our checkpoint fullpath
    try:
        checkpoint = tf.train.get_checkpoint_state(model_folder)
        input_checkpoint = checkpoint.model_checkpoint_path
        print("[INFO] input_checkpoint:", input_checkpoint)
    except:
        input_checkpoint = model_folder
        print("[INFO] Model folder", model_folder)

    # Before exporting our graph, we need to precise what is our output node
    # This is how TF decides what part of the Graph he has to keep and what part it can dump
    output_node_names = "FullyConnected/Softmax" # NOTE: Change here

    # We clear devices to allow TensorFlow to control on which device it will load operations
    clear_devices = True
    
    # We import the meta graph and retrieve a Saver
    saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)

    # We retrieve the protobuf graph definition
    graph = tf.get_default_graph()
    input_graph_def = graph.as_graph_def()

    # We start a session and restore the graph weights
    with tf.Session() as sess:
        saver.restore(sess, input_checkpoint)

        # We use a built-in TF helper to export variables to constants
        output_graph_def = graph_util.convert_variables_to_constants(
            sess,                        # The session is used to retrieve the weights
            input_graph_def,             # The graph_def is used to retrieve the nodes 
            output_node_names.split(",") # The output node names are used to select the usefull nodes
        ) 

        # Finally we serialize and dump the output graph to the filesystem
        with tf.gfile.GFile(output_graph, "wb") as f:
            f.write(output_graph_def.SerializeToString())
        print("%d ops in the final graph." % len(output_graph_def.node))

        print("[INFO] output_graph:",output_graph)
        print("[INFO] all done")


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Tensorflow graph freezer\nConverts trained models to .pb file",
                                     prefix_chars='-')
    parser.add_argument("--mfolder", type=str, help="model folder to export")
    parser.add_argument("--ograph", type=str, help="output graph name", default="frozen_model.pb")
    
    args = parser.parse_args()
    print(args,"\n")

    freeze_graph(args.mfolder,args.ograph)

However, before doing model.save(...) on TFLearn i have to do
del tf.get_collection_ref(tf.GraphKeys.TRAIN_OPS)[:]

Then I call this command
python tf_freeze.py --mfolder=<path_to_tflearn_model>

Note

* The <path_to_tflearn_model> must not have the ".data-00000-of-00001".

* The _output_node_names_ variable may change depending on your architecture. The thing is that you must reference the layer that has the softmax activation function.

@sebastientourbier sebastientourbier linked an issue Dec 5, 2020 that may be closed by this pull request
@sebastientourbier
Copy link
Member Author

However, I still managed to reduce both sizes from ~90MB down to ~30MB, which allows us to create packages publiable on Pypi 🥳

By testing on the first scan, it seems that the brain masks are exactly the same. Let's see if it passes all tests!

Seems even if checkpoints are correctly used on my local installation on MacOsX (where I regenerated the new checkpoints)
Tensorflow is not happy raising a DatatLoss error.

I do not think it is worst to dedicate more time here and I reverted changes to use the original checkpoints. By merging this PR, it will still add the notebook that makes a full report of the study for the sake of transparency, that could help future investigations if someone wish to do so.

Copy link
Member Author

@sebastientourbier sebastientourbier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By merging this PR it will adds the notebooks that investigates when to reduce size of TFLEARN models.
Add name for inputs of graph in _extractBrain().
As it does include any major changes and pass all tests I will merge it.

@sebastientourbier sebastientourbier merged commit 67d0fb7 into v2.0.1-dev Dec 7, 2020
@sebastientourbier sebastientourbier deleted the 52_tflearn_model_size branch December 8, 2020 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimizing tensorflow model size
1 participant