[INFRA] Reduce size of TFLEARN models for brain extraction #55

sebastientourbier · 2020-12-05T11:22:43Z

It investigates ways to address #52 to optimize model size to be able to publish pymialsrtk to PyPi,

Code contained in https://nbviewer.jupyter.org/github/Medical-Image-Analysis-Laboratory/mialsuperresolutiontoolkit/blob/52_tflearn_model_size/notebooks/optimize_tensorflow_checkpoint.ipynb.

However, it seems as we rely on tflearn way it is impossible or very challenging to optimize models as we can with tensorflow models for serving prediction. I do not think it is worst to dedicate time.

For more details see report in #55 (comment)

…brain extraction model

…ts file size

… in srr.py

sebastientourbier · 2020-12-05T11:45:48Z

Based on a number of discussions, it seems it is indeed challenging to save properly TFlearn models in the tensorflow SavedModel which would allow to freeze and optimize the graph.

Discussions and resources

I made several attempts with no success until now (See results in https://nbviewer.jupyter.org/github/Medical-Image-Analysis-Laboratory/mialsuperresolutiontoolkit/blob/52_tflearn_model_size/notebooks/optimize_tensorflow_checkpoint.ipynb).

There would one last thing to try is to freeze the graph directly after training and follows post tflearn/tflearn#964 (comment) that I quoted below. They also say it is really important to do del tf.get_collection_ref(tf.GraphKeys.TRAIN_OPS)[:] before saving model.

It also seems tf.train.Saver() is more adapted for serving training than prediction. However, it seems TFLEARN was not developed having in mind to use the cleanest tensorflow builder or SavedModel approach and so it explains the challenges. (https://stackoverflow.com/questions/33759623/tensorflow-how-to-save-restore-a-model/47235448#47235448)

This is an additional argument in direction to use only tensorflow to create all the layers which would give more control in producing optimized models for prediction (i.e. application). This adds to the first limitation in using tflearn the development of which has stopped and depends on tensorflow 1, which causes issues in compatibility as we found only a combination of packages that works only in a Python 3.6 environment.

However, I still managed to reduce both sizes from ~90MB down to ~30MB, which allows us to create packages publiable on Pypi 🥳

By testing on the first scan, it seems that the brain masks are exactly the same. Let's see if it passes all tests!

sebastientourbier · 2020-12-05T11:45:53Z

From tflearn/tflearn#964 (comment):

Hi @fffupeng!
I use an additional script to do that.

Tensorflow graph freezer
Converts Tensorflow trained models in .pb

Code adapted from:
https://gist.github.com/morgangiraud/249505f540a5e53a48b0c1a869d370bf#file-medium-tffreeze-1-py
"""

import os, argparse
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
from tensorflow.python.framework import graph_util

def freeze_graph(model_folder,output_graph="frozen_model.pb"):
    # We retrieve our checkpoint fullpath
    try:
        checkpoint = tf.train.get_checkpoint_state(model_folder)
        input_checkpoint = checkpoint.model_checkpoint_path
        print("[INFO] input_checkpoint:", input_checkpoint)
    except:
        input_checkpoint = model_folder
        print("[INFO] Model folder", model_folder)

    # Before exporting our graph, we need to precise what is our output node
    # This is how TF decides what part of the Graph he has to keep and what part it can dump
    output_node_names = "FullyConnected/Softmax" # NOTE: Change here

    # We clear devices to allow TensorFlow to control on which device it will load operations
    clear_devices = True
    
    # We import the meta graph and retrieve a Saver
    saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)

    # We retrieve the protobuf graph definition
    graph = tf.get_default_graph()
    input_graph_def = graph.as_graph_def()

    # We start a session and restore the graph weights
    with tf.Session() as sess:
        saver.restore(sess, input_checkpoint)

        # We use a built-in TF helper to export variables to constants
        output_graph_def = graph_util.convert_variables_to_constants(
            sess,                        # The session is used to retrieve the weights
            input_graph_def,             # The graph_def is used to retrieve the nodes 
            output_node_names.split(",") # The output node names are used to select the usefull nodes
        ) 

        # Finally we serialize and dump the output graph to the filesystem
        with tf.gfile.GFile(output_graph, "wb") as f:
            f.write(output_graph_def.SerializeToString())
        print("%d ops in the final graph." % len(output_graph_def.node))

        print("[INFO] output_graph:",output_graph)
        print("[INFO] all done")


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Tensorflow graph freezer\nConverts trained models to .pb file",
                                     prefix_chars='-')
    parser.add_argument("--mfolder", type=str, help="model folder to export")
    parser.add_argument("--ograph", type=str, help="output graph name", default="frozen_model.pb")
    
    args = parser.parse_args()
    print(args,"\n")

    freeze_graph(args.mfolder,args.ograph)

However, before doing model.save(...) on TFLearn i have to do
del tf.get_collection_ref(tf.GraphKeys.TRAIN_OPS)[:]

Then I call this command
python tf_freeze.py --mfolder=<path_to_tflearn_model>

Note

* The <path_to_tflearn_model> must not have the ".data-00000-of-00001".

* The _output_node_names_ variable may change depending on your architecture. The thing is that you must reference the layer that has the softmax activation function.

sebastientourbier · 2020-12-07T10:57:26Z

However, I still managed to reduce both sizes from ~90MB down to ~30MB, which allows us to create packages publiable on Pypi 🥳

By testing on the first scan, it seems that the brain masks are exactly the same. Let's see if it passes all tests!

Seems even if checkpoints are correctly used on my local installation on MacOsX (where I regenerated the new checkpoints)
Tensorflow is not happy raising a DatatLoss error.

I do not think it is worst to dedicate more time here and I reverted changes to use the original checkpoints. By merging this PR, it will still add the notebook that makes a full report of the study for the sake of transparency, that could help future investigations if someone wish to do so.

…i test

sebastientourbier

By merging this PR it will adds the notebooks that investigates when to reduce size of TFLEARN models.
Add name for inputs of graph in _extractBrain().
As it does include any major changes and pass all tests I will merge it.

sebastientourbier added 5 commits December 5, 2020 11:57

ENH: upload new checkpoint files (size reduced by a factor of 3) for …

41015f2

…brain extraction model

NOTEBOOK: add notebook that help investigating reduction of checkpoin…

5665eca

…ts file size

UPD: use new checkpoints files in srr.py

33cf889

MAINT: moved new model checkpoints for better isolation when packaging

c0845af

UPD: update setup.py to install only new checkpoints and updated path…

9552fce

… in srr.py

sebastientourbier added maintenance mid-effort labels Dec 5, 2020

sebastientourbier self-assigned this Dec 5, 2020

sebastientourbier added this to the v2.0.1 milestone Dec 5, 2020

sebastientourbier linked an issue Dec 5, 2020 that may be closed by this pull request

Optimizing tensorflow model size #52

Closed

sebastientourbier added 5 commits December 5, 2020 13:02

FIX: typo for location of checkpoint files in setup.py

d4c56ad

CI: update version in list of outputs

a4b7705

FIX: rename tensorflow input graph node

01d7db7

FIX: revert back to original checkpoint files

222243e

Merge branch 'v2.0.1-dev' into 52_tflearn_model_size

a9965fd

sebastientourbier mentioned this pull request Dec 7, 2020

Optimizing tensorflow model size #52

Closed

MAINT: removed new checkpoints which caused DataLoss error in circlec…

a74be44

…i test

sebastientourbier commented Dec 7, 2020

View reviewed changes

sebastientourbier merged commit 67d0fb7 into v2.0.1-dev Dec 7, 2020

sebastientourbier deleted the 52_tflearn_model_size branch December 8, 2020 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[INFRA] Reduce size of TFLEARN models for brain extraction #55

[INFRA] Reduce size of TFLEARN models for brain extraction #55

sebastientourbier commented Dec 5, 2020 •

edited

Loading

sebastientourbier commented Dec 5, 2020 •

edited

Loading

sebastientourbier commented Dec 5, 2020 •

edited

Loading

sebastientourbier commented Dec 7, 2020

sebastientourbier left a comment

[INFRA] Reduce size of TFLEARN models for brain extraction #55

[INFRA] Reduce size of TFLEARN models for brain extraction #55

Conversation

sebastientourbier commented Dec 5, 2020 • edited Loading

sebastientourbier commented Dec 5, 2020 • edited Loading

sebastientourbier commented Dec 5, 2020 • edited Loading

sebastientourbier commented Dec 7, 2020

sebastientourbier left a comment

Choose a reason for hiding this comment

sebastientourbier commented Dec 5, 2020 •

edited

Loading

sebastientourbier commented Dec 5, 2020 •

edited

Loading

sebastientourbier commented Dec 5, 2020 •

edited

Loading