Slow inference #4

rubysolo · 2017-09-13T22:49:10Z

First, thanks so much for working on this library! I'm using it for serving a couple of models with good success.

I ran into something that surprised me today; it's not necessarily an issue with this library, but wanted to get your thoughts. I'm comparing execution of an SSD detector via a python script and a go executable. Just comparing the elapsed wall clock time with unix time command, the go executable is quite a bit faster -- around 4 seconds vs around 13 seconds for the python script. However, I wanted finer-grained timing data, so I measured just the model.Exec call in go vs the model.predict call in python, and I found that the go version is roughly twice as slow (3 sec vs 1.7 sec). My guess is that for a single run, parsing all the python code accounts for all of the extra time.

Both of these are using the same model, although the python version is defining the model in code and loading weights from an hdf5 file, while the go version is loading model + weights from the SavedModel file (.pb format) -- not sure if that would make any difference.

Do you have any ideas about why the graph execution would be slower under go, or how I could speed it up?

Thanks!

The text was updated successfully, but these errors were encountered:

galeone · 2017-09-14T07:23:17Z

Hi! I'm happy that someone else is using tfgo, I really appreciate it (also, if you have any feedback or you wanna contribute you're welcome)!

Could you please show your Python and Go code? In this way, we can compare both and maybe together find out from where the slow down comes from.

The first thing that comes to my mind is that the slowdown could come from the session instantiation, but it's only a feeling. Post your codes and let's dig through them

rubysolo · 2017-09-14T19:23:05Z

Awesome, thanks for digging into this!

Here's the python version:

from models import SSD300
from models.utils import BBoxUtility
import models.weights

import cv2
import time
import tensorflow as tf
import numpy as np
from keras.applications.imagenet_utils import preprocess_input

weights, classes = models.weights.ssd300()
bbox_util = BBoxUtility(len(classes))

model = SSD300((300, 300, 3), num_classes=len(classes))
model.load_weights(weights, by_name=True)
model._make_predict_function()
graph = tf.get_default_graph()

im = cv2.imread("image.jpg")
if len(im.shape) == 2:
    im = cv2.cvtColor(im, cv2.COLOR_GRAY2BGR)

start = time.time()
I = cv2.resize(im, (300, 300))
after_resize = time.time()
with graph.as_default():
    after_default_graph = time.time()
    inputs = preprocess_input(np.array([I.astype('float')]))
    after_preprocess = time.time()
    preds = model.predict(inputs, batch_size=1, verbose=0)
    after_pred = time.time()
    results = bbox_util.detection_out(preds)
    after_decode = time.time()
    print("resize: %f sec" % (after_resize - start))
    print("set graph: %f sec" % (after_default_graph - after_resize))
    print("preprocess: %f sec" % (after_preprocess - after_default_graph))
    print("inference: %f sec" % (after_pred - after_preprocess))
    print("box decode: %f sec" % (after_decode - after_pred))

The go version is in two parts -- I build a Detector which loads the model, and then I have a method on that object to do inference:

// build detector
func NewDetector(classes []string) *Detector {
	model := tg.LoadModel("saved_model_ssd300", []string{"serve"}, nil)
	d := &Detector{model: model, classes: classes}
	return d
}

// process an image
func (detector *Detector) Detect(img image.Image, verbose bool) Predictions {
	tick := time.Now()

	bounds := img.Bounds().Size()
	resized := resizeImage(img)

	if verbose {
		log.Printf("resize: %s", time.Since(tick))
		tick = time.Now()
	}

	input, _ := tf.NewTensor(imageToArray(resized))

	if verbose {
		log.Printf("preprocess: %s", time.Since(tick))
		tick = time.Now()
	}

	results := detector.model.Exec([]tf.Output{
		detector.model.Op("predictions/concat", 0),
	}, map[tf.Output]*tf.Tensor{
		detector.model.Op("input_1", 0): input,
	})

	if verbose {
		log.Printf("inference: %s", time.Since(tick))
		tick = time.Now()
	}

	raw := results[0].Value().([][][]float32)

	preds := map[string]PredList{}
	for idx, cls := range detector.classes {
		if idx > 0 { // skip background
			preds[cls] = PredList{}
		}
	}

	for _, pred := range decodeBoxes(raw[0], scale(bounds)) {
		cls := detector.classes[pred.Label]
		preds[cls] = append(preds[cls], pred)
	}

	for idx, cls := range detector.classes {
		if idx > 0 { // skip background
			sort.Sort(preds[cls])
		}
	}

	if verbose {
		log.Printf("box decode: %s", time.Since(tick))
		tick = time.Now()
	}

	return preds
}

There are several timing differences that I understand (image preprocessing is much faster in Python since it is vectorized; my go box decoder "cheats" by using heuristics instead of true NMS to collapse box predictions; etc), but I am ignoring all of those. If I just look at inference time, the python version averages 1.7 sec over 10 consecutive runs, while the go version averages 2.5 sec. (running on my CPU) Since that part of the code is executed by tensorflow in both cases, I would expect the timing to be similar. Hopefully I'm just doing something wrong!

galeone · 2017-09-14T20:02:23Z

The inference time should be similar, you're right.

The only thing that I see is that you're measuring the inference time actually before executing the real inference.

You're invoking *Model.Op to extract tensors by name, here:

	results := detector.model.Exec([]tf.Output{
		detector.model.Op("predictions/concat", 0),
	}, map[tf.Output]*tf.Tensor{
		detector.model.Op("input_1", 0): input,
	})

Try moving out the 2 .Op invocations and then measure the performance, just like that:

	input, _ := tf.NewTensor(imageToArray(resized))
        modelOutput := detector.model.Op("predictions/concat", 0)
        modelInput := detector.model.Op("input_1", 0)

	if verbose {
		log.Printf("preprocess: %s", time.Since(tick))
		tick = time.Now()
	}

	results := detector.model.Exec([]tf.Output{
		modelOutput,
	}, map[tf.Output]*tf.Tensor{
		modelInput: input,
	})

	if verbose {
		log.Printf("inference: %s", time.Since(tick))
		tick = time.Now()
	}

Let me know if this will make the inference time similar

rubysolo · 2017-09-14T20:22:54Z

I changed it to:

	modelOutput := detector.model.Op("predictions/concat", 0)
	modelInput := detector.model.Op("input_1", 0)

	if verbose {
		log.Printf("load ops: %s", time.Since(tick))
		tick = time.Now()
	}

	results := detector.model.Exec([]tf.Output{
		modelOutput,
	}, map[tf.Output]*tf.Tensor{
		modelInput: input,
	})

	if verbose {
		log.Printf("inference: %s", time.Since(tick))
		tick = time.Now()
	}

The inference time looks similar, and over 10 runs, my "load ops" times were all between 20 and 35 µs.

galeone · 2017-09-14T20:25:58Z

Nice! We do have already found out the problem? :D

rubysolo · 2017-09-14T20:32:37Z

Sorry, that was poor wording. The inference time did not change much from what it was before -- between 2.4 and 2.9 sec over 10 runs.

galeone · 2017-09-14T20:43:19Z

Ah! I guessed that the go inference time looked similar to the python inference time. My bad.

We have to dig more into the code. Can you please share the code of the predict method used in preds = model.predict(inputs, batch_size=1, verbose=0) ?

We have to spot the differences between that code and the code of model.Exec

rubysolo · 2017-09-14T20:49:05Z

Sure, that's just the standard keras predict: https://github.com/fchollet/keras/blob/2.0.5/keras/engine/training.py#L1553

galeone · 2017-09-15T07:07:18Z

Uhm, I'm trying to figure out what the problem is but I can't see it.
The python code is just a standard python code, nothing special, the Go code is just this model.saved.Session.Run(feedDict, tensors, nil) that only uses operations from the github.com/tensorflow/tensorflow/tensorflow/go package.

Maybe we should open an issue on https://github.com/tensorflow/tensorflow/issues referencing this one? I guess the inference time difference is due to something they do and not related to tfgo.

rubysolo · 2017-09-15T16:48:45Z

Ok, no worries -- thank you for being another set of eyes! I'll poke around a bit more and open an issue there if I can't find anything.

galeone · 2017-09-15T16:51:47Z

Keep me up to date. Thank you!

rubysolo closed this as completed Sep 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow inference #4

Slow inference #4

rubysolo commented Sep 13, 2017

galeone commented Sep 14, 2017

rubysolo commented Sep 14, 2017

galeone commented Sep 14, 2017 •

edited

Loading

rubysolo commented Sep 14, 2017

galeone commented Sep 14, 2017

rubysolo commented Sep 14, 2017

galeone commented Sep 14, 2017

rubysolo commented Sep 14, 2017

galeone commented Sep 15, 2017

rubysolo commented Sep 15, 2017

galeone commented Sep 15, 2017

Slow inference #4

Slow inference #4

Comments

rubysolo commented Sep 13, 2017

galeone commented Sep 14, 2017

rubysolo commented Sep 14, 2017

galeone commented Sep 14, 2017 • edited Loading

rubysolo commented Sep 14, 2017

galeone commented Sep 14, 2017

rubysolo commented Sep 14, 2017

galeone commented Sep 14, 2017

rubysolo commented Sep 14, 2017

galeone commented Sep 15, 2017

rubysolo commented Sep 15, 2017

galeone commented Sep 15, 2017

galeone commented Sep 14, 2017 •

edited

Loading