Different Result after converting to onnx from TF! #1494

dedoogong · 2021-05-03T04:49:11Z

Describe the bug
Original TF-ASLFeat running result is as below;

Converted Onnx-ASLFeat running result is as below;

Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

Emergent!

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Tensorflow Version: 1.15.2 ~ 1.15.4
Python version: 3.6

To Reproduce
Describe steps/code to reproduce the behavior. Please upload/link the model you are trying to convert if possible.

clone the TF model repo
git clone https://github.com/lzx551402/aslfeat.git &&
cd ASLFeat/pretrained &&
wget https://research.altizure.com/data/aslfeat_models/aslfeatv2.tar; tar -xvf aslfeatv2.tar
add the following lines in the end of init function in the models/base_model.py to get the frozen graph

        output_layer = ['kpts', 'scores', 'descs']
        frozen = tf.graph_util.convert_variables_to_constants(self.sess, self.sess.graph_def, output_layer)
        graph_io.write_graph(frozen, './', 'aslfeatv2.pb', as_text=False)

the code should look like below;

def __init__(self, model_path, **config):
        self.model_path = model_path
        # Update config
        self.config = dict_update(getattr(self, 'default_config', {}), config)
        self._init_model()
        ext = os.path.splitext(model_path)[1]

        sess_config = tf.compat.v1.ConfigProto()
        sess_config.gpu_options.allow_growth = True

        if ext.find('.pb') == 0:
            graph = load_frozen_model(self.model_path, print_nodes=False)
            self.sess = tf.compat.v1.Session(graph=graph, config=sess_config)
            
        elif ext.find('.ckpt') == 0:
            self._construct_network()
            self.sess = tf.compat.v1.Session(config=sess_config)
            recoverer(self.sess, model_path)

        output_layer = ['kpts', 'scores', 'descs']
        frozen = tf.graph_util.convert_variables_to_constants(self.sess, self.sess.graph_def, output_layer)
        graph_io.write_graph(frozen, './', 'aslfeatv2.pb', as_text=False)

change a line of def _construct_network() function of feat_model.py to fix the input shape

ph_imgs = tf.placeholder(dtype=tf.float32, shape=(1, 240, 320, 1), name='input')

and insert a line of code into def load_imgs() in image_matching.py right after cv.imread(img_path).

img = cv2.resize(img, dsize=(320, 240))

get pb file and convert it to onnx
$python3 image_matching.py --config configs/matching_eval.yaml
$python3 -m tf2onnx.convert --graphdef aslfeatv2.pb --output aslfeatv2.onnx --inputs input:0 --outputs kpts:0,descs:0,scores:0 --opset 11

Run the below onnx inference code to get the visualized result.

import numpy as np
import cv2
import onnx
from onnx import helper
import onnxruntime as rt 
from matplotlib import pyplot as plt 

class MatcherWrapper(object):
    """OpenCV matcher wrapper."""

    def __init__(self):
        self.matcher = cv2.BFMatcher(cv2.NORM_L2)

    def get_matches(self, feat1, feat2, cv_kpts1, cv_kpts2, ratio=None, cross_check=True, err_thld=4, ransac=True, info=''):
        """Compute putative and inlier matches.
        Args:
            feat: (n_kpts, 128) Local features.
            cv_kpts: A list of keypoints represented as cv2.KeyPoint.
            ratio: The threshold to apply ratio test.
            cross_check: (True by default) Whether to apply cross check.
            err_thld: Epipolar error threshold.
            info: Info to print out.
        Returns:
            good_matches: Putative matches.
            mask: The mask to distinguish inliers/outliers on putative matches.
        """

        init_matches1 = self.matcher.knnMatch(feat1, feat2, k=2)
        init_matches2 = self.matcher.knnMatch(feat2, feat1, k=2)

        good_matches = []

        for i in range(len(init_matches1)):
            cond = True
            if cross_check:
                cond1 = cross_check and init_matches2[init_matches1[i][0].trainIdx][0].trainIdx == i
                cond *= cond1
            if ratio is not None and ratio < 1:
                cond2 = init_matches1[i][0].distance <= ratio * init_matches1[i][1].distance
                cond *= cond2
            if cond:
                good_matches.append(init_matches1[i][0])

        if type(cv_kpts1) is list and type(cv_kpts2) is list:
            good_kpts1 = np.array([cv_kpts1[m.queryIdx].pt for m in good_matches])
            good_kpts2 = np.array([cv_kpts2[m.trainIdx].pt for m in good_matches])
        elif type(cv_kpts1) is np.ndarray and type(cv_kpts2) is np.ndarray:
            good_kpts1 = np.array([cv_kpts1[m.queryIdx] for m in good_matches])
            good_kpts2 = np.array([cv_kpts2[m.trainIdx] for m in good_matches])
        else:
            raise Exception("Keypoint type error!")
            exit(-1)

        if ransac:
            _, mask = cv2.findFundamentalMat(
                good_kpts1, good_kpts2, cv2.RANSAC, err_thld, 0.999)
            n_inlier = np.count_nonzero(mask)
            print(info, 'n_putative', len(good_matches), 'n_inlier', n_inlier)
        else:
            mask = np.ones((len(good_matches), ))
            print(info, 'n_putative', len(good_matches))
        return good_matches, mask

    def draw_matches(self, img1, cv_kpts1, img2, cv_kpts2, good_matches, mask,
                     match_color=(0, 255, 0), pt_color=(0, 0, 255)):
        """Draw matches."""
        if type(cv_kpts1) is np.ndarray and type(cv_kpts2) is np.ndarray:
            cv_kpts1 = [cv2.KeyPoint(cv_kpts1[i][0], cv_kpts1[i][1], 1)
                        for i in range(cv_kpts1.shape[0])]
            cv_kpts2 = [cv2.KeyPoint(cv_kpts2[i][0], cv_kpts2[i][1], 1)
                        for i in range(cv_kpts2.shape[0])]
        display = cv2.drawMatches(img1, cv_kpts1, img2, cv_kpts2, good_matches,
                                  None,
                                  matchColor=match_color,
                                  singlePointColor=pt_color,
                                  matchesMask=mask.ravel().tolist(), flags=4)
        return display 

descs, kpts = [], []

rgb_list = []
gray_list = []
imgs=['imgs/test1.jpg', 'imgs/test2.jpg']
sess = rt.InferenceSession("test.onnx")
for img_name in imgs:        
    img=cv2.imread(img_name) 
    img = cv2.resize(img, dsize=(320,240)) 
    img_c = img[..., ::-1]
    rgb_list.append(img_c)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)[..., np.newaxis] 
    img=gray[..., ::-1]
    img=img[np.newaxis, ... ]
    print('img.shape : ', img.shape)    
    input_name = sess.get_inputs()[0].name
    outputs=sess.get_outputs()
    print('input_name  : ', input_name) 
    pred_onx = sess.run(None, {input_name: img.astype(np.float32)})#img.astype(np.float32)
    print( 'kpts : ', pred_onx[0].shape, 'descs : ', pred_onx[1].shape)
    kpts.append(pred_onx[0][0])
    descs.append(pred_onx[1][0]) 

matcher = MatcherWrapper()
match, mask = matcher.get_matches(
    descs[0], descs[1], kpts[0], kpts[1],
    ratio=0.8, cross_check=True,
    err_thld=3, ransac=True, info='ASLFeat') 
disp = matcher.draw_matches(rgb_list[0], kpts[0], rgb_list[1], kpts[1], match, mask)

output_name = 'disp.jpg'
print('image save to', output_name)
plt.imsave(output_name, disp)

The text was updated successfully, but these errors were encountered:

TomWildenhain-Microsoft · 2021-05-04T03:48:57Z

Hi @dedoogong, after following your instructions, I got an error but not the one you describe. For me the model failed to convert with unsupported ops DivNoNan. We should be able to easily add support for that op and get the model to convert. If you are getting a different error, can you upload your .pb file and tell me the exact tensorflow version you are using? tf2onnx uses tensorflow's optimizer, so maybe it is substituting in different ops in a different version.

dedoogong · 2021-05-04T22:36:54Z

Hi! thank you for trying to solve this issue!
ok, can you modify the code as below?
https://github.com/lzx551402/ASLFeat/blob/d6bc1b49f61b5bf32a5a4e6f4ca25da46713bbf2/models/cnn_wrapper/aslfeat.py#L212

det += 1e-10 
inv_hess_00 = tf.math.divide(djj, det)
inv_hess_01 = tf.math.divide(-dij, det)
inv_hess_11 = tf.math.divide(dii, det)

dedoogong · 2021-05-04T22:42:57Z

I found there are something may make this happen.
First, tf2onnx fused conv+bn, and it'ok. but in the model, you can see, in 'conv1' and 'conv3', the conv layer and bn layer are intentionally separated and tf2onnx still seem to merge those layers and again it adds the bn layer after conv layer!(maybe bug?).
I used onnx-graphsurgery tool to overwrite the conv's weight to the original weight and the results look much better(almost same result).

But there was still accuracy issue.
First, converted model outputs too many detected feature key points(almost 2 times).

TomWildenhain-Microsoft · 2021-05-04T23:27:59Z

Are you able to upload the frozen pb file? That would be easiest. Just upload it to OneDrive/Dropbox/Google Drive and post a share link if possible.

dedoogong · 2021-05-06T01:49:29Z

OK, here is the pb file and converted onnx file. you can see that, for example, in the converted onnx, there are 'conv1/Conv2D' and 'conv1/bn/FusedBatchNormV3' which means those nodes are not fused, so it should have same paremeter values as the original ones in pb, but actually 'conv1/Conv2D' doesn't even though conv1/bn/FusedBatchNormV3 has same values, .

pb_onnx.zip

dedoogong · 2021-05-06T01:55:52Z

And here is the another onnx file which is modified (conv1/conv3's w is overwritten by the original ones in the pb files after transposing by

onnx_conv_node.inputs[1].values=pb_conv_w.transpose([3, 2, 0, 1]))

after runs the inference like above in step 5, it shows the much better results even there is still little big gap in accuracy.
asl.zip

TomWildenhain-Microsoft · 2021-05-20T00:31:02Z

Hi @dedoogong, sorry took me a while to get to this. You are right that the converter was setting the weights incorrectly due to a bug in the transpose optimizer. #1528 should fix the issue. Can you confirm that the fix works for you?

You said "there was still accuracy issue." but also it gets "almost same result". Were you able to get a model that works for you? Was the accuracy issue with only some of the outputs or all of them?

dedoogong · 2021-05-20T02:50:26Z

I think the accuracy problem comes from bilinear resize op becasue it takes a feature map which contains a quite big values(around 100~400). I think Tensorflow's bilinear resize and ORT's one are little bit different.

TomWildenhain-Microsoft · 2021-05-27T16:56:17Z

Just to make sure, did you find that the latest tf2onnx on GitHub solves your issue? Your workaround of resetting the weights should no longer be necessary.

dedoogong closed this as completed May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different Result after converting to onnx from TF! #1494

Different Result after converting to onnx from TF! #1494

dedoogong commented May 3, 2021 •

edited

Loading

TomWildenhain-Microsoft commented May 4, 2021

dedoogong commented May 4, 2021

dedoogong commented May 4, 2021 •

edited

Loading

TomWildenhain-Microsoft commented May 4, 2021

dedoogong commented May 6, 2021

dedoogong commented May 6, 2021

TomWildenhain-Microsoft commented May 20, 2021

dedoogong commented May 20, 2021

TomWildenhain-Microsoft commented May 27, 2021

Different Result after converting to onnx from TF! #1494

Different Result after converting to onnx from TF! #1494

Comments

dedoogong commented May 3, 2021 • edited Loading

TomWildenhain-Microsoft commented May 4, 2021

dedoogong commented May 4, 2021

dedoogong commented May 4, 2021 • edited Loading

TomWildenhain-Microsoft commented May 4, 2021

dedoogong commented May 6, 2021

dedoogong commented May 6, 2021

TomWildenhain-Microsoft commented May 20, 2021

dedoogong commented May 20, 2021

TomWildenhain-Microsoft commented May 27, 2021

dedoogong commented May 3, 2021 •

edited

Loading

dedoogong commented May 4, 2021 •

edited

Loading