Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different Result after converting to onnx from TF! #1494

Closed
dedoogong opened this issue May 3, 2021 · 9 comments
Closed

Different Result after converting to onnx from TF! #1494

dedoogong opened this issue May 3, 2021 · 9 comments

Comments

@dedoogong
Copy link

dedoogong commented May 3, 2021

Describe the bug
Original TF-ASLFeat running result is as below;
disp_orig
Converted Onnx-ASLFeat running result is as below;
disp

Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

Emergent!

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Tensorflow Version: 1.15.2 ~ 1.15.4
  • Python version: 3.6

To Reproduce
Describe steps/code to reproduce the behavior. Please upload/link the model you are trying to convert if possible.

  1. clone the TF model repo
    git clone https://github.com/lzx551402/aslfeat.git &&
    cd ASLFeat/pretrained &&
    wget https://research.altizure.com/data/aslfeat_models/aslfeatv2.tar; tar -xvf aslfeatv2.tar

  2. add the following lines in the end of init function in the models/base_model.py to get the frozen graph

        output_layer = ['kpts', 'scores', 'descs']
        frozen = tf.graph_util.convert_variables_to_constants(self.sess, self.sess.graph_def, output_layer)
        graph_io.write_graph(frozen, './', 'aslfeatv2.pb', as_text=False)

the code should look like below;

def __init__(self, model_path, **config):
        self.model_path = model_path
        # Update config
        self.config = dict_update(getattr(self, 'default_config', {}), config)
        self._init_model()
        ext = os.path.splitext(model_path)[1]

        sess_config = tf.compat.v1.ConfigProto()
        sess_config.gpu_options.allow_growth = True

        if ext.find('.pb') == 0:
            graph = load_frozen_model(self.model_path, print_nodes=False)
            self.sess = tf.compat.v1.Session(graph=graph, config=sess_config)
            
        elif ext.find('.ckpt') == 0:
            self._construct_network()
            self.sess = tf.compat.v1.Session(config=sess_config)
            recoverer(self.sess, model_path)

        output_layer = ['kpts', 'scores', 'descs']
        frozen = tf.graph_util.convert_variables_to_constants(self.sess, self.sess.graph_def, output_layer)
        graph_io.write_graph(frozen, './', 'aslfeatv2.pb', as_text=False)
  1. change a line of def _construct_network() function of feat_model.py to fix the input shape
ph_imgs = tf.placeholder(dtype=tf.float32, shape=(1, 240, 320, 1), name='input')

and insert a line of code into def load_imgs() in image_matching.py right after cv.imread(img_path).

img = cv2.resize(img, dsize=(320, 240))
  1. get pb file and convert it to onnx
    $python3 image_matching.py --config configs/matching_eval.yaml
    $python3 -m tf2onnx.convert --graphdef aslfeatv2.pb --output aslfeatv2.onnx --inputs input:0 --outputs kpts:0,descs:0,scores:0 --opset 11

Run the below onnx inference code to get the visualized result.

import numpy as np
import cv2
import onnx
from onnx import helper
import onnxruntime as rt 
from matplotlib import pyplot as plt 

class MatcherWrapper(object):
    """OpenCV matcher wrapper."""

    def __init__(self):
        self.matcher = cv2.BFMatcher(cv2.NORM_L2)

    def get_matches(self, feat1, feat2, cv_kpts1, cv_kpts2, ratio=None, cross_check=True, err_thld=4, ransac=True, info=''):
        """Compute putative and inlier matches.
        Args:
            feat: (n_kpts, 128) Local features.
            cv_kpts: A list of keypoints represented as cv2.KeyPoint.
            ratio: The threshold to apply ratio test.
            cross_check: (True by default) Whether to apply cross check.
            err_thld: Epipolar error threshold.
            info: Info to print out.
        Returns:
            good_matches: Putative matches.
            mask: The mask to distinguish inliers/outliers on putative matches.
        """

        init_matches1 = self.matcher.knnMatch(feat1, feat2, k=2)
        init_matches2 = self.matcher.knnMatch(feat2, feat1, k=2)

        good_matches = []

        for i in range(len(init_matches1)):
            cond = True
            if cross_check:
                cond1 = cross_check and init_matches2[init_matches1[i][0].trainIdx][0].trainIdx == i
                cond *= cond1
            if ratio is not None and ratio < 1:
                cond2 = init_matches1[i][0].distance <= ratio * init_matches1[i][1].distance
                cond *= cond2
            if cond:
                good_matches.append(init_matches1[i][0])

        if type(cv_kpts1) is list and type(cv_kpts2) is list:
            good_kpts1 = np.array([cv_kpts1[m.queryIdx].pt for m in good_matches])
            good_kpts2 = np.array([cv_kpts2[m.trainIdx].pt for m in good_matches])
        elif type(cv_kpts1) is np.ndarray and type(cv_kpts2) is np.ndarray:
            good_kpts1 = np.array([cv_kpts1[m.queryIdx] for m in good_matches])
            good_kpts2 = np.array([cv_kpts2[m.trainIdx] for m in good_matches])
        else:
            raise Exception("Keypoint type error!")
            exit(-1)

        if ransac:
            _, mask = cv2.findFundamentalMat(
                good_kpts1, good_kpts2, cv2.RANSAC, err_thld, 0.999)
            n_inlier = np.count_nonzero(mask)
            print(info, 'n_putative', len(good_matches), 'n_inlier', n_inlier)
        else:
            mask = np.ones((len(good_matches), ))
            print(info, 'n_putative', len(good_matches))
        return good_matches, mask

    def draw_matches(self, img1, cv_kpts1, img2, cv_kpts2, good_matches, mask,
                     match_color=(0, 255, 0), pt_color=(0, 0, 255)):
        """Draw matches."""
        if type(cv_kpts1) is np.ndarray and type(cv_kpts2) is np.ndarray:
            cv_kpts1 = [cv2.KeyPoint(cv_kpts1[i][0], cv_kpts1[i][1], 1)
                        for i in range(cv_kpts1.shape[0])]
            cv_kpts2 = [cv2.KeyPoint(cv_kpts2[i][0], cv_kpts2[i][1], 1)
                        for i in range(cv_kpts2.shape[0])]
        display = cv2.drawMatches(img1, cv_kpts1, img2, cv_kpts2, good_matches,
                                  None,
                                  matchColor=match_color,
                                  singlePointColor=pt_color,
                                  matchesMask=mask.ravel().tolist(), flags=4)
        return display 

descs, kpts = [], []

rgb_list = []
gray_list = []
imgs=['imgs/test1.jpg', 'imgs/test2.jpg']
sess = rt.InferenceSession("test.onnx")
for img_name in imgs:        
    img=cv2.imread(img_name) 
    img = cv2.resize(img, dsize=(320,240)) 
    img_c = img[..., ::-1]
    rgb_list.append(img_c)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)[..., np.newaxis] 
    img=gray[..., ::-1]
    img=img[np.newaxis, ... ]
    print('img.shape : ', img.shape)    
    input_name = sess.get_inputs()[0].name
    outputs=sess.get_outputs()
    print('input_name  : ', input_name) 
    pred_onx = sess.run(None, {input_name: img.astype(np.float32)})#img.astype(np.float32)
    print( 'kpts : ', pred_onx[0].shape, 'descs : ', pred_onx[1].shape)
    kpts.append(pred_onx[0][0])
    descs.append(pred_onx[1][0]) 

matcher = MatcherWrapper()
match, mask = matcher.get_matches(
    descs[0], descs[1], kpts[0], kpts[1],
    ratio=0.8, cross_check=True,
    err_thld=3, ransac=True, info='ASLFeat') 
disp = matcher.draw_matches(rgb_list[0], kpts[0], rgb_list[1], kpts[1], match, mask)

output_name = 'disp.jpg'
print('image save to', output_name)
plt.imsave(output_name, disp)
@TomWildenhain-Microsoft
Copy link
Contributor

Hi @dedoogong, after following your instructions, I got an error but not the one you describe. For me the model failed to convert with unsupported ops DivNoNan. We should be able to easily add support for that op and get the model to convert. If you are getting a different error, can you upload your .pb file and tell me the exact tensorflow version you are using? tf2onnx uses tensorflow's optimizer, so maybe it is substituting in different ops in a different version.

@dedoogong
Copy link
Author

Hi! thank you for trying to solve this issue!
ok, can you modify the code as below?
https://github.com/lzx551402/ASLFeat/blob/d6bc1b49f61b5bf32a5a4e6f4ca25da46713bbf2/models/cnn_wrapper/aslfeat.py#L212

det += 1e-10 
inv_hess_00 = tf.math.divide(djj, det)
inv_hess_01 = tf.math.divide(-dij, det)
inv_hess_11 = tf.math.divide(dii, det)

@dedoogong
Copy link
Author

dedoogong commented May 4, 2021

I found there are something may make this happen.
First, tf2onnx fused conv+bn, and it'ok. but in the model, you can see, in 'conv1' and 'conv3', the conv layer and bn layer are intentionally separated and tf2onnx still seem to merge those layers and again it adds the bn layer after conv layer!(maybe bug?).
I used onnx-graphsurgery tool to overwrite the conv's weight to the original weight and the results look much better(almost same result).

But there was still accuracy issue.
First, converted model outputs too many detected feature key points(almost 2 times).

@TomWildenhain-Microsoft
Copy link
Contributor

Are you able to upload the frozen pb file? That would be easiest. Just upload it to OneDrive/Dropbox/Google Drive and post a share link if possible.

@dedoogong
Copy link
Author

OK, here is the pb file and converted onnx file. you can see that, for example, in the converted onnx, there are 'conv1/Conv2D' and 'conv1/bn/FusedBatchNormV3' which means those nodes are not fused, so it should have same paremeter values as the original ones in pb, but actually 'conv1/Conv2D' doesn't even though conv1/bn/FusedBatchNormV3 has same values, .

pb_onnx.zip

@dedoogong
Copy link
Author

And here is the another onnx file which is modified (conv1/conv3's w is overwritten by the original ones in the pb files after transposing by

onnx_conv_node.inputs[1].values=pb_conv_w.transpose([3, 2, 0, 1]))

after runs the inference like above in step 5, it shows the much better results even there is still little big gap in accuracy.
asl.zip

@TomWildenhain-Microsoft
Copy link
Contributor

Hi @dedoogong, sorry took me a while to get to this. You are right that the converter was setting the weights incorrectly due to a bug in the transpose optimizer. #1528 should fix the issue. Can you confirm that the fix works for you?

You said "there was still accuracy issue." but also it gets "almost same result". Were you able to get a model that works for you? Was the accuracy issue with only some of the outputs or all of them?

@dedoogong
Copy link
Author

I think the accuracy problem comes from bilinear resize op becasue it takes a feature map which contains a quite big values(around 100~400). I think Tensorflow's bilinear resize and ORT's one are little bit different.

@TomWildenhain-Microsoft
Copy link
Contributor

Just to make sure, did you find that the latest tf2onnx on GitHub solves your issue? Your workaround of resetting the weights should no longer be necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants