Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSD MobileNet train fails with Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>) #5715

Closed
anonym24 opened this issue Nov 7, 2018 · 7 comments

Comments

@anonym24
Copy link

anonym24 commented Nov 7, 2018

retraining with faster_rcnn_inception_v2_coco_2018_01_28 model and faster_rcnn_inception_v2_pets.config works ok

but

python3 train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config

with ssd_mobilenet_v1_coco_2018_01_28 pretrained model

fails with next error

ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
  File "train.py", line 184, in <module>
    tf.app.run()  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 306, in new_func
    return func(*args, **kwargs)  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)  File "/media/user/DATA/tensorflow1/models/research/object_detection/legacy/trainer.py", line 415, in train
    saver=saver)  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 791, in train
    should_retry = True  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 189, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
==================================
E1107 16:45:19.883316 139764880684864 tf_logging.py:105] ==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
  File "train.py", line 184, in <module>
    tf.app.run()  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 306, in new_func
    return func(*args, **kwargs)  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)  File "/media/user/DATA/tensorflow1/models/research/object_detection/legacy/trainer.py", line 415, in train
    saver=saver)  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 791, in train
    should_retry = True  File "/home/user/.local/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 189, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
==================================

Full log:
full_logs.zip

ssd_mobilenet_v1_coco.config

# SSD with Mobilenet v1 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 30
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 24
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/media/user/DATA/tensorflow1/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/media/user/DATA/tensorflow1/models/research/object_detection/train.record"
  }
  label_map_path: "/media/user/DATA/tensorflow1/models/research/object_detection/training/labelmap.pbtxt"
}

eval_config: {
  num_examples: 8000
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/media/user/DATA/tensorflow1/models/research/object_detection/test.record"
  }
  label_map_path: "/media/user/DATA/tensorflow1/models/research/object_detection/training/labelmap.pbtxt"
  shuffle: false
  num_readers: 1
}

System information

@anonym24
Copy link
Author

anonym24 commented Nov 8, 2018

I tried to use model_train.py instead of train.py - #5719

Seems it started to work with model_train.py for ssd_mobilenet_v1_coco.config + ssd_mobilenet_v1_coco_2018_01_28 combination
Though training is very slow and it eats a lot of cpu

But it fails for ssd_mobilenet_v1_quantized_300x300_coco14_sync.config + ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_18

@anonym24 anonym24 closed this as completed Nov 8, 2018
@anonym24
Copy link
Author

anonym24 commented Nov 9, 2018

seems the real issues with train.py are related to OOM - #2034 (comment)

seems only with batch_size value 1 it works for train.py

@anonym24
Copy link
Author

anonym24 commented Nov 9, 2018

Trying to get back to train.py again cause model_train.py works ugly - eats a lot of CPU and it's very slowly:

With train.py and batch_size value 1 it finally at least started to train but after some steps (~100-200) it fails retraining SSD MobileNet:

python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config

INFO:tensorflow:global step 171: loss = 14.8309 (0.360 sec/step)
I1109 12:00:57.197321  7740 tf_logging.py:115] global step 171: loss = 14.8309 (0.360 sec/step)
INFO:tensorflow:global step 172: loss = 11.7885 (0.351 sec/step)
I1109 12:00:57.549896  7740 tf_logging.py:115] global step 172: loss = 11.7885 (0.351 sec/step)
INFO:tensorflow:global step 173: loss = 12.5532 (0.369 sec/step)
I1109 12:00:57.919557  7740 tf_logging.py:115] global step 173: loss = 12.5532 (0.369 sec/step)
INFO:tensorflow:global step 174: loss = 13.3306 (0.328 sec/step)
I1109 12:00:58.248665  7740 tf_logging.py:115] global step 174: loss = 13.3306 (0.328 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Incompatible shapes: [2,1917] vs. [3,1]
         [[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175)  = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
         [[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Loss/Match/cond/mul_4', defined at:
  File "train.py", line 184, in <module>
    tf.app.run()
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func
    return func(*args, **kwargs)
  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 290, in train
    clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
  File "C:\tensorflow1\models\research\slim\deployment\model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 205, in _create_losses
    losses_dict = detection_model.loss(prediction_dict, true_image_shapes)
  File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 680, in loss
    keypoints, weights)
  File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 853, in _assign_targets
    groundtruth_weights_list)
  File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 483, in batch_assign_targets
    anchors, gt_boxes, gt_class_targets, unmatched_class_label, gt_weights)
  File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 182, in assign
    valid_rows=tf.greater(groundtruth_weights, 0))
  File "C:\tensorflow1\models\research\object_detection\core\matcher.py", line 241, in match
    return Match(self._match(similarity_matrix, valid_rows),
  File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 194, in _match
    _match_when_rows_are_non_empty, _match_when_rows_are_empty)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2086, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1930, in BuildCondBranch
    original_result = fn()
  File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 175, in _match_when_rows_are_non_empty
    tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32))
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 866, in binary_op_wrapper
    return func(x, y, name=name)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 1131, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5358, in mul
    "Mul", x=x, y=y, name=name)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
    op_def=op_def)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [3,1]
         [[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175)  = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
         [[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

I1109 12:00:58.279880  7740 tf_logging.py:115] Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Incompatible shapes: [2,1917] vs. [3,1]
         [[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175)  = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
         [[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Loss/Match/cond/mul_4', defined at:
  File "train.py", line 184, in <module>
    tf.app.run()
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func
    return func(*args, **kwargs)
  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 290, in train
    clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
  File "C:\tensorflow1\models\research\slim\deployment\model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 205, in _create_losses
    losses_dict = detection_model.loss(prediction_dict, true_image_shapes)
  File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 680, in loss
    keypoints, weights)
  File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 853, in _assign_targets
    groundtruth_weights_list)
  File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 483, in batch_assign_targets
    anchors, gt_boxes, gt_class_targets, unmatched_class_label, gt_weights)
  File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 182, in assign
    valid_rows=tf.greater(groundtruth_weights, 0))
  File "C:\tensorflow1\models\research\object_detection\core\matcher.py", line 241, in match
    return Match(self._match(similarity_matrix, valid_rows),
  File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 194, in _match
    _match_when_rows_are_non_empty, _match_when_rows_are_empty)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2086, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1930, in BuildCondBranch
    original_result = fn()
  File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 175, in _match_when_rows_are_non_empty
    tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32))
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 866, in binary_op_wrapper
    return func(x, y, name=name)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 1131, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5358, in mul
    "Mul", x=x, y=y, name=name)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
    op_def=op_def)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [3,1]
         [[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175)  = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
         [[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Traceback (most recent call last):
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
    return fn(*args)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [2,1917] vs. [3,1]
         [[{{node Loss/Match/cond/mul_4}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
         [[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 184, in <module>
    tf.app.run()
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func
    return func(*args, **kwargs)
  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 415, in train
    saver=saver)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 770, in train
    sess, train_op, global_step, train_step_kwargs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 487, in train_step
    run_metadata=run_metadata)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 929, in run
    run_metadata_ptr)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
    run_metadata)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [2,1917] vs. [3,1]
         [[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175)  = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
         [[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Loss/Match/cond/mul_4', defined at:
  File "train.py", line 184, in <module>
    tf.app.run()
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func
    return func(*args, **kwargs)
  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 290, in train
    clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
  File "C:\tensorflow1\models\research\slim\deployment\model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 205, in _create_losses
    losses_dict = detection_model.loss(prediction_dict, true_image_shapes)
  File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 680, in loss
    keypoints, weights)
  File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 853, in _assign_targets
    groundtruth_weights_list)
  File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 483, in batch_assign_targets
    anchors, gt_boxes, gt_class_targets, unmatched_class_label, gt_weights)
  File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 182, in assign
    valid_rows=tf.greater(groundtruth_weights, 0))
  File "C:\tensorflow1\models\research\object_detection\core\matcher.py", line 241, in match
    return Match(self._match(similarity_matrix, valid_rows),
  File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 194, in _match
    _match_when_rows_are_non_empty, _match_when_rows_are_empty)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2086, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1930, in BuildCondBranch
    original_result = fn()
  File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 175, in _match_when_rows_are_non_empty
    tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32))
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 866, in binary_op_wrapper
    return func(x, y, name=name)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 1131, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5358, in mul
    "Mul", x=x, y=y, name=name)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
    op_def=op_def)
  File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [3,1]
         [[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175)  = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
         [[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

@anonym24 anonym24 reopened this Nov 9, 2018
@anonym24
Copy link
Author

anonym24 commented Nov 9, 2018

seems issue related to #5391

@anonym24
Copy link
Author

anonym24 commented Nov 9, 2018

I guess legacy train.py isn't going to be updated, so the only solution is to use model_train.py

@anonym24 anonym24 closed this as completed Nov 9, 2018
@Madhukaran
Copy link

You have broken the serialized dataset(i.e, data corroupt) further training models could be added. thus delete the model and train from the fresh

@EthiopianOne
Copy link

Hey guys, I had the same issue, and after a loooong annoying searching and trying to debug it worked for me when i switched to model_main.py.

get rid of that evil legacy\train.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants