-
Notifications
You must be signed in to change notification settings - Fork 45.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting RuntimeError: main thread is not in main loop when running model_main.py #4777
Comments
Thank you for the report. The Tensorflow Object Detection API should work with Python 2 and 3. We'll have a patch out to fix these issues soon. |
Thank you, you mean the runtime error is also related to incompatibility with python3? How can I know that the fix has been added? Should I check the release section often? |
The following change did the trick for me:
|
Sorry, I don't get it! What do you mean by @@ and + - marks? |
It's a diff. It means remove the import and then move it to the top. |
OK, so what about the @@ -22,4 +22,5 @@ and @@ -29,5 +30,4 @@? |
@derekjchow I see pylint directives in the object_detection code - I presume you're linting on Python 2. Would it be possible to lint on Python 3? Also - and my apologies if these are readily available (I couldn't find them) - are there docs that explain Google's release process wrt object_detection? It seems that you maintain a private repo and collect a large number of changes, which are then applied to the public repo in discrete sync operations. This makes it tough for the community to make even simple contributions (like fixing Python 3 breakage) as there's little visibility into what Google will be merging into the public repo. Waiting for Google to patch these issues with little-to-no visibility into that process or timeline I think makes this library challenging to work with. I personally would be delighted to fix these directly (and I certainly must do that in my own fork) but knowing that they're already fixed, but in an inaccessible repo, forces me to either suffer the breakage, revert to some earlier commit (which itself may have issues) or to work with my own fork knowing I'll have merge conflicts down the road. I understand there's often a need to maintain private repos and routinely sync, but I'm wondering if this is outlined somewhere as a policy? Is there any discussion of making the public repo the sole repo for object_detection? Again, my apologies if this is covering already-documented topics! |
One line 703 of utils/visualization_utils.py change cdf_plot=tf.py_func(cdf_plot, [values], tf.uint8) to cdf_plot=tf.py_func(cdf_plot, [values], tf.float32) this will eliminate the errors, or at least it did for me |
Hello everyone,
Then I turned the line back to what it was and tried @marcbelmont 's solution (still don't have any idea what the @ means here I just moved
The full stack trace is huge, so I don't send all of it (tell me if I should). Now the point is I just want to train the model I don't want TF to perform any evaluation for me (I have a separate code and tool kit for that). So, is there any way to disable evaluation totally? Because it seems to be the source of the problem, unless of course, it's just because it's being done before training and the training process would give the same error as well! Another thing is, it took ages for the command to get to the error I mentioned! I'm running the code on a Geforce GTX 1070 and usually it's fast enough. So, what's the problem?! Is it perhaps trying to run evaluation on my whole validation set from the beginning or it's somthing else?! |
Hey guys! I'm still hoping someone would tell me how to fix this problem! Any ideas?! |
Got the same error!!! |
Have you tried with the latest commits? |
OK, today I downloaded the whole models folder and tried to run the same code as the one I mentioned in the actual question, meaning: After fixing an error in multiscale_grid_anchor_generator.py file (NameError: name 'xrange' is not defined, line 61), I got the same run time error again: Error in atexit._run_exitfuncs:
Again the full Traceback is huge, I don't know if I should send it or not. |
I have the same issue, any fix would be appreciated |
@szm2015, I believe your error about the pickle dictionary object may be due to a python version problem? I was having dictionary problems when I was running python3 and not python2 but I am not sure if that will be your specific problem. Also for the RuntimeError: main thread thing, I believe that this is really just an error saying that an error happened somewhere in the code. I think the only way to solve the problem is to go through the full traceback and comment out whatever line it is that is causing you pain... My fix for py_func was for a plot in tensorboard so not really necessary for training even though it was causing the program to crash |
It is an python2 vs python3 error. I fixed it by changing file: -- for scale in xrange(scales_per_octave)] |
@MichaelX99 and @xtianhb I think we already established the fact that it is a python2 vs. python3 problem and @xtianhb I also did the fix you suggest but that only fixes the initial error and not the pain-in-the-neck runtime one!!! :) The point is why should such errors exist in the first place?! Shouldn't the object detection API be compatible with python3?! |
Hello again. I did as @MichaelX99 suggested and went over the full Traceback. There I found the source of the problem, line 703 and 704 of file visualization_utils.py:
Which I commented out and so the runtime problem was resolved. The code went on (of course after an error However, I still don't want the code to perform any evaluation whatsoever. Now it seems to do nothing but evaluation! any suggestions?! I feel like this is something very simple and I'm missing some very obvious points! This is the output for running the code for a couple of minutes on a small dataset:
|
@szm2015, I am glad you are able to run the script without errors now! The system is massive and can error in very annoying places... I no longer can help since I run on the legacy code before estimators and am not knowledgeable enough about their latest code to help any more, good luck! |
Hi There, |
heey tensorflowbutler help please (tens-test) C:\tensorflow1\models\research\object_detection>python model_main.py --logtostderr --model_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config
WARNING:tensorflow:From model_main.py:109: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead. WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\utils\config_util.py:102: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead. W0215 03:49:36.531214 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\utils\config_util.py:102: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead. WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\model_lib.py:628: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead. W0215 03:49:36.539234 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\model_lib.py:628: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead. WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. W0215 03:49:36.545213 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\utils\config_util.py:488: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead. INFO:tensorflow:Maybe overwriting train_steps: None W0215 03:49:36.632063 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\data_decoders\tf_example_decoder.py:182: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead. WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\data_decoders\tf_example_decoder.py:197: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead. W0215 03:49:36.637063 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\data_decoders\tf_example_decoder.py:197: The name tf.VarLenFeature is deprecated. Please use tf.io.VarLenFeature instead. WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\builders\dataset_builder.py:64: The name tf.gfile.Glob is deprecated. Please use tf.io.gfile.glob instead. W0215 03:49:36.670076 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\builders\dataset_builder.py:64: The name tf.gfile.Glob is deprecated. Please use tf.io.gfile.glob instead. WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. W0215 03:49:39.115937 11232 module_wrapper.py:139] From C:\Users\aligk\Anaconda3\envs\tens-test\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.logging.warn is deprecated. Please use tf.compat.v1.logging.warn instead. WARNING:tensorflow:From C:\Users\aligk\Anaconda3\envs\tens-test\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.is_nan is deprecated. Please use tf.math.is_nan instead. W0215 03:49:50.746980 11232 module_wrapper.py:139] From C:\Users\aligk\Anaconda3\envs\tens-test\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.is_nan is deprecated. Please use tf.math.is_nan instead. WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\utils\ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. W0215 03:49:54.265079 11232 module_wrapper.py:139] From C:\Users\aligk\Anaconda3\envs\tens-test\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead. WARNING:tensorflow:From C:\Users\aligk\Anaconda3\envs\tens-test\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead. W0215 03:49:57.583898 11232 module_wrapper.py:139] From C:\Users\aligk\Anaconda3\envs\tens-test\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead. WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\inputs.py:166: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. W0215 03:50:02.155535 11232 module_wrapper.py:139] From C:\Users\aligk\Anaconda3\envs\tens-test\lib\site-packages\tensorflow_core\python\autograph\converters\directives.py:119: The name tf.string_to_hash_bucket_fast is deprecated. Please use tf.strings.to_hash_bucket_fast instead. WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\builders\dataset_builder.py:158: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. W0215 03:50:03.181793 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:168: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead. WARNING:tensorflow:From C:\Users\aligk\Anaconda3\envs\tens-test\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py:2784: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version. W0215 03:50:06.293033 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\core\anchor_generator.py:171: The name tf.assert_equal is deprecated. Please use tf.compat.v1.assert_equal instead. INFO:tensorflow:Scale of 0 disables regularizer. W0215 03:50:06.318396 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:558: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead. INFO:tensorflow:Scale of 0 disables regularizer. W0215 03:50:07.548388 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\box_coders\faster_rcnn_box_coder.py:82: The name tf.log is deprecated. Please use tf.math.log instead. WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\core\minibatch_sampler.py:85: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead. W0215 03:50:07.637370 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\core\minibatch_sampler.py:85: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead. WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\utils\spatial_transform_ops.py:419: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version. W0215 03:50:08.126353 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:191: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead. WARNING:tensorflow:From C:\Users\aligk\Anaconda3\envs\tens-test\lib\site-packages\tensorflow_core\contrib\layers\python\layers\layers.py:1634: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. W0215 03:50:09.268049 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\utils\variables_helper.py:179: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead. WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2768: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. W0215 03:50:09.282026 11232 module_wrapper.py:139] From C:\tensorflow1\models\research\object_detection\utils\variables_helper.py:139: The name tf.train.NewCheckpointReader is deprecated. Please use tf.compat.v1.train.NewCheckpointReader instead. Windows fatal exception: access violation Current thread 0x00002be0 (most recent call first): |
System information
Well, I'm trying to fine-tune ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync model on my own custom dataset.
No
Ubuntu 18.04
From source
1.9.0
0.15.0
9.2/7.1
Geforce GTX 850M 4GB
python3 object_detection/model_main.py --pipeline_config_path=/home/szm/Work/TensorFlow/Models/ObjectDetection/ssd_mobilenet_v1_fpn_DETRAC_516x292_4C_RA1/ssd_mobilenet_v1_fpn.config --model_dir=/home/szm/Work/TensorFlow/Models/ObjectDetection/ssd_mobilenet_v1_fpn_DETRAC_516x292_4C_RA1 --alsologtostderr
Describe the problem
OK! I've been using an older release of Object Detection API (from the README it seems to be February 9, 2018 release) and have trained multiple models with it. Recently, I decided to use the July 13, 2018 release so as to be able to train newly added models like ssd_mobilenet_v1_fpn.
First, I finished the steps mentioned in the installation with model_builder_test.py resulting in:
Then I tried to train my model using the command I mentioned above, which is what I found in Running locally. I'm using the provided config file ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync.config with minor changes in the following parts:
The first problem is that the codes (or at least some of them) seem to have problems running with python3 (which is the python I'm using, It's actually version 3.6.5). Here are the errors I got from running model_main.py and their solutions:
I really don't understand why should such errors exist? Is the API meant to be used with python 2 or it's that I'm missing something?!
Anyway, the actual problem is the one mentioned in the title. That's the error I get after fixing the above problems (I'm not exactly sure about the fixes though!) and I don't know how to deal with it. It's worth to note that I also repeated the above steps on an Ubuntu 16.04 with CUDA 9.0,tensorflow 1.7.0 and Geforce GTX 1070 8GB and got the same errors.
Source code / logs
Here's the full Traceback:
The text was updated successfully, but these errors were encountered: