-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probably environment issues #6
Comments
Hi JB, I found the spine dataset at the following link, so maybe you can download it from there? https://imperialcollegelondon.app.box.com/s/erhcm28aablpy1725lt93xh6pk31ply1 For the network, I have unzipped all the individual folders (training and testing images) and saved to a folder called 'images' as instructed in the README file - if you would like for me to share this already prepared folder with you please let me know. With regard to running on a CPU, why not sign up to use Google Colab (colab.research.google.com) - they will provide you free access to a GPU for training and running the network. You can also link directly to GitHub or your Google Drive for accessing files. That said, I am still encountering problems as I am getting nan when I run main.py... Thanks, Beth |
Hey, thanks for that, I will definitely try that. About NANs. Did you see the TODO comment in the main.py: if name == 'main': Did you try to reduce the learning rate? You probably have seen it, but nevertheless I decided to write about it, because sometimes when you start digging really deep you miss things that are on the surface. Regards, |
Thanks for your interest in our papers and the framework! Regarding your observed error message when running the code on the CPU, many parts of the framework are not tested and implemented for CPU usage. However, you could try to set the property 'data_format' to 'channels_last' instead of 'channels_first'. This may work on the CPU, however I did not test it most of the time. I hope you understand that we use this framework mainly for prototyping our current research, so lots of documentation is missing and many parts are untested. I am working on improving this in the future. Unfortunately, my schedule is currently quite full and I don't have much time for working on the framework... But if you observe bugs or have any suggestions, just write a message! Regarding your observed NaN values, I intentionally set the learning rate to be that high, such that training is faster. However, as you also observed, sometimes the loss function is becoming NaN. You have a couple of possible solutions: either just restart the program, reduce the learning rate, or change the optimizer and learning rate. I hope my notes could help you and clarify your questions. Regards, |
Hey Christian, thanks for the reply. I do understand how research works - documentation is the last thing you want to do when you see improvements on the horizon :) I just wanted to let you know that I created a new virtual env with p36 anaconda distribution with tensorflow-mkl and it works. Nevertheless, it takes ages to train the network on hand x-ray example with my resources. I will try the google colab then. @blane85 once (and if) I manage to make everything work on google colab I will share the code. Regards, |
Hello. That's a piece of great repository and good papers as well. I'm particularly interested in using spatial configuration for landmark recognition. I wanted to run the 'spine' example, but obviously no data available right now. Therefore I decided to run the hand_xray example first to understand how thing works here. Unfortunately, there are some small issues, some of which I manage to overcome, but the one with data format stopped me:
_Data generator thread stop
Data generator thread stopData generator thread stopData generator thread stop
Traceback (most recent call last):
File "C:\Users\user_name\Conda\deps\usr\envs\p37_analiza_cefalo_1\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
return fn(*args)
File "C:\Users\user_name\Conda\deps\usr\envs\p37_analiza_cefalo_1\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\user_name\Conda\deps\usr\envs\p37_analiza_cefalo_1\lib\site-packages\tensorflow\python\client\session.py", line 1429, in call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default AvgPoolingOp only supports NHWC on device type CPU
[[{{node net_1/unet/contracting/downsample0/AvgPool}}]]
I'm trying to run it on CPU. Is there a way to manage this? Maybe you could provide a reproducible environment for the repository (like e.g. docker maybe?)
Kind Regards!
JB
The text was updated successfully, but these errors were encountered: