-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android support for TensorFlow presets #297
Conversation
|
Conflicts: pom.xml tensorflow/cppbuild.sh tensorflow/pom.xml tensorflow/src/main/java/org/bytedeco/javacpp/presets/tensorflow.java
@saudet: I updated to the new tensorflow version by merging in the master and added the android in the @platform list. However, now the android build doesn't work any more. I get these errors: Do you have an idea what I need to fix? I will try using sed for patching the file. |
A lot of things have changed between 0.9.0 and 0.10.0. Maybe the build for Android broke. Please confirm with someone upstream that build for Android isn't broken. Thanks! |
The cppbuild.sh build for android worked fine. The compilation of the jni classes is failing, not the tensorflow build. But I will test there Android example to be sure. |
The errors you are getting mean that |
Because the TensorFlow guys prefer questions like this on stackoverflow, I created a question there. https://stackoverflow.com/questions/39855672/tensorflow-how-to-compile-libtensorflow-cc-so-for-android |
From what I read in some tensorflow issues, it seams that they don't support the ops for Android. I tried to include them, but there are several dependencies that aren't working for Android or at least not without significant work. @saudet: Is there a way to exclude the ops when building for Android? |
Sure, we can provide a different list of header files to use with |
@saudet: I had to remove some headers and also some Infos from the map. For easier comparison which headers have been removed, I temporarily removed the macos and linux Platform. At the moment there are two problems:
Many thanks in advance! |
You shouldn't need to remove any Info to get this working. If you could point me to a simple example of one Info that is causing problems, it will be easier to start explaining what you need to do. |
I removed the dot.h from the presets, as it was indeed not included in the binary. However, I still get these errors: https://gist.github.com/andreas-eberle/90e987abcf2cdd3f91838339e675cec1 I checked and the op.cc file is compiled and included into the binary. I cannot figure out what could be missing. Furthermore, if I remove that header, I have to remove many others (I din't find a valid configuration yet). What Infos could help you to identify the issue? What would you do? |
Well, seeing the errors that you get would be a good start. |
Sorry that I didn't give you the full log output... https://gist.github.com/andreas-eberle/0ae0185f3dfab6ad26c78348621e6cba is the output of a complete clean install. I also did the bazel build with the |
Thanks!
|
This is caused by the built-in android build rules. The same thing happens when you build the TensorFlow Android example. Therefore I don't think that's making us problems. Furthermore, from what I read and understand, this only says it's not good style to include sources from another package and you'd better only include libraries from other packages.
I found out that most of the ops are actually included in the Android build (see tensorflow/tensorflow#4795). What I don't understand about the RegisterOps is that is declared in I will try to remove those headers in the meantime to see if I can get something building. |
Ok, so we can skip those with |
I skipped the RegisterOps. As the other errors were still there, I also skipped those definitions (see commit). Furthermore, I had to remove the definitions of the OpDefBuilderWrapper. However, now I get these errors: https://gist.github.com/andreas-eberle/4cfcbd1b26b4b65dcd5f2db709d21225 In the generated tensorflow.java, I saw that in line 10312 the class OpDefBuilderReceiver is generated. How can I skip that whole class? EDIT: Never mind, found it. |
The current version compiles and sometimes works fine. However, sometimes it crashes after loading the model with one of the following errors (Android does not give me more than this):
I debugged into my dummy application and found that this does not happen during any call to the tensorflow/JavaCPP library. It seems to happen in the background. @saudet: Do you have an idea what could be causing this? |
169f0e6
to
768f0e4
Compare
Ah, I see. Fixed in the commit above. Thanks for reporting! |
…hich aren't working on Android.
Thx, that worked. I now wanted to replace the last skip in these lines with the However, in the generated sources for
Then, the Java compiler complains that the @platform annotation is used for a parameter. I guess that this annotation must never be added to parameters. @saudet: What do you think? |
Maybe these don't even get built for Linux or Mac OS X either you know. Have you confirmed that they do? Android is sensitive to undefined symbols, so errors show up, but there is no reason to have those interfaces around if they are not defined anywhere anyway. |
The classes OpDefBuilderReceiver and OpDefBuilderWrapper actually exist and are defined in But regardless of that, isn't the problem that the code generator adds the Anyways, what is your preferred way? Skipping these classes on all platforms or only on Android? |
So, why not simply exclude the |
@saudet: Thx for your question, it helped me to find out that I got some things mixed up.
Therefore, there is only one problem left: The sporadic, asynchronous segfault when I run my test application:
Is there a special EDIT: I found the deallocator thread of JavaCPP, which is actually causing the problem. I verified this by pausing that thread. Now the application does not Segfault any more. Do you have an idea why that thread is causing this? |
Well, like I said, you're probably doing something wrong in your code. Try to figure it out, you can set the "org.bytedeco.javacpp.logger.debug" system property to "true" to get more info in the log about what JavaCPP itself is doing. |
Thanks. The debugging output helped me to find the pointer that was cleaned up to early. The solution is that you have to keep a reference to the SessionOptions object. This is because tensorflow's session internally keeps a reference to the SessionOptions. But if you do not keep a reference in Java, JavaCPP does deallocate the SessionOptions object which then leads to a SegFault when running the Session the next time. Would it be an option for you to keep the SessionOption object as a field in the AbstractSession class in the |
Conflicts: tensorflow/src/main/java/org/bytedeco/javacpp/presets/tensorflow.java tensorflow/src/main/java/org/bytedeco/javacpp/tensorflow.java
…n` to prevent premature deallocation (pull #297)
Yes, that would be the way to go. I made this change in the commit above. Thanks for looking into this! |
Thx, I merged in your commit. Is there anything left from your side to merge this PR? And if you merge this PR, when will you release the 1.2.5 version? |
Let's see, could you remove the tabs and put back the spaces instead? |
Done. |
Thanks! It doesn't build with Bazel 0.3.0. I'm guessing you're using 0.3.1 or 0.3.2? These don't work with TensorFlow 0.10.0 and CUDA... Anyway, looks like TensorFlow 0.11.0 will get released before we all get this working :) I'm fine to merge this, but it won't get released. |
I just switched back to Bazel 0.3.0 (I had 0.3.1) and tested the build. What build doesn't work for you? For me Android and MacOsX worked with Bazel 0.3.0. |
Hum, ok doesn't work on Linux then. Complains about gcc not found...
|
So are you building on a Linux or building for Linux? What is your used system and the target platform? Are you compiling with CUDA or without? |
It just looks like we need to set the following environment variables under Fedora, even when building for Android: export CC="/usr/bin/gcc"
export CXX="/usr/bin/g++" Then it builds fine, for some reason, you know Bazel, whatever. |
Cool, thanks! |
This PR is not complete yet. However, as I need some support I open this for the discussion. My problems are the following:
How can I fix these problems?