-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add core-api modules using JavaCPP #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey thanks @saudet , overall looks good. There is a lot of things to look at so I just reviewed it quickly for a first pass and we'll see how it goes then.
echo "48ddba718da76df56fd4c48b4bbf4f97f254ba269ec4be67f783684c75563ef8 tensorflow-$TENSORFLOW_VERSION.tar.gz" | sha256sum -c - | ||
|
||
echo "Decompressing archives" | ||
tar --totals -xzf tensorflow-$TENSORFLOW_VERSION.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous implementation, I've added Bazel WORKSPACE and BUILD files to configure the native build. TensorFlow Serving also use a similar approach. It has a few advantages, for example it does all the archive caching, fetching and sum check for you.
I would be curious to see if we can replace that whole bash script with a Bazel workspace and using Maven to move files around after building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always hack something with Bazel I'm sure, but wasn't one of the goals to not use Bazel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to Maven instead of Bazel where we can, so I think it’s totally ok to continue to build the native library using Bazel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have any choice for TF itself, but we don't need to add files like WORKSPACE and BUILD in this repository. Forcing users to build with Bazel prevents them from using preexisting binaries, if that's what they want to do. Other than having something a bit better than cURL, what other advantages do you see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forcing users to build with Bazel prevents them from using preexisting binaries
Who do you mean by “users”? Because only developers building from scratch the tensorflow-java
repo will need bazel, anyone using binaries from Maven artifacts won’t.
Having a WORKSPACE and BUILD is only to help us configure the build of those binaries, on our side, instead of just invoking the bazel
command from a shell script like you do now. Do you see something else?
Note that the bazel
build, workspace or not, can still be controlled by Maven workflow, like here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get the hash from some well known URL rather than having it pasted in the build script? For similar reasons can we have the tensorflow version being built passed into this script via Maven (assuming that's the top level build system)? Otherwise we have multiple places that need to have the version number modified every time it changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karllessard There are going to be users that are not satisfied with whatever we are going to publish to the central. I know that from experience mainly with OpenCV and FFmpeg. TF has so many options and so many APIs that can be enabled/disabled, and that's not even counting all the open forks, plus all proprietary extensions that companies are not releasing as open source. What I'm saying is that having a Bash script as entry point where we can bazel
and/or curl
whatever is much more inviting than some Bazel script that doesn't even support precompiled binaries to start with! How do you plan on serving those users with Bazel?
@Craigacp I don't know where GitHub offers hash for their downloads. Do you know? As for versions and what not, yes, we can make the build script dependent on Maven. Those are details at this point though.
# Gather Java source files from everywhere | ||
mkdir -p ../java | ||
cp -r tensorflow/java/src/gen/java/* ../java | ||
cp -r tensorflow/java/src/main/java/* ../java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the plan is to move this source code directly to the core-api
folder or should we keep them separated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's just temporary, but if we're going to use Bazel anyway, is the plan to move the generator and everything into this repository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we want to build with Bazel only the code that this SIG don’t own/maintain. So the annotation processor should be own our side and we should use Maven to run it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but that's not everything. The plan is to continue maintaining the ops wrappers upstream until we have a Java equivalent in this repository, correct? In that case, we wouldn't need to use Bazel in this repository for that either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that’s the plan! Don’t know when that’ll be ready though.
We can rush it if needed but might be safer to wait where Google is heading with ops registration in modular TF, see this RFC.
# cp -r tensorflow/contrib/android/java/* ../java | ||
# cp -r tensorflow/lite/java/src/main/java/* ../java | ||
cp -r bazel-genfiles/tensorflow/java/ops/src/main/java/* ../java | ||
cp -r bazel-genfiles/tensorflow/java/_javac/tensorflow/libtensorflow_sourcegenfiles/* ../java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use the annotation-processor
agent, you won't need to copy those files anymore (under .../_javac/...)
<artifactId>javapoet</artifactId> | ||
<version>1.11.1</version> | ||
<optional>true</optional> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guava and javapoet are only required by the annotation processor, so if you use the artifact as discussed before, you can safely remove those from that pom.
<compilerOption>${tensorflow.path}/tensorflow/java/src/main/native/session_jni.cc</compilerOption> | ||
<compilerOption>${tensorflow.path}/tensorflow/java/src/main/native/tensorflow_jni.cc</compilerOption> | ||
<compilerOption>${tensorflow.path}/tensorflow/java/src/main/native/tensor_jni.cc</compilerOption> | ||
<compilerOption>${tensorflow.path}/tensorflow/java/src/main/native/utils_jni.cc</compilerOption> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my knowledge, what are those "compileOption" for? I guess it's temporary because all those JNI files should go away, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually just compiler arguments to compile these files along with the JNI code generated by JavaCPP. So it sounds like you want to use Bazel to build the JNI library as well? This means we'll need to create a Bazel plugin for JavaCPP and use it inside the Bazel build instead of the Maven lifecyle. I'm not sure that's something that's worth doing though. Java projects use mainly Maven, Gradle, and such. Almost none of them use Bazel, so in my opinion it makes more sense to maintain plugins for those and not Bazel. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m a bit confused, I thought that by using JavaCPP there won’t be a JNI library to maintain and build anymore, is that right? If the Java code can invoke directly the C API in Java, then all the logic actually found in the JNI library will be migrated to Java. At least, that how I understood it at first.
Update: oh, I think you mean using Bazel to build the JNI code generated by JavaCPP? If so, I guess you can still generate the JNI code in the ˋgenerate-sources` Maven phase and then we build it with Bazel with the rest of the TF library using one of our custom rule in our new BUILD file? This way, all native code is built in the same way, using the same tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we might be able to do that for the C API, but if someone ever wants to map some generated headers like the C++ ops ones, for some reason, then we can't do it. You're again preventing users from doing what they want. Besides, what's so great about creating JNI libraries with Bazel? The only thing it offers for JNI is this:
https://github.com/bazelbuild/bazel/blob/master/src/main/tools/jdk.BUILD
JavaCPP already does more than that! Sure, we could enhance Bazel probably with the WORKSPACE and BUILD files, but JavaCPP already does everything we need for JNI without any additional configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought it would be simpler to build the JNI library using the same tools that we use for building the core libs.
But you know better, if you see advantages to use something different for JNI, it’s fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Bazel did offer anything of value for JNI, I might say differently, but it doesn't. The thing with JavaCPP is that it works with any build system (Autoconf, Bazel, CMake, MSBuild, etc) so whatever gets done for all those libraries using those systems (OpenCV, FFmpeg, etc) gets implemented in JavaCPP, and doesn't get implemented in Bazel. That's the essence of reusability. Like I said, if you feel strongly enough about integrating JavaCPP in Bazel, it's not that hard, but someone would need to do it. On the other hand, if we keep doing it with JavaCPP, then whatever gets done as part of this project will inversely also be reusable for other build systems, which are still more popular than Bazel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right then, build script it is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thread wasn't about the build script, it was about building the JNI library with JavaCPP, which I think is important. The build script isn't that important though. We can just call Bazel for now and if/when people start hacking it some other way, we can easily reinsert Bash in there... I think doing that will simplify the initial steps here.
public abstract class AbstractTF_Buffer extends Pointer { | ||
protected static class DeleteDeallocator extends TF_Buffer implements Pointer.Deallocator { | ||
DeleteDeallocator(TF_Buffer s) { super(s); } | ||
@Override public void deallocate() { if (!isNull()) TF_DeleteBuffer(this); setNull(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just dropping an idea here, I don't know what's the login behind the Pointer.Deallocator
but in eager execution mode, I'm running a thread in background that listens to objects discarded by the garbage collector and automatically release their native resources (therefore this thread could call TF_DeleteBuffer
).
I'm wondering if we should apply this mechanism to all TF resources, such as Graph
and Session
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what that does as well, although I've found that GC isn't a good way to manage resources. We need to do it the C++ way:
http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s a very important topic, I’ll take a look. I also know that @EronWright has something to propose to discard resources using reference counters. I guess we should start a dedicated issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Briefly, @saudet I'm familiar with Netty reference counting and figured it might be useful in the TF Java API, as an alternative to try-with-resources. Will follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a specific issue related to memory management improvements, I invite you to drop your ideas in this place to continue the discussion on this very specific/interesting topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, reference counting. If it makes sense, I'd like to add it to JavaCPP. Let me take look at that...
public void delete() { | ||
deallocate(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you look at the actual JNI code, there is a lot more happening for allocating tensors, like here. But I guess that just not implemented yet in your PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. That's just going to call TF_DeleteBuffer()
.
If you're talking about, for example, StringTensorWriter
, the idea would be to port all that code to Java, eventually. It's much easier to write and maintain Java code than C/C++ code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so that comment answered my previous question: there shouldn’t be any JNI code to maintain and build on our side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the whole point. JavaCPP abstracts JNI away completely, but it still uses it.
<javacpp.platform.macosx-x86_64>${os.name}-${os.arch}${javacpp.platform.extension}</javacpp.platform.macosx-x86_64> | ||
<javacpp.platform.windows-x86>${os.name}-${os.arch}${javacpp.platform.extension}</javacpp.platform.windows-x86> | ||
<javacpp.platform.windows-x86_64>${os.name}-${os.arch}${javacpp.platform.extension}</javacpp.platform.windows-x86_64> | ||
</properties> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit lost in all those properties. When building using this profile, for example, it means that it will build the native-library for all the platforms that has a non-empty value or only for java.platform
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are just properties that mainly get used in core-api-platform
. They will have the values that we set there unless they are overwritten by the profiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javacpp.platform.host
is a special one that sets them all to the detected host platform, to prevent any transitive dependencies on other platforms. It doesn't do anything at build time.
|
||
# Run the build for both the C and Java APIs | ||
bash configure | ||
bazel build --config opt //tensorflow:tensorflow //tensorflow/java:tensorflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again using a dedicated Bazel BUILD for our need, we can configure which targets we really need to build for Java instead of building the whole //tensorflow:tensorflow
library, which might contain more code than we need (to be validated though) and increase unnecessarily the size of the binaries.
Also, after we move the Java code to our repo, if only the generated ops source jar remains of interest here, you can simply build this target: //tensorflow/java:java_op_gen_sources
(see here)
For reference, here those I was building in the JNI implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The //tensorflow:tensorflow
target does nothing more than what is needed for the C API:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/BUILD#L578-L609
I'm not sure what you're talking about here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I didn’t look by myself but yeah, looks pretty much only what we need. But anyway, if we use a Bazel BUILD file to configure our native build, we’ll be able to tweak it if needed (if for example TF adds a new dependency to a huge module that has no use/binding in Java).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always patch stuff for that, but I don't think it's a good idea to depend on Bazel for the reasons I mention above, unless there is something we really wouldn't be able to do any other way. It could be an option or something though.
Why is the generated code checked in? Isn't this rebuilt on demand by JavaCPP when the build is executed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is targeting Tensorflow 2 rather than the 1.x releases we've been working on. Should we try building it on 1.15 so the Java API on top doesn't have to change too much. Once it's migrated to JavaCPP then we could move the whole API over to something that behaves more like 2.0.
echo "48ddba718da76df56fd4c48b4bbf4f97f254ba269ec4be67f783684c75563ef8 tensorflow-$TENSORFLOW_VERSION.tar.gz" | sha256sum -c - | ||
|
||
echo "Decompressing archives" | ||
tar --totals -xzf tensorflow-$TENSORFLOW_VERSION.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get the hash from some well known URL rather than having it pasted in the build script? For similar reasons can we have the tensorflow version being built passed into this script via Maven (assuming that's the top level build system)? Otherwise we have multiple places that need to have the version number modified every time it changes.
To have it available for search engines and for reference. It's rarely possible to transfer mechanically all comments from C/C++ header files to 100% correct comments for Javadoc. Ideally what I would like is something like https://www.javadoc.io/ that works well for the |
I thought the plan was to start with 2.0 and leave the old APIs upstream with 1.15. @karllessard? |
Yeah, it's never clear what to do with the generated code, both options have their pros and cons. I think it depends on how often we need to generate this code: if we only want to generate it once in a while, for example when we upgrade our version of the TF libraries, then I guess it's ok to check it in. I think it's the case for the JavaCPP bindings and the operation wrappers. If on the other hand, if we'd benefit to generate the classes at each compilation, then might be better not. Like the My 2 cents.
That's what I had in mind, yes, but @tzolov also think it might bring confusions with expected feature that should come with a "2.0" TF release. I guess another "Google Forms" vote is required for this :) |
@saudet WRT generated code I usually look at the decompiled thing that my IDE generates out of a class file if I need to inspect something. Sometimes I'll look at the source itself but less frequently. I think it's important to write down somewhere all the validation that the current JNI code is doing, so we know what we need to add back into the Java code that calls into the generated wrapper code. We can do this after the fact through inspection, but do we think there are enough unit tests to catch all the code we're removing here? |
It's kind of both. It doesn't change often, but it gets regenerated on each build to make sure it doesn't change, or to make sure we commit the changes :) I'm using
👍
That doesn't give us the comments though. That's why Maven typically attaches by default a
Hum, I'm assuming there are unit tests for this already @sjamesr? If we port everything to Java including the unit tests, that should be enough I thought. In any case, this is going to happen in a second phase, so we can think about that some more a bit later. |
@saudet yeah, let’s give it a try like that and see how it looks like
Just to remember, the first phase is just to build and deploy the libraries with JavaCPP while keeping the original JNI code active? Will we use a few classes generated by JavaCPP as well? |
Yes, that's how I see it so we can try stuff and migrate gradually if it looks like it's working well. So, to summarize the changes to be done:
|
@saudet I agree with your plan. BTW, can you please now fork the official repository (https://github.com/tensorflow/java) and rebase your PR on this one? I just pushed the JNI-based I would like to move all our current discussions to the new repository as well but unfortunately, GitHub does not allow cross-organization issue transfers. |
Oh, forgot to reply to this one. Can't we use Maven? Like with the copy resource plugin? |
Do I need to create a fork? I mean, would I need to fill some paperwork to become a collaborator there?
We can go to great lengths to not call Bash if that's what you mean, yes. Is the idea to avoid Bash like the plague? :) Bazel itself needs Bash, so it's not like we'd gain anything here. Anyway, tell you what, I'll let you modify the Bazel build files and the pom.xml file in order to get rid of the script that I'll leave there "temporarily". Sounds good? BTW, what about the names of the modules? Is everyone OK with |
Yup, I discussed with @tzolov and @sjamesr and instead of managing a list of collaborators with their appropriate access rights, we'll all merge our changes by PRs from our personal accounts, which is pretty common with open-source projects hosted on GitHub. I don't foresee any problems that might prevent you to accomplish your work here, do you?
I sincerely don't mind, I proposed Maven to Bash because 1) that's what Java developers like to work with and 2) it is interesting to attach a given job to a specific phase (e.g.
I'm OK with those names. |
Merged in the official repo |
See discussion at karllessard/tensorflow-java#1 (comment)