[WIP] Add core-api modules using JavaCPP #2

saudet · 2019-09-06T01:50:52Z

See discussion at karllessard/tensorflow-java#1 (comment)

karllessard

Hey thanks @saudet , overall looks good. There is a lot of things to look at so I just reviewed it quickly for a first pass and we'll see how it goes then.

karllessard · 2019-09-06T16:43:38Z

core/core-api/build.sh

+echo "48ddba718da76df56fd4c48b4bbf4f97f254ba269ec4be67f783684c75563ef8 tensorflow-$TENSORFLOW_VERSION.tar.gz" | sha256sum -c -
+
+echo "Decompressing archives"
+tar --totals -xzf tensorflow-$TENSORFLOW_VERSION.tar.gz


In the previous implementation, I've added Bazel WORKSPACE and BUILD files to configure the native build. TensorFlow Serving also use a similar approach. It has a few advantages, for example it does all the archive caching, fetching and sum check for you.

I would be curious to see if we can replace that whole bash script with a Bazel workspace and using Maven to move files around after building.

We can always hack something with Bazel I'm sure, but wasn't one of the goals to not use Bazel?

The goal is to Maven instead of Bazel where we can, so I think it’s totally ok to continue to build the native library using Bazel.

We don't have any choice for TF itself, but we don't need to add files like WORKSPACE and BUILD in this repository. Forcing users to build with Bazel prevents them from using preexisting binaries, if that's what they want to do. Other than having something a bit better than cURL, what other advantages do you see?

Forcing users to build with Bazel prevents them from using preexisting binaries

Who do you mean by “users”? Because only developers building from scratch the tensorflow-java repo will need bazel, anyone using binaries from Maven artifacts won’t.

Having a WORKSPACE and BUILD is only to help us configure the build of those binaries, on our side, instead of just invoking the bazel command from a shell script like you do now. Do you see something else?

Note that the bazel build, workspace or not, can still be controlled by Maven workflow, like here.

Can we get the hash from some well known URL rather than having it pasted in the build script? For similar reasons can we have the tensorflow version being built passed into this script via Maven (assuming that's the top level build system)? Otherwise we have multiple places that need to have the version number modified every time it changes.

@karllessard There are going to be users that are not satisfied with whatever we are going to publish to the central. I know that from experience mainly with OpenCV and FFmpeg. TF has so many options and so many APIs that can be enabled/disabled, and that's not even counting all the open forks, plus all proprietary extensions that companies are not releasing as open source. What I'm saying is that having a Bash script as entry point where we can bazel and/or curl whatever is much more inviting than some Bazel script that doesn't even support precompiled binaries to start with! How do you plan on serving those users with Bazel?

@Craigacp I don't know where GitHub offers hash for their downloads. Do you know? As for versions and what not, yes, we can make the build script dependent on Maven. Those are details at this point though.

karllessard · 2019-09-06T16:49:20Z

core/core-api/build.sh

+# Gather Java source files from everywhere
+mkdir -p ../java
+cp -r tensorflow/java/src/gen/java/* ../java
+cp -r tensorflow/java/src/main/java/* ../java


I guess the plan is to move this source code directly to the core-apifolder or should we keep them separated?

Yes, that's just temporary, but if we're going to use Bazel anyway, is the plan to move the generator and everything into this repository?

Yes, we want to build with Bazel only the code that this SIG don’t own/maintain. So the annotation processor should be own our side and we should use Maven to run it.

Ok, but that's not everything. The plan is to continue maintaining the ops wrappers upstream until we have a Java equivalent in this repository, correct? In that case, we wouldn't need to use Bazel in this repository for that either.

Yep, that’s the plan! Don’t know when that’ll be ready though.

We can rush it if needed but might be safer to wait where Google is heading with ops registration in modular TF, see this RFC.

karllessard · 2019-09-06T16:49:51Z

core/core-api/build.sh

+# cp -r tensorflow/contrib/android/java/* ../java
+# cp -r tensorflow/lite/java/src/main/java/* ../java
+cp -r bazel-genfiles/tensorflow/java/ops/src/main/java/* ../java
+cp -r bazel-genfiles/tensorflow/java/_javac/tensorflow/libtensorflow_sourcegenfiles/* ../java


If you use the annotation-processor agent, you won't need to copy those files anymore (under .../_javac/...)

karllessard · 2019-09-06T16:51:56Z

core/core-api/pom.xml

+      <artifactId>javapoet</artifactId>
+      <version>1.11.1</version>
+      <optional>true</optional>
+    </dependency>


guava and javapoet are only required by the annotation processor, so if you use the artifact as discussed before, you can safely remove those from that pom.

karllessard · 2019-09-06T16:53:10Z

core/core-api/pom.xml

+            <compilerOption>${tensorflow.path}/tensorflow/java/src/main/native/session_jni.cc</compilerOption>
+            <compilerOption>${tensorflow.path}/tensorflow/java/src/main/native/tensorflow_jni.cc</compilerOption>
+            <compilerOption>${tensorflow.path}/tensorflow/java/src/main/native/tensor_jni.cc</compilerOption>
+            <compilerOption>${tensorflow.path}/tensorflow/java/src/main/native/utils_jni.cc</compilerOption>


For my knowledge, what are those "compileOption" for? I guess it's temporary because all those JNI files should go away, right?

It's actually just compiler arguments to compile these files along with the JNI code generated by JavaCPP. So it sounds like you want to use Bazel to build the JNI library as well? This means we'll need to create a Bazel plugin for JavaCPP and use it inside the Bazel build instead of the Maven lifecyle. I'm not sure that's something that's worth doing though. Java projects use mainly Maven, Gradle, and such. Almost none of them use Bazel, so in my opinion it makes more sense to maintain plugins for those and not Bazel. What do you think?

I’m a bit confused, I thought that by using JavaCPP there won’t be a JNI library to maintain and build anymore, is that right? If the Java code can invoke directly the C API in Java, then all the logic actually found in the JNI library will be migrated to Java. At least, that how I understood it at first.

Update: oh, I think you mean using Bazel to build the JNI code generated by JavaCPP? If so, I guess you can still generate the JNI code in the ˋgenerate-sources` Maven phase and then we build it with Bazel with the rest of the TF library using one of our custom rule in our new BUILD file? This way, all native code is built in the same way, using the same tools.

Yes, we might be able to do that for the C API, but if someone ever wants to map some generated headers like the C++ ops ones, for some reason, then we can't do it. You're again preventing users from doing what they want. Besides, what's so great about creating JNI libraries with Bazel? The only thing it offers for JNI is this:
https://github.com/bazelbuild/bazel/blob/master/src/main/tools/jdk.BUILD
JavaCPP already does more than that! Sure, we could enhance Bazel probably with the WORKSPACE and BUILD files, but JavaCPP already does everything we need for JNI without any additional configuration.

I just thought it would be simpler to build the JNI library using the same tools that we use for building the core libs.

But you know better, if you see advantages to use something different for JNI, it’s fine.

If Bazel did offer anything of value for JNI, I might say differently, but it doesn't. The thing with JavaCPP is that it works with any build system (Autoconf, Bazel, CMake, MSBuild, etc) so whatever gets done for all those libraries using those systems (OpenCV, FFmpeg, etc) gets implemented in JavaCPP, and doesn't get implemented in Bazel. That's the essence of reusability. Like I said, if you feel strongly enough about integrating JavaCPP in Bazel, it's not that hard, but someone would need to do it. On the other hand, if we keep doing it with JavaCPP, then whatever gets done as part of this project will inversely also be reusable for other build systems, which are still more popular than Bazel.

All right then, build script it is!

This thread wasn't about the build script, it was about building the JNI library with JavaCPP, which I think is important. The build script isn't that important though. We can just call Bazel for now and if/when people start hacking it some other way, we can easily reinsert Bash in there... I think doing that will simplify the initial steps here.

karllessard · 2019-09-06T16:58:43Z

core/core-api/src/main/java/org/tensorflow/c_api/AbstractTF_Buffer.java

+public abstract class AbstractTF_Buffer extends Pointer {
+    protected static class DeleteDeallocator extends TF_Buffer implements Pointer.Deallocator {
+        DeleteDeallocator(TF_Buffer s) { super(s); }
+        @Override public void deallocate() { if (!isNull()) TF_DeleteBuffer(this); setNull(); }


Just dropping an idea here, I don't know what's the login behind the Pointer.Deallocator but in eager execution mode, I'm running a thread in background that listens to objects discarded by the garbage collector and automatically release their native resources (therefore this thread could call TF_DeleteBuffer).

I'm wondering if we should apply this mechanism to all TF resources, such as Graph and Session...

Yes, that's what that does as well, although I've found that GC isn't a good way to manage resources. We need to do it the C++ way:
http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/

That’s a very important topic, I’ll take a look. I also know that @EronWright has something to propose to discard resources using reference counters. I guess we should start a dedicated issue for this.

Briefly, @saudet I'm familiar with Netty reference counting and figured it might be useful in the TF Java API, as an alternative to try-with-resources. Will follow-up.

I have created a specific issue related to memory management improvements, I invite you to drop your ideas in this place to continue the discussion on this very specific/interesting topic.

Oh, reference counting. If it makes sense, I'd like to add it to JavaCPP. Let me take look at that...

karllessard · 2019-09-06T17:06:00Z

core/core-api/src/main/java/org/tensorflow/c_api/AbstractTF_Tensor.java

+    public void delete() {
+        deallocate();
+    }
+}


When you look at the actual JNI code, there is a lot more happening for allocating tensors, like here. But I guess that just not implemented yet in your PR?

I'm not sure what you mean. That's just going to call TF_DeleteBuffer().

If you're talking about, for example, StringTensorWriter, the idea would be to port all that code to Java, eventually. It's much easier to write and maintain Java code than C/C++ code...

Yeah so that comment answered my previous question: there shouldn’t be any JNI code to maintain and build on our side.

Yes, that's the whole point. JavaCPP abstracts JNI away completely, but it still uses it.

karllessard · 2019-09-06T17:09:03Z

core/pom.xml

+        <javacpp.platform.macosx-x86_64>${os.name}-${os.arch}${javacpp.platform.extension}</javacpp.platform.macosx-x86_64>
+        <javacpp.platform.windows-x86>${os.name}-${os.arch}${javacpp.platform.extension}</javacpp.platform.windows-x86>
+        <javacpp.platform.windows-x86_64>${os.name}-${os.arch}${javacpp.platform.extension}</javacpp.platform.windows-x86_64>
+      </properties>


I'm a bit lost in all those properties. When building using this profile, for example, it means that it will build the native-library for all the platforms that has a non-empty value or only for java.platform?

Those are just properties that mainly get used in core-api-platform. They will have the values that we set there unless they are overwritten by the profiles.

javacpp.platform.host is a special one that sets them all to the detected host platform, to prevent any transitive dependencies on other platforms. It doesn't do anything at build time.

karllessard · 2019-09-06T17:12:00Z

core/core-api/build.sh

+
+# Run the build for both the C and Java APIs
+bash configure
+bazel build --config opt //tensorflow:tensorflow //tensorflow/java:tensorflow


again using a dedicated Bazel BUILD for our need, we can configure which targets we really need to build for Java instead of building the whole //tensorflow:tensorflowlibrary, which might contain more code than we need (to be validated though) and increase unnecessarily the size of the binaries.

Also, after we move the Java code to our repo, if only the generated ops source jar remains of interest here, you can simply build this target: //tensorflow/java:java_op_gen_sources(see here)

For reference, here those I was building in the JNI implementation

The //tensorflow:tensorflow target does nothing more than what is needed for the C API:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/BUILD#L578-L609
I'm not sure what you're talking about here?

Ok I didn’t look by myself but yeah, looks pretty much only what we need. But anyway, if we use a Bazel BUILD file to configure our native build, we’ll be able to tweak it if needed (if for example TF adds a new dependency to a huge module that has no use/binding in Java).

We can always patch stuff for that, but I don't think it's a good idea to depend on Bazel for the reasons I mention above, unless there is something we really wouldn't be able to do any other way. It could be an option or something though.

Craigacp · 2019-09-09T19:35:02Z

Why is the generated code checked in? Isn't this rebuilt on demand by JavaCPP when the build is executed?

Craigacp

It looks like this is targeting Tensorflow 2 rather than the 1.x releases we've been working on. Should we try building it on 1.15 so the Java API on top doesn't have to change too much. Once it's migrated to JavaCPP then we could move the whole API over to something that behaves more like 2.0.

Craigacp · 2019-09-09T19:51:14Z

core/core-api/build.sh

+echo "48ddba718da76df56fd4c48b4bbf4f97f254ba269ec4be67f783684c75563ef8 tensorflow-$TENSORFLOW_VERSION.tar.gz" | sha256sum -c -
+
+echo "Decompressing archives"
+tar --totals -xzf tensorflow-$TENSORFLOW_VERSION.tar.gz


Can we get the hash from some well known URL rather than having it pasted in the build script? For similar reasons can we have the tensorflow version being built passed into this script via Maven (assuming that's the top level build system)? Otherwise we have multiple places that need to have the version number modified every time it changes.

saudet · 2019-09-14T09:52:54Z

Why is the generated code checked in? Isn't this rebuilt on demand by JavaCPP when the build is executed?

To have it available for search engines and for reference. It's rarely possible to transfer mechanically all comments from C/C++ header files to 100% correct comments for Javadoc. Ideally what I would like is something like https://www.javadoc.io/ that works well for the sources artifacts too, but I haven't found any such services yet. Do you know of any? I was thinking we could also add the generated source code for the ops too instead of having a JAR file checked in. Or we could decide to not check in any generated code at all, but I always find myself looking at generated code for reference, so I'm sure many other users do the same. What you think @karllessard ?

saudet · 2019-09-14T09:55:18Z

It looks like this is targeting Tensorflow 2 rather than the 1.x releases we've been working on. Should we try building it on 1.15 so the Java API on top doesn't have to change too much. Once it's migrated to JavaCPP then we could move the whole API over to something that behaves more like 2.0.

I thought the plan was to start with 2.0 and leave the old APIs upstream with 1.15. @karllessard?

karllessard · 2019-09-14T21:23:43Z

Or we could decide to not check in any generated code at all, but I always find myself looking at generated code for reference, so I'm sure many other users do the same. What you think @karllessard ?

Yeah, it's never clear what to do with the generated code, both options have their pros and cons. I think it depends on how often we need to generate this code: if we only want to generate it once in a while, for example when we upgrade our version of the TF libraries, then I guess it's ok to check it in. I think it's the case for the JavaCPP bindings and the operation wrappers.

If on the other hand, if we'd benefit to generate the classes at each compilation, then might be better not. Like the Ops class, which collects all @Operator classes in the source path, including eventually user custom ones.

My 2 cents.

I thought the plan was to start with 2.0 and leave the old APIs upstream with 1.15. @karllessard?

That's what I had in mind, yes, but @tzolov also think it might bring confusions with expected feature that should come with a "2.0" TF release. I guess another "Google Forms" vote is required for this :)

Craigacp · 2019-09-16T21:02:56Z

@saudet WRT generated code I usually look at the decompiled thing that my IDE generates out of a class file if I need to inspect something. Sometimes I'll look at the source itself but less frequently.

I think it's important to write down somewhere all the validation that the current JNI code is doing, so we know what we need to add back into the Java code that calls into the generated wrapper code. We can do this after the fact through inspection, but do we think there are enough unit tests to catch all the code we're removing here?

saudet · 2019-09-17T02:07:54Z

Yeah, it's never clear what to do with the generated code, both options have their pros and cons. I think it depends on how often we need to generate this code: if we only want to generate it once in a while, for example when we upgrade our version of the TF libraries, then I guess it's ok to check it in. I think it's the case for the JavaCPP bindings and the operation wrappers.

If on the other hand, if we'd benefit to generate the classes at each compilation, then might be better not. Like the Ops class, which collects all @Operator classes in the source path, including eventually user custom ones.

It's kind of both. It doesn't change often, but it gets regenerated on each build to make sure it doesn't change, or to make sure we commit the changes :) I'm using src/gen for that with JavaCPP, and we could copy all the code generated on build with Bazel there too. Sounds good?

I thought the plan was to start with 2.0 and leave the old APIs upstream with 1.15. @karllessard?

That's what I had in mind, yes, but @tzolov also think it might bring confusions with expected feature that should come with a "2.0" TF release. I guess another "Google Forms" vote is required for this :)

👍

@saudet WRT generated code I usually look at the decompiled thing that my IDE generates out of a class file if I need to inspect something. Sometimes I'll look at the source itself but less frequently.

That doesn't give us the comments though. That's why Maven typically attaches by default a sources artifact to each main artifact, and it's something IDEs can download automatically for us, but there doesn't appear to be a way to reference it easily outside of an IDE...

I think it's important to write down somewhere all the validation that the current JNI code is doing, so we know what we need to add back into the Java code that calls into the generated wrapper code. We can do this after the fact through inspection, but do we think there are enough unit tests to catch all the code we're removing here?

Hum, I'm assuming there are unit tests for this already @sjamesr? If we port everything to Java including the unit tests, that should be enough I thought. In any case, this is going to happen in a second phase, so we can think about that some more a bit later.

karllessard · 2019-09-17T19:01:11Z

I'm using src/gen for that with JavaCPP, and we could copy all the code generated on build with Bazel there too. Sounds good?

@saudet yeah, let’s give it a try like that and see how it looks like

In any case, this is going to happen in a second phase, so we can think about that some more a bit later.

Just to remember, the first phase is just to build and deploy the libraries with JavaCPP while keeping the original JNI code active? Will we use a few classes generated by JavaCPP as well?

saudet · 2019-09-18T09:35:42Z

Just to remember, the first phase is just to build and deploy the libraries with JavaCPP while keeping the original JNI code active? Will we use a few classes generated by JavaCPP as well?

Yes, that's how I see it so we can try stuff and migrate gradually if it looks like it's working well.

So, to summarize the changes to be done:

Remove redundant classes and dependencies already moved to other modules, and
We could also remove build.sh for now until it becomes something users ask, and
Depend on Bazel to build a library exposing the C API of TensorFlow + to generate Java code, but
Use JavaCPP to build the JNI library from generated code and manually written legacy code, and
Put all generated Java code either in src/gen to allow users to reference it more easily or target/generated-sources if we don't want to check it in the source code repository.
- BTW, how can we do this with Bazel without Bash? Even Google gave up and just tells users to run a Bash script for the Python build: https://www.tensorflow.org/install/source Should we keep something like build.sh around for Java as well?

karllessard · 2019-09-19T13:40:11Z

@saudet I agree with your plan. BTW, can you please now fork the official repository (https://github.com/tensorflow/java) and rebase your PR on this one? I just pushed the JNI-based core in there so we have at least something to show and that compiles.

I would like to move all our current discussions to the new repository as well but unfortunately, GitHub does not allow cross-organization issue transfers.

karllessard · 2019-09-20T02:55:43Z

BTW, how can we do this with Bazel without Bash?

Oh, forgot to reply to this one. Can't we use Maven? Like with the copy resource plugin?

saudet · 2019-09-23T15:27:23Z

@saudet I agree with your plan. BTW, can you please now fork the official repository (https://github.com/tensorflow/java) and rebase your PR on this one? I just pushed the JNI-based core in there so we have at least something to show and that compiles.

I would like to move all our current discussions to the new repository as well but unfortunately, GitHub does not allow cross-organization issue transfers.

Do I need to create a fork? I mean, would I need to fill some paperwork to become a collaborator there?

Oh, forgot to reply to this one. Can't we use Maven? Like with the copy resource plugin?

We can go to great lengths to not call Bash if that's what you mean, yes. Is the idea to avoid Bash like the plague? :) Bazel itself needs Bash, so it's not like we'd gain anything here. Anyway, tell you what, I'll let you modify the Bazel build files and the pom.xml file in order to get rid of the script that I'll leave there "temporarily". Sounds good?

BTW, what about the names of the modules? Is everyone OK with core-api and core-api-platform or should I call them something else?

karllessard · 2019-09-24T01:39:34Z

Do I need to create a fork? I mean, would I need to fill some paperwork to become a collaborator there?

Yup, I discussed with @tzolov and @sjamesr and instead of managing a list of collaborators with their appropriate access rights, we'll all merge our changes by PRs from our personal accounts, which is pretty common with open-source projects hosted on GitHub. I don't foresee any problems that might prevent you to accomplish your work here, do you?

I'll let you modify the Bazel build files and the pom.xml file in order to get rid of the script that I'll leave there "temporarily". Sounds good?

I sincerely don't mind, I proposed Maven to Bash because 1) that's what Java developers like to work with and 2) it is interesting to attach a given job to a specific phase (e.g. generate-sources for building the op wrappers). But yes, if I find time, I can do those changes "later" and let you review it.

Is everyone OK with core-api and core-api-platform or should I call them something else?

I'm OK with those names.

karllessard · 2019-10-13T13:06:48Z

Merged in the official repo

Add core-api modules using JavaCPP

d26fee9

saudet requested a review from karllessard September 6, 2019 06:31

karllessard reviewed Sep 6, 2019

View reviewed changes

Craigacp reviewed Sep 9, 2019

View reviewed changes

saudet mentioned this pull request Sep 26, 2019

Refactor modules and integrate JavaCPP to map the C API tensorflow/java#1

Merged

karllessard closed this Oct 13, 2019

karllessard deleted the javacpp/core-api branch October 13, 2019 13:06

saudet mentioned this pull request Feb 5, 2020

Java maven snapshots and releases microsoft/onnxruntime#2675

Closed

[WIP] Add core-api modules using JavaCPP #2

[WIP] Add core-api modules using JavaCPP #2

Conversation

saudet commented Sep 6, 2019

karllessard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saudet Sep 9, 2019 • edited Loading

Choose a reason for hiding this comment

karllessard Sep 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saudet Sep 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karllessard Sep 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karllessard Sep 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saudet Sep 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Craigacp commented Sep 9, 2019

Craigacp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saudet commented Sep 14, 2019

saudet commented Sep 14, 2019

karllessard commented Sep 14, 2019 • edited Loading

Craigacp commented Sep 16, 2019

saudet commented Sep 17, 2019

karllessard commented Sep 17, 2019

saudet commented Sep 18, 2019

karllessard commented Sep 19, 2019

karllessard commented Sep 20, 2019

saudet commented Sep 23, 2019

karllessard commented Sep 24, 2019

karllessard commented Oct 13, 2019

saudet Sep 9, 2019 •

edited

Loading

karllessard Sep 9, 2019 •

edited

Loading

saudet Sep 14, 2019 •

edited

Loading

karllessard Sep 7, 2019 •

edited

Loading

karllessard Sep 7, 2019 •

edited

Loading

saudet Sep 17, 2019 •

edited

Loading

karllessard commented Sep 14, 2019 •

edited

Loading