-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor JNI code in C++ into Java code with JavaCPP #18
Conversation
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java
Outdated
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java
Outdated
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java
Outdated
Show resolved
Hide resolved
throw new IndexOutOfBoundsException("MetaGraphDef is too large to serialize into a byte[] array"); | ||
} else { | ||
byte[] jmetagraph_def = new byte[(int)metagraph_def.length()]; | ||
new BytePointer(metagraph_def.data()).get(jmetagraph_def); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we move this array copy stunt in AbstractTF_Buffer
as a utility method like toBytes()
, since TF_Buffer
is meant to be generic.
Alternatively, we could retain a reference to metaGraphDef
and return its data as a ByteBuffer
to the user instead of an array, as it is only used for deserializing the proto message... But I'm wondering why we just don't import those proto classes in the core API and return typed objects to the user like MetaGraphDef
instead of leaving him the burden to serialize/deserialize the proto messages.
What do you think @sjamesr ? Is the original reason of leaving the protos outside the Java client was only to avoid an extra dependency to grpc?
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java
Outdated
Show resolved
Hide resolved
...core/tensorflow-core-api/src/main/java/org/tensorflow/internal/c_api/AbstractTF_Session.java
Show resolved
Hide resolved
@saudet , this PR is still marked as WIP, looks like it's ready to be merged now? |
I was thinking we could do all of it together before merging? It shouldn't take me more than a few days to do... |
As you wish, we can do it iteratively with smaller PRs as well |
Let's talk about it at the meeting tomorrow :) |
dd6373b
to
55b7b5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @saudet , I just left a few questions and remarks here and there.
I didn't compared yet the old JNI code with the new Java implementations but I think we can also rely on the Junit coverage for this.
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/EagerOperation.java
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/EagerOperation.java
Outdated
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/EagerOperation.java
Show resolved
Hide resolved
int length = TFE_OpGetOutputLength(handle, name, status); | ||
status.throwExceptionIfNotOK(); | ||
return length; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of starting new scopes in all of these methods, could it be simpler and more efficient to just create the status in a try-with-resource
block when there are not other resource allocated?
try (TF_Status status = TF_Status.newStatus()) {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be more efficient, but it would also make it more error-prone when we start creating other objects in there that may start doing temporary allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I would still prefer we go with the most efficient approach but it's up to you if you want to make the changes or not, we can merge it like this too.
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Graph.java
Outdated
Show resolved
Hide resolved
outputTensorHandles[i] = outputValues.get(TF_Tensor.class, i); | ||
} | ||
|
||
return runMetadata != null ? runMetadata.get() : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: did you ever ran a benchmark showing that all these context switches between the JVM and the native code (i.e. at each JavaCPP generated method) ends up to be as performant as when only one call was made (run
in this case)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calls from the JVM to native code are very efficient, in the order of 30 ns. On the other hand, calls from native code to the JVM are typically very expensive, in the order of 300 ns, so it's almost certain that the new code here is going to be faster. I'll run a simple benchmark and post the results here just to confirm, but if you're worried about performance, we should think about writing a whole set of benchmarks to make sure there is never any regression in performance anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started to write some of them as well (like this one), I think with high-performance projects like TF we should have a set of benchmarks, not only it is useful to detect regression but also is a very good indicator of where we should spend time for optimization.
If you can add some, that would be great, but we can merge without if you want to create these later.
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java
Outdated
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java
Outdated
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/test/java/org/tensorflow/TensorTest.java
Outdated
Show resolved
Hide resolved
I ran the
So it does look like this refactoring also increases performance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it does look like this refactoring also increases performance!
Great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, I'm merging this. There is a few unresolved conversation left that are not mandatory to be closed right now and could be part of another PR if needs be.
This is a WIP to remove the manually written JNI code from tensorflow-core-api. In the first commit for SavedModelBundle.java, I replaced about 100 lines of JNI code in C++ with about 40 lines of code in Java, and the unit tests still pass. We need to do the same with the rest of the JNI code, which shouldn't take me that long to do by myself, but if anyone is interested in helping, please let me know and I will give you write access to my fork!
There is also a lot of code in the Java section of the wrappers that is redundant with JavaCPP and that we will be able to remove in a second phase.
/cc @tzolov