Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor JNI code in C++ into Java code with JavaCPP #18

Merged
merged 6 commits into from
Jan 29, 2020

Conversation

saudet
Copy link
Contributor

@saudet saudet commented Jan 20, 2020

This is a WIP to remove the manually written JNI code from tensorflow-core-api. In the first commit for SavedModelBundle.java, I replaced about 100 lines of JNI code in C++ with about 40 lines of code in Java, and the unit tests still pass. We need to do the same with the rest of the JNI code, which shouldn't take me that long to do by myself, but if anyone is interested in helping, please let me know and I will give you write access to my fork!

There is also a lot of code in the Java section of the wrappers that is redundant with JavaCPP and that we will be able to remove in a second phase.

/cc @tzolov

@saudet saudet requested a review from karllessard January 20, 2020 14:31
@saudet saudet changed the title Refactor saved_model_bundle_jni into SavedModelBundle with JavaCPP Refactor JNI code in C++ into Java code with JavaCPP Jan 20, 2020
throw new IndexOutOfBoundsException("MetaGraphDef is too large to serialize into a byte[] array");
} else {
byte[] jmetagraph_def = new byte[(int)metagraph_def.length()];
new BytePointer(metagraph_def.data()).get(jmetagraph_def);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we move this array copy stunt in AbstractTF_Buffer as a utility method like toBytes(), since TF_Buffer is meant to be generic.

Alternatively, we could retain a reference to metaGraphDef and return its data as a ByteBuffer to the user instead of an array, as it is only used for deserializing the proto message... But I'm wondering why we just don't import those proto classes in the core API and return typed objects to the user like MetaGraphDef instead of leaving him the burden to serialize/deserialize the proto messages.

What do you think @sjamesr ? Is the original reason of leaving the protos outside the Java client was only to avoid an extra dependency to grpc?

@karllessard
Copy link
Collaborator

@saudet , this PR is still marked as WIP, looks like it's ready to be merged now?

@saudet
Copy link
Contributor Author

saudet commented Jan 22, 2020

I was thinking we could do all of it together before merging? It shouldn't take me more than a few days to do...

@karllessard
Copy link
Collaborator

As you wish, we can do it iteratively with smaller PRs as well

@saudet
Copy link
Contributor Author

saudet commented Jan 23, 2020

Let's talk about it at the meeting tomorrow :)

@saudet saudet marked this pull request as ready for review January 27, 2020 22:27
Copy link
Collaborator

@karllessard karllessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @saudet , I just left a few questions and remarks here and there.

I didn't compared yet the old JNI code with the new Java implementations but I think we can also rely on the Junit coverage for this.

int length = TFE_OpGetOutputLength(handle, name, status);
status.throwExceptionIfNotOK();
return length;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of starting new scopes in all of these methods, could it be simpler and more efficient to just create the status in a try-with-resource block when there are not other resource allocated?

try (TF_Status status = TF_Status.newStatus()) { 
    ... 
}

Copy link
Contributor Author

@saudet saudet Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be more efficient, but it would also make it more error-prone when we start creating other objects in there that may start doing temporary allocations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I would still prefer we go with the most efficient approach but it's up to you if you want to make the changes or not, we can merge it like this too.

outputTensorHandles[i] = outputValues.get(TF_Tensor.class, i);
}

return runMetadata != null ? runMetadata.get() : null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: did you ever ran a benchmark showing that all these context switches between the JVM and the native code (i.e. at each JavaCPP generated method) ends up to be as performant as when only one call was made (run in this case)?

Copy link
Contributor Author

@saudet saudet Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calls from the JVM to native code are very efficient, in the order of 30 ns. On the other hand, calls from native code to the JVM are typically very expensive, in the order of 300 ns, so it's almost certain that the new code here is going to be faster. I'll run a simple benchmark and post the results here just to confirm, but if you're worried about performance, we should think about writing a whole set of benchmarks to make sure there is never any regression in performance anywhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started to write some of them as well (like this one), I think with high-performance projects like TF we should have a set of benchmarks, not only it is useful to detect regression but also is a very good indicator of where we should spend time for optimization.

If you can add some, that would be great, but we can merge without if you want to create these later.

@saudet
Copy link
Contributor Author

saudet commented Jan 29, 2020

I ran the initTensorByArrays benchmark from pull #20 (comment) with JMH's default settings and here is the kind of result I am getting on my machine:

  • Before (manually written JNI code):
    Benchmark                        Mode  Cnt   Score   Error  Units
    MyBenchmark.initTensorByArrays  thrpt   25  93.895 ± 1.580  ops/s
    
  • After (JNI refactored into Java with JavaCPP):
    Benchmark                        Mode  Cnt    Score   Error  Units
    MyBenchmark.initTensorByArrays  thrpt   25  110.598 ± 1.012  ops/s
    

So it does look like this refactoring also increases performance!

@saudet saudet requested a review from karllessard January 29, 2020 07:16
Copy link
Collaborator

@karllessard karllessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it does look like this refactoring also increases performance!

Great!

Copy link
Collaborator

@karllessard karllessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I'm merging this. There is a few unresolved conversation left that are not mandatory to be closed right now and could be part of another PR if needs be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants