Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add core-api modules using JavaCPP #2

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions core/core-api-platform/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>org.tensorflow</groupId>
<artifactId>parent-core</artifactId>
<version>2.0.0-SNAPSHOT</version>
</parent>
<artifactId>core-api-platform</artifactId>
<name>Core API Library Platform</name>

<properties>
<javacpp.moduleId>core-api</javacpp.moduleId>
</properties>

<dependencies>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>${javacpp.moduleId}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>${javacpp.moduleId}</artifactId>
<version>${project.version}</version>
<classifier>${javacpp.platform.linux-x86_64}</classifier>
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>${javacpp.moduleId}</artifactId>
<version>${project.version}</version>
<classifier>${javacpp.platform.macosx-x86_64}</classifier>
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>${javacpp.moduleId}</artifactId>
<version>${project.version}</version>
<classifier>${javacpp.platform.windows-x86_64}</classifier>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<executions>
<execution>
<id>default-jar</id>
<configuration>
<archive>
<manifestEntries>
<Class-Path>${javacpp.moduleId}.jar ${javacpp.moduleId}-linux-x86_64.jar ${javacpp.moduleId}-macosx-x86_64.jar ${javacpp.moduleId}-windows-x86_64.jar</Class-Path>
</manifestEntries>
</archive>
</configuration>
</execution>
<execution>
<id>empty-javadoc-jar</id>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<classifier>javadoc</classifier>
</configuration>
</execution>
<execution>
<id>empty-sources-jar</id>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<classifier>sources</classifier>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

</project>
121 changes: 121 additions & 0 deletions core/core-api/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
#!/bin/bash
# Script to build native TensorFlow libraries
set -eu

KERNEL=(`uname -s | tr [A-Z] [a-z]`)
ARCH=(`uname -m | tr [A-Z] [a-z]`)
case $KERNEL in
darwin)
OS=macosx
;;
mingw32*)
OS=windows
KERNEL=windows
ARCH=x86
;;
mingw64*)
OS=windows
KERNEL=windows
ARCH=x86_64
;;
*)
OS=$KERNEL
;;
esac
case $ARCH in
arm*)
ARCH=arm
;;
aarch64*)
ARCH=arm64
;;
i386|i486|i586|i686)
ARCH=x86
;;
amd64|x86-64)
ARCH=x86_64
;;
esac
PLATFORM=$OS-$ARCH
EXTENSION=
echo "Detected platform \"$PLATFORM\""

while [[ $# > 0 ]]; do
case "$1" in
-platform=*)
PLATFORM="${1#-platform=}"
;;
-platform)
shift
PLATFORM="$1"
;;
-extension=*)
EXTENSION="${1#-extension=}"
;;
-extension)
shift
EXTENSION="$1"
;;
clean)
echo "Cleaning build"
rm -Rf build
;;
esac
shift
done

echo -n "Building for platform \"$PLATFORM\""
if [[ -n "$EXTENSION" ]]; then
echo -n " with extension \"$EXTENSION\""
fi
echo

export PYTHON_BIN_PATH=$(which python)
export USE_DEFAULT_PYTHON_LIB_PATH=1
export TF_ENABLE_XLA=0
export TF_NEED_OPENCL_SYCL=0
export TF_NEED_ROCM=0
export TF_NEED_CUDA=0
export TF_DOWNLOAD_CLANG=0
export TF_NEED_MPI=0
export CC_OPT_FLAGS=-O3
export TF_SET_ANDROID_WORKSPACE=0

TENSORFLOW_VERSION=2.0.0-rc0

mkdir -p "build/$PLATFORM$EXTENSION"
cd "build/$PLATFORM$EXTENSION"

if [[ ! -e "tensorflow-$TENSORFLOW_VERSION.tar.gz" ]]; then
curl -L -o "tensorflow-$TENSORFLOW_VERSION.tar.gz" "https://github.com/tensorflow/tensorflow/archive/v$TENSORFLOW_VERSION.tar.gz"
fi
echo "48ddba718da76df56fd4c48b4bbf4f97f254ba269ec4be67f783684c75563ef8 tensorflow-$TENSORFLOW_VERSION.tar.gz" | sha256sum -c -

echo "Decompressing archives"
tar --totals -xzf tensorflow-$TENSORFLOW_VERSION.tar.gz
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous implementation, I've added Bazel WORKSPACE and BUILD files to configure the native build. TensorFlow Serving also use a similar approach. It has a few advantages, for example it does all the archive caching, fetching and sum check for you.

I would be curious to see if we can replace that whole bash script with a Bazel workspace and using Maven to move files around after building.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always hack something with Bazel I'm sure, but wasn't one of the goals to not use Bazel?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to Maven instead of Bazel where we can, so I think it’s totally ok to continue to build the native library using Bazel.

Copy link
Collaborator Author

@saudet saudet Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any choice for TF itself, but we don't need to add files like WORKSPACE and BUILD in this repository. Forcing users to build with Bazel prevents them from using preexisting binaries, if that's what they want to do. Other than having something a bit better than cURL, what other advantages do you see?

Copy link
Owner

@karllessard karllessard Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forcing users to build with Bazel prevents them from using preexisting binaries

Who do you mean by “users”? Because only developers building from scratch the tensorflow-java repo will need bazel, anyone using binaries from Maven artifacts won’t.

Having a WORKSPACE and BUILD is only to help us configure the build of those binaries, on our side, instead of just invoking the bazel command from a shell script like you do now. Do you see something else?

Note that the bazel build, workspace or not, can still be controlled by Maven workflow, like here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the hash from some well known URL rather than having it pasted in the build script? For similar reasons can we have the tensorflow version being built passed into this script via Maven (assuming that's the top level build system)? Otherwise we have multiple places that need to have the version number modified every time it changes.

Copy link
Collaborator Author

@saudet saudet Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karllessard There are going to be users that are not satisfied with whatever we are going to publish to the central. I know that from experience mainly with OpenCV and FFmpeg. TF has so many options and so many APIs that can be enabled/disabled, and that's not even counting all the open forks, plus all proprietary extensions that companies are not releasing as open source. What I'm saying is that having a Bash script as entry point where we can bazel and/or curl whatever is much more inviting than some Bazel script that doesn't even support precompiled binaries to start with! How do you plan on serving those users with Bazel?

@Craigacp I don't know where GitHub offers hash for their downloads. Do you know? As for versions and what not, yes, we can make the build script dependent on Maven. Those are details at this point though.


# Assume Bazel is available in the path: https://www.tensorflow.org/install/source
cd tensorflow-$TENSORFLOW_VERSION

# Run the build for both the C and Java APIs
bash configure
bazel build --config opt //tensorflow:tensorflow //tensorflow/java:tensorflow
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again using a dedicated Bazel BUILD for our need, we can configure which targets we really need to build for Java instead of building the whole //tensorflow:tensorflowlibrary, which might contain more code than we need (to be validated though) and increase unnecessarily the size of the binaries.

Also, after we move the Java code to our repo, if only the generated ops source jar remains of interest here, you can simply build this target: //tensorflow/java:java_op_gen_sources(see here)

For reference, here those I was building in the JNI implementation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The //tensorflow:tensorflow target does nothing more than what is needed for the C API:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/BUILD#L578-L609
I'm not sure what you're talking about here?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I didn’t look by myself but yeah, looks pretty much only what we need. But anyway, if we use a Bazel BUILD file to configure our native build, we’ll be able to tweak it if needed (if for example TF adds a new dependency to a huge module that has no use/binding in Java).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always patch stuff for that, but I don't think it's a good idea to depend on Bazel for the reasons I mention above, unless there is something we really wouldn't be able to do any other way. It could be an option or something though.


# Normalize some paths with symbolic links
ln -sf tensorflow-$TENSORFLOW_VERSION ../tensorflow
ln -sf libtensorflow.so.${TENSORFLOW_VERSION%-*} bazel-bin/tensorflow/libtensorflow.so
ln -sf libtensorflow.so.${TENSORFLOW_VERSION%-*} bazel-bin/tensorflow/libtensorflow.so.2

# Gather Java source files from everywhere
mkdir -p ../java
cp -r tensorflow/java/src/gen/java/* ../java
cp -r tensorflow/java/src/main/java/* ../java
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the plan is to move this source code directly to the core-apifolder or should we keep them separated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's just temporary, but if we're going to use Bazel anyway, is the plan to move the generator and everything into this repository?

Copy link
Owner

@karllessard karllessard Sep 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we want to build with Bazel only the code that this SIG don’t own/maintain. So the annotation processor should be own our side and we should use Maven to run it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but that's not everything. The plan is to continue maintaining the ops wrappers upstream until we have a Java equivalent in this repository, correct? In that case, we wouldn't need to use Bazel in this repository for that either.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that’s the plan! Don’t know when that’ll be ready though.

We can rush it if needed but might be safer to wait where Google is heading with ops registration in modular TF, see this RFC.

# cp -r tensorflow/contrib/android/java/* ../java
# cp -r tensorflow/lite/java/src/main/java/* ../java
cp -r bazel-genfiles/tensorflow/java/ops/src/main/java/* ../java
cp -r bazel-genfiles/tensorflow/java/_javac/tensorflow/libtensorflow_sourcegenfiles/* ../java
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the annotation-processor agent, you won't need to copy those files anymore (under .../_javac/...)


# Work around loader bug in NativeLibrary.java
sed -i="" '/TensorFlow.version/d' ../java/org/tensorflow/NativeLibrary.java

cd ../..
Loading