Skip to content

Commit

Permalink
[Java] Refactor Java cookbooks to switch from JShell to Maven (#350)
Browse files Browse the repository at this point in the history
In #347 we found the way
we have been running cookbooks for Java (JShell) doesn't work well with
JPMS which was introduced in Arrow 16. This refactors `javadoctest.py`
to run examples directly with Maven using `exec:java` instead of with
JShell. This PR also bumps the Java source/target version to 11 to fix
some compiler errors and fixes a few compilation errors in cookbook
code.

I ran into one snag will require a follow-up commit to this PR: The way
the examples in
[substrait.rst](https://raw.githubusercontent.com/apache/arrow-cookbook/main/java/source/substrait.rst)
are written doesn't work with my approach. My approach splits each code
snippet into its `import` statements and non-`import` statements, puts
the imports outside the main class definition and puts the non-imports
inside the class's main method. This works fine for every example except
[substrait.rst](https://raw.githubusercontent.com/apache/arrow-cookbook/main/java/source/substrait.rst)
which needs some of its code to be defined in the main class, e.g.,

We probably generally want to support examples that need this so I think
we may need to rewrite all the Java cookbooks to have an explicit main
class. @lidavidm suspected this might be the case in
#347 (comment)
but I do wonder if there is still a way to avoid this. Any ideas
welcome.

Fixes #347
Related #348
  • Loading branch information
amoeba authored May 23, 2024
1 parent a43e757 commit 51e2880
Show file tree
Hide file tree
Showing 8 changed files with 366 additions and 235 deletions.
34 changes: 20 additions & 14 deletions java/CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,13 @@ install needed packages via pip, to build the Java cookbook. The
dependency packages managed via pip by build scripts are found at
`requirements.txt <requirements.txt>`_.

Java Shell
^^^^^^^^^^^^^^^^^^^^^^^^^
For Java cookbook we are running these with Java Shell tool -
`JShell <https://docs.oracle.com/en/java/javase/11/jshell/introduction-jshell.html>`_

.. code-block:: bash
> java --version
java 11.0.14 2022-01-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.14+8-LTS-263)
.. code-block:: bash
Java
^^^^

> jshell --version
jshell 11.0.14
The Java cookbooks require:

- Java JDK (11+)
- Maven

Build Process
-------------------------
Expand Down Expand Up @@ -81,6 +72,21 @@ additional ``.rst`` files in the ``source`` directory. You just
need to remember to add them to the ``index.rst`` file in the
``toctree`` for them to become visible.

When run, Java code snippets are wrapped in a simple main class

.. code-block:: java
// Any imports get put here
public class Example {
public static void main (String[] args) {
// Your code gets inserted here
}
}
If your code is more complicated, you can explicitly define ``public class Example``,
the above wrapping won't happen and the code will be run as-is.

Java Sphinx Directive
=====================

Expand Down
143 changes: 99 additions & 44 deletions java/ext/javadoctest.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,16 @@
import pathlib
import subprocess
from typing import Any, Dict
import tempfile
import shutil

from sphinx.ext.doctest import (DocTestBuilder, TestcodeDirective,
TestoutputDirective, doctest, sphinx)
from sphinx.ext.doctest import (
DocTestBuilder,
TestcodeDirective,
TestoutputDirective,
doctest,
sphinx,
)
from sphinx.locale import __


Expand All @@ -34,6 +41,10 @@ def run(self):
class JavaDocTestBuilder(DocTestBuilder):
"""
Runs java test snippets in the documentation.
The code in each testcode block is insert into a template Maven project,
run through exec:java, its output captured and post-processed, and finally
compared to whatever's in the corresponding testoutput.
"""

name = "javadoctest"
Expand All @@ -45,46 +56,49 @@ class JavaDocTestBuilder(DocTestBuilder):
def compile(
self, code: str, name: str, type: str, flags: Any, dont_inherit: bool
) -> Any:
# go to project that contains all your arrow maven dependencies
path_arrow_project = pathlib.Path(__file__).parent.parent / "source" / "demo"
# create list of all arrow jar dependencies
subprocess.check_call(
[
"mvn",
"-q",
"dependency:build-classpath",
"-DincludeTypes=jar",
"-Dmdep.outputFile=.cp.tmp",
f"-Darrow.version={self.env.config.version}",
],
cwd=path_arrow_project,
text=True,
)
if not (path_arrow_project / ".cp.tmp").exists():
raise RuntimeError(
__("invalid process to create jshell dependencies library")
source_dir = pathlib.Path(__file__).parent.parent / "source" / "demo"

with tempfile.TemporaryDirectory() as project_dir:
shutil.copytree(source_dir, project_dir, dirs_exist_ok=True)

template_file_path = (
pathlib.Path(project_dir)
/ "src"
/ "main"
/ "java"
/ "org"
/ "example"
/ "Example.java"
)

# get list of all arrow jar dependencies
with open(path_arrow_project / ".cp.tmp") as f:
stdout_dependency = f.read()
if not stdout_dependency:
raise RuntimeError(
__("invalid process to list jshell dependencies library")
with open(template_file_path, "r") as infile:
template = infile.read()

filled_template = self.fill_template(template, code)

with open(template_file_path, "w") as outfile:
outfile.write(filled_template)

# Support JPMS (since Arrow 16)
modified_env = os.environ.copy()
modified_env["_JAVA_OPTIONS"] = "--add-opens=java.base/java.nio=ALL-UNNAMED"

test_proc = subprocess.Popen(
[
"mvn",
"-f",
project_dir,
"compile",
"exec:java",
"-Dexec.mainClass=org.example.Example",
],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
text=True,
env=modified_env,
)

# execute java testing code thru jshell and read output
# JDK11 support '-' This allows the pipe to work as expected without requiring a shell
# Migrating to /dev/stdin to also support JDK9+
proc_jshell_process = subprocess.Popen(
["jshell", "-R--add-opens=java.base/java.nio=ALL-UNNAMED", "--class-path", stdout_dependency, "-s", "/dev/stdin"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
text=True,
)
out_java_arrow, err_java_arrow = proc_jshell_process.communicate(code)
if err_java_arrow:
raise RuntimeError(__("invalid process to run jshell"))
out_java_arrow, err_java_arrow = test_proc.communicate()

# continue with python logic code to do java output validation
output = f"print('''{self.clean_output(out_java_arrow)}''')"
Expand All @@ -93,12 +107,53 @@ def compile(
return compile(output, name, self.type, flags, dont_inherit)

def clean_output(self, output: str):
if output[-3:] == '-> ':
output = output[:-3]
if output[-1:] == '\n':
output = output[:-1]
output = (4*' ').join(output.split('\t'))
return output
lines = output.split("\n")

# Remove log lines from output
lines = [l for l in lines if not l.startswith("[INFO]")]
lines = [l for l in lines if not l.startswith("[WARNING]")]
lines = [l for l in lines if not l.startswith("Download")]
lines = [l for l in lines if not l.startswith("Progress")]

result = "\n".join(lines)

# Sometimes the testoutput content is smushed onto the same line as
# following log line, probably just when the testcode code doesn't print
# its own final newline. This example is the only case I found so I
# didn't pull out the re module (i.e., this works)
result = result.replace(
"[INFO] ------------------------------------------------------------------------",
"",
)

# Convert all tabs to 4 spaces, Sphinx seems to eat tabs even if we
# explicitly put them in the testoutput block so we instead modify
# the output
result = (4 * " ").join(result.split("\t"))

return result.strip()

def fill_template(self, template, code):
# Detect the special case where cookbook code is already wrapped in a
# class and just use the code as-is without wrapping it up
if code.find("public class Example") >= 0:
return template + code

# Split input code into imports and not-imports
lines = code.split("\n")
code_imports = [l for l in lines if l.startswith("import")]
code_rest = [l for l in lines if not l.startswith("import")]

pieces = [
template,
"\n".join(code_imports),
"\n\npublic class Example {\n public static void main(String[] args) {\n",
"\n".join(code_rest),
" }\n}",
]

return "\n".join(pieces)


def setup(app) -> Dict[str, Any]:
app.add_directive("testcode", JavaTestcodeDirective)
Expand Down
1 change: 1 addition & 0 deletions java/source/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,7 @@ In case we need to project only certain columns we could configure ScanOptions w
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ArrowReader;
import java.util.Optional;

String uri = "file:" + System.getProperty("user.dir") + "/thirdpartydeps/parquetfiles/data1.parquet";
String[] projection = new String[] {"name"};
Expand Down
16 changes: 14 additions & 2 deletions java/source/demo/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,13 @@
<version>1.6.1</version>
</extension>
</extensions>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>3.0.0</version>
</plugin>
</plugins>
</build>
<repositories>
<repository>
Expand All @@ -32,8 +39,8 @@
</repository>
</repositories>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<arrow.version>15.0.2</arrow.version>
</properties>
<dependencies>
Expand Down Expand Up @@ -107,5 +114,10 @@
<artifactId>core</artifactId>
<version>0.26.0</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
<version>1.37.0</version>
</dependency>
</dependencies>
</project>
17 changes: 17 additions & 0 deletions java/source/demo/src/main/java/org/example/Example.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package org.example;
1 change: 1 addition & 0 deletions java/source/flight.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ Flight Client and Server
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ConcurrentHashMap;

class Dataset implements AutoCloseable {
Expand Down
4 changes: 4 additions & 0 deletions java/source/schema.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ Fields are used to denote the particular columns of tabular data.
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
import java.util.ArrayList;
import java.util.List;

FieldType intType = new FieldType(true, new ArrowType.Int(32, true), null);
FieldType listType = new FieldType(true, new ArrowType.List(), null);
Expand Down Expand Up @@ -118,6 +120,8 @@ In case we need to add metadata to our Field we could use:
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
import java.util.HashMap;
import java.util.Map;

Map<String, String> metadata = new HashMap<>();
metadata.put("A", "Id card");
Expand Down
Loading

0 comments on commit 51e2880

Please sign in to comment.