Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow appender to accept byte[] #107

Merged
merged 3 commits into from
Dec 13, 2024

Conversation

mach-kernel
Copy link
Contributor

Greetings!

We're writing a large amount of binary data and wanted to (for performance) eliminate a copy/serialization step that took our Java byte[] and made them in to a hex code escaped string. To achieve this with the JDBC driver, we:

  • Added a DuckDBAppender#append method that accepts byte[]
  • Made the corresponding JNI call treat the value as BLOB_RAW
    • In Java: without getBytes and an encoding
    • In CXX: without going from std::string -> cstr
  • Added a test to ensure byte[] can be round-tripped losslessly.

To better illustrate the issue, a modified version of this test fails on trunk (using a deprecated String constructor that AFAICT does not apply any encoding) :

    public static void test_appender_roundtrip_blob() throws Exception {
        DuckDBConnection conn = DriverManager.getConnection(JDBC_URL).unwrap(DuckDBConnection.class);
        Statement stmt = conn.createStatement();

        stmt.execute("CREATE TABLE data (a BLOB)");

        DuckDBAppender appender = conn.createAppender(DuckDBConnection.DEFAULT_SCHEMA, "data");
        SecureRandom random = SecureRandom.getInstanceStrong();
        byte[] data = new byte[512];
        random.nextBytes(data);

        appender.beginRow();
        // This does not apply a charset encoding
        appender.append(new String(data, 0));
        appender.endRow();
        appender.flush();
        appender.close();

        ResultSet results = stmt.executeQuery("SELECT * FROM data");
        assertTrue(results.next());

        byte[] resultBlob = results.getBlob(1).getBinaryStream().readAllBytes();
        assertTrue(Arrays.equals(resultBlob, data), "byte[] data is round tripped untouched");

        results.close();
        stmt.close();
        conn.close();
    }
TestDuckDBJDBC#test_appender_roundtrip_blob failed with java.sql.SQLException: Invalid Input Error: Failed to cast value: Invalid byte encountered in STRING örÇoÀùuãÅUïϾ1è
java.sql.SQLException: Invalid Input Error: Failed to cast value: Invalid byte encountered in STRING -> BLOB conversion of string "#íiæþb­ð}´éA¡a
 þ;·p°g ,Çüó[,@Ä9hMÑú×zÇ       ý2?)8Lâ¦PôÞIõ!»C\         Y". All non-ascii characters must be escaped with hex codes (e.g. \xAA)
        at org.duckdb.DuckDBNative.duckdb_jdbc_appender_append_string(Native Method)
        at org.duckdb.DuckDBAppender.append(DuckDBAppender.java:84)
        at org.duckdb.TestDuckDBJDBC.test_appender_roundtrip_blob(TestDuckDBJDBC.java:2775)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at org.duckdb.test.Runner.runTests(Runner.java:45)
        at org.duckdb.TestDuckDBJDBC.main(TestDuckDBJDBC.java:4637)

@Mause
Copy link
Member

Mause commented Dec 10, 2024

Heya, thanks for submitting this! Are you able to look into the build failure? Looks like you're using a method that isn't available in Java 8

@mach-kernel
Copy link
Contributor Author

Sure!! Just pushed up a fix, let me know if it doesn't work. 😸

@Mause Mause merged commit 998d0e4 into duckdb:main Dec 13, 2024
7 checks passed
@Mause
Copy link
Member

Mause commented Dec 13, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants