Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] jdbc:arrow-flight-sql executeQuery behaves incorrectly #38785

Closed
xinyiZzz opened this issue Nov 19, 2023 · 11 comments
Closed

[Java] jdbc:arrow-flight-sql executeQuery behaves incorrectly #38785

xinyiZzz opened this issue Nov 19, 2023 · 11 comments

Comments

@xinyiZzz
Copy link

xinyiZzz commented Nov 19, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Arrow Version 13.0.0
I use the following code to connect to flight sql server to execute query, two unexpected behaviors occur:

  1. acceptPutPreparedStatementUpdate of the flight server is called, and flightStream.getRoot().getRowCount() == 0 is true. but query is not an Update statement.
  • Shouldn't acceptPutPreparedStatementUpdate be called?

I'm using Python DB-API 2.0 cursor.execute(sql) and cursor.fetchallarrow() which does not call acceptPutPreparedStatementUpdate after calling createPreparedStatement.

  1. The FlightEndpoint returned by getFlightInfoPreparedStatement is another flight server, but not fetch result from other flight server. Instead, getStreamStatement of the current flight server is called.
  • Why is there no fetch result from another flight server?
  • Why is getStreamStatement called instead of getStreamPreparedStatement?

Actual calling sequence:

  1. createPreparedStatement
  2. acceptPutPreparedStatementUpdate
  3. getFlightInfoPreparedStatement
  4. getStreamStatement
  5. closePreparedStatement

I'm looking for help, thanks!

Connection conn = null;
Statement stmt = null;
Class.forName("org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver");
try {
    conn = DriverManager.getConnection("jdbc:arrow-flight-sql://xx:xx?useServerPrepStmts=false&cachePrepStmts=true&useSSL=false&useEncryption=false", "root", "");

    String sql = "select * from clickbench.hits limit 50";
    stmt = conn.createStatement();
    ResultSet rs = stmt.executeQuery(sql);
    int rowNum = rs.getRow();
    System.out.println(rowNum);
    rs.close();
    stmt.close();
    conn.close();
} catch (SQLException e) {
    throw new RuntimeException(e);
}

Component(s)

FlightRPC

@xinyiZzz xinyiZzz changed the title [Java] jdbc:arrow-flight-sql behaves incorrectly [Java] jdbc:arrow-flight-sql executeQuery behaves incorrectly Nov 19, 2023
@xinyiZzz
Copy link
Author

image
FlightInfo Endpoint in ArrowFlightSqlClientHandler.getStreams is correct

@xinyiZzz
Copy link
Author

xinyiZzz commented Nov 20, 2023

Upgraded to the latest 14.0.1, problem still exists

It seems that Location is not used when initializing FlightStream?

@jduo
Copy link
Member

jduo commented Nov 20, 2023

Hi @xinyiZzz ,
It'd be great to create two separate Issues for each problem.

For the problem with the Location not being used, this was reported in #34532 and recently fixed in #38521 . This fix hasn't been released into a new artifact yet though.

For problem 1, I'm not sure if I understand the issue being raised. The acceptPutPreparedStatementUpdate RPC is used when the query supplied is an update-type statement (and if that's the case, you would use either the execute() or executeUpdate() JDBC functions on Statement/PreparedStatement instead of executeQuery()). Or are you saying you are running a SELECT statement with executeQuery() and the driver is issuing the acceptPutPreparedStatementUpdate message instead?

@aiguofer
Copy link
Contributor

One thing I noticed while working on this code is that the way we're identifying UPDATE statements is REALLY naive. If the server returns a null schema for the createPreparedStatement call, the driver thinks this is an UPDATE call. This could be your problem. Is the server correctly setting the schema?

@xinyiZzz
Copy link
Author

xinyiZzz commented Nov 21, 2023

Hi @xinyiZzz , It'd be great to create two separate Issues for each problem.

Thanks! , ok, I'll separate later~

For the problem with the Location not being used, this was reported in #34532 and recently fixed in #38521 . This fix hasn't been released into a new artifact yet though.

That's great! When will this fix be released into a new artifact? , or is there a nightly build that can be used? I want to use it as soon as possible.

For problem 1, I'm not sure if I understand the issue being raised. The acceptPutPreparedStatementUpdate RPC is used when the query supplied is an update-type statement (and if that's the case, you would use either the execute() or executeUpdate() JDBC functions on Statement/PreparedStatement instead of executeQuery()). Or are you saying you are running a SELECT statement with executeQuery() and the driver is issuing the acceptPutPreparedStatementUpdate message instead?

Yes, I running a SELECT statement with executeQuery() and the driver is issuing the acceptPutPreparedStatementUpdate message instead. I think as @aiguofer said.

@xinyiZzz
Copy link
Author

xinyiZzz commented Nov 21, 2023

One thing I noticed while working on this code is that the way we're identifying UPDATE statements is REALLY naive. If the server returns a null schema for the createPreparedStatement call, the driver thinks this is an UPDATE call. This could be your problem. Is the server correctly setting the schema?

Great! I reversed the order of DatasetSchema and ParameterSchema as parameters, and my ParameterSchema was null.

Thanks for your attention @aiguofer , ask a question again, it seems that the DatasetSchema returned by createPreparedStatement is not really used? I return null DatasetSchema, python adbc queries normally.

@xinyiZzz
Copy link
Author

s there a nightly build that can be used

I found nightly builds and will try it:
https://arrow.apache.org/docs/developers/java/building.html#installing-nightly-packages

@xinyiZzz
Copy link
Author

I found nightly builds and will try it: https://arrow.apache.org/docs/developers/java/building.html#installing-nightly-packages

Thanks! @jduo , I used nightlies 15.0.0-SNAPSHOT and the problem disappeared, getStream requested other endpoints.

@aiguofer
Copy link
Contributor

The DatasetSchema is exposed to clients to do whatever they wish with it. They might need to know the schema ahead of having the results, for example.

As you noticed, It's also used to determine the type of query, which might affect how to execute the query. For example, the JDBC driver doesn't differentiate betwee executeQuery and executeUpdate when execute is called, so the driver itself must do it.

This does mean that in order for your server to meet the spec you need to include a DatasetSchema.

@xinyiZzz
Copy link
Author

Thanks!, solved all my problems.

@jduo
Copy link
Member

jduo commented Nov 22, 2023

Thanks for the update @xinyiZzz and thanks @aiguofer .
Closing as all questions have been resolved.

@jduo jduo closed this as completed Nov 22, 2023
yiguolei pushed a commit to apache/doris that referenced this issue Dec 7, 2023
Previously temporarily upgrade Arrow to dev version 15.0.0-SNAPSHOT, because the latest release version Arrow 14.0.1 jdbc:arrow-flight-sql has BUG, jdbc:arrow-flight-sql cannot be used normally, see: apache/arrow#38785

But Arrow 15.0.0-SNAPSHOT was not published to the Maven central repository, and the network could not be connected sometimes, so back to Arrow 14.0.1. jdbc:arrow-flight-sql will be supported after upgrading to Arrow 15.0.0 release version.
XuJianxu pushed a commit to XuJianxu/doris that referenced this issue Dec 14, 2023
Previously temporarily upgrade Arrow to dev version 15.0.0-SNAPSHOT, because the latest release version Arrow 14.0.1 jdbc:arrow-flight-sql has BUG, jdbc:arrow-flight-sql cannot be used normally, see: apache/arrow#38785

But Arrow 15.0.0-SNAPSHOT was not published to the Maven central repository, and the network could not be connected sometimes, so back to Arrow 14.0.1. jdbc:arrow-flight-sql will be supported after upgrading to Arrow 15.0.0 release version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants