-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid loading table properties during query analysis #6745
Avoid loading table properties during query analysis #6745
Conversation
This code is not tested yet. @kokosing @Praveen2112 could you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it.
Can you please verify INSERT, DELETE and UPDATE if they are doing the same and if we can apply the same?
Also can you please add a test (like io.trino.tests.TestInformationSchemaConnector#testMetadataCalls
) where we could verify that given method is not called for SELECT?
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Show resolved
Hide resolved
Map<String, ColumnHandle> columnHandles = metadata.getColumnHandles(session, tableHandle); | ||
for (ColumnMetadata column : tableMetadata.getColumns()) { | ||
for (ColumnHandle columnHandle : columnHandles.values()) { | ||
ColumnMetadata column = metadata.getColumnMetadata(session, tableHandle, columnHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it usually call any metadata service to get column metadata? Please check HiveMetadata
and JdbcMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to maintain the order of the columns ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to maintain the order of the columns ?
Yes. And this requires that getColumnMetadata
returns ImmutableMap or LinkedHashMap. Some connectors even rely on this.
io.trino.plugin.mongodb.MongoMetadata#getTableMetadata(io.trino.spi.connector.ConnectorSession, io.trino.spi.connector.SchemaTableName)
I've one idea to modify returned type of io.trino.spi.connector.ConnectorMetadata#getColumnHandles from Map to ImmutableMap or at least update javadoc.
I'm investigating it further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are even test failures for some other connectors:
Error: blackHoleConnectorUsage(io.trino.plugin.blackhole.TestBlackHoleSmoke) Time elapsed: 6.063 s <<< FAILURE!
java.lang.RuntimeException: Insert query has mismatched column types: Table: [bigint, bigint, varchar(25), varchar(152)], Query: [bigint, varchar(25), bigint, varchar(152)]
Caused by: io.trino.spi.TrinoException: Insert query has mismatched column types: Table: [bigint, bigint, varchar(25), varchar(152)], Query: [bigint, varchar(25), bigint, varchar(152)]
at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:48)
at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:43)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitInsert(StatementAnalyzer.java:461)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've one idea to modify returned type of io.trino.spi.connector.ConnectorMetadata#getColumnHandles from Map to ImmutableMap or at least update javadoc.
The contract of this method is to return mapping, not to maintain the order, so modifying return type won't help unless you make sure that the map is always built accordingly to the expected order.
It seems that currently there is no other way to get ordered columns than via table metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract of this method is to return mapping, not to maintain the order
The former does not forbid the latter.
unless you make sure that the map is always built accordingly to the expected order.
Exactly.
modify returned type of io.trino.spi.connector.ConnectorMetadata#getColumnHandles from Map to ImmutableMap
It will require adding
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</dependency>
to
trino/core/trino-spi/pom.xml
I'll copy javadoc for com.google.common.collect.ImmutableCollection
here:
"Interfaces", not implementations
These are classes instead of interfaces to prevent external subtyping, but should be thought of as interfaces in every important sense. Each public class such as ImmutableSet is a type offering meaningful behavioral guarantees. This is substantially different from the case of (say) HashSet, which is an implementation, with semantics that were largely defined by its supertype.
For field types and method return types, you should generally use the immutable type (such as ImmutableList) instead of the general collection interface type (such as List). This communicates to your callers all of the semantic guarantees listed above, which is almost always very useful information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't add Guava as a dependency to the SPI. Different plugins may depend on different versions of Guava that are incompatible with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we could have an API that only fetches the List<ColumnMetadata>
-> default implementation would be based on ConnectorMetadata#getColumns
and each Connector could have it own implementation if needs to lazily load the properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it usually call any metadata service to get column metadata? Please check HiveMetadata and JdbcMetadata
No. More detailed answer is here: #6745 (comment)
@@ -1261,8 +1260,6 @@ protected Scope visitTable(Table table, Optional<Scope> scope) | |||
Optional.of(name), | |||
Optional.of(column.getName()), | |||
false); | |||
ColumnHandle columnHandle = columnHandles.get(column.getName()); | |||
checkArgument(columnHandle != null, "Unknown field %s", field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kokosing @Praveen2112 @kasiafi I think this check is redundant. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is redundant. It should never happen as otherwise we should raise TrinoException
here. It is just defensive programming. You could replace it with requireNonNull
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I'll just remove these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't remove it.
@@ -1250,9 +1249,9 @@ protected Scope visitTable(Table table, Optional<Scope> scope) | |||
ImmutableMap.Builder<ColumnHandle, Field> fields = ImmutableMap.builder(); | |||
|
|||
// TODO: discover columns lazily based on where they are needed (to support connectors that can't enumerate all tables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kokosing @Praveen2112 @kasiafi Any idea what does this comment mean?
Maybe some examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a case that connector is not able to enumerate columns because they are dynamic. But if you ask for a column then it could generate it for you.
I guess we don't have such connectors yet, and in case we would have them I guess there is more work than just addressing this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
b098cd8
to
058c98f
Compare
List<ColumnMetadata> columns = tableMetadata.getColumns().stream() | ||
List<ColumnMetadata> columns = metadata.getColumnHandles(session, targetTableHandle.get()) | ||
.values().stream() | ||
.map(columnHandle -> metadata.getColumnMetadata(session, targetTableHandle.get(), columnHandle)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's O(columns) calls into connector metadata now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we use CachingJdbcClient
to reduce amount of calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That is correct. My understanding behind that I liked it at first is that it do not actually reach the real metadata, as it could be possibly to take that information just of of table handle.
Hence my comment: #6745 (comment)
Anyway, I think we need a dedicated method in SPI. This is just hacking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we use CachingJdbcClient to reduce amount of calls?
@ssheikin JdbcMetadata#getColumnMetadata
is O(1) and does not even reach CachingJdbcClient
But this is generic code, and needs to work great for every connector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it usually call any metadata service to get column metadata? Please check HiveMetadata and JdbcMetadata
No I didn't find any implementation which calls some metadata service.
In most cases it has simple implementation:
@Override
public ColumnMetadata getColumnMetadata(ConnectorSession session, ConnectorTableHandle tableHandle, ColumnHandle columnHandle)
{
return ((JdbcColumnHandle) columnHandle).getColumnMetadata();
}
it do not actually reach the real metadata, as it could be possibly to take that information just of of table handle.
Seems it is the case.
058c98f
to
968fbc8
Compare
This PR fixes it for |
getColumnMetadata()
consistent with tableMetadata.getColumns()
in Elasticsearch connector
#6795
5cc509d
to
7d49e41
Compare
Fixes #6477