Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid loading table properties during query analysis #6745

Closed

Conversation

ssheikin
Copy link
Contributor

@ssheikin ssheikin commented Jan 28, 2021

Fixes #6477

@cla-bot cla-bot bot added the cla-signed label Jan 28, 2021
@ssheikin ssheikin marked this pull request as draft January 28, 2021 09:07
@ssheikin
Copy link
Contributor Author

This code is not tested yet. @kokosing @Praveen2112 could you please review?

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it.

Can you please verify INSERT, DELETE and UPDATE if they are doing the same and if we can apply the same?

Also can you please add a test (like io.trino.tests.TestInformationSchemaConnector#testMetadataCalls) where we could verify that given method is not called for SELECT?

Map<String, ColumnHandle> columnHandles = metadata.getColumnHandles(session, tableHandle);
for (ColumnMetadata column : tableMetadata.getColumns()) {
for (ColumnHandle columnHandle : columnHandles.values()) {
ColumnMetadata column = metadata.getColumnMetadata(session, tableHandle, columnHandle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it usually call any metadata service to get column metadata? Please check HiveMetadata and JdbcMetadata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to maintain the order of the columns ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to maintain the order of the columns ?

Yes. And this requires that getColumnMetadata returns ImmutableMap or LinkedHashMap. Some connectors even rely on this.
io.trino.plugin.mongodb.MongoMetadata#getTableMetadata(io.trino.spi.connector.ConnectorSession, io.trino.spi.connector.SchemaTableName)
I've one idea to modify returned type of io.trino.spi.connector.ConnectorMetadata#getColumnHandles from Map to ImmutableMap or at least update javadoc.
I'm investigating it further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are even test failures for some other connectors:

Error:  blackHoleConnectorUsage(io.trino.plugin.blackhole.TestBlackHoleSmoke)  Time elapsed: 6.063 s  <<< FAILURE!
java.lang.RuntimeException: Insert query has mismatched column types: Table: [bigint, bigint, varchar(25), varchar(152)], Query: [bigint, varchar(25), bigint, varchar(152)]

Caused by: io.trino.spi.TrinoException: Insert query has mismatched column types: Table: [bigint, bigint, varchar(25), varchar(152)], Query: [bigint, varchar(25), bigint, varchar(152)]
	at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:48)
	at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:43)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitInsert(StatementAnalyzer.java:461)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've one idea to modify returned type of io.trino.spi.connector.ConnectorMetadata#getColumnHandles from Map to ImmutableMap or at least update javadoc.

The contract of this method is to return mapping, not to maintain the order, so modifying return type won't help unless you make sure that the map is always built accordingly to the expected order.
It seems that currently there is no other way to get ordered columns than via table metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract of this method is to return mapping, not to maintain the order

The former does not forbid the latter.

unless you make sure that the map is always built accordingly to the expected order.

Exactly.

modify returned type of io.trino.spi.connector.ConnectorMetadata#getColumnHandles from Map to ImmutableMap

It will require adding

        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
        </dependency>

to
trino/core/trino-spi/pom.xml

I'll copy javadoc for com.google.common.collect.ImmutableCollection here:

"Interfaces", not implementations
These are classes instead of interfaces to prevent external subtyping, but should be thought of as interfaces in every important sense. Each public class such as ImmutableSet is a type offering meaningful behavioral guarantees. This is substantially different from the case of (say) HashSet, which is an implementation, with semantics that were largely defined by its supertype.
For field types and method return types, you should generally use the immutable type (such as ImmutableList) instead of the general collection interface type (such as List). This communicates to your callers all of the semantic guarantees listed above, which is almost always very useful information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add Guava as a dependency to the SPI. Different plugins may depend on different versions of Guava that are incompatible with each other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we could have an API that only fetches the List<ColumnMetadata> -> default implementation would be based on ConnectorMetadata#getColumns and each Connector could have it own implementation if needs to lazily load the properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it usually call any metadata service to get column metadata? Please check HiveMetadata and JdbcMetadata

No. More detailed answer is here: #6745 (comment)

@@ -1261,8 +1260,6 @@ protected Scope visitTable(Table table, Optional<Scope> scope)
Optional.of(name),
Optional.of(column.getName()),
false);
ColumnHandle columnHandle = columnHandles.get(column.getName());
checkArgument(columnHandle != null, "Unknown field %s", field);
Copy link
Contributor Author

@ssheikin ssheikin Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokosing @Praveen2112 @kasiafi I think this check is redundant. Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is redundant. It should never happen as otherwise we should raise TrinoException here. It is just defensive programming. You could replace it with requireNonNull.

Copy link
Contributor Author

@ssheikin ssheikin Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I'll just remove these lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't remove it.

@@ -1250,9 +1249,9 @@ protected Scope visitTable(Table table, Optional<Scope> scope)
ImmutableMap.Builder<ColumnHandle, Field> fields = ImmutableMap.builder();

// TODO: discover columns lazily based on where they are needed (to support connectors that can't enumerate all tables)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokosing @Praveen2112 @kasiafi Any idea what does this comment mean?
Maybe some examples?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a case that connector is not able to enumerate columns because they are dynamic. But if you ask for a column then it could generate it for you.

I guess we don't have such connectors yet, and in case we would have them I guess there is more work than just addressing this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@ssheikin ssheikin force-pushed the ssheikin/7/table-properties-loading branch from b098cd8 to 058c98f Compare January 28, 2021 13:00
List<ColumnMetadata> columns = tableMetadata.getColumns().stream()
List<ColumnMetadata> columns = metadata.getColumnHandles(session, targetTableHandle.get())
.values().stream()
.map(columnHandle -> metadata.getColumnMetadata(session, targetTableHandle.get(), columnHandle))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's O(columns) calls into connector metadata now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we use CachingJdbcClient to reduce amount of calls?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is correct. My understanding behind that I liked it at first is that it do not actually reach the real metadata, as it could be possibly to take that information just of of table handle.

Hence my comment: #6745 (comment)

Anyway, I think we need a dedicated method in SPI. This is just hacking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we use CachingJdbcClient to reduce amount of calls?

@ssheikin JdbcMetadata#getColumnMetadata is O(1) and does not even reach CachingJdbcClient
But this is generic code, and needs to work great for every connector

Copy link
Contributor Author

@ssheikin ssheikin Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it usually call any metadata service to get column metadata? Please check HiveMetadata and JdbcMetadata

No I didn't find any implementation which calls some metadata service.
In most cases it has simple implementation:

    @Override
    public ColumnMetadata getColumnMetadata(ConnectorSession session, ConnectorTableHandle tableHandle, ColumnHandle columnHandle)
    {
        return ((JdbcColumnHandle) columnHandle).getColumnMetadata();
    }

it do not actually reach the real metadata, as it could be possibly to take that information just of of table handle.

Seems it is the case.

@ssheikin ssheikin force-pushed the ssheikin/7/table-properties-loading branch from 058c98f to 968fbc8 Compare January 28, 2021 19:27
@martint martint self-requested a review January 28, 2021 19:32
@Praveen2112
Copy link
Member

This PR fixes it for SELECT type of queries but how do we handle them for INSERT ?

@ssheikin ssheikin force-pushed the ssheikin/7/table-properties-loading branch from 5cc509d to 7d49e41 Compare February 3, 2021 09:30
@ssheikin ssheikin closed this Apr 15, 2021
@ssheikin ssheikin deleted the ssheikin/7/table-properties-loading branch October 24, 2022 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Avoid loading table properties during query analysis
6 participants