-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Redshift #15365
Improve Redshift #15365
Changes from all commits
404ef98
d3355ab
0137647
7a0025c
6b1c3c6
d83ee1c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Redshift Connector | ||
|
||
To run the Redshift tests you will need to provision a Redshift cluster. The | ||
tests are designed to run on the smallest possible Redshift cluster containing | ||
is a single dc2.large instance. Additionally, you will need a S3 bucket | ||
containing TPCH tiny data in Parquet format. The files should be named: | ||
|
||
``` | ||
s3://<your_bucket>/tpch/tiny/<table_name>.parquet | ||
``` | ||
|
||
To run the tests set the following system properties: | ||
|
||
``` | ||
test.redshift.jdbc.endpoint=<your_endpoint>.<your_region>.redshift.amazonaws.com:5439/ | ||
test.redshift.jdbc.user=<username> | ||
test.redshift.jdbc.password=<password> | ||
test.redshift.s3.tpch.tables.root=<your_bucket> | ||
test.redshift.iam.role=<your_iam_arm_to_access_bucket> | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,12 +23,32 @@ | |
<artifactId>trino-base-jdbc</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.trino</groupId> | ||
<artifactId>trino-matching</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.trino</groupId> | ||
<artifactId>trino-plugin-toolkit</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.airlift</groupId> | ||
<artifactId>configuration</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>com.amazon.redshift</groupId> | ||
<artifactId>redshift-jdbc42</artifactId> | ||
<version>2.1.0.9</version> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>com.google.guava</groupId> | ||
<artifactId>guava</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>com.google.inject</groupId> | ||
<artifactId>guice</artifactId> | ||
|
@@ -39,10 +59,27 @@ | |
<artifactId>javax.inject</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>org.jdbi</groupId> | ||
<artifactId>jdbi3-core</artifactId> | ||
</dependency> | ||
|
||
<!-- used by tests but also needed transitively --> | ||
<dependency> | ||
<groupId>com.google.guava</groupId> | ||
<artifactId>guava</artifactId> | ||
<groupId>io.airlift</groupId> | ||
<artifactId>log</artifactId> | ||
<scope>runtime</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.airlift</groupId> | ||
<artifactId>log-manager</artifactId> | ||
<scope>runtime</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>net.jodah</groupId> | ||
<artifactId>failsafe</artifactId> | ||
<scope>runtime</scope> | ||
</dependency> | ||
|
||
|
@@ -72,16 +109,91 @@ | |
</dependency> | ||
|
||
<!-- for testing --> | ||
<dependency> | ||
<groupId>io.trino</groupId> | ||
<artifactId>trino-base-jdbc</artifactId> | ||
<type>test-jar</type> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.trino</groupId> | ||
<artifactId>trino-main</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.trino</groupId> | ||
<artifactId>trino-main</artifactId> | ||
<type>test-jar</type> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.trino</groupId> | ||
<artifactId>trino-testing</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.trino</groupId> | ||
<artifactId>trino-testing-services</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.trino</groupId> | ||
<artifactId>trino-tpch</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.trino.tpch</groupId> | ||
<artifactId>tpch</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.airlift</groupId> | ||
<artifactId>testing</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>org.assertj</groupId> | ||
<artifactId>assertj-core</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>org.testng</groupId> | ||
<artifactId>testng</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
</dependencies> | ||
|
||
<profiles> | ||
<profile> | ||
<id>default</id> | ||
<activation> | ||
<activeByDefault>true</activeByDefault> | ||
</activation> | ||
<build> | ||
<plugins> | ||
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-surefire-plugin</artifactId> | ||
<configuration> | ||
<excludes> | ||
<exclude>**/TestRedshiftAutomaticJoinPushdown.java</exclude> | ||
<exclude>**/TestRedshiftConnectorTest.java</exclude> | ||
<exclude>**/TestRedshiftTableStatisticsReader.java</exclude> | ||
<exclude>**/TestRedshiftTypeMapping.java</exclude> | ||
</excludes> | ||
Comment on lines
+187
to
+192
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no profile which includes it? How can I run the tests manually (outside of IDE)? |
||
</configuration> | ||
</plugin> | ||
</plugins> | ||
</build> | ||
</profile> | ||
</profiles> | ||
</project> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
/* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package io.trino.plugin.redshift; | ||
|
||
import io.trino.plugin.jdbc.aggregation.BaseImplementAvgBigint; | ||
|
||
public class ImplementRedshiftAvgBigint | ||
extends BaseImplementAvgBigint | ||
{ | ||
@Override | ||
protected String getRewriteFormatExpression() | ||
{ | ||
return "avg(CAST(%s AS double precision))"; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
/* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package io.trino.plugin.redshift; | ||
|
||
import io.trino.matching.Capture; | ||
import io.trino.matching.Captures; | ||
import io.trino.matching.Pattern; | ||
import io.trino.plugin.base.aggregation.AggregateFunctionRule; | ||
import io.trino.plugin.jdbc.JdbcColumnHandle; | ||
import io.trino.plugin.jdbc.JdbcExpression; | ||
import io.trino.spi.connector.AggregateFunction; | ||
import io.trino.spi.expression.Variable; | ||
import io.trino.spi.type.DecimalType; | ||
|
||
import java.util.Optional; | ||
|
||
import static com.google.common.base.Verify.verify; | ||
import static io.trino.matching.Capture.newCapture; | ||
import static io.trino.plugin.base.aggregation.AggregateFunctionPatterns.basicAggregation; | ||
import static io.trino.plugin.base.aggregation.AggregateFunctionPatterns.functionName; | ||
import static io.trino.plugin.base.aggregation.AggregateFunctionPatterns.singleArgument; | ||
import static io.trino.plugin.base.expression.ConnectorExpressionPatterns.type; | ||
import static io.trino.plugin.base.expression.ConnectorExpressionPatterns.variable; | ||
import static io.trino.plugin.redshift.RedshiftClient.REDSHIFT_MAX_DECIMAL_PRECISION; | ||
import static java.lang.String.format; | ||
|
||
public class ImplementRedshiftAvgDecimal | ||
implements AggregateFunctionRule<JdbcExpression, String> | ||
{ | ||
private static final Capture<Variable> INPUT = newCapture(); | ||
|
||
@Override | ||
public Pattern<AggregateFunction> getPattern() | ||
{ | ||
return basicAggregation() | ||
.with(functionName().equalTo("avg")) | ||
.with(singleArgument().matching( | ||
variable() | ||
.with(type().matching(DecimalType.class::isInstance)) | ||
.capturedAs(INPUT))); | ||
} | ||
|
||
@Override | ||
public Optional<JdbcExpression> rewrite(AggregateFunction aggregateFunction, Captures captures, RewriteContext<String> context) | ||
{ | ||
Variable input = captures.get(INPUT); | ||
JdbcColumnHandle columnHandle = (JdbcColumnHandle) context.getAssignment(input.getName()); | ||
DecimalType type = (DecimalType) columnHandle.getColumnType(); | ||
verify(aggregateFunction.getOutputType().equals(type)); | ||
|
||
// When decimal type has maximum precision we can get result that is not matching Presto avg semantics. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presto -> Trino |
||
if (type.getPrecision() == REDSHIFT_MAX_DECIMAL_PRECISION) { | ||
return Optional.of(new JdbcExpression( | ||
format("avg(CAST(%s AS decimal(%s, %s)))", context.rewriteExpression(input).orElseThrow(), type.getPrecision(), type.getScale()), | ||
columnHandle.getJdbcTypeHandle())); | ||
} | ||
|
||
// Redshift avg function rounds down resulting decimal. | ||
// To match Presto avg semantics, we extend scale by 1 and round result to target scale. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presto -> Trino? |
||
return Optional.of(new JdbcExpression( | ||
format("round(avg(CAST(%s AS decimal(%s, %s))), %s)", context.rewriteExpression(input).orElseThrow(), type.getPrecision() + 1, type.getScale() + 1, type.getScale()), | ||
columnHandle.getJdbcTypeHandle())); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is backward incompatible change. It is changing semantics of Redshift connector. One could say that after this change it is a new different Redshift connector. Most of the existing production usages of Redshift will now require a migration to make Trino upgrade. Such migration is a binary step. Either user is adjusted to use 403 Redshift connector or 404. There is no middle ground, so rolling back such migration is also difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's restore old one as
redshift-legacy
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it is enough. I would prefer to rename the existing one to mark it is deprecated (legacy) and create a new one with unique name (
redshift-v2
?). Then after some time once legacy is removed I would rename the one toredshift
)