-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL MERGE Fixes #14650
SQL MERGE Fixes #14650
Conversation
fea5a9e
to
f6f7981
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add regression tests for these fixes?
core/trino-main/src/main/java/io/trino/operator/ChangeOnlyUpdatedColumnsMergeProcessor.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
f6f7981
to
eabdc67
Compare
We need to look at them one by one. With only one exception, we haven't seen these problems with SQL MERGE itself. Most of these problems showed up in tests in the PR that implements UPDATE and DELETE using MERGE plumbing, but migrating many of those tests would be difficult. I'll take a look, and consult @electrum, and see what new tests it makes sense to add. Thanks for looking @ksobolew! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A few minor comments
testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorTest.java
Outdated
Show resolved
Hide resolved
...-singlenode-hdp3-config-default-41fc9f4ea7a64eeba44f57fca29cf339/hadoop-master/container.log
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/MergeProcessorOperator.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
* - Values (0) | ||
* </pre> | ||
*/ | ||
public final class RemoveEmptyMergeWriterRuleSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this? Maybe TestRemoveEmptyDeleteRuleSet
is a good example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turned out to be somewhat challenging. I spent an hour on it and didn't get the test working. I'll add the test in a second pass.
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
The bug showed up as an exception in BeginTableWrite.rewriteModifyTableScan. However, that method doesn't need to be called for SQL MERGE. It is only needed if the connector operation requires an UpdatablePageSource, and MERGE does not, so the bug is fixed by this one-liner.
Preprocess pages passed as inputs to MergeProcessorOperator and MergeWriterOperator using the canonical pagePreprocessor.
This commit adds RemoveEmptyMergeWriterRuleSet, which turns MergeWriter over an empty TableScan into an empty values node, analogous to RemoveEmptyDeleteRuleSet. Rename Pattern.merge() to Pattern.mergeWriter(), since there are several kinds of merge nodes.
Null values aren't allowed in ImmutableLists, so use an ArrayList for partition values, and update the test to succeed when changing partition columns.
This commit keep track of the number of rows written in DeltaLakeMergeSink.rewriteParquetFile, and only commits the fileWriter if at least one row was written. Otherwise it calls fileWriter.rollback(). This fixes the Delte Lake tests that failed because of unexpected filex. I think this fix also applies to SQL MERGE.
In BeginTableWrite, when returning the MergeTarget, get the table handle from the TableScanNode.
This commit adds HiveMetadata.getUpdateLayout and classes HiveUpdateHandle and HiveUpdateBucketFunction.
KuduTableHandle had a boolean field isDeleteHandle. If the field was not set, rowIds would not be generated. This commit renames the field to more appropriate requiresRowId, and sets it in KuduMetadata.beginMerge. This fixes bugs in delete and update using merge components.
eabdc67
to
6f64339
Compare
Is there a release note that we should include for this? |
Description
This PR contains the accumulation of commits in PR #13926 that fix or improve the infrastructure developed to support SQL MERGE.
Non-technical explanation
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: