Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: move coercion of union from builder to TypeCoercion #11961

Merged
merged 16 commits into from
Aug 14, 2024

Conversation

jonahgao
Copy link
Member

Which issue does this PR close?

Closes #11742.

Rationale for this change

Currently, in the LogicalPlan builder, we coerce the inputs of a union plan to a common schema.
But this common schema is inaccurate because it lacks type coercion for expressions.

For example, in #11742, the return type of the expression nvl(v1, 0.5) is INT before type coercion and will become Float64 afterward.

/// The return type of `nvl` depends on the type of the first argument.
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
    Ok(arg_types[0].clone())
}

The fix is to place coercion of the union after the type coercion of expressions.

What changes are included in this PR?

  • Move coercion of union from builder to TypeCoercion.
  • Move some union-related SQL planner tests to slt, as these tests need to perform type coercion to achieve the same results as before.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

project_with_column_index has become private. It was introduced for UNION in #2108, and I think it should only be used internally.

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Aug 13, 2024
@@ -1881,23 +1799,6 @@ mod tests {
Ok(())
}

#[test]
fn plan_builder_union_different_num_columns_error() -> Result<()> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to slt.

@@ -127,7 +130,14 @@ mod tests {
}

fn assert_optimized_plan_equal(plan: LogicalPlan, expected: &str) -> Result<()> {
assert_optimized_plan_eq(Arc::new(EliminateNestedUnion::new()), plan, expected)
let options = ConfigOptions::default();
let analyzed_plan = Analyzer::with_rules(vec![Arc::new(TypeCoercion::new())])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add TypeCoercion to avoid breaking the tests.

05)--------Projection: Int64(1) AS a
06)----------EmptyRelation
07)--Projection: Float64(2.1) + x.a AS Float64(0) + x.a
08)----Aggregate: groupBy=[[Float64(2.1) + CAST(x.a AS Float64)]], aggr=[[]]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only difference I observed from the previous test results because type coercion was performed, and x.a had an additional cast.

@@ -2176,148 +2176,6 @@ fn union_all() {
quick_test(sql, expected);
}

#[test]
fn union_with_different_column_names() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting these union tests to SLT is easier than moving them to TypeCoercion.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jonahgao -- I agree it makes much more sense to have the type coercion for union happening as part of TypeCoercsion rather than the SQL planner / builder.

return plan_err!("Empty UNION");
}

// Temporarily use the schema from the left input and later rely on the analyzer to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

05)----EmptyRelation

# union_with_incompatible_data_type()
query error Error during planning: UNION Column 'Int64\(1\)' \(type: Int64\) is not compatible with other type: Interval\(MonthDayNano\)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the error refers to the types in reverse order than they appear in the query

The error message might be better if it were something more like

 Incompatible inputs for Union. Previous inputs were of type Interval\(MonthDayNano\), got incomaptible type 'Int64\(1\)' \(type: Int64\)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks @alamb

@@ -809,7 +809,7 @@ fn coerce_case_expression(case: Case, schema: &DFSchema) -> Result<Case> {
}

/// Get a common schema that is compatible with all inputs of UNION.
fn coerce_union_schema(inputs: Vec<Arc<LogicalPlan>>) -> Result<DFSchema> {
fn coerce_union_schema(inputs: &Vec<Arc<LogicalPlan>>) -> Result<DFSchema> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb
Copy link
Contributor

alamb commented Aug 14, 2024

Looks great @jonahgao -- thank you

@alamb alamb merged commit afa23ab into apache:main Aug 14, 2024
24 checks passed
@jonahgao
Copy link
Member Author

Thanks for the review @alamb

@jonahgao jonahgao deleted the union_coercion branch August 14, 2024 13:30
wiedld added a commit to influxdata/arrow-datafusion that referenced this pull request Aug 19, 2024
wiedld added a commit to influxdata/arrow-datafusion that referenced this pull request Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect result returned by UNION ALL (SQLancer-TLP)
2 participants