-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle downstream impacts to union's behavioral changes. #12105
Comments
Sorry for introducing this behavior change. There are two reasons we can't perform coercion when building unions.
create table t1(a int, b int);
create table t2(c int, d int);
select * from t1 union select c, d from t2; I think it's reasonable to make Another potential solution might be to manually run the necessary Analyzer rules, such as |
Thank you for the explanation I think for our use (which is a planner for the InfluxQL query language) we don't need to handle wildcards I do think it would be interesting to try calling analyze rather than the lower level type coercion I believe @wiedld said she tried this and the additional coercion performed by |
Thank you @jonahgao . Agreed that the change was needed. I was hoping for an update to the docs on how to handle the change (if the user needs to) when using the If the user needs the exact same logical plan as before the change, then this is roughly how we did it:
If the user is ok with added CAST nodes (including timezone cast), then they could use the analyzers:
@alamb -- we have passing InfluxQL e2es with the second (analyzer) approach, diffing a lot of added CAST nodes. I'm not sure about the use case for other users whom build their own logical plans (a.k.a. how the casts may impact them). |
I can make this change. Just wanted input (such as the wildcard expansion explanation) before doing so. Are we good for this updated doc change? Or should we do anything else? |
Thank you @wiedld 👍 . I think we can remind users about wildcard expansion in the document. Another thing is that I wonder whether it is necessary to expose coerce_union, allowing users to try using it first, as it is somewhat a high-level API. |
take |
Is your feature request related to a problem or challenge?
With this upstream change, an existing union() API had a behavioral change. This API is used to construct the UNION logical plan. Previously, it had been constructing the UNION plan node and performing type coercion. After the upstream change, it no longer perform the type coercion at logical plan construction (and instead relies upon a later logical plan optimizer pass).
This upstream change does not cause any failures in the upstream datafusion tests. However, for those constructing their own logical plans with UNION it does have significant downstream impacts. Explicitly:
union()
upstream APIWhen the upstream change removed the UNION type coercion from step 1 above, our logical plan construction code (step 2) started creating incorrect logical plans.
Describe the solution you'd like
Update the
union()
API docs with how to handle the changed behavior if the user still requires UNION type coercion at logical plan construction. Something like:The above code relies upon the already public coerce_plan_expr_for_schema. The coerce_union_schemaI is not yet public, therefore we recommend making it public.
Describe alternatives you've considered
The original change was put in by @jonahgao , whom may have some alternative ideas?
Additional context
We have made our own public (patched) version of the
coerce_union_schema
as a temporary measure to fix the issue, and can confirm it's sufficiently. Also open to other ideas.The text was updated successfully, but these errors were encountered: