Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make schema calculations for LogicalPlan::Aggregate and LogicalPlan::Distinct consistent with the HashExec #8766

Closed
wants to merge 2 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 5, 2024

Draft

Which issue does this PR close?

Closes #8738

Rationale for this change

This is a second sketch of how to close #8738

#8291 / #7647 changed DataFusion's Grouping operator so that it never dictionary encoded the output grouping columns.

Previously, the types of the input grouping expressions were the same as the types of the output group by

The idea I think is that since the values in the group columns are unique, there is no reason to dictionary encode them (as each dictionary entry would have a single value). I actually am not sure about this for reasons I will explain shortly.

What changes are included in this PR?

This PR changes LogicalPlan::Aggregate and LogicalPlan::Distinct (which both use the HashAggregate ExecutionPlan) to report a schema that is dictionary unencoded.

Are these changes tested?

Yes, there is a regression test added in #8750

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression "RowConverter column schema mismatch, expected Utf8 got Dictionary(Int32, Utf8)" after upgrade
1 participant