-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: full support of group by nullable column #4079
Comments
/assignme |
If it overflows u64, defaults to use String(binary) type. |
ok |
Why not combine the null bits for multiple columns to save space? |
Yes, I also thought about this. But we should introduce native |
This kind of type may be not friendly with the cache line.That's may be in too low-level layer to consider this question. |
This method will be very similar with clickhouse's way. It uses the array<UInt8, getBitmapSize()> to represent the nullable. |
Summary
Description for this feature.
Currently
we ignore the nulls to generate the hash key of the group by columns.
So the results are not correct in this query:
How:
a % 3 as c
isNullable(UInt8)
. Ifc
isUInt8
, we can hash columnc
intoUInt8
Column usingfixed_hash
. But we should use extra bits to identify the null value, soUInt16
is used: [8bit(value)
--8bits(just use 1 bit to identify null) ]
If there are multiple nullable columns, say: N,
N * 8
bits will be used.The final bits must be rounded into primitive types: [
Nullabe(UInt8)
,UInt8
] -->24bits
---> round into32bits
, UInt32Column will be used to store the hashkeys.This task can be taken after #4074 is merged.
Related:
The text was updated successfully, but these errors were encountered: