You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The ListingFileProvider after #1860 uses UInt16 for the indexes of the DictionaryArray type
This means that the number of partitions is limited to 2^16 ~ 64K. It also means when scanning files from a source that have fewer than 256 distinct values (that could have fit in UInt8) there is wasted space and time using larger than needed dictionary columns (which will all have the same value).
Describe the solution you'd like
Ideally the partition column would be a constant (or a DictionaryArray with UInt8 indexes) and the various upstream operations would create DictionaryArrays with larger index sizes as needed
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The
ListingFileProvider
after #1860 usesUInt16
for the indexes of theDictionaryArray
typeThis means that the number of partitions is limited to
2^16
~64K
. It also means when scanning files from a source that have fewer than 256 distinct values (that could have fit inUInt8
) there is wasted space and time using larger than needed dictionary columns (which will all have the same value).Describe the solution you'd like
Ideally the partition column would be a constant (or a DictionaryArray with
UInt8
indexes) and the various upstream operations would createDictionaryArrays
with larger index sizes as neededAdditional context
SUggested by @rdettai on #1860 at #1860 (comment) and #1860 (comment)
The text was updated successfully, but these errors were encountered: