-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update matrix for parquet-cpp and parquet-java #100
base: production
Are you sure you want to change the base?
Conversation
| STRING | ✅ | ✅ | | | ✅ | | ||
| ENUM | ❌ | ✅ | | | ❌ | | ||
| UUID | ❌ | ✅ | | | ❌ | | ||
| 8, 16, 32, 64 bit signed and unsigned INT | ✅ | ✅ | | | ✅ | | ||
| DECIMAL (INT32) | ✅ | ✅ | | | ✅ | | ||
| DECIMAL (INT64) | ✅ | ✅ | | | ✅ | | ||
| DECIMAL (BYTE_ARRAY) | ✅ | ✅ | | | ✅ | | ||
| DECIMAL (FIXED_LEN_BYTE_ARRAY) | ✅ | ✅ | | | ✅ | | ||
| DATE | ✅ | ✅ | | | ✅ | | ||
| TIME (INT32) | ✅ | ✅ | | | ✅ | | ||
| TIME (INT64) | ✅ | ✅ | | | ✅ | | ||
| TIMESTAMP (INT64) | ✅ | ✅ | | | ✅ | | ||
| INTERVAL | ✅ | ✅ | | | ❌ | | ||
| JSON | ✅ | ✅ | | | ❌ | | ||
| BSON | ❌ | ✅ | | | ❌ | | ||
| LIST | ✅ | ✅ | | | ✅ | | ||
| MAP | ✅ | ✅ | | | ✅ | | ||
| UNKNOWN (always null) | ✅ | ✅ | | | ✅ | | ||
| FLOAT16 | ✅ | ✅ | | | ✅ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we mean by logical types are supported? Is it like we won't throw an exception and return something at least, or we expect to actually represent the related value correctly?
Since parquet-java
does not have its own representation type system (except the one in the example package that mainly handles the primitive types only), the question is whether at least one binding supports the related type. I am not aware that any binding supports INTERVAL
or BSON
. JSON
is mostly handle as string without any syntax check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think won't throw should be considered as supported. Let me remove INTERVAL
and BSON
.
Just to confirm that ENUM
and UUID
are supported by the Avro binding, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, UUID
is mapped to the Avro UUID type, ENUM
is mapped to string.
But, JSON
and FLOAT16
do not seem to be supported either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think FLOAT16
should be considered as supported: apache/parquet-java#1142
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, that's why I asked what supporting a logical type mean in terms of parquet-java
where we do not have our own representation type system. For FLOAT16
, we return the containing 2 bytes as a Binary
in our example package. We do not map this type in any of our bindings.
If we consider FLOAT16
as supported, then all the others are supported as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we consider FLOAT16 as supported, then all the others are supported as well.
Yes, that's why I marked all logical types as supported in the initial commit. Developers are able to implement their own bindings to write and read such types by extending the parquet-java
library.
Is there value in adding another option here (P perhaps) that means the implementation at least supports the logical type enum value but will return the physical type?
I'm not sure. Maybe it is a minimum requirement for implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wgtmac, If we do not want to distinguish the support level of e.g. FLOAT16
and UUID
, I'm fine having all marked as supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's why I marked all logical types as supported in the initial commit. Developers are able to implement their own bindings to write and read such types by extending the
parquet-java
library.
You should then add a note or asterisk to point out that "supported" means the physical type is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Added asterisk to explain that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ part LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some small comments, but this looks great @wgtmac Thanks for updating this 👍
| DATE | ✅ | ✅ | | | ✅ | | ||
| TIME (INT32) | ✅ | ✅ | | | ✅ | | ||
| TIME (INT64) | ✅ | ✅ | | | ✅ | | ||
| TIMESTAMP (INT64) | ✅ | ✅ | | | ✅ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we split this out on the unit as well?
| TIMESTAMP (INT64) | ✅ | ✅ | | | ✅ | | |
| TIMESTAMP (INT64, MICROS) | ✅ | ✅ | | | ✅ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that? Any unsupported unit in parquet-java?
No description provided.