-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5889: [C++][Parquet] Add property to indicate origin from converted type to TimestampLogicalType #4856
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Also tested it locally and can confirm this now properly reads older files in pyarrow.
Thanks @jorisvandenbossche for testing it out. When you say it works on older files, did you use the files attached to the Jira issue by the reporter? I've been considering adding a test using those as fixtures. Thanks again. |
Yes, I checked with those files (as well as something I wrote myself with an older version). I certainly think we should add such tests reading older files (I just opened https://issues.apache.org/jira/browse/ARROW-5915 for that). |
Great. Thanks for testing the reporter's files. Thanks also for opening that issue for compatibility testing -- I generally agree with you -- I'll comment further on that Jira issue when I get the chance. |
@tpboudreau this needs to be rebased now, let me know if you need help |
Rebased and RFR (I'll monitor the TravisCI build after it kicks off). |
Codecov Report
@@ Coverage Diff @@
## master #4856 +/- ##
===========================================
- Coverage 87.08% 75.68% -11.4%
===========================================
Files 883 58 -825
Lines 116941 3735 -113206
Branches 1418 0 -1418
===========================================
- Hits 101837 2827 -99010
+ Misses 14742 908 -13834
+ Partials 362 0 -362 Continue to review full report at Codecov.
|
bool force_set_converted_type = false); | ||
bool is_adjusted_to_utc() const; | ||
LogicalType::TimeUnit::unit time_unit() const; | ||
|
||
/// \brief If true, will not set LogicalType in Thrift metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true anymore -- I'll remove this comment and the is_serialized
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, GitHub was hiding files from me so this is indeed what is going on in this patch. Will take a closer look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I see that the is_serialized
override just allows us to round-trip the metadata faithfully. If a file is read to Arrow, then written back, then the logical type will be written (I think)
…rted type to TimestampLogicalType This patch: - adds a new boolean member isFromConvertedType() to the TimestampLogicalType class that is set to "true" if the LogicalType was created from a converted type of TIMESTAMP_MILLIS or TIMESTAMP_MICROS - prevents writing the TimestampLogicalType in the Parquet schema if this new property is true - changes the Arrow reader to ignore the isAdjustedToUTC() property of the TimestampLogicalType if the type annotation came from a converted type Author: TP Boudreau <tpboudreau@gmail.com> Closes #4856 from tpboudreau/ARROW-5889 and squashes the following commits: 458f063 <TP Boudreau> Add property showing converted type origin to TimestampLogicalType
…rted type to TimestampLogicalType This patch: - adds a new boolean member isFromConvertedType() to the TimestampLogicalType class that is set to "true" if the LogicalType was created from a converted type of TIMESTAMP_MILLIS or TIMESTAMP_MICROS - prevents writing the TimestampLogicalType in the Parquet schema if this new property is true - changes the Arrow reader to ignore the isAdjustedToUTC() property of the TimestampLogicalType if the type annotation came from a converted type Author: TP Boudreau <tpboudreau@gmail.com> Closes #4856 from tpboudreau/ARROW-5889 and squashes the following commits: 458f063 <TP Boudreau> Add property showing converted type origin to TimestampLogicalType
…rted type to TimestampLogicalType This patch: - adds a new boolean member isFromConvertedType() to the TimestampLogicalType class that is set to "true" if the LogicalType was created from a converted type of TIMESTAMP_MILLIS or TIMESTAMP_MICROS - prevents writing the TimestampLogicalType in the Parquet schema if this new property is true - changes the Arrow reader to ignore the isAdjustedToUTC() property of the TimestampLogicalType if the type annotation came from a converted type Author: TP Boudreau <tpboudreau@gmail.com> Closes apache#4856 from tpboudreau/ARROW-5889 and squashes the following commits: 458f063 <TP Boudreau> Add property showing converted type origin to TimestampLogicalType
This patch: