-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ORC issue with incorrect timestamp nanosecond values #7581
Conversation
There's no significant perf impact on the writer, but the reader is up to 5% slower in some cases (timestamp columns only). I expected the writer to be more impacted, will look into why the reader is so much slower. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it weird that ORC does not require any change in metadata to signify that the stream has been changed from 32 bit to 64 bit.
Please confirm!?
Approved otherwise.
ORC uses varint so integer encoding does not depend on the size. |
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7581 +/- ##
===============================================
+ Coverage 81.86% 82.38% +0.51%
===============================================
Files 101 101
Lines 16884 17340 +456
===============================================
+ Hits 13822 14285 +463
+ Misses 3062 3055 -7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@gpucibot merge |
Closes rapidsai#7355 Use 64 bit variables/buffers to handle nanosecond values since nanosecond encode can overflow a 32bit value in some cases. Removed the overloaded `intrle_minmax` function, using templated `numeric_limits` functions instead (the alternative was to add another overload). Performance impact evaluation pending, but this fix seems unavoidable regardless of the impact. Authors: - Vukasin Milovanovic (@vuule) Approvers: - GALI PREM SAGAR (@galipremsagar) - Devavret Makkar (@devavret) - Kumar Aatish (@kaatish) URL: rapidsai#7581
Closes #7355
Use 64 bit variables/buffers to handle nanosecond values since nanosecond encode can overflow a 32bit value in some cases.
Removed the overloaded
intrle_minmax
function, using templatednumeric_limits
functions instead (the alternative was to add another overload).Performance impact evaluation pending, but this fix seems unavoidable regardless of the impact.