Incorrect or inconsistent ZonedDateTime handling in operations #5241

rcaudy · 2024-03-11T18:36:19Z

Firstly, the over-arching issues:

ZonedDateTime instances are not equal to other ZonedDateTime instances with different time zones, and so when we do convert to a long for hashing or sorting we are changing the definition of equals that applies.
Our "automatic" conversion code in ReinterpretUtils.maybeConvertToPrimitive rightlfully requires support for symmetric conversion from the ColumnSource to be reinterpreted (i.e. source instanceof ConvertibleTimeSource.Zoned) and thus a single time zone for the entire ColumnSource.

Secondly, the consistency issues:

naturalJoin and join are never reinterpreting ZonedDateTime key sources to long for hashing. Hence they are always applying ZonedDateTime.hashCode() and ZonedDateTime.equals(). This is internally consistent within the operations, at least, but inconsistent with ~~other operations~~ aj.
aj is using ReinterpretUtils.maybeConvertToPrimitive, and so we might reinterpret only one side, resulting in an error that seems inexplicable to users. We will also be getting equality "wrong" (by the Java definition) if the two sides have different time zones.
aggBy converts symmetrically, which is good. ~~However, it is effectively picking different definitions of equality depending on the provenance of the ZonedDateTime key source.~~ With current maybeConvertToPrimitive, we're always correct.
sort has basically the same issue as aggBy. ~~While we're always using a comparison that is consistent with equality, which definition of comparison and equality we use depends on the source.~~ With current maybeConvertToPrimitive, we're always correct.

Solutions:

Remove ZonedDateTime support from maybeConvertToPrimitive. This standardizes on Java's definition of comparison and equality, but eliminates opportunities for optimization.
Remove the restriction that we only convert to primitive when reversible. This requires extra work to figure out what our result time zone should be in many cases. If we have a join with mismatched ConvertibleTimeSource.Zoned.getZone() results, we need to error out or "pick a winner". Worse, if we have to convert back from an un-zoned long source, we need to pick a zone. DateTimeUtils.timeZone(), e.g. the system default?
Keep maybeConvertToPrimitive the same. For joins, we should only convert if both sides are reinterpretable and have the same fixed zone.

I think we should pick option (2), as that renders it less fraught to reinterpret between Instant and ZonedDateTime. Otherwise, this reinterpretation changes the meaning of equality, etc, for the data within the column.

I think we should pick option (3). It means zone matters for equality and comparison. It keeps aggBy and sort correct with current optimizations. We could eventually optimize naturalJoin and join, but they are correct as-is. We would have to fix a bug in aj that might result in error messages or incorrect equality/comparability.

The text was updated successfully, but these errors were encountered:

rcaudy · 2024-03-18T23:26:02Z

@lbooker42 We need to do two things:

Fix aj to ensure that we only reinterpret one side if we can reinterpret both sides.
Maybe reinterpret ZonedDataTime in naturalJoin and join IF we can reinterpret both sides.
Probably never reinterpret ZonedDateTime in whereIn, unless we can refactor to know the set table and the source table at the same time.

rcaudy · 2024-07-19T20:52:00Z

#5780 (comment)
We may need to revisit column stats for ZDTs.

rcaudy added bug Something isn't working query engine core Core development tasks labels Mar 11, 2024

rcaudy added this to the 4. Unscheduled milestone Mar 11, 2024

rcaudy self-assigned this Mar 11, 2024

rcaudy assigned lbooker42 and unassigned rcaudy Mar 18, 2024

rcaudy modified the milestones: 4. Unscheduled, 2. April 2024 Mar 18, 2024

pete-petey added the april2024 label Aug 26, 2024

pete-petey modified the milestones: 2. April 2024, Backlog Aug 26, 2024

rcaudy mentioned this issue Sep 16, 2024

feat: [WIP] Barrage Refactor Read/Write Chunk Factories and Default Type Mappings #6065

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect or inconsistent ZonedDateTime handling in operations #5241

Incorrect or inconsistent ZonedDateTime handling in operations #5241

rcaudy commented Mar 11, 2024 •

edited

Loading

rcaudy commented Mar 18, 2024 •

edited

Loading

rcaudy commented Jul 19, 2024 •

edited

Loading

Incorrect or inconsistent ZonedDateTime handling in operations #5241

Incorrect or inconsistent ZonedDateTime handling in operations #5241

Comments

rcaudy commented Mar 11, 2024 • edited Loading

rcaudy commented Mar 18, 2024 • edited Loading

rcaudy commented Jul 19, 2024 • edited Loading

rcaudy commented Mar 11, 2024 •

edited

Loading

rcaudy commented Mar 18, 2024 •

edited

Loading

rcaudy commented Jul 19, 2024 •

edited

Loading