-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Unordered Repeated Records #222
Conversation
Codecov Report
@@ Coverage Diff @@
## master #222 +/- ##
==========================================
- Coverage 72.18% 71.55% -0.64%
==========================================
Files 36 35 -1
Lines 1467 1445 -22
Branches 124 116 -8
==========================================
- Hits 1059 1034 -25
- Misses 408 411 +3
Continue to review full report at Codecov.
|
: Seq[Delta] = { | ||
val schemaFields = (x, y) match { | ||
case (Some(xVal), None) => xVal.getSchema.getFields.asScala.toList | ||
case (_, Some(yVal)) => yVal.getSchema.getFields.asScala.toList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) I think this would be more readable if you switched the order of lines 43 and 44, or left a comment explaining that the intended behavior is to usually fall back to the second case; my understanding is that you want to use y's schema fields if y exists, and only use x's if y doesn't exist and x does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment. It's because we assume LHS is backwards compatible with RHS, therefore RHS should have all necessary fields
private def diff(x: Option[GenericRecord], y: Option[GenericRecord], root: String) | ||
: Seq[Delta] = { | ||
val schemaFields = (x, y) match { | ||
case (Some(xVal), None) => xVal.getSchema.getFields.asScala.toList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these need to be cast to lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asScala has laziness which causes weirdness and hard to parse errors. Easier to just avoid it imo
a.asScala.zip(b.asScala).flatMap{case (l, r) => | ||
diff(l.asInstanceOf[GenericRecord], r.asInstanceOf[GenericRecord], fullName) | ||
}.toList | ||
&& unordered.contains(fullName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're checking this in the case statement already, won't it always be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice catch
&& unordered.contains(fullName) | ||
&& unorderedFieldKeys.contains(fullName)) { | ||
val l = x.flatMap(r => | ||
Option(r.get(name).asInstanceOf[java.util.List[GenericRecord]].asScala.toList)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these need to be converted to list or can they be some other seq?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think it would make more sense to wrap in option immediately after the asInstanceOf
call (which i don't think throws NPEs) and do the .asScala.toList
(or just .asScala
if you take my other comment too) in subsequent map steps (since asScala and toList can NPE I think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think the asScala won't immediately NPE but the toList will. I'll adjust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we will already have a properly wrapped option at this point so it's not a concern
}.toList | ||
&& unordered.contains(fullName) | ||
&& unorderedFieldKeys.contains(fullName)) { | ||
val l = x.flatMap(r => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: I know the r here stands for "record" but the L here stands for "left" so I don't think we should call the record 'r'
@@ -93,8 +95,8 @@ abstract class Diffy[T](val ignore: Set[String], | |||
StringDelta(stringDelta(x.toString, y.toString)) | |||
} else { | |||
val tryVector = Try { | |||
val vx = x.asInstanceOf[java.util.List[_]].asScala.map(_.toString.toDouble) | |||
val vy = y.asInstanceOf[java.util.List[_]].asScala.map(_.toString.toDouble) | |||
val vx = x.asInstanceOf[java.util.List[_]].asScala.map(_.toString.toDouble).toList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why List and not Seq?
b.asInstanceOf[java.util.List[AbstractMessage]].asScala).flatMap { | ||
case (l, r) => diff(l, r, f.getMessageType.getFields.asScala, fullName)} | ||
if (f.getJavaType == JavaType.MESSAGE | ||
&& unordered.contains(fullName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this contains check is already checked above
if (a == null && b == null) { | ||
val a = x.flatMap(m => getField(f)(m).asInstanceOf[Option[AbstractMessage]]) | ||
val b = y.flatMap(m => getField(f)(m).asInstanceOf[Option[AbstractMessage]]) | ||
if (a.isEmpty && b.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked the case statement you changed the other file to use in the same part of that logic a bit better (but this is still ok)
if (f.getJavaType == JavaType.MESSAGE | ||
&& unordered.contains(fullName) | ||
&& unorderedFieldKeys.contains(fullName)) { | ||
val l = x.flatMap(r => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comments in this as in above
if (a == b) Nil else Seq(Delta(fullName, Option(a), Option(b), delta(a, b))) | ||
if (f.getType == "RECORD" | ||
&& unorderedFieldKeys.contains(fullName) | ||
&& unordered.contains(fullName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also checked above already; similar comments in this file to other files
.getOrElse(List()) | ||
.flatMap(r => Try(r.get(unorderedFieldKeys(fullName))).toOption.map(k => (k, r))) | ||
.toMap | ||
(l.keySet ++ r.keySet).flatMap(k => diff(l.get(k), r.get(k), fullName)).toList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we're saying the diff of two records x
and y
that are like Array[NestedRecord]
is computed by taking any exact matches of some subfield x.a
and y.a
of NestedRecord
and comparing?
I worry this could not be what some users would expect - e.g. if the arrays are equal size and all subfields are numeric but all slightly different, there would be no way to compute the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced offline.
We agreed this does change the behavior since we no longer sort on unorderedFieldKeys
in a way that'll mark all records as "unkonwn delta" if we want to compare on numeric fields that could be slightly different. But all use cases we know of are for non-numeric from what I was told.
Docs should be updated though, esp to clarify that unorderedFieldKeys
is no longer a sorting key as the docs currently say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also be good to write a test w/ this new behavior.
The test would also aid as documentation for expected behavior.
LGTM after modifying docs clarifying |
#219 and #217