-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[compiler] Iterative DistinctlyKeyed
Analysis
#12696
Conversation
Use iterative tree traversals to prevent exceeding stack size for large IRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work -- couple tiny nits (mostly from looking at the existing analysis for the first time in a while :) )
@@ -31,6 +31,9 @@ class Memo[T] private(val m: mutable.HashMap[RefEquality[BaseIR], T]) { | |||
this | |||
} | |||
|
|||
def bindIf(test: Boolean, ir: BaseIR, t: T): Memo[T] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is certainly fine in the unit case, but we might not want to use this naively for every analysis if constructing t
is not cheap (as it is for Unit), since we'll need to construct t
even if we don't end up binding it.
An alternative is changing the signature to def bindIf(test: Boolean, ir: BaseIR, t: => T)
, but this has other performance consequences -- Scala desugars this to a () => T
and now a closure is allocated for every call to the function.
I think this is fine, we just won't be able to use it everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, call by name makes much more sense here. Use it where appropriate I guess. Do you know if visualvm works with scala code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, really appreciate the explanations in comments. Thank you for taking the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we typically use either YourKit or async-profiler. async-profiler is free and lighter-weight, but the last time we looked a year or two ago it didn't provide timings at the level of line symbols within functions, and as such wasn't quite as useful for profiling our enormous functions inside generated code.
| _: TableExplode | ||
| _: TableFilterIntervals | ||
| _: TableHead | ||
| _: TableIntervalJoin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you didn't change this behavior, but TableIntervalJoin is actually inherits distinctness just from the left -- it joins intervals from the right onto the left without a cartesian product
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll update it
| _: TableIntervalJoin | ||
| _: TableJoin | ||
| _: TableKeyBy | ||
| _: TableLeftJoinRightDistinct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableLeftJoinRightDistinct is also not strict enough here. Should just propagate distinctness from the left child.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the previous code was not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it wasn't wrong, but it was unnecessarily loose. The semantic distinctness of the TableLeftJoinRightDistinct node doesn't care if the right side is distinct or not, but the previous implementation did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now wondering if visiting a subset of the nodes in the tree is the right thing to do. What if there's a subtree in one of these unvisited nodes that might be distinctly keyed and we're not discovering it?
Is this analysis used top-down in a way that making that unimportant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're totally right -- in the general case, we need to visit every TableIR. This isn't just used top-down, it's used in lowering joins to avoid grouping the right side if we know it's already distinct.
We want to visit all TableIRs, but the distinctness calculation for any given IR might not care about some inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
// Java (and Scala) iterators mutate on `next()` so it's convenient | ||
// to hold on to a node and its children as we visit the node after | ||
// its children. | ||
private var stack = List((root, adj(root))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work!
Use iterative tree traversals to prevent exceeding stack size for large IRs.