[compiler] Iterative `DistinctlyKeyed` Analysis #12696

ehigham · 2023-02-13T21:04:44Z

Use iterative tree traversals to prevent exceeding stack size for large IRs.

tpoterba

great work -- couple tiny nits (mostly from looking at the existing analysis for the first time in a while :) )

tpoterba · 2023-02-14T11:05:41Z

hail/src/main/scala/is/hail/expr/ir/RefEquality.scala

@@ -31,6 +31,9 @@ class Memo[T] private(val m: mutable.HashMap[RefEquality[BaseIR], T]) {
    this
  }

+  def bindIf(test: Boolean, ir: BaseIR, t: T): Memo[T] =


This is certainly fine in the unit case, but we might not want to use this naively for every analysis if constructing t is not cheap (as it is for Unit), since we'll need to construct t even if we don't end up binding it.

An alternative is changing the signature to def bindIf(test: Boolean, ir: BaseIR, t: => T), but this has other performance consequences -- Scala desugars this to a () => T and now a closure is allocated for every call to the function.

I think this is fine, we just won't be able to use it everywhere.

You're right, call by name makes much more sense here. Use it where appropriate I guess. Do you know if visualvm works with scala code?

BTW, really appreciate the explanations in comments. Thank you for taking the time.

we typically use either YourKit or async-profiler. async-profiler is free and lighter-weight, but the last time we looked a year or two ago it didn't provide timings at the level of line symbols within functions, and as such wasn't quite as useful for profiling our enormous functions inside generated code.

tpoterba · 2023-02-14T11:09:51Z

hail/src/main/scala/is/hail/expr/ir/DistinctlyKeyed.scala

+              | _: TableExplode
+              | _: TableFilterIntervals
+              | _: TableHead
+              | _: TableIntervalJoin


you didn't change this behavior, but TableIntervalJoin is actually inherits distinctness just from the left -- it joins intervals from the right onto the left without a cartesian product

Thanks! I'll update it

tpoterba · 2023-02-14T11:10:46Z

hail/src/main/scala/is/hail/expr/ir/DistinctlyKeyed.scala

+              | _: TableIntervalJoin
+              | _: TableJoin
+              | _: TableKeyBy
+              | _: TableLeftJoinRightDistinct


TableLeftJoinRightDistinct is also not strict enough here. Should just propagate distinctness from the left child.

So the previous code was not correct?

it wasn't wrong, but it was unnecessarily loose. The semantic distinctness of the TableLeftJoinRightDistinct node doesn't care if the right side is distinct or not, but the previous implementation did.

I'm now wondering if visiting a subset of the nodes in the tree is the right thing to do. What if there's a subtree in one of these unvisited nodes that might be distinctly keyed and we're not discovering it?
Is this analysis used top-down in a way that making that unimportant?

Ah, you're totally right -- in the general case, we need to visit every TableIR. This isn't just used top-down, it's used in lowering joins to avoid grouping the right side if we know it's already distinct.

We want to visit all TableIRs, but the distinctness calculation for any given IR might not care about some inputs

Sounds good!

tpoterba · 2023-02-14T11:11:50Z

hail/src/main/scala/is/hail/utils/TreeTraversal.scala

+      // Java (and Scala) iterators mutate on `next()` so it's convenient
+      // to hold on to a node and its children as we visit the node after
+      // its children.
+      private var stack = List((root, adj(root)))


Looks good.

Applied changes

tpoterba

great work!

[compiler] Iterative DistinctlyKeyed Analysis

f404e8e

Use iterative tree traversals to prevent exceeding stack size for large IRs.

ehigham requested a review from tpoterba February 13, 2023 21:05

ehigham assigned tpoterba Feb 13, 2023

tpoterba previously requested changes Feb 14, 2023

View reviewed changes

traverse all ir nodes

e752386

ehigham requested a review from tpoterba February 14, 2023 18:19

tpoterba approved these changes Feb 15, 2023

View reviewed changes

ehigham assigned ehigham and unassigned tpoterba Feb 15, 2023

Merge remote-tracking branch 'upstream/main' into ir_tree_traversers

0733c8a

danking merged commit 1fe6f2a into hail-is:main Feb 15, 2023

ehigham deleted the ir_tree_traversers branch February 15, 2023 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compiler] Iterative `DistinctlyKeyed` Analysis #12696

[compiler] Iterative `DistinctlyKeyed` Analysis #12696

ehigham commented Feb 13, 2023

tpoterba left a comment

tpoterba Feb 14, 2023

ehigham Feb 14, 2023

ehigham Feb 14, 2023

tpoterba Feb 14, 2023

tpoterba Feb 14, 2023

ehigham Feb 14, 2023

tpoterba Feb 14, 2023

ehigham Feb 14, 2023

tpoterba Feb 14, 2023

ehigham Feb 14, 2023

tpoterba Feb 14, 2023

ehigham Feb 14, 2023

tpoterba Feb 14, 2023

tpoterba left a comment

[compiler] Iterative DistinctlyKeyed Analysis #12696

[compiler] Iterative DistinctlyKeyed Analysis #12696

Conversation

ehigham commented Feb 13, 2023

tpoterba left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tpoterba left a comment

Choose a reason for hiding this comment

[compiler] Iterative `DistinctlyKeyed` Analysis #12696

[compiler] Iterative `DistinctlyKeyed` Analysis #12696