Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler] Iterative DistinctlyKeyed Analysis #12696

Merged
merged 3 commits into from
Feb 15, 2023

Conversation

ehigham
Copy link
Member

@ehigham ehigham commented Feb 13, 2023

Use iterative tree traversals to prevent exceeding stack size for large IRs.

Use iterative tree traversals to prevent exceeding stack size for large IRs.
Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work -- couple tiny nits (mostly from looking at the existing analysis for the first time in a while :) )

@@ -31,6 +31,9 @@ class Memo[T] private(val m: mutable.HashMap[RefEquality[BaseIR], T]) {
this
}

def bindIf(test: Boolean, ir: BaseIR, t: T): Memo[T] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly fine in the unit case, but we might not want to use this naively for every analysis if constructing t is not cheap (as it is for Unit), since we'll need to construct t even if we don't end up binding it.

An alternative is changing the signature to def bindIf(test: Boolean, ir: BaseIR, t: => T), but this has other performance consequences -- Scala desugars this to a () => T and now a closure is allocated for every call to the function.

I think this is fine, we just won't be able to use it everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, call by name makes much more sense here. Use it where appropriate I guess. Do you know if visualvm works with scala code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, really appreciate the explanations in comments. Thank you for taking the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we typically use either YourKit or async-profiler. async-profiler is free and lighter-weight, but the last time we looked a year or two ago it didn't provide timings at the level of line symbols within functions, and as such wasn't quite as useful for profiling our enormous functions inside generated code.

| _: TableExplode
| _: TableFilterIntervals
| _: TableHead
| _: TableIntervalJoin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't change this behavior, but TableIntervalJoin is actually inherits distinctness just from the left -- it joins intervals from the right onto the left without a cartesian product

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll update it

| _: TableIntervalJoin
| _: TableJoin
| _: TableKeyBy
| _: TableLeftJoinRightDistinct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableLeftJoinRightDistinct is also not strict enough here. Should just propagate distinctness from the left child.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the previous code was not correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wasn't wrong, but it was unnecessarily loose. The semantic distinctness of the TableLeftJoinRightDistinct node doesn't care if the right side is distinct or not, but the previous implementation did.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now wondering if visiting a subset of the nodes in the tree is the right thing to do. What if there's a subtree in one of these unvisited nodes that might be distinctly keyed and we're not discovering it?
Is this analysis used top-down in a way that making that unimportant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're totally right -- in the general case, we need to visit every TableIR. This isn't just used top-down, it's used in lowering joins to avoid grouping the right side if we know it's already distinct.

We want to visit all TableIRs, but the distinctness calculation for any given IR might not care about some inputs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

// Java (and Scala) iterators mutate on `next()` so it's convenient
// to hold on to a node and its children as we visit the node after
// its children.
private var stack = List((root, adj(root)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!

@ehigham ehigham assigned ehigham and unassigned tpoterba Feb 15, 2023
@danking danking merged commit 1fe6f2a into hail-is:main Feb 15, 2023
@ehigham ehigham deleted the ir_tree_traversers branch February 15, 2023 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants