-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8117] [SQL] Move codegen implementation into Expression #6660
Conversation
Test build #34224 has finished for PR 6660 at commit
|
Test build #34227 has finished for PR 6660 at commit
|
protected val genericMutableRowType = classOf[GenericMutableRow].getName | ||
|
||
private val curId = new java.util.concurrent.atomic.AtomicInteger() | ||
case class EvaluatedExpression(var code: Code, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GeneratedExpressionCode?
Test build #34229 has finished for PR 6660 at commit
|
Test build #34231 has finished for PR 6660 at commit
|
Test build #34232 has finished for PR 6660 at commit
|
*/ | ||
def equalFunc(dataType: DataType): ((Term, Term) => Code) = dataType match { | ||
case BinaryType => { case (eval1, eval2) => s"java.util.Arrays.equals($eval1, $eval2)" } | ||
case dt if isNativeType(dt) => { case (eval1, eval2) => s"$eval1 == $eval2" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work? native doesn't mean it works in java?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only primitive types is native type here, from bool to long, double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea but the definition of native type (4 lines down) is : " List of data types that have special accessors and setters in [[Row]]." Which means in the future we might break this.
I'd make it explicit with primitive type here.
Test build #34247 has finished for PR 6660 at commit
|
@@ -86,6 +87,8 @@ case class Abs(child: Expression) extends UnaryArithmetic { | |||
abstract class BinaryArithmetic extends BinaryExpression { | |||
self: Product => | |||
|
|||
def decimalMethod: String = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u add inline comment explaining what this is for?
[SPARK-8117] [SQL] Move codegen implementation into Expression
One idea I have is to use numeric type for the null term. In that case, for a lot of expressions we no longer need conditional branches. To compute null term: && becomes &, || becomes |, xor becomes ^. |
BTW - I sent a pull request against your branch with some review comments. |
Code gen code review.
Test build #34294 has finished for PR 6660 at commit
|
Test build #34306 has finished for PR 6660 at commit
|
Test build #34308 has finished for PR 6660 at commit
|
* @param ctx a [[CodeGenContext]] | ||
* @return [[GeneratedExpressionCode]] | ||
*/ | ||
def gen(ctx: CodeGenContext): GeneratedExpressionCode = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this can be final
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some cases that we need to override it.
High-level comment: we should have a checklist / guide at the top of |
* by codegen, then they are evaluated directly. The unsupported expression is appended at the | ||
* end of `references`, the position of it is kept in the code, used to access and evaluate it. | ||
*/ | ||
class CodeGenContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, but can this class go into its own file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this later (reduce the number of changed lines, better for review)
Test build #34313 has finished for PR 6660 at commit
|
Test build #34320 has finished for PR 6660 at commit
|
Test build #34357 has finished for PR 6660 at commit
|
Test build #34358 has finished for PR 6660 at commit
|
This PR move codegen implementation of expressions into Expression class itself, make it easy to manage. It introduces two APIs in Expression: ``` def gen(ctx: CodeGenContext): GeneratedExpressionCode def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): Code ``` gen(ctx) will call genSource(ctx, ev) to generate Java source code for the current expression. A expression needs to override genSource(). Here are the types: ``` type Term String type Code String /** * Java source for evaluating an [[Expression]] given a [[Row]] of input. */ case class GeneratedExpressionCode(var code: Code, nullTerm: Term, primitiveTerm: Term, objectTerm: Term) /** * A context for codegen, which is used to bookkeeping the expressions those are not supported * by codegen, then they are evaluated directly. The unsupported expression is appended at the * end of `references`, the position of it is kept in the code, used to access and evaluate it. */ class CodeGenContext { /** * Holding all the expressions those do not support codegen, will be evaluated directly. */ val references: Seq[Expression] = new mutable.ArrayBuffer[Expression]() } ``` This is basically apache#6660, but fixed style violation and compilation failure. Author: Davies Liu <davies@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes apache#6690 from rxin/codegen and squashes the following commits: e1368c2 [Reynold Xin] Fixed tests. 73db80e [Reynold Xin] Fixed compilation failure. 19d6435 [Reynold Xin] Fixed style violation. 9adaeaf [Davies Liu] address comments f42c732 [Davies Liu] improve coverage and tests bad6828 [Davies Liu] address comments e03edaa [Davies Liu] consts fold 86fac2c [Davies Liu] fix style 02262c9 [Davies Liu] address comments b5d3617 [Davies Liu] Merge pull request #5 from rxin/codegen 48c454f [Reynold Xin] Some code gen update. 2344bc0 [Davies Liu] fix test 12ff88a [Davies Liu] fix build c5fb514 [Davies Liu] rename 8c6d82d [Davies Liu] update docs b145047 [Davies Liu] fix style e57959d [Davies Liu] add type alias 3ff25f8 [Davies Liu] refactor 593d617 [Davies Liu] pushing codegen into Expression
Now that 5e7b6b6 has merged, can this be closed? |
This PR move codegen implementation of expressions into Expression class itself, make it easy to manage. It introduces two APIs in Expression: ``` def gen(ctx: CodeGenContext): GeneratedExpressionCode def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): Code ``` gen(ctx) will call genSource(ctx, ev) to generate Java source code for the current expression. A expression needs to override genSource(). Here are the types: ``` type Term String type Code String /** * Java source for evaluating an [[Expression]] given a [[Row]] of input. */ case class GeneratedExpressionCode(var code: Code, nullTerm: Term, primitiveTerm: Term, objectTerm: Term) /** * A context for codegen, which is used to bookkeeping the expressions those are not supported * by codegen, then they are evaluated directly. The unsupported expression is appended at the * end of `references`, the position of it is kept in the code, used to access and evaluate it. */ class CodeGenContext { /** * Holding all the expressions those do not support codegen, will be evaluated directly. */ val references: Seq[Expression] = new mutable.ArrayBuffer[Expression]() } ``` This is basically apache#6660, but fixed style violation and compilation failure. Author: Davies Liu <davies@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes apache#6690 from rxin/codegen and squashes the following commits: e1368c2 [Reynold Xin] Fixed tests. 73db80e [Reynold Xin] Fixed compilation failure. 19d6435 [Reynold Xin] Fixed style violation. 9adaeaf [Davies Liu] address comments f42c732 [Davies Liu] improve coverage and tests bad6828 [Davies Liu] address comments e03edaa [Davies Liu] consts fold 86fac2c [Davies Liu] fix style 02262c9 [Davies Liu] address comments b5d3617 [Davies Liu] Merge pull request apache#5 from rxin/codegen 48c454f [Reynold Xin] Some code gen update. 2344bc0 [Davies Liu] fix test 12ff88a [Davies Liu] fix build c5fb514 [Davies Liu] rename 8c6d82d [Davies Liu] update docs b145047 [Davies Liu] fix style e57959d [Davies Liu] add type alias 3ff25f8 [Davies Liu] refactor 593d617 [Davies Liu] pushing codegen into Expression
This PR move codegen implementation of expressions into Expression class itself, make it easy to manage.
It introduces two APIs in Expression:
gen(ctx) will call genSource(ctx, ev) to generate Java source code for the current expression. A expression needs to override
genSource()
.Here are the types: