Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8117] [SQL] Push codegen implementation into each Expression #6690

Closed
wants to merge 19 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Jun 7, 2015

This PR move codegen implementation of expressions into Expression class itself, make it easy to manage.

It introduces two APIs in Expression:

def gen(ctx: CodeGenContext): GeneratedExpressionCode
def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): Code

gen(ctx) will call genSource(ctx, ev) to generate Java source code for the current expression. A expression needs to override genSource().

Here are the types:

type Term String
type Code String

/**
 * Java source for evaluating an [[Expression]] given a [[Row]] of input.
 */
case class GeneratedExpressionCode(var code: Code,
                               nullTerm: Term,
                               primitiveTerm: Term,
                               objectTerm: Term)
/**
 * A context for codegen, which is used to bookkeeping the expressions those are not supported
 * by codegen, then they are evaluated directly. The unsupported expression is appended at the
 * end of `references`, the position of it is kept in the code, used to access and evaluate it.
 */
class CodeGenContext {
  /**
   * Holding all the expressions those do not support codegen, will be evaluated directly.
   */
  val references: Seq[Expression] = new mutable.ArrayBuffer[Expression]()
}

This is basically #6660, but fixed style violation and compilation failure.

@rxin
Copy link
Contributor Author

rxin commented Jun 7, 2015

cc @davies I have a big followup pr to move a lot of expression test code for better grouping.

@SparkQA
Copy link

SparkQA commented Jun 7, 2015

Test build #34388 has finished for PR 6690 at commit 73db80e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code, var isNull: Term, var primitive: Term)
    • class CodeGenContext
    • case class Pow(left: Expression, right: Expression)
    • case class Rint(child: Expression) extends UnaryMathExpression(math.rint, "ROUND")
    • case class ToDegrees(child: Expression) extends UnaryMathExpression(math.toDegrees, "DEGREES")
    • case class ToRadians(child: Expression) extends UnaryMathExpression(math.toRadians, "RADIANS")

@davies
Copy link
Contributor

davies commented Jun 7, 2015

@rxin Should we merge this one first?

@rxin
Copy link
Contributor Author

rxin commented Jun 7, 2015

Yes -- except it is failing tests :(

@rxin
Copy link
Contributor Author

rxin commented Jun 7, 2015

cc @liancheng why is partitioning suites failing?

@rxin
Copy link
Contributor Author

rxin commented Jun 7, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jun 7, 2015

Test build #34394 has finished for PR 6690 at commit 73db80e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code, var isNull: Term, var primitive: Term)
    • class CodeGenContext
    • case class Pow(left: Expression, right: Expression)
    • case class Rint(child: Expression) extends UnaryMathExpression(math.rint, "ROUND")
    • case class ToDegrees(child: Expression) extends UnaryMathExpression(math.toDegrees, "DEGREES")
    • case class ToRadians(child: Expression) extends UnaryMathExpression(math.toRadians, "RADIANS")

@rxin
Copy link
Contributor Author

rxin commented Jun 7, 2015

@liancheng The failure is due to double type vs float type under the hood. What I don't get is that how come this code was passing before?!

Should we parse 1.5 as double type, or float type?

@rxin
Copy link
Contributor Author

rxin commented Jun 7, 2015

I filed #6692 to always use DoubleType. It's much simpler to have only one type for floating point numbers, rather than having to reason about float vs double.

@SparkQA
Copy link

SparkQA commented Jun 7, 2015

Test build #34395 has finished for PR 6690 at commit e1368c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code, var isNull: Term, var primitive: Term)
    • class CodeGenContext
    • case class Pow(left: Expression, right: Expression)
    • case class Rint(child: Expression) extends UnaryMathExpression(math.rint, "ROUND")
    • case class ToDegrees(child: Expression) extends UnaryMathExpression(math.toDegrees, "DEGREES")
    • case class ToRadians(child: Expression) extends UnaryMathExpression(math.toRadians, "RADIANS")

@asfgit asfgit closed this in 5e7b6b6 Jun 8, 2015
@liancheng
Copy link
Contributor

@rxin The test failure was because this PR overrides Literal.equals which uses value.equals(o.value). A quick Scala REPL check:

scala> 1.5 == 1.5f
res2: Boolean = true

scala> 1.5.equals(1.5f)
res3: Boolean = false

@rxin
Copy link
Contributor Author

rxin commented Jun 8, 2015

It's a problem of the test case that manifests from this.

We shouldn't use a double value in the place where we intend to have a float value. Let's be more careful with that in the future.

nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This PR move codegen implementation of expressions into Expression class itself, make it easy to manage.

It introduces two APIs in Expression:
```
def gen(ctx: CodeGenContext): GeneratedExpressionCode
def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): Code
```

gen(ctx) will call genSource(ctx, ev) to generate Java source code for the current expression. A expression needs to override genSource().

Here are the types:
```
type Term String
type Code String

/**
 * Java source for evaluating an [[Expression]] given a [[Row]] of input.
 */
case class GeneratedExpressionCode(var code: Code,
                               nullTerm: Term,
                               primitiveTerm: Term,
                               objectTerm: Term)
/**
 * A context for codegen, which is used to bookkeeping the expressions those are not supported
 * by codegen, then they are evaluated directly. The unsupported expression is appended at the
 * end of `references`, the position of it is kept in the code, used to access and evaluate it.
 */
class CodeGenContext {
  /**
   * Holding all the expressions those do not support codegen, will be evaluated directly.
   */
  val references: Seq[Expression] = new mutable.ArrayBuffer[Expression]()
}
```

This is basically apache#6660, but fixed style violation and compilation failure.

Author: Davies Liu <davies@databricks.com>
Author: Reynold Xin <rxin@databricks.com>

Closes apache#6690 from rxin/codegen and squashes the following commits:

e1368c2 [Reynold Xin] Fixed tests.
73db80e [Reynold Xin] Fixed compilation failure.
19d6435 [Reynold Xin] Fixed style violation.
9adaeaf [Davies Liu] address comments
f42c732 [Davies Liu] improve coverage and tests
bad6828 [Davies Liu] address comments
e03edaa [Davies Liu] consts fold
86fac2c [Davies Liu] fix style
02262c9 [Davies Liu] address comments
b5d3617 [Davies Liu] Merge pull request apache#5 from rxin/codegen
48c454f [Reynold Xin] Some code gen update.
2344bc0 [Davies Liu] fix test
12ff88a [Davies Liu] fix build
c5fb514 [Davies Liu] rename
8c6d82d [Davies Liu] update docs
b145047 [Davies Liu] fix style
e57959d [Davies Liu] add type alias
3ff25f8 [Davies Liu] refactor
593d617 [Davies Liu] pushing codegen into Expression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants