-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[compiler] disentangle IR parser from bindings #13990
Conversation
6342978
to
69d25d8
Compare
8f644a4
to
4112879
Compare
4112879
to
c8cfa80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has significantly improved the simplicity of the parser, so much so that much of the logic therein could be simplified further, though I think that's beyond the scope of this change.
I like the separation of type-checking and parsing, however I'd prefer in your implementations of typecheck
that you assert one thing at a time. That way when things fail, it'll be clear which assertion was fired (ie if (a && b && c)
fails, you don't know if it's a
or b
or c
, whereas
assert(a)
assert(b)
assert(c)
would give you that information.
case class NEQ(t1: Type, t2: Type) extends ComparisonOp[Boolean] { val op: CodeOrdering.Op = CodeOrdering.Neq() } | ||
case class NEQ(t1: Type, t2: Type) extends ComparisonOp[Boolean] { | ||
val op: CodeOrdering.Op = CodeOrdering.Neq() | ||
override def copy(t1: Type = t1, t2: Type = t2): NEQ = NEQ(t1, t2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these copy
methods not automatically defined by nature of being a case class
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that I want a virtual copy method on ComparisonOp
. I tried, but the compiler generated copy methods won't override the virtual one.
case "==" | "EQ" => | ||
EQ(null, null) | ||
case "!=" | "NEQ" => | ||
NEQ(null, null) | ||
case ">=" | "GTEQ" => | ||
GTEQ(null, null) | ||
case "<=" | "LTEQ" => | ||
LTEQ(null, null) | ||
case ">" | "GT" => | ||
GT(null, null) | ||
case "<" | "LT" => | ||
LT(null, null) | ||
case "Compare" => | ||
Compare(null, null) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ComparisonOp
s have always seemed a little funny to me. I'm not sure what value they add over lifting the operand types into ApplyComparisonOp
and just using the CodeOrdering.Op
, ie:
case class ApplyComparisonOp(op: CodeOrdering.Op, aTy: Type, bTy: Type, a: IR, b: IT)
I also wonder what's the point of ApplyComparisonOp
over ApplyBinaryOp
(or indeed, Apply
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair questions, but I think outside the scope of this PR. For the first question, I think it's mostly about ease of pattern matching; if an optimization wants to apply only for <
, you don't want to muck around at the CodeOrdering
level.
Similar for the second question, it's just an issue of what information is encoded in the IR, and how easy it is for optimizations to extract that information.
@@ -319,8 +324,7 @@ final case class StreamLen(a: IR) extends IR | |||
final case class StreamGrouped(a: IR, groupSize: IR) extends IR | |||
final case class StreamGroupByKey(a: IR, key: IndexedSeq[String], missingEqual: Boolean) extends IR | |||
|
|||
final case class StreamMap(a: IR, name: String, body: IR) extends IR { | |||
override def typ: TStream = tcoerce[TStream](super.typ) | |||
final case class StreamMap(a: IR, name: String, body: IR) extends TypedIR[TStream] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like the use of TypedIR
!
refMap: BindingEnv[Type] = BindingEnv.empty[Type], | ||
irMap: Map[Int, BaseIR] = Map.empty, | ||
) { | ||
|
||
def promoteAgg: IRParserEnvironment = copy(refMap = refMap.promoteAgg) | ||
|
||
def promoteScan: IRParserEnvironment = copy(refMap = refMap.promoteScan) | ||
|
||
def promoteAggScan(isScan: Boolean): IRParserEnvironment = | ||
if (isScan) promoteScan else promoteAgg | ||
|
||
def noAgg: IRParserEnvironment = copy(refMap = refMap.noAgg) | ||
|
||
def noScan: IRParserEnvironment = copy(refMap = refMap.noScan) | ||
|
||
def noAggScan(isScan: Boolean): IRParserEnvironment = | ||
if (isScan) noScan else noAgg | ||
|
||
def createAgg: IRParserEnvironment = copy(refMap = refMap.createAgg) | ||
|
||
def createScan: IRParserEnvironment = copy(refMap = refMap.createScan) | ||
|
||
def onlyRelational: IRParserEnvironment = { | ||
if (refMap.eval.isEmpty && refMap.agg.isEmpty && refMap.scan.isEmpty) | ||
this | ||
else | ||
copy(refMap = refMap.onlyRelational) | ||
} | ||
|
||
def empty: IRParserEnvironment = copy(refMap = BindingEnv.empty) | ||
|
||
def bindEval(name: String, t: Type): IRParserEnvironment = | ||
copy(refMap = refMap.bindEval(name, t)) | ||
|
||
def bindEval(bindings: (String, Type)*): IRParserEnvironment = | ||
copy(refMap = refMap.bindEval(bindings: _*)) | ||
|
||
def bindEval(bindings: Env[Type]): IRParserEnvironment = | ||
copy(refMap = refMap.bindEval(bindings.m.toSeq: _*)) | ||
|
||
def bindAggScan(isScan: Boolean, bindings: (String, Type)*): IRParserEnvironment = | ||
copy(refMap = if (isScan) refMap.bindScan(bindings: _*) else refMap.bindAgg(bindings: _*)) | ||
|
||
def bindAgg(name: String, t: Type): IRParserEnvironment = | ||
copy(refMap = refMap.bindAgg(name, t)) | ||
|
||
def bindAgg(bindings: (String, Type)*): IRParserEnvironment = | ||
copy(refMap = refMap.bindAgg(bindings: _*)) | ||
|
||
def bindAgg(bindings: Env[Type]): IRParserEnvironment = | ||
copy(refMap = refMap.bindAgg(bindings.m.toSeq: _*)) | ||
|
||
def bindScan(name: String, t: Type): IRParserEnvironment = | ||
copy(refMap = refMap.bindScan(name, t)) | ||
|
||
def bindScan(bindings: (String, Type)*): IRParserEnvironment = | ||
copy(refMap = refMap.bindScan(bindings: _*)) | ||
|
||
def bindScan(bindings: Env[Type]): IRParserEnvironment = | ||
copy(refMap = refMap.bindScan(bindings.m.toSeq: _*)) | ||
|
||
def bindRelational(name: String, t: Type): IRParserEnvironment = | ||
copy(refMap = refMap.bindRelational(name, t)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
val errorID = int32_literal(it) | ||
val function = identifier(it) | ||
val typeArgs = type_exprs(it) | ||
val rt = type_expr(it) | ||
ir_value_children(env)(it).map { args => | ||
invoke(function, rt, typeArgs, errorID, args: _*) | ||
ApplyIR(function, rt, typeArgs, args, errorID) | ||
} | ||
case "ApplySpecial" => | ||
val errorID = int32_literal(it) | ||
val function = identifier(it) | ||
val typeArgs = type_exprs(it) | ||
val rt = type_expr(it) | ||
ir_value_children(env)(it).map { args => | ||
ApplySpecial(function, typeArgs, args, rt, errorID) | ||
} | ||
case "Apply" => | ||
val errorID = int32_literal(it) | ||
val function = identifier(it) | ||
val typeArgs = type_exprs(it) | ||
val rt = type_expr(it) | ||
ir_value_children(env)(it).map { args => | ||
Apply(function, typeArgs, args, rt, errorID) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little sad. Can you define a function that takes the data constructor to clean all this up? ie
def applyLikeParser[Ap <: IR](cons: (String, Type, IndexedSeq[Type], IndexedSeq[IR], Int) => Ap): Parser[Ap] =
/* The de-duplicated implementation */
val parseApply: Parser[Apply] =
applyLikeParser(Apply)
val parseApplySpecial: Parser[ApplySpecial] =
applyLikeParser(ApplySpecial)
...
case "Apply" =>
parseApply(env)(it)
case "ApplySpecial" =>
parseApplySpecial(env)(it)
...
Agreed. But as my follow up will be a complete rewrite of the parser, I definitely don't want to do more to simplify the current one.
Good suggestion. I only moved assertions here, didn't add any new ones, but I don't mind splitting up some that I moved. |
Currently the binding structure is redundantly specified in two places: Binds.scala, and the parser. We need the binding structure in the parser to propagate the environment, so we can annotate
Ref
nodes (and a few other things) with their types. But we can't use Binds.scala because we don't yet have an IR.This PR removes environment maintenance from the parser by deferring type annotation to a separate pass (which is simple, because it can use the Binds.scala infrastructure). One consequence is that we can't assign types to nodes like
Ref
during parsing, which means we can't ask for the type of any node during parsing, and by extension we can't ask for types of children in IR node constructors. Instead, all typechecking logic is moved to theTypeCheck
pass.Some benefits of this change:
TypeCheck
pass, all typechecking and type error reporting logic is in one place.This change was motivated by my work in progress to convert the parser to use the SSA grammar, which this should greatly simplify.
I chose to make the type annotation pass after parsing mutate the IR in place (with the unfortunate exception of
Apply
, which can change into anApplyIR
orApplySpecial
. Do these really need to be separate nodes?). The type of aRef
node was already mutable to allow this sort of deferred annotation, and I've had to make a few other things mutable as well. Alternatively we could rebuild the entire IR to include type annotations, but I think the mutation is sufficiently well localized, and this is a common compiler pattern.This touches a lot of lines, but the high level changes are:
refMap
from theIRParserEnvironment
, and remove all code that modifies the typing environment from the parser. Nodes likeRef
that need a type from the environment get types set tonull
, to be filled in after parsing.annotateTypes
pass to fill in type annotations from the environment. This is currently written in the parser, and always called after parsing. This means for the moment we can only parse type correct IR. But in the future we could move this to a separate stage of the compilation pipeline.typecheck
method, which is called from theTypeCheck
pass.typ
field on IR nodes consistently lazy (or a def when the type is a child's type without modification). Before we sometimes did this for performance, but it's now required to avoid querying children's types during construction.AggSignature
andComparisonOp
, so they can be filled in after parsing.Binds.scala
satisfies the following invariant: to construct the typing environment of childi
, we only need the types of children with index less thani
. This was almost always satisfied already, and allows us to use the generic binds infrastucture in the pass to annotate types (where when visiting childi
, we can only query types of already visited children).TailLoop
/Recur
to move the explicit type annotation fromRecur
toTailLoop
. This is necessary to comply with the previous point. It's also consistent withRef
, where types of references are inferred from the environment.TableFilterIntervals
andMatrixFilterIntervals
, where the types were needed during parsing and we can no longer get them from child types.Apply
node. Before the parser parsed anApply
node to one ofApply/ApplySpecial/ApplyIR
; now it always parses toApply
, and the type annotation pass converts to the appropriate specialization, which needs the parsed return type.