[compiler] disentangle IR parser from bindings #13990

patrick-schultz · 2023-11-08T19:30:03Z

Currently the binding structure is redundantly specified in two places: Binds.scala, and the parser. We need the binding structure in the parser to propagate the environment, so we can annotate Ref nodes (and a few other things) with their types. But we can't use Binds.scala because we don't yet have an IR.

This PR removes environment maintenance from the parser by deferring type annotation to a separate pass (which is simple, because it can use the Binds.scala infrastructure). One consequence is that we can't assign types to nodes like Ref during parsing, which means we can't ask for the type of any node during parsing, and by extension we can't ask for types of children in IR node constructors. Instead, all typechecking logic is moved to the TypeCheck pass.

Some benefits of this change:

The parser is simpler, as it doesn't have to maintain a typing environment.
Binds.scala is now the single source of truth on the binding structure of the IR.
Instead of typechecking being split in an ad-hoc way between IR constructors and the TypeCheck pass, all typechecking and type error reporting logic is in one place.
The parser parses a context-free grammar, no more and no less. If the input is gramatically correct, the parser succeeds.
We can round trip IR with type errors through the text representation. For instance, if we log an IR that fails TypeCheck, we can copy the IR from the log, parse it, then debug.

This change was motivated by my work in progress to convert the parser to use the SSA grammar, which this should greatly simplify.

I chose to make the type annotation pass after parsing mutate the IR in place (with the unfortunate exception of Apply, which can change into an ApplyIR or ApplySpecial. Do these really need to be separate nodes?). The type of a Ref node was already mutable to allow this sort of deferred annotation, and I've had to make a few other things mutable as well. Alternatively we could rebuild the entire IR to include type annotations, but I think the mutation is sufficiently well localized, and this is a common compiler pattern.

This touches a lot of lines, but the high level changes are:

Remove the refMap from the IRParserEnvironment, and remove all code that modifies the typing environment from the parser. Nodes like Ref that need a type from the environment get types set to null, to be filled in after parsing.
Add annotateTypes pass to fill in type annotations from the environment. This is currently written in the parser, and always called after parsing. This means for the moment we can only parse type correct IR. But in the future we could move this to a separate stage of the compilation pipeline.
Move typechecking logic on relational nodes from the constructors to a typecheck method, which is called from the TypeCheck pass.
Make the typ field on IR nodes consistently lazy (or a def when the type is a child's type without modification). Before we sometimes did this for performance, but it's now required to avoid querying children's types during construction.
Make types mutable on AggSignature and ComparisonOp, so they can be filled in after parsing.
Ensure that the structure in Binds.scala satisfies the following invariant: to construct the typing environment of child i, we only need the types of children with index less than i. This was almost always satisfied already, and allows us to use the generic binds infrastucture in the pass to annotate types (where when visiting child i, we can only query types of already visited children).
Change the text representation of TailLoop/Recur to move the explicit type annotation from Recur to TailLoop. This is necessary to comply with the previous point. It's also consistent with Ref, where types of references are inferred from the environment.
Add explicit types in the text representation of TableFilterIntervals and MatrixFilterIntervals, where the types were needed during parsing and we can no longer get them from child types.
Fix IR examples used in parser tests to be type correct.
Add an explicit return type to the Apply node. Before the parser parsed an Apply node to one of Apply/ApplySpecial/ApplyIR; now it always parses to Apply, and the type annotation pass converts to the appropriate specialization, which needs the parsed return type.

ehigham

This has significantly improved the simplicity of the parser, so much so that much of the logic therein could be simplified further, though I think that's beyond the scope of this change.

I like the separation of type-checking and parsing, however I'd prefer in your implementations of typecheck that you assert one thing at a time. That way when things fail, it'll be clear which assertion was fired (ie if (a && b && c) fails, you don't know if it's a or b or c, whereas

assert(a)
assert(b)
assert(c)

would give you that information.

ehigham · 2023-11-13T20:17:10Z

hail/src/main/scala/is/hail/expr/ir/ComparisonOp.scala

-case class NEQ(t1: Type, t2: Type) extends ComparisonOp[Boolean] { val op: CodeOrdering.Op = CodeOrdering.Neq() }
+case class NEQ(t1: Type, t2: Type) extends ComparisonOp[Boolean] {
+  val op: CodeOrdering.Op = CodeOrdering.Neq()
+  override def copy(t1: Type = t1, t2: Type = t2): NEQ = NEQ(t1, t2)


Are these copy methods not automatically defined by nature of being a case class?

The issue is that I want a virtual copy method on ComparisonOp. I tried, but the compiler generated copy methods won't override the virtual one.

ehigham · 2023-11-13T21:07:31Z

hail/src/main/scala/is/hail/expr/ir/ComparisonOp.scala

+    case "==" | "EQ" =>
+      EQ(null, null)
+    case "!=" | "NEQ" =>
+      NEQ(null, null)
+    case ">=" | "GTEQ" =>
+      GTEQ(null, null)
+    case "<=" | "LTEQ" =>
+      LTEQ(null, null)
+    case ">" | "GT" =>
+      GT(null, null)
+    case "<" | "LT" =>
+      LT(null, null)
+    case "Compare" =>
+      Compare(null, null)
  }


These ComparisonOps have always seemed a little funny to me. I'm not sure what value they add over lifting the operand types into ApplyComparisonOp and just using the CodeOrdering.Op, ie:

case class ApplyComparisonOp(op: CodeOrdering.Op, aTy: Type, bTy: Type, a: IR, b: IT)

I also wonder what's the point of ApplyComparisonOp over ApplyBinaryOp (or indeed, Apply).

Fair questions, but I think outside the scope of this PR. For the first question, I think it's mostly about ease of pattern matching; if an optimization wants to apply only for <, you don't want to muck around at the CodeOrdering level.

Similar for the second question, it's just an issue of what information is encoded in the IR, and how easy it is for optimizations to extract that information.

ehigham · 2023-11-13T21:09:41Z

hail/src/main/scala/is/hail/expr/ir/IR.scala

@@ -319,8 +324,7 @@ final case class StreamLen(a: IR) extends IR
 final case class StreamGrouped(a: IR, groupSize: IR) extends IR
 final case class StreamGroupByKey(a: IR, key: IndexedSeq[String], missingEqual: Boolean) extends IR

-final case class StreamMap(a: IR, name: String, body: IR) extends IR {
-  override def typ: TStream = tcoerce[TStream](super.typ)
+final case class StreamMap(a: IR, name: String, body: IR) extends TypedIR[TStream] {


Really like the use of TypedIR!

ehigham · 2023-11-13T21:21:42Z

hail/src/main/scala/is/hail/expr/ir/Parser.scala

-  refMap: BindingEnv[Type] = BindingEnv.empty[Type],
-  irMap: Map[Int, BaseIR] = Map.empty,
-) {
-
-  def promoteAgg: IRParserEnvironment = copy(refMap = refMap.promoteAgg)
-
-  def promoteScan: IRParserEnvironment = copy(refMap = refMap.promoteScan)
-
-  def promoteAggScan(isScan: Boolean): IRParserEnvironment =
-    if (isScan) promoteScan else promoteAgg
-
-  def noAgg: IRParserEnvironment = copy(refMap = refMap.noAgg)
-
-  def noScan: IRParserEnvironment = copy(refMap = refMap.noScan)
-
-  def noAggScan(isScan: Boolean): IRParserEnvironment =
-    if (isScan) noScan else noAgg
-
-  def createAgg: IRParserEnvironment = copy(refMap = refMap.createAgg)
-
-  def createScan: IRParserEnvironment = copy(refMap = refMap.createScan)
-
-  def onlyRelational: IRParserEnvironment = {
-    if (refMap.eval.isEmpty && refMap.agg.isEmpty && refMap.scan.isEmpty)
-      this
-    else
-      copy(refMap = refMap.onlyRelational)
-  }
-
-  def empty: IRParserEnvironment = copy(refMap = BindingEnv.empty)
-
-  def bindEval(name: String, t: Type): IRParserEnvironment =
-    copy(refMap = refMap.bindEval(name, t))
-
-  def bindEval(bindings: (String, Type)*): IRParserEnvironment =
-    copy(refMap = refMap.bindEval(bindings: _*))
-
-  def bindEval(bindings: Env[Type]): IRParserEnvironment =
-    copy(refMap = refMap.bindEval(bindings.m.toSeq: _*))
-
-  def bindAggScan(isScan: Boolean, bindings: (String, Type)*): IRParserEnvironment =
-    copy(refMap = if (isScan) refMap.bindScan(bindings: _*) else refMap.bindAgg(bindings: _*))
-
-  def bindAgg(name: String, t: Type): IRParserEnvironment =
-    copy(refMap = refMap.bindAgg(name, t))
-
-  def bindAgg(bindings: (String, Type)*): IRParserEnvironment =
-    copy(refMap = refMap.bindAgg(bindings: _*))
-
-  def bindAgg(bindings: Env[Type]): IRParserEnvironment =
-    copy(refMap = refMap.bindAgg(bindings.m.toSeq: _*))
-
-  def bindScan(name: String, t: Type): IRParserEnvironment =
-    copy(refMap = refMap.bindScan(name, t))
-
-  def bindScan(bindings: (String, Type)*): IRParserEnvironment =
-    copy(refMap = refMap.bindScan(bindings: _*))
-
-  def bindScan(bindings: Env[Type]): IRParserEnvironment =
-    copy(refMap = refMap.bindScan(bindings.m.toSeq: _*))
-
-  def bindRelational(name: String, t: Type): IRParserEnvironment =
-    copy(refMap = refMap.bindRelational(name, t))
-}


ehigham · 2023-11-13T21:27:31Z

hail/src/main/scala/is/hail/expr/ir/Parser.scala

        val errorID = int32_literal(it)
        val function = identifier(it)
        val typeArgs = type_exprs(it)
        val rt = type_expr(it)
        ir_value_children(env)(it).map { args =>
-          invoke(function, rt, typeArgs, errorID, args: _*)
+          ApplyIR(function, rt, typeArgs, args, errorID)
+        }
+      case "ApplySpecial" =>
+        val errorID = int32_literal(it)
+        val function = identifier(it)
+        val typeArgs = type_exprs(it)
+        val rt = type_expr(it)
+        ir_value_children(env)(it).map { args =>
+          ApplySpecial(function, typeArgs, args, rt, errorID)
+        }
+      case "Apply" =>
+        val errorID = int32_literal(it)
+        val function = identifier(it)
+        val typeArgs = type_exprs(it)
+        val rt = type_expr(it)
+        ir_value_children(env)(it).map { args =>
+          Apply(function, typeArgs, args, rt, errorID)
        }


This is a little sad. Can you define a function that takes the data constructor to clean all this up? ie

def applyLikeParser[Ap <: IR](cons: (String, Type, IndexedSeq[Type], IndexedSeq[IR], Int) => Ap): Parser[Ap] = /* The de-duplicated implementation */ val parseApply: Parser[Apply] = applyLikeParser(Apply) val parseApplySpecial: Parser[ApplySpecial] = applyLikeParser(ApplySpecial) ... case "Apply" => parseApply(env)(it) case "ApplySpecial" => parseApplySpecial(env)(it) ...

patrick-schultz · 2023-11-13T22:00:54Z

This has significantly improved the simplicity of the parser, so much so that much of the logic therein could be simplified further, though I think that's beyond the scope of this change.

Agreed. But as my follow up will be a complete rewrite of the parser, I definitely don't want to do more to simplify the current one.

I like the separation of type-checking and parsing, however I'd prefer in your implementations of typecheck that you assert one thing at a time. That way when things fail, it'll be clear which assertion was fired (ie if (a && b && c) fails, you don't know if it's a or b or c, whereas
assert(a)
assert(b)
assert(c)
would give you that information.

Good suggestion. I only moved assertions here, didn't add any new ones, but I don't mind splitting up some that I moved.

done

patrick-schultz force-pushed the parser-no-env branch 4 times, most recently from 6342978 to 69d25d8 Compare November 9, 2023 21:43

[compiler] disentangle IR parser from bindings

db78e4c

patrick-schultz force-pushed the parser-no-env branch 2 times, most recently from 8f644a4 to 4112879 Compare November 13, 2023 14:46

patrick-schultz added 4 commits November 13, 2023 10:39

never access types in ir constructors

464d2b6

add return type to TailLoop

7a3e512

remove type from Recur text format

8e0c7fe

misc fixes

c8cfa80

patrick-schultz force-pushed the parser-no-env branch from 4112879 to c8cfa80 Compare November 13, 2023 15:39

patrick-schultz assigned ehigham Nov 13, 2023

patrick-schultz requested a review from ehigham November 13, 2023 15:40

patrick-schultz marked this pull request as ready for review November 13, 2023 16:12

ehigham previously requested changes Nov 13, 2023

View reviewed changes

address comments

739c143

patrick-schultz requested a review from ehigham November 14, 2023 16:44

ehigham approved these changes Nov 14, 2023

View reviewed changes

danking merged commit 9dc3495 into hail-is:main Nov 15, 2023
8 checks passed

patrick-schultz deleted the parser-no-env branch November 15, 2023 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compiler] disentangle IR parser from bindings #13990

[compiler] disentangle IR parser from bindings #13990

patrick-schultz commented Nov 8, 2023 •

edited

Loading

ehigham left a comment

ehigham Nov 13, 2023

patrick-schultz Nov 13, 2023

ehigham Nov 13, 2023

patrick-schultz Nov 13, 2023

ehigham Nov 13, 2023

ehigham Nov 13, 2023

ehigham Nov 13, 2023

patrick-schultz commented Nov 13, 2023

[compiler] disentangle IR parser from bindings #13990

[compiler] disentangle IR parser from bindings #13990

Conversation

patrick-schultz commented Nov 8, 2023 • edited Loading

ehigham left a comment

Choose a reason for hiding this comment

ehigham Nov 13, 2023

Choose a reason for hiding this comment

patrick-schultz Nov 13, 2023

Choose a reason for hiding this comment

ehigham Nov 13, 2023

Choose a reason for hiding this comment

patrick-schultz Nov 13, 2023

Choose a reason for hiding this comment

ehigham Nov 13, 2023

Choose a reason for hiding this comment

ehigham Nov 13, 2023

Choose a reason for hiding this comment

ehigham Nov 13, 2023

Choose a reason for hiding this comment

patrick-schultz commented Nov 13, 2023

patrick-schultz commented Nov 8, 2023 •

edited

Loading