diff --git a/src/hir-lowering.md b/src/hir-lowering.md index b2b6a7cdf..e28bb4cd4 100644 --- a/src/hir-lowering.md +++ b/src/hir-lowering.md @@ -1 +1,119 @@ # HIR lowering + +The HIR -- "High-level IR" -- is the primary IR used in most of +rustc. It is a desugared version of the "abstract syntax tree" (AST) +that is generated after parsing, macro expansion, and name resolution +have completed. Many parts of HIR resemble Rust surface syntax quite +closely, with the exception that some of Rust's expression forms have +been desugared away (as an example, `for` loops are converted into a +`loop` and do not appear in the HIR). + +This chapter covers the main concepts of the HIR. + +### Out-of-band storage and the `Crate` type + +The top-level data-structure in the HIR is the `Crate`, which stores +the contents of the crate currently being compiled (we only ever +construct HIR for the current crate). Whereas in the AST the crate +data structure basically just contains the root module, the HIR +`Crate` structure contains a number of maps and other things that +serve to organize the content of the crate for easier access. + +For example, the contents of individual items (e.g., modules, +functions, traits, impls, etc) in the HIR are not immediately +accessible in the parents. So, for example, if had a module item `foo` +containing a function `bar()`: + +``` +mod foo { + fn bar() { } +} +``` + +Then in the HIR the representation of module `foo` (the `Mod` +stuct) would have only the **`ItemId`** `I` of `bar()`. To get the +details of the function `bar()`, we would lookup `I` in the +`items` map. + +One nice result from this representation is that one can iterate +over all items in the crate by iterating over the key-value pairs +in these maps (without the need to trawl through the IR in total). +There are similar maps for things like trait items and impl items, +as well as "bodies" (explained below). + +The other reason to setup the representation this way is for better +integration with incremental compilation. This way, if you gain access +to a `&hir::Item` (e.g. for the mod `foo`), you do not immediately +gain access to the contents of the function `bar()`. Instead, you only +gain access to the **id** for `bar()`, and you must invoke some +function to lookup the contents of `bar()` given its id; this gives us +a chance to observe that you accessed the data for `bar()` and record +the dependency. + +### Identifiers in the HIR + +Most of the code that has to deal with things in HIR tends not to +carry around references into the HIR, but rather to carry around +*identifier numbers* (or just "ids"). Right now, you will find four +sorts of identifiers in active use: + +- `DefId`, which primarily names "definitions" or top-level items. + - You can think of a `DefId` as being shorthand for a very explicit + and complete path, like `std::collections::HashMap`. However, + these paths are able to name things that are not nameable in + normal Rust (e.g., impls), and they also include extra information + about the crate (such as its version number, as two versions of + the same crate can co-exist). + - A `DefId` really consists of two parts, a `CrateNum` (which + identifies the crate) and a `DefIndex` (which indixes into a list + of items that is maintained per crate). +- `HirId`, which combines the index of a particular item with an + offset within that item. + - the key point of a `HirId` is that it is *relative* to some item (which is named + via a `DefId`). +- `BodyId`, this is an absolute identifier that refers to a specific + body (definition of a function or constant) in the crate. It is currently + effectively a "newtype'd" `NodeId`. +- `NodeId`, which is an absolute id that identifies a single node in the HIR tree. + - While these are still in common use, **they are being slowly phased out**. + - Since they are absolute within the crate, adding a new node + anywhere in the tree causes the node-ids of all subsequent code in + the crate to change. This is terrible for incremental compilation, + as you can perhaps imagine. + +### HIR Map + +Most of the time when you are working with the HIR, you will do so via +the **HIR Map**, accessible in the tcx via `tcx.hir` (and defined in +the `hir::map` module). The HIR map contains a number of methods to +convert between ids of various kinds and to lookup data associated +with a HIR node. + +For example, if you have a `DefId`, and you would like to convert it +to a `NodeId`, you can use `tcx.hir.as_local_node_id(def_id)`. This +returns an `Option` -- this will be `None` if the def-id +refers to something outside of the current crate (since then it has no +HIR node), but otherwise returns `Some(n)` where `n` is the node-id of +the definition. + +Similarly, you can use `tcx.hir.find(n)` to lookup the node for a +`NodeId`. This returns a `Option>`, where `Node` is an enum +defined in the map; by matching on this you can find out what sort of +node the node-id referred to and also get a pointer to the data +itself. Often, you know what sort of node `n` is -- e.g., if you know +that `n` must be some HIR expression, you can do +`tcx.hir.expect_expr(n)`, which will extract and return the +`&hir::Expr`, panicking if `n` is not in fact an expression. + +Finally, you can use the HIR map to find the parents of nodes, via +calls like `tcx.hir.get_parent_node(n)`. + +### HIR Bodies + +A **body** represents some kind of executable code, such as the body +of a function/closure or the definition of a constant. Bodies are +associated with an **owner**, which is typically some kind of item +(e.g., a `fn()` or `const`), but could also be a closure expression +(e.g., `|x, y| x + y`). You can use the HIR map to find the body +associated with a given def-id (`maybe_body_owned_by()`) or to find +the owner of a body (`body_owner_def_id()`). \ No newline at end of file