From 5f5e0b5a732960f629721a84f8c9aa256aa29cab Mon Sep 17 00:00:00 2001 From: Boxy Date: Thu, 21 Nov 2024 12:28:23 +0000 Subject: [PATCH] Reorganize the "Source Code Representation" chapters (#2142) --- src/SUMMARY.md | 28 +++--- src/asm.md | 236 +++------------------------------------------ src/closure.md | 2 +- src/hir.md | 36 +++++-- src/identifiers.md | 107 -------------------- src/the-parser.md | 25 ++++- 6 files changed, 80 insertions(+), 354 deletions(-) delete mode 100644 src/identifiers.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 570256412..772ae3ec9 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -96,10 +96,6 @@ # Source Code Representation - [Prologue](./part-3-intro.md) -- [Command-line arguments](./cli.md) -- [rustc_driver and rustc_interface](./rustc-driver/intro.md) - - [Example: Type checking](./rustc-driver/interacting-with-the-ast.md) - - [Example: Getting diagnostics](./rustc-driver/getting-diagnostics.md) - [Syntax and the AST](./syntax-intro.md) - [Lexing and Parsing](./the-parser.md) - [Macro expansion](./macro-expansion.md) @@ -118,10 +114,22 @@ - [MIR construction](./mir/construction.md) - [MIR visitor and traversal](./mir/visitor.md) - [MIR queries and passes: getting the MIR](./mir/passes.md) -- [Identifiers in the Compiler](./identifiers.md) -- [Closure expansion](./closure.md) - [Inline assembly](./asm.md) +# Supporting Infrastructure + +- [Command-line arguments](./cli.md) +- [rustc_driver and rustc_interface](./rustc-driver/intro.md) + - [Example: Type checking](./rustc-driver/interacting-with-the-ast.md) + - [Example: Getting diagnostics](./rustc-driver/getting-diagnostics.md) +- [Errors and Lints](diagnostics.md) + - [Diagnostic and subdiagnostic structs](./diagnostics/diagnostic-structs.md) + - [Translation](./diagnostics/translation.md) + - [`LintStore`](./diagnostics/lintstore.md) + - [Error codes](./diagnostics/error-codes.md) + - [Diagnostic items](./diagnostics/diagnostic-items.md) + - [`ErrorGuaranteed`](./diagnostics/error-guaranteed.md) + # Analysis - [Prologue](./part-4-intro.md) @@ -190,13 +198,7 @@ - [Closure constraints](./borrow_check/region_inference/closure_constraints.md) - [Error reporting](./borrow_check/region_inference/error_reporting.md) - [Two-phase-borrows](./borrow_check/two_phase_borrows.md) -- [Errors and Lints](diagnostics.md) - - [Diagnostic and subdiagnostic structs](./diagnostics/diagnostic-structs.md) - - [Translation](./diagnostics/translation.md) - - [`LintStore`](./diagnostics/lintstore.md) - - [Error codes](./diagnostics/error-codes.md) - - [Diagnostic items](./diagnostics/diagnostic-items.md) - - [`ErrorGuaranteed`](./diagnostics/error-guaranteed.md) +- [Closure capture inference](./closure.md) - [Async closures/"coroutine-closures"](coroutine-closures.md) # MIR to Binaries diff --git a/src/asm.md b/src/asm.md index b19f2ad46..677a08910 100644 --- a/src/asm.md +++ b/src/asm.md @@ -54,111 +54,26 @@ string parsing. The remaining options are mostly passed through to LLVM with lit ## AST -`InlineAsm` is represented as an expression in the AST: - -```rust -pub struct InlineAsm { - pub template: Vec, - pub template_strs: Box<[(Symbol, Option, Span)]>, - pub operands: Vec<(InlineAsmOperand, Span)>, - pub clobber_abi: Option<(Symbol, Span)>, - pub options: InlineAsmOptions, - pub line_spans: Vec, -} - -pub enum InlineAsmRegOrRegClass { - Reg(Symbol), - RegClass(Symbol), -} - -pub enum InlineAsmOperand { - In { - reg: InlineAsmRegOrRegClass, - expr: P, - }, - Out { - reg: InlineAsmRegOrRegClass, - late: bool, - expr: Option>, - }, - InOut { - reg: InlineAsmRegOrRegClass, - late: bool, - expr: P, - }, - SplitInOut { - reg: InlineAsmRegOrRegClass, - late: bool, - in_expr: P, - out_expr: Option>, - }, - Const { - anon_const: AnonConst, - }, - Sym { - expr: P, - }, -} -``` +`InlineAsm` is represented as an expression in the AST with the [`ast::InlineAsm` type][inline_asm_ast]. The `asm!` macro is implemented in `rustc_builtin_macros` and outputs an `InlineAsm` AST node. The template string is parsed using `fmt_macros`, positional and named operands are resolved to explicit operand indices. Since target information is not available to macro invocations, validation of the registers and register classes is deferred to AST lowering. +[inline_asm_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/struct.InlineAsm.html + ## HIR -`InlineAsm` is represented as an expression in the HIR: - -```rust -pub struct InlineAsm<'hir> { - pub template: &'hir [InlineAsmTemplatePiece], - pub template_strs: &'hir [(Symbol, Option, Span)], - pub operands: &'hir [(InlineAsmOperand<'hir>, Span)], - pub options: InlineAsmOptions, - pub line_spans: &'hir [Span], -} - -pub enum InlineAsmRegOrRegClass { - Reg(InlineAsmReg), - RegClass(InlineAsmRegClass), -} - -pub enum InlineAsmOperand<'hir> { - In { - reg: InlineAsmRegOrRegClass, - expr: Expr<'hir>, - }, - Out { - reg: InlineAsmRegOrRegClass, - late: bool, - expr: Option>, - }, - InOut { - reg: InlineAsmRegOrRegClass, - late: bool, - expr: Expr<'hir>, - }, - SplitInOut { - reg: InlineAsmRegOrRegClass, - late: bool, - in_expr: Expr<'hir>, - out_expr: Option>, - }, - Const { - anon_const: AnonConst, - }, - Sym { - expr: Expr<'hir>, - }, -} -``` +`InlineAsm` is represented as an expression in the HIR with the [`hir::InlineAsm` type][inline_asm_hir]. AST lowering is where `InlineAsmRegOrRegClass` is converted from `Symbol`s to an actual register or register class. If any modifiers are specified for a template string placeholder, these are validated against the set allowed for that operand type. Finally, explicit registers for inputs and outputs are checked for conflicts (same register used for different operands). +[inline_asm_hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.InlineAsm.html + ## Type checking Each register class has a whitelist of types that it may be used with. After the types of all @@ -169,113 +84,17 @@ be used for an operand based on the type that was passed into it. ## THIR -`InlineAsm` is represented as an expression in the THIR: - -```rust -crate enum ExprKind<'tcx> { - // [..] - InlineAsm { - template: &'tcx [InlineAsmTemplatePiece], - operands: Box<[InlineAsmOperand<'tcx>]>, - options: InlineAsmOptions, - line_spans: &'tcx [Span], - }, -} -crate enum InlineAsmOperand<'tcx> { - In { - reg: InlineAsmRegOrRegClass, - expr: ExprId, - }, - Out { - reg: InlineAsmRegOrRegClass, - late: bool, - expr: Option, - }, - InOut { - reg: InlineAsmRegOrRegClass, - late: bool, - expr: ExprId, - }, - SplitInOut { - reg: InlineAsmRegOrRegClass, - late: bool, - in_expr: ExprId, - out_expr: Option, - }, - Const { - value: &'tcx Const<'tcx>, - span: Span, - }, - SymFn { - expr: ExprId, - }, - SymStatic { - def_id: DefId, - }, -} -``` +`InlineAsm` is represented as an expression in the THIR with the [`InlineAsmExpr` type][inline_asm_thir]. The only significant change compared to HIR is that `Sym` has been lowered to either a `SymFn` whose `expr` is a `Literal` ZST of the `fn`, or a `SymStatic` which points to the `DefId` of a `static`. +[inline_asm_thir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/thir/struct.InlineAsmExpr.html + ## MIR -`InlineAsm` is represented as a `Terminator` in the MIR: - -```rust -pub enum TerminatorKind<'tcx> { - // [..] - - /// Block ends with an inline assembly block. This is a terminator since - /// inline assembly is allowed to diverge. - InlineAsm { - /// The template for the inline assembly, with placeholders. - template: &'tcx [InlineAsmTemplatePiece], - - /// The operands for the inline assembly, as `Operand`s or `Place`s. - operands: Vec>, - - /// Miscellaneous options for the inline assembly. - options: InlineAsmOptions, - - /// Source spans for each line of the inline assembly code. These are - /// used to map assembler errors back to the line in the source code. - line_spans: &'tcx [Span], - - /// Destination block after the inline assembly returns, unless it is - /// diverging (InlineAsmOptions::NORETURN). - destination: Option, - }, -} - -pub enum InlineAsmOperand<'tcx> { - In { - reg: InlineAsmRegOrRegClass, - value: Operand<'tcx>, - }, - Out { - reg: InlineAsmRegOrRegClass, - late: bool, - place: Option>, - }, - InOut { - reg: InlineAsmRegOrRegClass, - late: bool, - in_value: Operand<'tcx>, - out_place: Option>, - }, - Const { - value: Box>, - }, - SymFn { - value: Box>, - }, - SymStatic { - def_id: DefId, - }, -} -``` +`InlineAsm` is represented as a `Terminator` in the MIR with the [`TerminatorKind::InlineAsm` variant][inline_asm_mir] As part of THIR lowering, `InOut` and `SplitInOut` operands are lowered to a split form with a separate `in_value` and `out_place`. @@ -283,38 +102,11 @@ separate `in_value` and `out_place`. Semantically, the `InlineAsm` terminator is similar to the `Call` terminator except that it has multiple output places where a `Call` only has a single return place output. +[inline_asm_mir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.TerminatorKind.html#variant.InlineAsm + ## Codegen -Operands are lowered one more time before being passed to LLVM codegen: - -```rust -pub enum InlineAsmOperandRef<'tcx, B: BackendTypes + ?Sized> { - In { - reg: InlineAsmRegOrRegClass, - value: OperandRef<'tcx, B::Value>, - }, - Out { - reg: InlineAsmRegOrRegClass, - late: bool, - place: Option>, - }, - InOut { - reg: InlineAsmRegOrRegClass, - late: bool, - in_value: OperandRef<'tcx, B::Value>, - out_place: Option>, - }, - Const { - string: String, - }, - SymFn { - instance: Instance<'tcx>, - }, - SymStatic { - def_id: DefId, - }, -} -``` +Operands are lowered one more time before being passed to LLVM codegen, this is represented by the [`InlineAsmOperandRef` type][inline_asm_codegen] from `rustc_codegen_ssa`. The operands are lowered to LLVM operands and constraint codes as follow: - `out` and the output part of `inout` operands are added first, as required by LLVM. Late output @@ -339,6 +131,8 @@ Note that LLVM is sometimes rather picky about what types it accepts for certain so we sometimes need to insert conversions to/from a supported type. See the target-specific ISelLowering.cpp files in LLVM for details of what types are supported for each register class. +[inline_asm_codegen]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/enum.InlineAsmOperandRef.html + ## Adding support for new architectures Adding inline assembly support to an architecture is mostly a matter of defining the registers and diff --git a/src/closure.md b/src/closure.md index dc3ef23ed..718a0e5d7 100644 --- a/src/closure.md +++ b/src/closure.md @@ -1,4 +1,4 @@ -# Closure Expansion in rustc +# Closure Capture Inference This section describes how rustc handles closures. Closures in Rust are effectively "desugared" into structs that contain the values they use (or diff --git a/src/hir.md b/src/hir.md index 7d338cde0..db26e7f96 100644 --- a/src/hir.md +++ b/src/hir.md @@ -78,18 +78,40 @@ the compiler a chance to observe that you accessed the data for ## Identifiers in the HIR -There are a bunch of different identifiers to refer to other nodes or definitions -in the HIR. In short: -- A [`DefId`] refers to a *definition* in any crate. -- A [`LocalDefId`] refers to a *definition* in the currently compiled crate. -- A [`HirId`] refers to *any node* in the HIR. +The HIR uses a bunch of different identifiers that coexist and serve different purposes. -For more detailed information, check out the [chapter on identifiers][ids]. +- A [`DefId`], as the name suggests, identifies a particular definition, or top-level + item, in a given crate. It is composed of two parts: a [`CrateNum`] which identifies + the crate the definition comes from, and a [`DefIndex`] which identifies the definition + within the crate. Unlike [`HirId`]s, there isn't a [`DefId`] for every expression, which + makes them more stable across compilations. + +- A [`LocalDefId`] is basically a [`DefId`] that is known to come from the current crate. + This allows us to drop the [`CrateNum`] part, and use the type system to ensure that + only local definitions are passed to functions that expect a local definition. + +- A [`HirId`] uniquely identifies a node in the HIR of the current crate. It is composed + of two parts: an `owner` and a `local_id` that is unique within the `owner`. This + combination makes for more stable values which are helpful for incremental compilation. + Unlike [`DefId`]s, a [`HirId`] can refer to [fine-grained entities][Node] like expressions, + but stays local to the current crate. + +- A [`BodyId`] identifies a HIR [`Body`] in the current crate. It is currently only + a wrapper around a [`HirId`]. For more info about HIR bodies, please refer to the + [HIR chapter][hir-bodies]. + +These identifiers can be converted into one another through the [HIR map][map]. [`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html [`LocalDefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.LocalDefId.html [`HirId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir_id/struct.HirId.html -[ids]: ./identifiers.md#in-the-hir +[`BodyId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.BodyId.html +[`CrateNum`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.CrateNum.html +[`DefIndex`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefIndex.html +[`Body`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Body.html +[hir-map]: ./hir.md#the-hir-map +[hir-bodies]: ./hir.md#hir-bodies +[map]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/map/struct.Map.html ## The HIR Map diff --git a/src/identifiers.md b/src/identifiers.md deleted file mode 100644 index 31bc355da..000000000 --- a/src/identifiers.md +++ /dev/null @@ -1,107 +0,0 @@ -# Identifiers in the compiler - -If you have read the few previous chapters, you now know that `rustc` uses -many different intermediate representations to perform different kinds of analyses. -However, like in every data structure, you need a way to traverse the structure -and refer to other elements. In this chapter, you will find information on the -different identifiers `rustc` uses for each intermediate representation. - -## In the AST - -A [`NodeId`] is an identifier number that uniquely identifies an AST node within -a crate. Every node in the AST has its own [`NodeId`], including top-level items -such as structs, but also individual statements and expressions. - -However, because they are absolute within a crate, adding or removing a single -node in the AST causes all the subsequent [`NodeId`]s to change. This renders -[`NodeId`]s pretty much useless for incremental compilation, where you want as -few things as possible to change. - -[`NodeId`]s are used in all the `rustc` bits that operate directly on the AST, -like macro expansion and name resolution. - -[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html - -## In the HIR - -The HIR uses a bunch of different identifiers that coexist and serve different purposes. - -- A [`DefId`], as the name suggests, identifies a particular definition, or top-level - item, in a given crate. It is composed of two parts: a [`CrateNum`] which identifies - the crate the definition comes from, and a [`DefIndex`] which identifies the definition - within the crate. Unlike [`HirId`]s, there isn't a [`DefId`] for every expression, which - makes them more stable across compilations. - -- A [`LocalDefId`] is basically a [`DefId`] that is known to come from the current crate. - This allows us to drop the [`CrateNum`] part, and use the type system to ensure that - only local definitions are passed to functions that expect a local definition. - -- A [`HirId`] uniquely identifies a node in the HIR of the current crate. It is composed - of two parts: an `owner` and a `local_id` that is unique within the `owner`. This - combination makes for more stable values which are helpful for incremental compilation. - Unlike [`DefId`]s, a [`HirId`] can refer to [fine-grained entities][Node] like expressions, - but stays local to the current crate. - -- A [`BodyId`] identifies a HIR [`Body`] in the current crate. It is currently only - a wrapper around a [`HirId`]. For more info about HIR bodies, please refer to the - [HIR chapter][hir-bodies]. - -These identifiers can be converted into one another through the [HIR map][map]. -See the [HIR chapter][hir-map] for more detailed information. - -[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html -[`LocalDefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.LocalDefId.html -[`HirId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir_id/struct.HirId.html -[`BodyId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.BodyId.html -[`CrateNum`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.CrateNum.html -[`DefIndex`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefIndex.html -[`Body`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Body.html -[Node]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/enum.Node.html -[hir-map]: ./hir.md#the-hir-map -[hir-bodies]: ./hir.md#hir-bodies -[map]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/map/struct.Map.html - -## In the MIR - -- [`BasicBlock`] identifies a *basic block*. It points to an instance of - [`BasicBlockData`], which can be retrieved by indexing into - [`Body.basic_blocks`]. - -- [`Local`] identifies a local variable in a function. Its associated data is in - [`LocalDecl`], which can be retrieved by indexing into [`Body.local_decls`]. - -- [`FieldIdx`] identifies a struct's, union's, or enum variant's field. It is used - as a "projection" in [`Place`]. - -- [`SourceScope`] identifies a name scope in the original source code. Used for - diagnostics and for debuginfo in debuggers. It points to an instance of - [`SourceScopeData`], which can be retrieved by indexing into - [`Body.source_scopes`]. - -- [`Promoted`] identifies a promoted constant within another item (related to - const evaluation). Note: it is unique only locally within the item, so it - should be associated with a `DefId`. - [`GlobalId`] will give you a more specific identifier. - -- [`GlobalId`] identifies a global variable: a `const`, a `static`, a `const fn` - where all arguments are [zero-sized types], or a promoted constant. - -- [`Location`] represents the location in the MIR of a statement or terminator. - It identifies the block (using [`BasicBlock`]) and the index of the statement - or terminator in the block. - -[`BasicBlock`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.BasicBlock.html -[`BasicBlockData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.BasicBlockData.html -[`Body.basic_blocks`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html#structfield.basic_blocks -[`Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Local.html -[`LocalDecl`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.LocalDecl.html -[`Body.local_decls`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html#structfield.local_decls -[`FieldIdx`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_target/abi/struct.FieldIdx.html -[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Place.html -[`SourceScope`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.SourceScope.html -[`SourceScopeData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.SourceScopeData.html -[`Body.source_scopes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Body.html#structfield.source_scopes -[`Promoted`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Promoted.html -[`GlobalId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/interpret/struct.GlobalId.html -[`Location`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/struct.Location.html -[zero-sized types]: https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-sized-types-zsts diff --git a/src/the-parser.md b/src/the-parser.md index ad66bdbab..703ef2794 100644 --- a/src/the-parser.md +++ b/src/the-parser.md @@ -6,7 +6,7 @@ This happens in two stages: Lexing and Parsing. 1. _Lexing_ takes strings and turns them into streams of [tokens]. For example, `foo.bar + buz` would be turned into the tokens `foo`, `.`, `bar`, - `+`, and `buz`. + `+`, and `buz`. This is implemented in [`rustc_lexer`][lexer]. [tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html [lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html @@ -14,16 +14,31 @@ This happens in two stages: Lexing and Parsing. 2. _Parsing_ takes streams of tokens and turns them into a structured form which is easier for the compiler to work with, usually called an [*Abstract Syntax Tree* (AST)][ast] . - - -An AST mirrors the structure of a Rust program in memory, using a `Span` to + +## The AST + +The AST mirrors the structure of a Rust program in memory, using a `Span` to link a particular AST node back to its source text. The AST is defined in [`rustc_ast`][rustc_ast], along with some definitions for tokens and token streams, data structures/traits for mutating ASTs, and shared definitions for other AST-related parts of the compiler (like the lexer and macro-expansion). -The lexer is developed in [`rustc_lexer`][lexer]. +Every node in the AST has its own [`NodeId`], including top-level items +such as structs, but also individual statements and expressions. A [`NodeId`] +is an identifier number that uniquely identifies an AST node within a crate. + +However, because they are absolute within a crate, adding or removing a single +node in the AST causes all the subsequent [`NodeId`]s to change. This renders +[`NodeId`]s pretty much useless for incremental compilation, where you want as +few things as possible to change. + +[`NodeId`]s are used in all the `rustc` bits that operate directly on the AST, +like macro expansion and name resolution (more on these over the next couple chapters). + +[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html + +## Parsing The parser is defined in [`rustc_parse`][rustc_parse], along with a high-level interface to the lexer and some validation routines that run after