From 92588e478b730e458dbe36a595e77a20b8b39442 Mon Sep 17 00:00:00 2001 From: Matthew Jasper Date: Sat, 4 Jul 2020 17:00:34 +0100 Subject: [PATCH 1/2] Document serialization in rustc --- src/SUMMARY.md | 1 + src/serialization.md | 164 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 165 insertions(+) create mode 100644 src/serialization.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 1fe7dcf11..70c34f2a6 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -58,6 +58,7 @@ - [Profiling Queries](./queries/profiling.md) - [Salsa](./salsa.md) - [Memory Management in Rustc](./memory.md) +- [Serialization in Rustc](./serialization.md) - [Parallel Compilation](./parallel-rustc.md) - [Rustdoc](./rustdoc-internals.md) diff --git a/src/serialization.md b/src/serialization.md new file mode 100644 index 000000000..252dcf8ac --- /dev/null +++ b/src/serialization.md @@ -0,0 +1,164 @@ +# Serialization in Rustc + +Rustc has to [serialize] and deserialize various data during compilation. +Specifially: + +- "Crate metadata", mainly query outputs, are serialized in a binary + format into `rlib` and `rmeta` files that are output when compiling a library + crate, these are then deserialized by crates that depend on that library. +- Certain query outputs are serialized in a binary format to + [persist incremental compilation results]. +- The `-Z ast-json` and `-Z ast-json-noexpand` flags serialize the [AST] to json + and output the result to stdout. +- [`CrateInfo`] is serialized to json when the `-Z no-link` flag is used, and + deserialized from json when the `-Z link-only` flag is used. + +## The `Encodable` and `Decodable` traits + +The [`rustc_serialize`] crate defines two traits for types which can be serialized: + +```rust +pub trait Encodable { + fn encode(&self, s: &mut S) -> Result<(), S::Error>; +} + +pub trait Decodable: Sized { + fn decode(d: &mut D) -> Result; +} +``` + +It also defines implementations of these for integer types, floating point +types, `bool`, `char`, `str` and various common standard library types. + +For types that are constructed from those types, `Encodable` and `Decodable` are +usually implemented by [derives]. These generate implementations that forward +deserialization to the fields of the struct or enum. For a struct those impls +look something like this: + +```rust +# #![feature(rustc_private)] +# extern crate rustc_serialize; +# use rustc_serialize::{Decodable, Decoder, Encodable, Encoder}; + +struct MyStruct { + int: u32, + float: f32, +} + +impl Encodable for MyStruct { + fn encode(&self, s: &mut E) -> Result<(), E::Error> { + s.emit_struct("MyStruct", 2, |s| { + s.emit_struct_field("int", 0, |s| self.int.encode(s))?; + s.emit_struct_field("float", 1, |s| self.float.encode(s)) + }) + } +} +impl Decodable for MyStruct { + fn decode(s: &mut D) -> Result { + s.read_struct("MyStruct", 2, |d| { + let int = d.read_struct_field("int", 0, Decodable::decode)?; + let float = d.read_struct_field("float", 1, Decodable::decode)?; + + Ok(MyStruct::new(int, float, SyntaxContext::root())) + }) + } +} +``` + +## Encoding and Decoding arena allocated types + +Rustc has a lot of [arena allocated types]. Deserializing these types isn't +possible without access to the arena that they need to be allocated on. The +[`TyDecoder`] and [`TyEncoder`] traits are supertraits of `Decoder` and +`Encoder` that allow access to a `TyCtxt`. + +Types which contain arena allocated types can then bound the type parameter of +their `Encodable` and `Decodable` implementations with these traits. For +example + +```rust,ignore +impl<'tcx, D: TyDecoder<'tcx>> Decodable for MyStruct<'tcx> { + /* ... */ +} +``` + +The `TyEncodable` and `TyDecodable` [derive macros](derives) will expand to such +an implementation. + +Decoding the actual arena allocated type is harder, because some of the +implementations can't be written due to the orphan rules. To work around this, +the [`RefDecodable`] trait is defined in `rustc_middle`. This can then be +implemented for any type. The `TyDecodable` macro will call `RefDecodable` to +decode references, but various generic code needs types to actually be +`Decodable` with a specific decoder. + +For interned types instead of manually implementing `RefDecodable`, using a new +type wrapper, like `ty::Predicate` and manually implementing `Encodable` and +`Decodable` may be simpler. + +## Derive macros + +The `rustc_macros` crate defines various derives to help implement `Decodable` +and `Encodable`. + +- The `Encodable` and `Decodable` macros generate implementations that apply to + all `Encoders` and `Decoders`. These should be used in crates that don't + depend on `rustc_middle`, or that have to be serialized by a type that does + not implement `TyEncoder`. +- `MetadataEncodable` and `MetadataDecodable` generate implementations that + only allow decoding by [`rustc_metadata::rmeta::encoder::EncodeContext`] and + [`rustc_metadata::rmeta::decoder::DecodeContext`]. These are used for types + that contain `rustc_metadata::rmeta::Lazy`. +- `TyEncodable` and `TyDecoder` generate implementation that apply to any + `TyEncoder` or `TyDecoder`. These should be used for types that are only + serialized in crate metadata and/or the incremental cache, which is most + serializable types in `rustc_middle`. + +## Shorthands + +`Ty` can be deeply recursive, if each `Ty` was encoded naively then crate +metadata would be very large. To handle this, each `TyEncoder` has a cache of +locations in its output where it has serialized types. If a type being encoded +is in the cache, then instead of serializing the type as usual, the byte offset +within the file being written is encoded instead. A similar scheme is used for +`ty::Predicate`. + +## `Lazy` + +Crate metadata is initially loaded before the `TyCtxt<'tcx>` is created, so +some deserialization needs to be deferred from the initial loading of metadata. +The [`Lazy`] type wraps the (relative) offset in the crate metadata where a +`T` has been serialized. + +The `Lazy<[T]>` and `Lazy>` type provide some functionality over +`Lazy>` and `Lazy>`: + +- It's possible to encode a `Lazy<[T]>` directly from an iterator, without + first collecting into a `Vec`. +- Indexing into a `Lazy>` does not require decoding entries other + than the one being read. + +**note**: `Lazy` does not cache its value after being deserialized the first +time. Instead the query system is the main way of caching these results. + +## Specialization + +A few types, most notably `DefId`, need to have different implementations for +different `Encoder`s. This is currently handled by ad-hoc specializations: +`DefId` has a `default` implementation of `Encodable` and a specialized one +for `Encodable`. + +[arena allocated types]: memory.md +[AST]: the-parser.md +[derives]: #derive-macros +[persist incremental compilation results]: queries/incremental-compilation-in-detail.md#the-real-world-how-persistence-makes-everything-complicated +[serialize]: https://en.wikipedia.org/wiki/Serialization + +[`CrateInfo`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/struct.CrateInfo.html +[`Lazy`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.Lazy.html +[`RefDecodable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/codec/trait.RefDecodable.html +[`rustc_metadata::rmeta::decoder::DecodeContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/decoder/struct.DecodeContext.html +[`rustc_metadata::rmeta::encoder::EncodeContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/encoder/struct.EncodeContext.html +[`rustc_serialize`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_serialize/index.html +[`TyDecoder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/codec/trait.TyEncoder.html +[`TyEncoder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/codec/trait.TyDecoder.html From 0e0b79047ebbc20b0be8758a7ebfd43a331f3791 Mon Sep 17 00:00:00 2001 From: Matthew Jasper Date: Sun, 16 Aug 2020 11:37:48 +0100 Subject: [PATCH 2/2] Address revuew comments --- src/serialization.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/serialization.md b/src/serialization.md index 252dcf8ac..292ffc546 100644 --- a/src/serialization.md +++ b/src/serialization.md @@ -1,7 +1,7 @@ # Serialization in Rustc Rustc has to [serialize] and deserialize various data during compilation. -Specifially: +Specifically: - "Crate metadata", mainly query outputs, are serialized in a binary format into `rlib` and `rmeta` files that are output when compiling a library @@ -17,7 +17,7 @@ Specifially: The [`rustc_serialize`] crate defines two traits for types which can be serialized: -```rust +```rust,ignore pub trait Encodable { fn encode(&self, s: &mut S) -> Result<(), S::Error>; } @@ -35,7 +35,7 @@ usually implemented by [derives]. These generate implementations that forward deserialization to the fields of the struct or enum. For a struct those impls look something like this: -```rust +```rust,ingore # #![feature(rustc_private)] # extern crate rustc_serialize; # use rustc_serialize::{Decodable, Decoder, Encodable, Encoder}; @@ -59,7 +59,7 @@ impl Decodable for MyStruct { let int = d.read_struct_field("int", 0, Decodable::decode)?; let float = d.read_struct_field("float", 1, Decodable::decode)?; - Ok(MyStruct::new(int, float, SyntaxContext::root())) + Ok(MyStruct { int, float }) }) } } @@ -82,7 +82,7 @@ impl<'tcx, D: TyDecoder<'tcx>> Decodable for MyStruct<'tcx> { } ``` -The `TyEncodable` and `TyDecodable` [derive macros](derives) will expand to such +The `TyEncodable` and `TyDecodable` [derive macros][derives] will expand to such an implementation. Decoding the actual arena allocated type is harder, because some of the