Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

Add a high-level overview of the RLS #1257

Merged
merged 7 commits into from
Jan 25, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ editors, and other tools with information about Rust programs. It supports
functionality such as 'goto definition', symbol search, reformatting, and code
completion, and enables renaming and refactorings.

A high-level overview of the architecture can be found [here](architecture.md).

The RLS gets its source data from the compiler and from
[Racer](https://github.com/racer-rust/racer). Where possible it uses data from
the compiler which is precise and complete. Where it is not possible, (for example
Expand Down
182 changes: 182 additions & 0 deletions architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# Rust Language Server (RLS) - Architecture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link this file from README.md and contributing.md


## Preface
In addition to the document below, an architecture overview can be found at @nrc's blog post [How the RLS works](https://www.ncameron.org/blog/how-the-rls-works/) (2017). While some bits have changed, the gist of it stays the same.

Here we aim to explain in-depth how RLS obtains the underlying data to drive its indexing features as the context for the upcoming IDE planning and discussion at the 2019 Rust All-Hands.

Also the [rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/blob/e0d8c86563b72e5414cf10fe16da5e88201447e2/guide.md) guide is a great resource as it covers a lot of common ground.

## High-level overview

At the time of writing, at the highest level RLS compiles your package/workspace (similar to `cargo check`) and reuses resulting internal data structures of the Rust compiler to power its indexing features.

When initialized, (unless overriden by custom build command) RLS `cargo check`s the current project and collects inter-crate [1] dependency graph along with exact crate compilation invocations, which is used later to run the compiler again itself (but in-process).

In-process compilation runs return populated internal data structures (`rls_data::Analysis`), which are further lowered and cross-referenced to expose a low-level indexing API (`rls_analysis::Analysis`) to finally be consumed by the Rust Language Server in order to answer relevant LSP [2] queries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to say why we do in-process and also when (and why) we do standard out of process build


The main reason we execute the compilation in-process is to optimize the
latency - we pass the resulting data structures in-memory. For dependencies that
don't change often (non-primary/path dependencies) we perform the compilation out of process
once, where we dump and cache the resulting data into a JSON file, which only needs to be read once at the start of indexing.

[1] *crate* is a single unit of compilation as compiled by `rustc`. For example, Cargo package with bin+lib has *two* crates (sometimes called *targets* by Cargo).

[2] [*Language Server Protocol*](https://microsoft.github.io/language-server-protocol/specification) is a language-agnostic JSON-RPC protocol which serves to expose common language "smartness" operations via standardized interface, regardless of the IDE/editor used.

## Information flow (in-depth)
The current flow is as follows:
```
rustc -> librustc_save_analysis -> rls_data -> rls_analysis -> rls
```

### [librustc_save_analysis](https://github.com/rust-lang/rust/tree/master/src/librustc_save_analysis)

The Rust compiler includes the [`librustc_save_analysis`](https://github.com/rust-lang/rust/tree/master/src/librustc_save_analysis) crate, which allows to dump the knowledge about the currently compiled crate. The main entry point is [`process_crate`](https://github.com/rust-lang/rust/blob/7164a9f151a56316a382d8bc2b15ccf373e129ca/src/librustc_save_analysis/lib.rs#L1119), which walks the post-macro-expansion AST and [saves](https://github.com/rust-lang/rust/blob/7164a9f151a56316a382d8bc2b15ccf373e129ca/src/librustc_save_analysis/lib.rs#L1146) the collected knowledge either by [dumping to a JSON file](https://github.com/rust-lang/rust/blob/7164a9f151a56316a382d8bc2b15ccf373e129ca/src/librustc_save_analysis/lib.rs#L1074-L1090) or by [calling back with resulting data structure](https://github.com/rust-lang/rust/blob/7164a9f151a56316a382d8bc2b15ccf373e129ca/src/librustc_save_analysis/lib.rs#L1092-L1117).

### [rls_data](https://github.com/rust-dev-tools/rls-data)

As mentioned previously, the returned data structure is [`rls_data::Analysis`](https://github.com/rust-dev-tools/rls-data/blob/9edbe8b4947c10ef670c4723be375c6944cab640/src/lib.rs#L30-L48) inside the [`rls_data`](https://github.com/rust-dev-tools/rls-data) crate:
```rust
/// Basically a type alias, we refer to nodes with HIR ids.
/// All of the below nodes are either identified by or refer to those IDs.
type Id = rustc::hir::def_id::DefId;

pub struct Analysis {
...
/// Contains rustc invocation producing this save-analysis data. Currently
/// used to support RLS in custom build system context, namely Buck).
pub compilation: Option<CompilationOptions>,
/// Path to current crate root file and current crate `GlobalCrateId`. Also
/// includes externally referred crates' `GlobalCrateId` and their local
/// `CrateNum` index as stored by rustc from this crate compilation PoV.
pub prelude: Option<CratePreludeData>,
/// Nodes of use tree forests, incl. relation, span, optional alias value
/// and kind (extern crate, simple use or glob).
pub imports: Vec<Import>,
/// Main data nodes. Roughly correspond to post-expansion AST nodes, incl.
/// span, qualified name, relations, optional signature (if a function),
/// docs and attributes.
pub defs: Vec<Def>,
/// Nodes for `impl` items, incl. kind (inherent, trait impl, ...), span,
/// children ids, docs, attributes and signature.
pub impls: Vec<Impl>,
/// Span which refers an `Id` of function/module/type/variable kind.
pub refs: Vec<Ref>,
/// Contains callsite and callee span along with macro qualified name.
pub macro_refs: Vec<MacroRef>,
/// Impl/SuperTrait relation between `Id`s with an associated span.
pub relations: Vec<Relation>,
}
```

### [rls_analysis](https://github.com/rust-dev-tools/rls-analysis)

This [crate](https://github.com/rust-dev-tools/rls-analysis) is responsible for loading and stitching multiple of
the `rls_data::Analysis` data structures into a single, coherent interface.

Whereas `rls_data` format can be considered an implementation detail that might
change, this crate aims to provide a 'stable' API.

Another reason behind that is that each of those structures contains data centric
to the crate that was being compiled - this [lowering](https://github.com/rust-dev-tools/rls-analysis/blob/bd82c9b38b56e53bbfb199569a32b392056964fd/src/lowering.rs#L167)
cross-references the data
and indexes it, resulting in a database spanning multiple crates that can be
queried like 'across known crates, find all references to a definition at a
given span' or similarly.

We are capable of updating the index with new crate data. Whenever we encounter
a new crate, we [record and translate](https://github.com/rust-dev-tools/rls-analysis/blob/bd82c9b38b56e53bbfb199569a32b392056964fd/src/lowering.rs#L131-L154)
the crate id into our database-wide crate id mapping.

However, if data for an already lowered crate is loaded again, we simply
replace the definitions for a given crate and re-index.

One interesting edge case is when we lower data for crates having the same name, such
as binary and `#[cfg(test)]`-compiled version of it. We need to ensure we lower a given definition
[only once](https://github.com/rust-dev-tools/rls-analysis/blob/bd82c9b38b56e53bbfb199569a32b392056964fd/src/lowering.rs#L258-L263)
, even if it technically is repeated across multiple crates.

### rls

With all the data lowering logic in place, all we have to do is actually fetch
the data - that happens inside the RLS.

In general, apart from being an LSP server, the RLS is also concerned with
build orchestration, and coordination of other components, such as
* Racer for autocompletion
* Cargo for project layout detection and initial build coordination
* internal virtual file system (VFS) for handling in-memory text buffers,
* rls-analysis serving as our knowledge database
* Clippy performing additional lints
* Rustfmt driving our formatting capabilities

After doing initial compilation with Cargo, we cache a subgraph of the inter-crate
dependency graph along with compilation invocations and input files for the
crates we're interested in (inside the [primary or path-based packages](https://github.com/rust-lang/rls/blob/d7c2eb8b641ae7e6d7145c268249f28efcf5467c/src/build/cargo.rs#L376-L381))
which we later rerun manually.

In our case Cargo is configured to use a separate target directory
(`$target-dir/rls`) so the analysis does not interfere with regular builds.

We hijack the Cargo build process not only to record the build orchestration
data mentioned above but also to inject additional compiler flags forcing the compiler
to dump the JSON save-analysis files for each dependency. This serves as an
optimization so that we don't have to re-check dependencies in a fresh run.

Because RLS aims to provide a truthful view of the compilation and to maintain
parity with regular `cargo check` flow, our Cargo runs also run `build.rs` and
proc macros initially and when needed (e.g. by modifying a file causing
`build.rs` to be rerun).

## Build scheduling

On every relevant file change we [mark files as dirty](https://github.com/rust-lang/rls/blob/d7c2eb8b641ae7e6d7145c268249f28efcf5467c/src/actions/notifications.rs#L131) and schedule a normal build.

We currently discern two [build priorities](https://github.com/rust-lang/rls/blob/67bce0bdcf2db1d3c05bb1a3d87df9e66eaec7db/src/build/mod.rs#L124-L133):
* Normal
* Cargo

The latter is scheduled whenever a change happened that can impact entire
project. This includes:
* [initial build](https://github.com/rust-lang/rls/blob/67f2a86c13a34dcb231436c2f1db8900fece3c09/src/actions/mod.rs#L331)
* [configuration change](https://github.com/rust-lang/rls/blob/67f2a86c13a34dcb231436c2f1db8900fece3c09/src/actions/notifications.rs#L207)
(can potentially build different set of packages)
* [Cargo.toml change](https://github.com/rust-lang/rls/blob/67f2a86c13a34dcb231436c2f1db8900fece3c09/src/actions/notifications.rs#L273)
(ditto)
* [build directory](https://github.com/rust-lang/rls/blob/67bce0bdcf2db1d3c05bb1a3d87df9e66eaec7db/src/build/mod.rs#L479-L486) changed
* [modified file](https://github.com/rust-lang/rls/blob/d6570bc62575e03412340e55620cbf24fe59f772/src/build/cargo_plan.rs#L379-L383) in a package we didn't build
* [build.rs](https://github.com/rust-lang/rls/blob/d6570bc62575e03412340e55620cbf24fe59f772/src/build/cargo_plan.rs#L392-L396) modification

On a normal build, we map from dirty files to dirty crates, sort those
topologically and run rustc in-process for each crate ourselves.
With each compilation in-process we directly
[receive `rls_data::Analysis`](https://github.com/rust-lang/rls/blob/41bc0bf70bbbc8661f0c7f9cef700be5e105a926/src/build/rustc.rs#L334-L347)
in a callback,
[mark corresponding files as built](https://github.com/rust-lang/rls/blob/67bce0bdcf2db1d3c05bb1a3d87df9e66eaec7db/src/build/mod.rs#L493-L505)
and finally
[update our analysis database](https://github.com/rust-lang/rls/blob/d7c2eb8b641ae7e6d7145c268249f28efcf5467c/src/actions/post_build.rs#L208-L223)
with currently built data for each rebuilt crate.

If there are still files that are modified after we scheduled a build (user kept
typing), we don't mark it as done yet and schedule a regular build again).
It's worth noting that we squash builds whenever user happened to type before
a build kicked off (we buffer build requests not to waste work on something we
might potentially invalidate).

## I/O

As mentioned previously, we run Cargo using a separate target directory so we
do the same kind of work that Cargo does, in addition to also saving
JSON save-analysis files for our non-path dependencies.

### VFS

To allow running analysis on unsaved in-memory text buffers, we use the
[`rls-vfs`](https://github.com/rust-dev-tools/rls-vfs)
crate to act as our virtual file system.

The Rust compiler supports using custom file providers via [`FileLoader`](https://github.com/rust-lang/rust/blob/79d8a0fcefa5134db2a94739b1d18daa01fc6e9f/src/libsyntax/source_map.rs#L58-L68) trait, which [we use](https://github.com/rust-lang/rls/blob/67bce0bdcf2db1d3c05bb1a3d87df9e66eaec7db/src/build/rustc.rs#L385-L402).

It delegates to the real file system
whenever there are no buffered changes to a file but serves the unsaved buffers [from the VFS](https://github.com/rust-lang/rls/blob/67bce0bdcf2db1d3c05bb1a3d87df9e66eaec7db/src/build/rustc.rs#L79) otherwise.
2 changes: 2 additions & 0 deletions contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ implementing clients are all extremely valuable.
Here is the list of known [issues](https://github.com/rust-lang/rls/issues).
These are [good issues to start on](https://github.com/rust-lang/rls/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22).

A good resource on how RLS works can be found [here](architecture.md).

We're happy to help however we can. The best way to get help is either to
leave a comment on an issue in this repo, or to ping us (nrc) in #rust-dev-tools
on IRC.
Expand Down