Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel docs #1203

Merged
merged 6 commits into from
Sep 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 1 addition & 6 deletions src/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -291,12 +291,7 @@ Compiler performance is a problem that we would like to improve on
(and are always working on). One aspect of that is parallelizing
`rustc` itself.

Currently, there is only one part of rustc that is already parallel: codegen.
During monomorphization, the compiler will split up all the code to be
generated into smaller chunks called _codegen units_. These are then generated
by independent instances of LLVM. Since they are independent, we can run them
in parallel. At the end, the linker is run to combine all the codegen units
together into one binary.
Currently, there is only one part of rustc that is parallel by default: codegen.

However, the rest of the compiler is still not yet parallel. There have been
lots of efforts spent on this, but it is generally a hard problem. The current
Expand Down
61 changes: 54 additions & 7 deletions src/parallel-rustc.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,61 @@
# Parallel Compilation

Most of the compiler is not parallel. This represents an opportunity for
improving compiler performance.
As of <!-- date: 2021-09 --> September 2021, The only stage of the compiler
that is already parallel is codegen. The nightly compiler implements query evaluation,
but there is still a lot of work to be done. The lack of parallelism at other stages
also represents an opportunity for improving compiler performance. One can try out the current
parallel compiler work by enabling it in the `config.toml`.

These next few sections describe where and how parallelism is currently used,
and the current status of making parallel compilation the default in `rustc`.

The underlying thread-safe data-structures used in the parallel compiler
can be found in `rustc_data_structures/sync.rs`. Some of these data structures
use the `parking_lot` API.

## Codegen

During [monomorphization][monomorphization] the compiler splits up all the code to
be generated into smaller chunks called _codegen units_. These are then generated by
independent instances of LLVM running in parallel. At the end, the linker
is run to combine all the codegen units together into one binary.

## Query System

The query model has some properties that make it actually feasible to evaluate
multiple queries in parallel without too much of an effort:

- All data a query provider can access is accessed via the query context, so
the query context can take care of synchronizing access.
- Query results are required to be immutable so they can safely be used by
different threads concurrently.

When a query `foo` is evaluated, the cache table for `foo` is locked.

- If there already is a result, we can clone it, release the lock and
we are done.
- If there is no cache entry and no other active query invocation computing the
same result, we mark the key as being "in progress", release the lock and
start evaluating.
- If there *is* another query invocation for the same key in progress, we
release the lock, and just block the thread until the other invocation has
computed the result we are waiting for. This cannot deadlock because, as
mentioned before, query invocations form a DAG. Some thread will always make
progress.

jyn514 marked this conversation as resolved.
Show resolved Hide resolved
## Rustdoc

As of <!-- date: 2021-09--> September 2021, there are still a number of steps
to complete before rustdoc rendering can be made parallel. More details on
this issue can be found [here][parallel-rustdoc].

## Current Status

As of <!-- date: 2021-07 --> July 2021, work on explicitly parallelizing the
compiler has stalled. There is a lot of design and correctness work that needs
to be done.

One can try out the current parallel compiler work by enabling it in the
`config.toml`.
to be done.

There are a few basic ideas in this effort:
These are the basic ideas in the effort to make `rustc` parallel:

- There are a lot of loops in the compiler that just iterate over all items in
a crate. These can possibly be parallelized.
Expand Down Expand Up @@ -45,3 +90,5 @@ are a bit out of date):
[imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
[irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503
[tracking]: https://github.com/rust-lang/rust/issues/48685
[monomorphization]:https://rustc-dev-guide.rust-lang.org/backend/monomorph.html
[parallel-rustdoc]:https://github.com/rust-lang/rust/issues/82741
26 changes: 0 additions & 26 deletions src/queries/query-evaluation-model-in-detail.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,29 +211,3 @@ much of a maintenance burden.

To summarize: "Steal queries" break some of the rules in a controlled way.
There are checks in place that make sure that nothing can go silently wrong.


## Parallel Query Execution

The query model has some properties that make it actually feasible to evaluate
multiple queries in parallel without too much of an effort:

- All data a query provider can access is accessed via the query context, so
the query context can take care of synchronizing access.
- Query results are required to be immutable so they can safely be used by
different threads concurrently.

The nightly compiler already implements parallel query evaluation as follows:

When a query `foo` is evaluated, the cache table for `foo` is locked.

- If there already is a result, we can clone it, release the lock and
we are done.
- If there is no cache entry and no other active query invocation computing the
same result, we mark the key as being "in progress", release the lock and
start evaluating.
- If there *is* another query invocation for the same key in progress, we
release the lock, and just block the thread until the other invocation has
computed the result we are waiting for. This cannot deadlock because, as
mentioned before, query invocations form a DAG. Some thread will always make
progress.