Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustdoc: synthetic blanket impls: hang / very long compiletimes with complex mutually recursive structs involving many projections #114891

Open
jimblandy opened this issue Aug 16, 2023 · 20 comments · May be fixed by #125907
Assignees
Labels
A-synthetic-impls Area: Synthetic impls, used by rustdoc to document auto traits and traits with blanket impls A-trait-system Area: Trait system C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@jimblandy
Copy link
Contributor

jimblandy commented Aug 16, 2023

Problem

When I run cargo doc, it never terminates. Instead, it uses 100% of a CPU indefinitely.

Steps

  1. git clone https://github.com/gents83/wgpu.git
  2. cd wgpu
  3. git checkout 91d872e1
  4. cargo +nightly doc -p wgpu-core

That final command never seems to exit.

Notes

I'm running Fedora 38 and cargo 1.73.0-nightly (7c3904d6c 2023-08-14).

@jimblandy jimblandy added the C-bug Category: This is a bug. label Aug 16, 2023
@weihanglo
Copy link
Member

The change described in gfx-rs/wgpu#3626 (comment) contributing to this freeze is a bit too big to me. Is it possible for you to minimize the scope and the relevant part, so that people can focus on debugging withoug reading through all the changes?

BTW, this looks like a rustdoc issue, not a Cargo issue.

@weihanglo
Copy link
Member

For more contexts, this failed from at least 1.64 to today's nightly.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 16, 2023
@ehuss ehuss transferred this issue from rust-lang/cargo Aug 16, 2023
@ehuss
Copy link
Contributor

ehuss commented Aug 16, 2023

Transferred to rust-lang/rust since this appears to be a rustc/rustdoc issue.

In the logs, I see a stream of rustc_trait_selection::traits::project::normalize_with_depth_to calls with an ever increasing depth.

@ehuss ehuss added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Aug 16, 2023
@ehuss
Copy link
Contributor

ehuss commented Aug 16, 2023

Note for reproduction, you may need to git fetch origin pull/3626/head, since the given commit is on a PR.

@jimblandy
Copy link
Contributor Author

The change described in gfx-rs/wgpu#3626 (comment) contributing to this freeze is a bit too big to me. Is it possible for you to minimize the scope and the relevant part, so that people can focus on debugging withoug reading through all the changes?

I can see how that would be helpful. Yes, it is a big change. Unfortunately, I won't be able to spend time reducing it further than I have.

Just for others - in the comment @weihanglo linked to, I was able narrow down which commit introduces the problem:

It seems like trunk merge gfx-rs/wgpu@5b34df5 is fine, but trunk merge gfx-rs/wgpu@91d872e hangs. It's specifically the merge of gfx-rs/wgpu@88f18ed (gfx-rs/wgpu#3691, "Don't implement Send or Sync on Wasm") that makes cargo doc hang.

@jimblandy
Copy link
Contributor Author

Note for reproduction, you may need to git fetch origin pull/3626/head, since the given commit is on a PR.

Thank you for catching this. I've updated the "Steps" so that the given commit is actually there.

@Noratrieb Noratrieb added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 22, 2023
@chenyukang
Copy link
Member

yes, I can reproduce it in local, seems evaluate_trait_predicate_recursively and evaluate_predicate_recursively are calling each other infinitely.

@GuillaumeGomez
Copy link
Member

That seems to be a compiler bug rather than a rustdoc one.

@GuillaumeGomez GuillaumeGomez added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 30, 2023
@cwfitzgerald
Copy link

cwfitzgerald commented Jan 17, 2024

This might have been fixed magically since it was reported, 1.76 beta and nightly don't hang and complete in a reasonable timeframe (see gfx-rs/wgpu#4905)

@cwfitzgerald
Copy link

cwfitzgerald commented Jan 27, 2024

Forwarding these from @Imberflur from the downstream issue - with new reproduction steps.

For me the time jumps up from ~36 seconds before gfx-rs/wgpu@192a2fe to ~450 seconds after which roughly matches the multiplier of going from 1 to 12 (36 * 12 = 432) FastHashMap fields (each with unique type parameters) in ResourceMaps.

On nightly the effect of gfx-rs/wgpu@192a2fe seems to be resolved as mentioned above. However, cargo doc -p wgpu-core --no-deps times go from 10-15 seconds to 80 seconds on gfx-rs/wgpu@efb35d4. Not quite as extreme, but still somewhat of a significant increase.

Nightly version tested is 1.77.0-nightly (5bd5d214e 2024-01-25)

git clone https://github.com/gfx-rs/wgpu.git
cd wgpu
git checkout efb35d4fa1059a514942d375fa5239c6d27cf35a
cargo +nightly doc -p wgpu-core

The notable changes in efb35d4is added weak-ref backhandles which causes more types to be aware of each other circularly. In this case, BindGroup previously contains a Texture (via BindGroupStates and TextureInitTrackerAction) but now Texture has a BindGroup (via a Weak<Texture>). This is true for Buffers as well.

I wouldn't be surprised if we're hitting some nasty edge case with just how many generics we have in the codebase and how mutually referential everything is. We do want to reduce the amount of generics in the codebase to speed up compilation, though some guidance on what might be most effective would be nice, if such advise exists.

@SludgePhD
Copy link

This bug affects every project using wgpu, which includes every single bevy project, for which generating local docs is a fairly common thing to do. A fix would be greatly appreciated.

@fmease fmease added the A-trait-system Area: Trait system label Apr 5, 2024
@jimblandy
Copy link
Contributor Author

@SludgePhD I think what the Rust team wants most is a reduced test case.

@jimblandy
Copy link
Contributor Author

It seems like this diff makes cargo doc fast again:

modified   wgpu-core/src/binding_model.rs
@@ -250,17 +250,7 @@ pub struct BindGroupDynamicBindingData {
 
 #[derive(Debug)]
 pub struct BindGroup<A: HalApi> {
-    pub(crate) raw: Snatchable<A::BindGroup>,
-    pub(crate) device: Arc<Device<A>>,
-    pub(crate) layout: Arc<BindGroupLayout<A>>,
-    pub(crate) info: ResourceInfo<BindGroup<A>>,
-    pub(crate) used: BindGroupStates<A>,
-    pub(crate) used_buffer_ranges: Vec<BufferInitTrackerAction<A>>,
-    pub(crate) used_texture_ranges: Vec<TextureInitTrackerAction<A>>,
-    pub(crate) dynamic_binding_info: Vec<BindGroupDynamicBindingData>,
-    /// Actual binding sizes for buffers that don't have `min_binding_size`
-    /// specified in BGL. Listed in the order of iteration of `BGL.entries`.
-    pub(crate) late_buffer_binding_sizes: Vec<wgt::BufferSize>,
+    marker: std::marker::PhantomData<A>,
 }
 
 impl<A: HalApi> Drop for BindGroup<A> {

@jimblandy
Copy link
Contributor Author

(I deleted all the function bodies from wgpu-core, so that this still builds.)

@jimblandy
Copy link
Contributor Author

jimblandy commented Apr 17, 2024

@weihanglo I think I have a reduced test case.

$ git clone --branch debug-cargo-doc-slow --shallow-exclude=trunk https://github.com/jimblandy/wgpu.git
Cloning into 'wgpu'...
$ cd wgpu/wgpu-core
$ git log -2
commit ad259b3e1e3d6303dfa4d7c483e94b3efcee24dc (HEAD -> debug-cargo-doc-slow, origin/debug-cargo-doc-slow)
Author: Jim Blandy <jimb@red-bean.com>
Date:   Tue Apr 16 20:47:25 2024 -0700

    Speed up `cargo doc` from 47s to 0.7s.

commit 18a8a07f68479243a549c0ed89c914719f536eb2
Author: Jim Blandy <jimb@red-bean.com>
Date:   Tue Apr 16 18:23:10 2024 -0700

    more deletions
$ cargo doc --quiet
$ cargo clean -p wgpu-core
     Removed 15 files, 7.4MiB total
$ cargo doc
   Compiling wgpu-core v0.19.3 (/home/jimb/snoo/wgpu/wgpu-core)
 Documenting wgpu-core v0.19.3 (/home/jimb/snoo/wgpu/wgpu-core)
    Finished dev [unoptimized + debuginfo] target(s) in 0.88s
   Generated /home/jimb/snoo/wgpu/target/doc/wgpu_core/index.html
$ git checkout HEAD~
HEAD is now at 18a8a07f6 more deletions
$ cargo doc
   Compiling wgpu-core v0.19.3 (/home/jimb/snoo/wgpu/wgpu-core)
 Documenting wgpu-core v0.19.3 (/home/jimb/snoo/wgpu/wgpu-core)
    Finished dev [unoptimized + debuginfo] target(s) in 55.58s
   Generated /home/jimb/snoo/wgpu/target/doc/wgpu_core/index.html
$ 

So on that branch, ad259b3e takes cargo doc 0.88s to document, whereas its parent 18a8a07f6 takes 55s. The only difference between the two trees is replacing the members of one struct with a PhantomData:

diff --git a/wgpu-core/src/device/resource.rs b/wgpu-core/src/device/resource.rs
index ebebe1fba..dad2e853c 100644
--- a/wgpu-core/src/device/resource.rs
+++ b/wgpu-core/src/device/resource.rs
@@ -87,59 +87,9 @@ use super::{
 /// Important:
 /// When locking pending_writes please check that trackers is not locked
 /// trackers should be locked only when needed for the shortest time possible
-#[allow(dead_code)] // JIMB
+#[allow(dead_code)] // JIMB 0.76s
 pub struct Device<A: HalApi> {
-    raw: Option<A::Device>,
-    pub(crate) adapter: Arc<Adapter<A>>,
-    pub(crate) queue: OnceCell<Weak<Queue<A>>>,
-    queue_to_drop: OnceCell<A::Queue>,
-    pub(crate) zero_buffer: Option<A::Buffer>,
-    pub(crate) info: ResourceInfo<Device<A>>,
-
-    pub(crate) command_allocator: command::CommandAllocator<A>,
-    //Note: The submission index here corresponds to the last submission that is done.
-    pub(crate) active_submission_index: AtomicU64, //SubmissionIndex,
-    // NOTE: if both are needed, the `snatchable_lock` must be consistently acquired before the
-    // `fence` lock to avoid deadlocks.
-    pub(crate) fence: RwLock<Option<A::Fence>>,
-    pub(crate) snatchable_lock: SnatchLock,
-
-    /// Is this device valid? Valid is closely associated with "lose the device",
-    /// which can be triggered by various methods, including at the end of device
-    /// destroy, and by any GPU errors that cause us to no longer trust the state
-    /// of the device. Ideally we would like to fold valid into the storage of
-    /// the device itself (for example as an Error enum), but unfortunately we
-    /// need to continue to be able to retrieve the device in poll_devices to
-    /// determine if it can be dropped. If our internal accesses of devices were
-    /// done through ref-counted references and external accesses checked for
-    /// Error enums, we wouldn't need this. For now, we need it. All the call
-    /// sites where we check it are areas that should be revisited if we start
-    /// using ref-counted references for internal access.
-    pub(crate) valid: AtomicBool,
-
-    /// All live resources allocated with this [`Device`].
-    ///
-    /// Has to be locked temporarily only (locked last)
-    /// and never before pending_writes
-    pub(crate) trackers: Mutex<Tracker<A>>,
-    pub(crate) tracker_indices: TrackerIndexAllocators,
-    // Life tracker should be locked right after the device and before anything else.
-    life_tracker: Mutex<LifetimeTracker<A>>,
-    /// Temporary storage for resource management functions. Cleared at the end
-    /// of every call (unless an error occurs).
-    pub(crate) temp_suspected: Mutex<Option<ResourceMaps<A>>>,
-    /// Pool of bind group layouts, allowing deduplication.
-    pub(crate) bgl_pool: ResourcePool<bgl::EntryMap, BindGroupLayout<A>>,
-    pub(crate) alignments: hal::Alignments,
-    pub(crate) limits: wgt::Limits,
-    pub(crate) features: wgt::Features,
-    pub(crate) downlevel: wgt::DownlevelCapabilities,
-    pub(crate) instance_flags: wgt::InstanceFlags,
-    pub(crate) pending_writes: Mutex<Option<PendingWrites<A>>>,
-    pub(crate) deferred_destroy: Mutex<Vec<DeferredDestroy<A>>>,
-    #[cfg(feature = "trace")]
-    pub(crate) trace: Mutex<Option<trace::Trace>>,
-    pub(crate) usage_scopes: UsageScopePool<A>,
+    marker: std::marker::PhantomData<A>,
 }
 
 #[allow(dead_code)] // JIMB

Does this help make this bug more actionable?

@jimblandy
Copy link
Contributor Author

More detail, in case it's helpful:

That source tree has a few structs marked with JIMB. Replacing the members of any of those structs with a single PhantomData, as I did above for Device, will cause a significant reduction in cargo doc runtime. So I think the problem must be some sort of collaboration between all those types.

I've gone through the entire wgpu-core tree and replaced all function bodies with todo!() or something trivial like that, which is why you can delete all the structs' members without cargo check complaining.

@jimblandy
Copy link
Contributor Author

jimblandy commented Apr 17, 2024

What the structs marked JIMB seem to have in common is that they take a <A: HalApi> generic parameter, and have a bunch of members that are in turn parameterized by <A>. Deleting members from other kinds of generic structs doesn't seem to matter much.

It may be relevant that there are reference cycles among these structs:

  • Device has a member life_tracker: Mutex<LifetimeTracker<A>>
  • LifetimeTracker has a member mapped: Vec<Buffer<A>>
  • Buffer has a member device: Arc<Device<A>>

If you pick a struct that participates in one of these cycles, and delete only those members that cause that participation, that has as much effect on cargo doc runtime as removing all the struct's members. For example, deleting the device and bind_groups members from Buffer in wgpu-core/src/resource.rs lowers cargo doc run time to 15s, which is the same effect as replacing all of Buffer's members with a single PhantomData<A>.

So it seems likely that the cycles are relevant.

@fmease fmease self-assigned this May 6, 2024
@fmease
Copy link
Member

fmease commented May 26, 2024

I decided to just go for a top-down minimization starting at https://github.com/jimblandy/wgpu.git rev=18a8a07f6. Instead of using a pre-built artifact of rustc, I used a local stage1 build of master rustc (with debug info) out of habit which I've come to regret as the MCVE I've constructed only works with the aforementioned setup which I found out way too late in the process. I don't know the exact parameters yet or why there's a discrepancy in the first place.

Anyway, below is a relatively small reproducer that consistently takes 33~34s to get documented with rustdoc +stage1 (indeed, the reproducer doesn't depend on any non-sysroot deps) on my machine. The "expected"/"target" time is definitely well below 1s (the nightly and the pre-built master compiler take ~4s on my machine). The reproducer is extremely fragile in the sense that any slight modification has a noticable impact on the compiletime.

I'll profile this later today or in the coming days and will continue my investigations.

use std::sync::{Mutex, RwLock};
use std::{
    collections::HashMap,
    sync::{Arc, Weak},
};

pub trait Api: Clone {
    type Instance: Instance<A = Self>;
    type Surface: Surface<A = Self>;
    type Adapter: Adapter<A = Self>;
    type Device: DeviceTr<A = Self>;

    type Queue: QueueTr<A = Self>;
    type CommandEncoder: CommandEncoder<A = Self>;

    type CommandBuffer: WasmNotSendSync;

    type Buffer: WasmNotSendSync + 'static;
    type Texture: WasmNotSendSync + 'static;
    type SurfaceTexture: WasmNotSendSync + std::borrow::Borrow<Self::Texture>;
    type TextureView: WasmNotSendSync;
    type Sampler: WasmNotSendSync;
    type QuerySet: WasmNotSendSync;
    type Fence: WasmNotSendSync;

    type BindGroupLayout: WasmNotSendSync;
    type BindGroup: WasmNotSendSync;
    type PipelineLayout: WasmNotSendSync;
    type ShaderModule: WasmNotSendSync;
    type RenderPipeline: WasmNotSendSync;
    type ComputePipeline: WasmNotSendSync;

    type AccelerationStructure: WasmNotSendSync + 'static;
}

pub trait Instance: Sized + WasmNotSendSync {
    type A: Api;
}

pub trait Surface: WasmNotSendSync {
    type A: Api;
}

pub trait Adapter: WasmNotSendSync {
    type A: Api;
}

pub trait DeviceTr: WasmNotSendSync {
    type A: Api;
}

pub trait QueueTr: WasmNotSendSync {
    type A: Api;
}

pub trait CommandEncoder: WasmNotSendSync {
    type A: Api;
}

pub trait WasmNotSendSync: WasmNotSend + WasmNotSync {}
impl<T: WasmNotSend + WasmNotSync> WasmNotSendSync for T {}

pub trait WasmNotSend: Send {}
impl<T: Send> WasmNotSend for T {}

pub trait WasmNotSync: Sync {}
impl<T: Sync> WasmNotSync for T {}

trait HalApi: Api + 'static {}

struct BindGroup<A: HalApi> {
    raw: A::BindGroup,
    device: Arc<Device<A>>,
    layout: Arc<A>,
    info: ResourceInfo<BindGroup<A>>,
    used: BindGroupStates<A>,
    used_buffer_ranges: Vec<A>,
    used_texture_ranges: Vec<A>,
}

struct BindGroupStates<A: HalApi> {
    buffers: BufferBindGroupState<A>,
    textures: TextureBindGroupState<A>,
    views: TextureView<A>,
    samplers: Sampler<A>,
}

type UsageScopePool<A> = Mutex<Vec<(BufferUsageScope<A>, TextureUsageScope<A>)>>;

struct Tracker<A: HalApi> {
    buffers: BufferTracker<A>,
    textures: TextureTracker<A>,
    views: TextureView<A>,
    samplers: Sampler<A>,
    bind_groups: crate::BindGroup<A>,
    compute_pipelines: A,
    render_pipelines: A,
    bundles: A,
    query_sets: QuerySet<A>,
}

struct BufferBindGroupState<A: HalApi> {
    buffers: Mutex<Vec<Arc<Buffer<A>>>>,
}
struct BufferUsageScope<A: HalApi> {
    metadata: Buffer<A>,
}

struct BufferTracker<A: HalApi> {
    metadata: Buffer<A>,
}

struct TextureBindGroupState<A: HalApi> {
    textures: Mutex<Vec<A>>,
}
struct TextureUsageScope<A: HalApi>(A);

struct TextureTracker<A: HalApi> {
    _phantom: std::marker::PhantomData<A>,
}

struct ResourceInfo<T> {
    marker: std::marker::PhantomData<T>,
}

struct Buffer<A: HalApi> {
    raw: A::Buffer,
    device: Arc<Device<A>>,
    info: ResourceInfo<Buffer<A>>,
    bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}

struct DestroyedBuffer<A: HalApi> {
    raw: Option<A::Buffer>,
    device: Arc<Device<A>>,
    bind_groups: Vec<Weak<BindGroup<A>>>,
}

struct StagingBuffer<A: HalApi> {
    raw: Mutex<Option<A::Buffer>>,
    device: Arc<Device<A>>,
    info: ResourceInfo<StagingBuffer<A>>,
}

enum TextureInner<A: HalApi> {
    Native(A::Texture),
    Surface(Option<A::SurfaceTexture>),
}

enum TextureClearMode<A: HalApi> {
    RenderPass(Vec<Option<A::TextureView>>),
    Surface(Option<A::TextureView>),
}

struct Texture<A: HalApi> {
    inner: TextureInner<A>,
    device: Arc<Device<A>>,
    info: ResourceInfo<Texture<A>>,
    clear_mode: RwLock<TextureClearMode<A>>,
    views: Mutex<Vec<Weak<TextureView<A>>>>,
    bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}

struct DestroyedTexture<A: HalApi> {
    raw: Option<A::Texture>,
    views: Vec<Weak<TextureView<A>>>,
    bind_groups: Vec<Weak<BindGroup<A>>>,
    device: Arc<Device<A>>,
}

struct TextureView<A: HalApi> {
    raw: A::TextureView,
    parent: Arc<Texture<A>>,
    device: Arc<Device<A>>,
    info: ResourceInfo<TextureView<A>>,
}

struct Sampler<A: HalApi> {
    raw: Option<A::Sampler>,
    device: Arc<Device<A>>,
    info: ResourceInfo<Self>,
}

struct QuerySet<A: HalApi> {
    raw: Option<A::QuerySet>,
    device: Arc<Device<A>>,
    info: ResourceInfo<Self>,
}

struct Device<A: HalApi> {
    raw: Option<A::Device>,
    adapter: Arc<A>,
    queue: Weak<Queue<A>>,
    queue_to_drop: A::Queue,
    zero_buffer: Option<A::Buffer>,
    info: ResourceInfo<Device<A>>,
    command_allocator: A,
    fence: RwLock<Option<A::Fence>>,
    trackers: Mutex<Tracker<A>>,
    life_tracker: Mutex<LifetimeTracker<A>>,
    temp_suspected: Mutex<Option<ResourceMaps<A>>>,
    pending_writes: Mutex<Option<PendingWrites<A>>>,
    usage_scopes: UsageScopePool<A>,
}

struct Queue<A: HalApi> {
    device: Option<Arc<Device<A>>>,
    raw: Option<A::Queue>,
    info: ResourceInfo<Queue<A>>,
}

struct EncoderInFlight<A: HalApi> {
    marker: std::marker::PhantomData<A>,
}
struct PendingWrites<A: HalApi> {
    command_encoder: A::CommandEncoder,
    dst_buffers: HashMap<i32, Arc<Buffer<A>>>,
    dst_textures: HashMap<i32, Arc<Texture<A>>>,
    executing_command_buffers: Vec<A::CommandBuffer>,
}

struct ResourceMaps<A: HalApi> {
    buffers: HashMap<i32, Arc<Buffer<A>>>,
    staging_buffers: HashMap<i32, Arc<StagingBuffer<A>>>,
    textures: HashMap<i32, Arc<Texture<A>>>,
    texture_views: HashMap<i32, Arc<TextureView<A>>>,
    samplers: HashMap<i32, Arc<Sampler<A>>>,
    bind_groups: HashMap<i32, Arc<BindGroup<A>>>,
    bind_group_layouts: HashMap<i32, Arc<A>>,
    render_pipelines: HashMap<i32, Arc<A>>,
    compute_pipelines: HashMap<i32, Arc<A>>,
    pipeline_layouts: HashMap<i32, Arc<A>>,
    render_bundles: HashMap<i32, Arc<A>>,
    query_sets: HashMap<i32, Arc<QuerySet<A>>>,
    destroyed_buffers: HashMap<i32, Arc<DestroyedBuffer<A>>>,
    destroyed_textures: HashMap<i32, Arc<DestroyedTexture<A>>>,
}
struct ActiveSubmission<A: HalApi> {
    last_resources: ResourceMaps<A>,
    mapped: Vec<Arc<Buffer<A>>>,
    encoders: Vec<EncoderInFlight<A>>,
}

struct LifetimeTracker<A: HalApi> {
    mapped: Vec<Arc<Buffer<A>>>,
    future_suspected_buffers: Vec<Arc<Buffer<A>>>,
    future_suspected_textures: Vec<Arc<Texture<A>>>,
    suspected_resources: ResourceMaps<A>,
    active: Vec<ActiveSubmission<A>>,
    ready_to_map: Vec<Arc<Buffer<A>>>,
}

@fmease fmease added I-compiletime Issue: Problems and improvements with respect to compile times. A-synthetic-impls Area: Synthetic impls, used by rustdoc to document auto traits and traits with blanket impls and removed I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. labels May 26, 2024
@fmease
Copy link
Member

fmease commented May 26, 2024

Ah, it happens during blanket impl synthesis!

rustc

@fmease fmease added S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue and removed E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels May 26, 2024
@fmease fmease changed the title cargo doc hangs using a full CPU rustdoc: synthetic blanket impls: hang / very long compiletimes with complex mutually recursive structs involving many projections May 26, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 3, 2024
…t, r=<try>

rustdoc: use the next solver for blanket impl synthesis

Furthermore use an `ObligationCtxt` instead of operating on an `InferCtxt` directly and don't drop the obligations generated by the type relating (fixes a FIXME).

Originally I just wanted to submit the infcx→ocx change. However, that regressed `tests/rustdoc-ui/ice-blanket-impl-52873.rs` (pass→overflow error) because with `ocx.select_where_possible` we would no longer suppress / defatalize (canonical) overflow errors on the last obligation we register. CC rust-lang#54199 which introduced that special case. Obviously in the next solver overflows are non-fatal incl. `ice-blanket-impl-52873.rs`. Hence the switch now.

Note that I wanted to hold off on switching to the new solver since it makes rust-lang#114891 go from long I-compiletime to I-compilemem + I-hang + eventual death by the OOM killer. So I don't know maybe we should block this PR on me finding a rustc reproducer for the rustdoc issue rust-lang#114891 (this is on my agenda) to be able to properly investigate the underlying next-solver issue.
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 15, 2024
…t, r=<try>

rustdoc: use the next solver for blanket impl synthesis

Furthermore use an `ObligationCtxt` instead of operating on an `InferCtxt` directly and don't drop the obligations generated by the type relating.

Originally I just wanted to submit the infcx→ocx change. However, that regressed `tests/rustdoc-ui/ice-blanket-impl-52873.rs` (pass→overflow error) because with `ocx.select_where_possible` we would no longer suppress / defatalize (canonical) overflow errors. CC rust-lang#54199 which introduced that special case. Obviously in the next solver overflows are non-fatal incl. `ice-blanket-impl-52873.rs`. Hence the switch now.

~~Note that I wanted to hold off on switching to the new solver since it makes rust-lang#114891 go from long I-compiletime to I-compilemem + I-hang + eventual death by the OOM killer. So I don't know maybe we should block this PR on me finding a rustc reproducer for the rustdoc issue rust-lang#114891 (this is on my agenda) to be able to properly investigate the underlying next-solver issue.~~

Fixes rust-lang#114891.
@fmease
Copy link
Member

fmease commented Aug 15, 2024

With #128828 merged, #125907 should now be effective. Indeed, locally documenting wgpu (18a8a07f6) under #125907 seems to work flawlessly now (wrt. compile times). Still waiting for the proper perf results though (#125907 (comment)). I invite you to try to locally verify my claim.

bors added a commit to rust-lang-ci/rust that referenced this issue Oct 16, 2024
…t, r=<try>

rustdoc: use the next solver for blanket impl synthesis

Presumably due to better caching behavior, switching from the old to the next solver *drastically* improves the compile time on certain inputs. See rust-lang#114891.
Fixes rust-lang#114891.

Furthermore use an `ObligationCtxt` instead of operating on an `InferCtxt` directly and don't drop the obligations generated by the type relating.

For context, originally I just wanted to submit the infcx→ocx change. However, that regressed `tests/rustdoc-ui/ice-blanket-impl-52873.rs` (pass→overflow error) because with `ocx.select_where_possible` we would no longer suppress / defatalize (canonical) overflow errors. CC rust-lang#54199 which introduced that special case. Obviously in the next solver overflows are non-fatal incl. `ice-blanket-impl-52873.rs`. Hence the switch now.

https://github.com/rust-lang/rust/labels/S-blocked on perf improvements for the next solver.
@fmease fmease added A-trait-system Area: Trait system and removed A-trait-system Area: Trait system labels Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-synthetic-impls Area: Synthetic impls, used by rustdoc to document auto traits and traits with blanket impls A-trait-system Area: Trait system C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants