-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustdoc: synthetic blanket impls: hang / very long compiletimes with complex mutually recursive structs involving many projections #114891
Comments
The change described in gfx-rs/wgpu#3626 (comment) contributing to this freeze is a bit too big to me. Is it possible for you to minimize the scope and the relevant part, so that people can focus on debugging withoug reading through all the changes? BTW, this looks like a |
For more contexts, this failed from at least 1.64 to today's nightly. |
Transferred to rust-lang/rust since this appears to be a rustc/rustdoc issue. In the logs, I see a stream of |
Note for reproduction, you may need to |
I can see how that would be helpful. Yes, it is a big change. Unfortunately, I won't be able to spend time reducing it further than I have. Just for others - in the comment @weihanglo linked to, I was able narrow down which commit introduces the problem:
|
Thank you for catching this. I've updated the "Steps" so that the given commit is actually there. |
yes, I can reproduce it in local, seems |
That seems to be a compiler bug rather than a rustdoc one. |
This might have been fixed magically since it was reported, 1.76 beta and nightly don't hang and complete in a reasonable timeframe (see gfx-rs/wgpu#4905) |
Forwarding these from @Imberflur from the downstream issue - with new reproduction steps.
The notable changes in efb35d4is added weak-ref backhandles which causes more types to be aware of each other circularly. In this case, I wouldn't be surprised if we're hitting some nasty edge case with just how many generics we have in the codebase and how mutually referential everything is. We do want to reduce the amount of generics in the codebase to speed up compilation, though some guidance on what might be most effective would be nice, if such advise exists. |
This bug affects every project using wgpu, which includes every single bevy project, for which generating local docs is a fairly common thing to do. A fix would be greatly appreciated. |
@SludgePhD I think what the Rust team wants most is a reduced test case. |
It seems like this diff makes modified wgpu-core/src/binding_model.rs
@@ -250,17 +250,7 @@ pub struct BindGroupDynamicBindingData {
#[derive(Debug)]
pub struct BindGroup<A: HalApi> {
- pub(crate) raw: Snatchable<A::BindGroup>,
- pub(crate) device: Arc<Device<A>>,
- pub(crate) layout: Arc<BindGroupLayout<A>>,
- pub(crate) info: ResourceInfo<BindGroup<A>>,
- pub(crate) used: BindGroupStates<A>,
- pub(crate) used_buffer_ranges: Vec<BufferInitTrackerAction<A>>,
- pub(crate) used_texture_ranges: Vec<TextureInitTrackerAction<A>>,
- pub(crate) dynamic_binding_info: Vec<BindGroupDynamicBindingData>,
- /// Actual binding sizes for buffers that don't have `min_binding_size`
- /// specified in BGL. Listed in the order of iteration of `BGL.entries`.
- pub(crate) late_buffer_binding_sizes: Vec<wgt::BufferSize>,
+ marker: std::marker::PhantomData<A>,
}
impl<A: HalApi> Drop for BindGroup<A> { |
(I deleted all the function bodies from |
@weihanglo I think I have a reduced test case.
So on that branch, ad259b3e takes diff --git a/wgpu-core/src/device/resource.rs b/wgpu-core/src/device/resource.rs
index ebebe1fba..dad2e853c 100644
--- a/wgpu-core/src/device/resource.rs
+++ b/wgpu-core/src/device/resource.rs
@@ -87,59 +87,9 @@ use super::{
/// Important:
/// When locking pending_writes please check that trackers is not locked
/// trackers should be locked only when needed for the shortest time possible
-#[allow(dead_code)] // JIMB
+#[allow(dead_code)] // JIMB 0.76s
pub struct Device<A: HalApi> {
- raw: Option<A::Device>,
- pub(crate) adapter: Arc<Adapter<A>>,
- pub(crate) queue: OnceCell<Weak<Queue<A>>>,
- queue_to_drop: OnceCell<A::Queue>,
- pub(crate) zero_buffer: Option<A::Buffer>,
- pub(crate) info: ResourceInfo<Device<A>>,
-
- pub(crate) command_allocator: command::CommandAllocator<A>,
- //Note: The submission index here corresponds to the last submission that is done.
- pub(crate) active_submission_index: AtomicU64, //SubmissionIndex,
- // NOTE: if both are needed, the `snatchable_lock` must be consistently acquired before the
- // `fence` lock to avoid deadlocks.
- pub(crate) fence: RwLock<Option<A::Fence>>,
- pub(crate) snatchable_lock: SnatchLock,
-
- /// Is this device valid? Valid is closely associated with "lose the device",
- /// which can be triggered by various methods, including at the end of device
- /// destroy, and by any GPU errors that cause us to no longer trust the state
- /// of the device. Ideally we would like to fold valid into the storage of
- /// the device itself (for example as an Error enum), but unfortunately we
- /// need to continue to be able to retrieve the device in poll_devices to
- /// determine if it can be dropped. If our internal accesses of devices were
- /// done through ref-counted references and external accesses checked for
- /// Error enums, we wouldn't need this. For now, we need it. All the call
- /// sites where we check it are areas that should be revisited if we start
- /// using ref-counted references for internal access.
- pub(crate) valid: AtomicBool,
-
- /// All live resources allocated with this [`Device`].
- ///
- /// Has to be locked temporarily only (locked last)
- /// and never before pending_writes
- pub(crate) trackers: Mutex<Tracker<A>>,
- pub(crate) tracker_indices: TrackerIndexAllocators,
- // Life tracker should be locked right after the device and before anything else.
- life_tracker: Mutex<LifetimeTracker<A>>,
- /// Temporary storage for resource management functions. Cleared at the end
- /// of every call (unless an error occurs).
- pub(crate) temp_suspected: Mutex<Option<ResourceMaps<A>>>,
- /// Pool of bind group layouts, allowing deduplication.
- pub(crate) bgl_pool: ResourcePool<bgl::EntryMap, BindGroupLayout<A>>,
- pub(crate) alignments: hal::Alignments,
- pub(crate) limits: wgt::Limits,
- pub(crate) features: wgt::Features,
- pub(crate) downlevel: wgt::DownlevelCapabilities,
- pub(crate) instance_flags: wgt::InstanceFlags,
- pub(crate) pending_writes: Mutex<Option<PendingWrites<A>>>,
- pub(crate) deferred_destroy: Mutex<Vec<DeferredDestroy<A>>>,
- #[cfg(feature = "trace")]
- pub(crate) trace: Mutex<Option<trace::Trace>>,
- pub(crate) usage_scopes: UsageScopePool<A>,
+ marker: std::marker::PhantomData<A>,
}
#[allow(dead_code)] // JIMB Does this help make this bug more actionable? |
More detail, in case it's helpful: That source tree has a few structs marked with I've gone through the entire |
What the structs marked It may be relevant that there are reference cycles among these structs:
If you pick a struct that participates in one of these cycles, and delete only those members that cause that participation, that has as much effect on So it seems likely that the cycles are relevant. |
I decided to just go for a top-down minimization starting at https://github.com/jimblandy/wgpu.git rev=18a8a07f6. Instead of using a pre-built artifact of rustc, I used a local stage1 build of master rustc (with debug info) out of habit which I've come to regret as the MCVE I've constructed only works with the aforementioned setup which I found out way too late in the process. I don't know the exact parameters yet or why there's a discrepancy in the first place. Anyway, below is a relatively small reproducer that consistently takes 33~34s to get documented with I'll profile this later today or in the coming days and will continue my investigations. use std::sync::{Mutex, RwLock};
use std::{
collections::HashMap,
sync::{Arc, Weak},
};
pub trait Api: Clone {
type Instance: Instance<A = Self>;
type Surface: Surface<A = Self>;
type Adapter: Adapter<A = Self>;
type Device: DeviceTr<A = Self>;
type Queue: QueueTr<A = Self>;
type CommandEncoder: CommandEncoder<A = Self>;
type CommandBuffer: WasmNotSendSync;
type Buffer: WasmNotSendSync + 'static;
type Texture: WasmNotSendSync + 'static;
type SurfaceTexture: WasmNotSendSync + std::borrow::Borrow<Self::Texture>;
type TextureView: WasmNotSendSync;
type Sampler: WasmNotSendSync;
type QuerySet: WasmNotSendSync;
type Fence: WasmNotSendSync;
type BindGroupLayout: WasmNotSendSync;
type BindGroup: WasmNotSendSync;
type PipelineLayout: WasmNotSendSync;
type ShaderModule: WasmNotSendSync;
type RenderPipeline: WasmNotSendSync;
type ComputePipeline: WasmNotSendSync;
type AccelerationStructure: WasmNotSendSync + 'static;
}
pub trait Instance: Sized + WasmNotSendSync {
type A: Api;
}
pub trait Surface: WasmNotSendSync {
type A: Api;
}
pub trait Adapter: WasmNotSendSync {
type A: Api;
}
pub trait DeviceTr: WasmNotSendSync {
type A: Api;
}
pub trait QueueTr: WasmNotSendSync {
type A: Api;
}
pub trait CommandEncoder: WasmNotSendSync {
type A: Api;
}
pub trait WasmNotSendSync: WasmNotSend + WasmNotSync {}
impl<T: WasmNotSend + WasmNotSync> WasmNotSendSync for T {}
pub trait WasmNotSend: Send {}
impl<T: Send> WasmNotSend for T {}
pub trait WasmNotSync: Sync {}
impl<T: Sync> WasmNotSync for T {}
trait HalApi: Api + 'static {}
struct BindGroup<A: HalApi> {
raw: A::BindGroup,
device: Arc<Device<A>>,
layout: Arc<A>,
info: ResourceInfo<BindGroup<A>>,
used: BindGroupStates<A>,
used_buffer_ranges: Vec<A>,
used_texture_ranges: Vec<A>,
}
struct BindGroupStates<A: HalApi> {
buffers: BufferBindGroupState<A>,
textures: TextureBindGroupState<A>,
views: TextureView<A>,
samplers: Sampler<A>,
}
type UsageScopePool<A> = Mutex<Vec<(BufferUsageScope<A>, TextureUsageScope<A>)>>;
struct Tracker<A: HalApi> {
buffers: BufferTracker<A>,
textures: TextureTracker<A>,
views: TextureView<A>,
samplers: Sampler<A>,
bind_groups: crate::BindGroup<A>,
compute_pipelines: A,
render_pipelines: A,
bundles: A,
query_sets: QuerySet<A>,
}
struct BufferBindGroupState<A: HalApi> {
buffers: Mutex<Vec<Arc<Buffer<A>>>>,
}
struct BufferUsageScope<A: HalApi> {
metadata: Buffer<A>,
}
struct BufferTracker<A: HalApi> {
metadata: Buffer<A>,
}
struct TextureBindGroupState<A: HalApi> {
textures: Mutex<Vec<A>>,
}
struct TextureUsageScope<A: HalApi>(A);
struct TextureTracker<A: HalApi> {
_phantom: std::marker::PhantomData<A>,
}
struct ResourceInfo<T> {
marker: std::marker::PhantomData<T>,
}
struct Buffer<A: HalApi> {
raw: A::Buffer,
device: Arc<Device<A>>,
info: ResourceInfo<Buffer<A>>,
bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}
struct DestroyedBuffer<A: HalApi> {
raw: Option<A::Buffer>,
device: Arc<Device<A>>,
bind_groups: Vec<Weak<BindGroup<A>>>,
}
struct StagingBuffer<A: HalApi> {
raw: Mutex<Option<A::Buffer>>,
device: Arc<Device<A>>,
info: ResourceInfo<StagingBuffer<A>>,
}
enum TextureInner<A: HalApi> {
Native(A::Texture),
Surface(Option<A::SurfaceTexture>),
}
enum TextureClearMode<A: HalApi> {
RenderPass(Vec<Option<A::TextureView>>),
Surface(Option<A::TextureView>),
}
struct Texture<A: HalApi> {
inner: TextureInner<A>,
device: Arc<Device<A>>,
info: ResourceInfo<Texture<A>>,
clear_mode: RwLock<TextureClearMode<A>>,
views: Mutex<Vec<Weak<TextureView<A>>>>,
bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
}
struct DestroyedTexture<A: HalApi> {
raw: Option<A::Texture>,
views: Vec<Weak<TextureView<A>>>,
bind_groups: Vec<Weak<BindGroup<A>>>,
device: Arc<Device<A>>,
}
struct TextureView<A: HalApi> {
raw: A::TextureView,
parent: Arc<Texture<A>>,
device: Arc<Device<A>>,
info: ResourceInfo<TextureView<A>>,
}
struct Sampler<A: HalApi> {
raw: Option<A::Sampler>,
device: Arc<Device<A>>,
info: ResourceInfo<Self>,
}
struct QuerySet<A: HalApi> {
raw: Option<A::QuerySet>,
device: Arc<Device<A>>,
info: ResourceInfo<Self>,
}
struct Device<A: HalApi> {
raw: Option<A::Device>,
adapter: Arc<A>,
queue: Weak<Queue<A>>,
queue_to_drop: A::Queue,
zero_buffer: Option<A::Buffer>,
info: ResourceInfo<Device<A>>,
command_allocator: A,
fence: RwLock<Option<A::Fence>>,
trackers: Mutex<Tracker<A>>,
life_tracker: Mutex<LifetimeTracker<A>>,
temp_suspected: Mutex<Option<ResourceMaps<A>>>,
pending_writes: Mutex<Option<PendingWrites<A>>>,
usage_scopes: UsageScopePool<A>,
}
struct Queue<A: HalApi> {
device: Option<Arc<Device<A>>>,
raw: Option<A::Queue>,
info: ResourceInfo<Queue<A>>,
}
struct EncoderInFlight<A: HalApi> {
marker: std::marker::PhantomData<A>,
}
struct PendingWrites<A: HalApi> {
command_encoder: A::CommandEncoder,
dst_buffers: HashMap<i32, Arc<Buffer<A>>>,
dst_textures: HashMap<i32, Arc<Texture<A>>>,
executing_command_buffers: Vec<A::CommandBuffer>,
}
struct ResourceMaps<A: HalApi> {
buffers: HashMap<i32, Arc<Buffer<A>>>,
staging_buffers: HashMap<i32, Arc<StagingBuffer<A>>>,
textures: HashMap<i32, Arc<Texture<A>>>,
texture_views: HashMap<i32, Arc<TextureView<A>>>,
samplers: HashMap<i32, Arc<Sampler<A>>>,
bind_groups: HashMap<i32, Arc<BindGroup<A>>>,
bind_group_layouts: HashMap<i32, Arc<A>>,
render_pipelines: HashMap<i32, Arc<A>>,
compute_pipelines: HashMap<i32, Arc<A>>,
pipeline_layouts: HashMap<i32, Arc<A>>,
render_bundles: HashMap<i32, Arc<A>>,
query_sets: HashMap<i32, Arc<QuerySet<A>>>,
destroyed_buffers: HashMap<i32, Arc<DestroyedBuffer<A>>>,
destroyed_textures: HashMap<i32, Arc<DestroyedTexture<A>>>,
}
struct ActiveSubmission<A: HalApi> {
last_resources: ResourceMaps<A>,
mapped: Vec<Arc<Buffer<A>>>,
encoders: Vec<EncoderInFlight<A>>,
}
struct LifetimeTracker<A: HalApi> {
mapped: Vec<Arc<Buffer<A>>>,
future_suspected_buffers: Vec<Arc<Buffer<A>>>,
future_suspected_textures: Vec<Arc<Texture<A>>>,
suspected_resources: ResourceMaps<A>,
active: Vec<ActiveSubmission<A>>,
ready_to_map: Vec<Arc<Buffer<A>>>,
} |
…t, r=<try> rustdoc: use the next solver for blanket impl synthesis Furthermore use an `ObligationCtxt` instead of operating on an `InferCtxt` directly and don't drop the obligations generated by the type relating (fixes a FIXME). Originally I just wanted to submit the infcx→ocx change. However, that regressed `tests/rustdoc-ui/ice-blanket-impl-52873.rs` (pass→overflow error) because with `ocx.select_where_possible` we would no longer suppress / defatalize (canonical) overflow errors on the last obligation we register. CC rust-lang#54199 which introduced that special case. Obviously in the next solver overflows are non-fatal incl. `ice-blanket-impl-52873.rs`. Hence the switch now. Note that I wanted to hold off on switching to the new solver since it makes rust-lang#114891 go from long I-compiletime to I-compilemem + I-hang + eventual death by the OOM killer. So I don't know maybe we should block this PR on me finding a rustc reproducer for the rustdoc issue rust-lang#114891 (this is on my agenda) to be able to properly investigate the underlying next-solver issue.
…t, r=<try> rustdoc: use the next solver for blanket impl synthesis Furthermore use an `ObligationCtxt` instead of operating on an `InferCtxt` directly and don't drop the obligations generated by the type relating. Originally I just wanted to submit the infcx→ocx change. However, that regressed `tests/rustdoc-ui/ice-blanket-impl-52873.rs` (pass→overflow error) because with `ocx.select_where_possible` we would no longer suppress / defatalize (canonical) overflow errors. CC rust-lang#54199 which introduced that special case. Obviously in the next solver overflows are non-fatal incl. `ice-blanket-impl-52873.rs`. Hence the switch now. ~~Note that I wanted to hold off on switching to the new solver since it makes rust-lang#114891 go from long I-compiletime to I-compilemem + I-hang + eventual death by the OOM killer. So I don't know maybe we should block this PR on me finding a rustc reproducer for the rustdoc issue rust-lang#114891 (this is on my agenda) to be able to properly investigate the underlying next-solver issue.~~ Fixes rust-lang#114891.
With #128828 merged, #125907 should now be effective. Indeed, locally documenting |
…t, r=<try> rustdoc: use the next solver for blanket impl synthesis Presumably due to better caching behavior, switching from the old to the next solver *drastically* improves the compile time on certain inputs. See rust-lang#114891. Fixes rust-lang#114891. Furthermore use an `ObligationCtxt` instead of operating on an `InferCtxt` directly and don't drop the obligations generated by the type relating. For context, originally I just wanted to submit the infcx→ocx change. However, that regressed `tests/rustdoc-ui/ice-blanket-impl-52873.rs` (pass→overflow error) because with `ocx.select_where_possible` we would no longer suppress / defatalize (canonical) overflow errors. CC rust-lang#54199 which introduced that special case. Obviously in the next solver overflows are non-fatal incl. `ice-blanket-impl-52873.rs`. Hence the switch now. https://github.com/rust-lang/rust/labels/S-blocked on perf improvements for the next solver.
Problem
When I run
cargo doc
, it never terminates. Instead, it uses 100% of a CPU indefinitely.Steps
That final command never seems to exit.
Notes
I'm running Fedora 38 and cargo 1.73.0-nightly (7c3904d6c 2023-08-14).
The text was updated successfully, but these errors were encountered: