Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DirectX12 backend exhibits different behaviour than other backends; scene will not render on some machines #1002

Closed
Masterchef365 opened this issue Oct 22, 2020 · 14 comments
Labels
external: upstream Issues happening in lower level APIs or platforms

Comments

@Masterchef365
Copy link

Description
My application is simply supposed to draw a grid of lines on the screen. On some Windows 10 machines using the DirectX12 backend, when the iced-wgpu backend is initialized before my custom rendering, the grid disappears. This does not happen in DirectX11, or Vulkan on Linux. On some machines the DirectX12/iced-wgpu combo does work and the grid displays; luckily it seems to be reproducible in a Windows 10 VM.

Repro steps
See my repository here

Platform
Windows 10 build 19042.572
DirectX12

@kvark kvark added the type: bug Something isn't working label Oct 22, 2020
@kvark
Copy link
Member

kvark commented Oct 22, 2020

Thank you for a great issue! What real vendors/models suffer from this issue? I tested on AMD Ryzen 3500U so far, unable to reproduce.

@kvark
Copy link
Member

kvark commented Oct 22, 2020

Same for Intel Iris 550. Going to run on VM now...

@Masterchef365
Copy link
Author

Masterchef365 commented Oct 22, 2020

My GTX 1080 doesn't seem to suffer the issue, I'll check with the other people I had test my app

Edit: Looks like it exhibited the issue on someone else's 1080, and they were also on Win10

@kvark
Copy link
Member

kvark commented Oct 23, 2020

So far, it looks like the problem occurs with WARP device only (software-ish implementation of D3D12 that's a part of the SDK). I found gfx-rs/gfx#3432 while looking at the memory in RenderDoc that is all zeroes, but fixing this doesn't address the bug.

@kvark
Copy link
Member

kvark commented Oct 23, 2020

I narrowed it down a bit with this code:

if use_iced {
        //let _renderer = Renderer::new(Backend::new(&mut device, Settings::default()));
        let constant_layout =
            device.create_bind_group_layout(&wgpu::BindGroupLayoutDescriptor {
                label: None,
                entries: &[wgpu::BindGroupLayoutEntry {
                    binding: 0,
                    visibility: wgpu::ShaderStage::VERTEX,
                    ty: wgpu::BindingType::UniformBuffer {
                        dynamic: false,
                        min_binding_size: None,
                    },
                    count: None,
                }],
            });

        let constants_buffer = device.create_buffer(&wgpu::BufferDescriptor {
            label: None,
            size: 64,
            usage: wgpu::BufferUsage::UNIFORM,
            mapped_at_creation: false,
        });

        let _constants = device.create_bind_group(&wgpu::BindGroupDescriptor {
            label: None,
            layout: &constant_layout,
            entries: &[wgpu::BindGroupEntry {
                binding: 0,
                resource: wgpu::BindingResource::Buffer(
                    constants_buffer.slice(..),
                ),
            }],
        });
    }

It would be great to see if on master the situation is any different (since this test no longer relies on iced)

@kvark
Copy link
Member

kvark commented Oct 23, 2020

Looking at this case leads me nowhere. The stuff created in this snipped is getting successfully freed, and doesn't affect anything that happens per frame...
It would help if you could ask a person who can reproduce this on real hardware the following:

  1. print the adapter.get_info() that they get
  2. run from Visual Studio and get us the debug warnings/errors from the runtime. On real adapters I tested it doesn't complain about anything, and on WARP it doesn't, either, for some reason.

@kvark
Copy link
Member

kvark commented Oct 23, 2020

@msiglreith I've been looking at this for more than I hoped, and I was trying to narrow down the bug. I got to the end of it, I believe, but not sure how to explain or fix it yet, asking for your opinion.

Here is what happens in a test app:

  1. A uniform buffer A is created, a descriptor set for it is created and filled with data.
  2. A uniform buffer B is created, similar to A, then a similar descriptor set is created and filled (with write_descriptor_sets).
  3. Rendering starts using A, every frame copying something into it and making a draw call. Nothing shows up on screen when on WARP, but it should and does on non-WARP, and no validation warnings are reported.
  4. After the first frame the descriptor set of B is freed, then B itself is freed. This probably doesn't matter.

I narrowed the problem down to the descriptor update pool in our DX12 backend. It's a CPU-only descriptor heap for views (UAV/SRV/CBV). What write_descriptor_sets does is:

  1. it creates a view for each written descriptor (that needs a view) in this CPU-only heap
  2. it issues a CopyDescriptors from this heap into the shader-visible heap of the descriptor
  3. it instantly clears the CPU heap

What happens is that for both writes (of buffer A descriptor and buffer B descriptor), even though the target shader descriptors are different, and the source buffers are different, the intermediate CPU heap we use is the same, and we use the same index 0 of this heap for temporarily storing the CBV.

If I disable step (3) - clearing of buffer_desc_pool, then the problem goes away (!). It forces the temporary CBV to be a different handle for the copy between A and B.

I was thinking if CopyDescriptors is asynchronous in any way, but no, nothing in the docs suggests that it is, if I'm reading it correctly, and adding a wait for idle after the copy doesn't help either.

@kvark
Copy link
Member

kvark commented Oct 23, 2020

A few more things I tried:

  • recreating the buffer A - works
  • sleep_ms(1000) after CopyDescriptor - doesn't work

@kvark kvark added the external: upstream Issues happening in lower level APIs or platforms label Oct 23, 2020
@msiglreith
Copy link
Contributor

Hmm, I forced WARP adapter and seems to show correct results - do I need to set anything specific?
grafik

@kvark
Copy link
Member

kvark commented Oct 23, 2020

All I needed to reproduce was specifying the "dx12 iced" parameters. I see that you are on dx12, but are you sure you got iced enabled as well? If true, it would be even more weird :/

@msiglreith
Copy link
Contributor

Ah, no I missed that now it's also blank..

@msiglreith
Copy link
Contributor

Agree, looks like a bug in the WARP implementation to me. The memory regions of the non shader visibile descriptors should be immediately reusable after CopyDescriptors. The implementation seems to keep only references.

@kvark
Copy link
Member

kvark commented Oct 23, 2020

Thanks looking into this, @msiglreith ! Let's figure out a workaround, or otherwise WARP becomes totally unusable (and we'd need to block it).

@kvark kvark removed the type: bug Something isn't working label Oct 23, 2020
bors bot added a commit to gfx-rs/gfx that referenced this issue Oct 24, 2020
3434: [0.6/dx12] Improve descriptor handling r=msiglreith a=kvark

Can be reviewed commit-by-commit. Actually gets us to release the descriptors.
It also fixes a mistake I made in one of the previous PRs that would panic in view creation, unnecessarily.

Also works around gfx-rs/wgpu#1002 in the following way:
  - we rotate the temporary descriptor heaps instead of resetting them every time
  - once we reach the head during rotation, we insert a new heap in

That is not a guarantee that we'll never hit this WARP issue again, but there is a high chance most apps will not notice. If there is a better workaround, I'll be happy to scrap this one.

I think the old logic was plain wrong, too: the `update_pool_index ` wasn't actually being used...

Co-authored-by: Dzmitry Malyshau <kvarkus@gmail.com>
@kvark
Copy link
Member

kvark commented Oct 24, 2020

Workaround landed in gfx-backend-dx12-0.6.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external: upstream Issues happening in lower level APIs or platforms
Projects
None yet
Development

No branches or pull requests

3 participants