Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue write_buffer issue with Vulkan backend on NVidia #1323

Open
simast opened this issue Apr 6, 2021 · 15 comments
Open

Queue write_buffer issue with Vulkan backend on NVidia #1323

simast opened this issue Apr 6, 2021 · 15 comments
Labels
api: vulkan Issues with Vulkan external: driver-bug A driver is causing the bug, though we may still want to work around it

Comments

@simast
Copy link

simast commented Apr 6, 2021

After upgrading wgpu from 0.7 to latest master I am seeing really strange issue with Queue::write_buffer() calls. This issue happens only when Vulkan backend is used - cannot reproduce with DX12 and Metal backends.

Some background how I render a scene.

  1. When renderer starts I pre-allocate a large buffer using Device::create_buffer(). I do this to not allocate memory during subsequent render calls.
  2. When changes are detected I update vertex instance data in this buffer using Queue::write_buffer() calls.
  3. I then render vertices using instance drawing with RenderPass::draw_indexed().

The strange issue is that when I write, for example, 5 items with write_buffer() - only 4 are actually written to the buffer. So one item is lost somehow.. Here are some screenshots:

Debugger active just before write_buffer() call showing 5 items:

debugger-output

And afterwards RenderDoc view for the same buffer showing only 4 items.

renderdoc-output

Since this started only after upgrading wgpu and only while Vulkan backend is used - I don't think the bug is in my code. There must be some issue in wgpu or any dependencies?

@simast simast changed the title Queue write_buffer issue with Vulcan backend Queue write_buffer issue with Vulkan backend Apr 6, 2021
@kvark
Copy link
Member

kvark commented Apr 6, 2021

Thank you for filing!
Please provide an API trace, and I'll be happy to investigate quickly.

@simast
Copy link
Author

simast commented Apr 6, 2021

trace.zip

Attached! In the trace above the "Scene Underline Instance Buffer" is missing one item just as described in the original issue.

@kvark
Copy link
Member

kvark commented Apr 7, 2021

I ❤️ that menu :)

@kvark
Copy link
Member

kvark commented Apr 7, 2021

Replaying this trace, I'm seeing 40 bytes updated, all the 5 instances are there:
wgpu-vulkan-update
@simast could you try to replay it yourself, using wgpu's player? I wonder if it's also working for your player.

@simast
Copy link
Author

simast commented Apr 7, 2021

Yep, I am seeing the same issue in the wgpu player: https://i.imgur.com/c8YAJqH.png

Note the missing underline and another issue with missing blue cell in the logo.

So I wonder, could this be a driver issue if you can't reproduce? I am on Windows 10 with latest NVIDIA Geforce drivers on a GTX 1070 hardware.

@kvark
Copy link
Member

kvark commented Apr 7, 2021

I'll check a few more machines. Looks like a driver issue, unless we are missing a validation error there.
Please check if you have Vulkan validation layers installed.

@simast
Copy link
Author

simast commented Apr 7, 2021

I do not see anything suspicious with Vulkan validation layers enabled. Will try later today to revert NVIDIA drivers to previous version and see if that helps.

@kvark
Copy link
Member

kvark commented Apr 7, 2021

RenderDoc 1.3 refuses to run this for some reason :/.

@simast
Copy link
Author

simast commented Apr 12, 2021

Investigated this further and found a 100% workaround, but it's really really strange.

The issue was related to the number of instances I was allocating the buffer for (my single instance buffer is size_of = 8 bytes). If I allocated a wgpu buffer that was not divisible by 16 - the issue was present and the last element was always dropped in write_queue call. If, however, I allocated the buffer to be divisible by 16 - the issue is fixed.

I am not sure where the requirement comes from and maybe this should be handled internally by wgpu? As mentioned it happens only with Vulkan backend.

RenderDoc 1.3 refuses to run this for some reason :/.

I use latest RenderDoc 1.13. But I am slightly confused - since you posted RenderDoc screenshot before. Would it help if I provided a new trace?

@kvark
Copy link
Member

kvark commented Apr 12, 2021

I use latest RenderDoc 1.13. But I am slightly confused - since you posted RenderDoc screenshot before. Would it help if I provided a new trace?

It didn't work on a different machine for me.

Your discovery is very interesting! I wonder if it's a driver bug.
Can't see any size alignment requirements in Vulkan. Maybe I'm missing something. @Ralith - does anything come to mind?
TL;DR: the driver misbehaves if buffer size is not aligned to 16 bytes.

I think it would be no problem for us to round up all the buffer sizes in wgpu to 16 bytes. Just wondering if it's because we are missing something in the spec, or this is a genuine driver issue.

@kvark
Copy link
Member

kvark commented Apr 12, 2021

Also, I tested on GTX 1050 / Windows, which is very close to your configuration, and wasn't able to see it. Driver version 27.21.14.5256 :(
Edit: turns out on this platform I get Intel GPU selected even with HighPerformance criteria, so I'm not actually testing NVidia there just yet.
I wonder if NV has channels to report driver issues?

@simast
Copy link
Author

simast commented Apr 12, 2021

Just FYI my driver version is reported as: 27.21.14.6589 (NVIDIA 465.89) / Win10 64. This should be the latest driver available as of this comment.
I am hesitant to revert it since that does not look like an easy process from what I have seen. Will wait for a couple of days maybe NVIDIA will release a new version this week... If not will try to report the issue.

@kvark
Copy link
Member

kvark commented Apr 12, 2021

I see it now on my GTX 1050, looking!

@kvark
Copy link
Member

kvark commented Apr 12, 2021

The data is uploaded correctly, but vkCmdFillBuffer, called with destOffset=40, appears to overwrite bytes 32..40 as well.

@kvark kvark transferred this issue from gfx-rs/wgpu-rs Apr 13, 2021
@kvark kvark changed the title Queue write_buffer issue with Vulkan backend Queue write_buffer issue with Vulkan backend on NVidia Apr 13, 2021
@kvark kvark added the external: driver-bug A driver is causing the bug, though we may still want to work around it label Apr 13, 2021
@kvark
Copy link
Member

kvark commented Apr 13, 2021

Reported to NVidia level 2 support team, ticket 210412-000661
Also added to our driver issue list

@kvark kvark added this to the WebGPU MVP milestone May 21, 2021
@cwfitzgerald cwfitzgerald added the api: vulkan Issues with Vulkan label Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vulkan Issues with Vulkan external: driver-bug A driver is causing the bug, though we may still want to work around it
Projects
None yet
Development

No branches or pull requests

3 participants