-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raytracing Tests Failing on AMD GPU #6727
Comments
One stack:
|
was this one crashing with an OOM or a access violation? |
Access violation - it crashed in amdvlk64.dll, but I missed those in the stack |
Nevermind I don't think that would help. |
I wonder if it's trying to read from the vertex buffer (we don't and aren't required to set it), can you attach a program that does this to a debugger and see the address it's trying dereference? It would likely be 0. |
Since AMD's drivers are open source, is it possible to get debug symbols out of them? Knowing where the problem is occurring helps figure out what could be occurring. |
Not the windows ones :) |
Oh, that's annoying. Is it possible to (using a debugger) see at what address the access violation is occurring? |
Yeah, I'll look when I've a minute |
Ah alright, it's first doing
r15 is |
At |
We should probably not expose the ray tracing feature on AMD until this stuff is figured out. |
Just want to check, does this also occur on the examples, if so which ones? That might be able to narrow things down (e.g. most of the examples use an index buffer). |
Noticed that although we set index_type to none in Edit: I can't really see why this would be causing it, but it's the only bug I can currently find... |
Nope did nothing |
OK. I wasn't expecting it to work, but since there was a bug in the same function I was interested. |
Do the examples work? |
Weirdly, yes - all 5 do. |
Then we need to find what the examples are doing differently. |
There are two things I can find (quickly)
The geo flags seem the most likely (because we have already found 3 index bugs). Could you try changing them? |
ACCELERATION_STRUCTURE_BUILD_WITH_INDEX was marked as failing, but did that one fail in the same place? |
Yup, failed in the same place:
|
Ok... that means that my first ideas don't work. The only other thing I have been able to find is that we use |
I can try it, but those are only for internal validation, they don't make it down to vulkan (I don't think). |
No, they don't seem to. Don't bother. |
Do the examples work when run as part of the test? I'm really unsure what other differences there are. |
It seems so |
My only other idea is that because only one downlevel capability is being set it's triggering some odd code path that AMD doesn't like (probably unlikely). |
I am now wondering whether this call is even the root cause - using vulkan's API dump it seems that the actual call to this function is given the same parameters (excluding pointer differences) with |
In ray_cube_compute, could you try changing wgpu::DownlevelCapabilities {
flags: wgpu::DownlevelFlags::COMPUTE_SHADERS,
.. wgpu::DownlevelCapabilities::default()
} and tell me if it still works? If so could you add downlevel flags until it does work and tell me what the flag the made it work was? |
Made no difference |
Could you try swapping out the original ray_cube_compute's init function with this (I'm assuming that new
and see if that reaches the panic? (this is just the test I mentioned earlier turned into a example) |
So swapped out with the init, than run the test associated with the example. AMD successfully hit
|
How odd... I shall have to look into that. The other test I derived this from failed? If so then that narrows our search. |
Yes,
|
Does it fail on AMD? |
It segfaults. |
Ok - that narrows it down, that means that the error is not in the test itself. |
The nVidea issue is unrelated, I think it's because there is no way to tell if there is a transform buffer and for some reason nVidea has a larger scratch size for that. You should open an issue for it as you know more about it. |
The following tests are all failing with either a device OOM on bind group creation, or a segfault on both my AMD GPUs
The text was updated successfully, but these errors were encountered: