-
Notifications
You must be signed in to change notification settings - Fork 13.2k
tests : add -INF blocks to the KQ mask in the FA tests #16380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I forgot to mention this, but it is possible that cuda fails this test. |
@Green-Sky Do you mean you tried it locally and it failed? |
I just tried it locally and it works. I will have to go and look at the sd.cpp+chroma+fa bug again, so I actually know what fails there. |
Ok, did some more debugging. I am not sure if the issue really is with ggml, but flash attention just seems to die when we use the mask in the DiT for chroma. The mask is:
|
Does it show any error? What do you mean by "die"? |
Ah, sorry for that. |
I'm seeing some failures in the vulkan path with the current change. I'll look into it. |
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
I'm seeing two classes of failures with vulkan. In the coopmat2 path, the backend will generate some negative infinity results, e.g.:
I can workaround this by clamping in the shader. But more concerning is cases where the CPU backend generates NaN:
It's easy to trigger this - just increase the frequency of the -inf blocks by 100x:
I imagine this may be happening when a whole row has -inf. I assume this will reprodue with other backends too, but I haven't verified it. It's not clear to me what the correct result is in this case. |
Yes, I have this fixed for CPU and Metal in: fdedda0 I think the correct thing to do is return 0 in this case. |
With ad83f06, the CPU backend should no longer generate NaNs. |
After #16447, the vulkan backend should pass the updated tests. |
ref #16372 (comment)
The mask that we use in the FA tests is always filled with non-INF values, while in practice it often contains -INF blocks due to causal masking or due to padding. Extend the tests to add -INF values so we can exercise any logic that relies on detecting such blocks in the mask.