tests : add -INF blocks to the KQ mask in the FA tests #16380

ggerganov · 2025-10-02T05:44:15Z

The mask that we use in the FA tests is always filled with non-INF values, while in practice it often contains -INF blocks due to causal masking or due to padding. Extend the tests to add -INF values so we can exercise any logic that relies on detecting such blocks in the mask.

Green-Sky · 2025-10-02T09:45:56Z

I forgot to mention this, but it is possible that cuda fails this test.

ggerganov · 2025-10-02T10:26:51Z

@Green-Sky Do you mean you tried it locally and it failed?

Green-Sky · 2025-10-02T10:30:55Z

I just tried it locally and it works. I will have to go and look at the sd.cpp+chroma+fa bug again, so I actually know what fails there.

Green-Sky · 2025-10-02T17:56:07Z

Ok, did some more debugging. I am not sure if the issue really is with ggml, but flash attention just seems to die when we use the mask in the DiT for chroma.

The mask is:

0 in [0;19] (in my example)
-inf in [20;511]
and then padded with 0 again for [512;2815]

attention_ext L_q:2816 L_k:2816 n_head:24 C:3072 d_head:128 N:1

ggerganov · 2025-10-02T18:13:42Z

Does it show any error? What do you mean by "die"?

Green-Sky · 2025-10-02T18:15:37Z

Does it show any error? What do you mean by "die"?

Ah, sorry for that.
By "die", I mean silently fail. Pulling data from a backend is kind of a pain, but the resulting image is always black, so I assume it it pulled to (+/-) INF somewhere, or maybe zero.

tests/test-backend-ops.cpp

jeffbolznv · 2025-10-06T14:05:51Z

I'm seeing some failures in the vulkan path with the current change. I'll look into it.

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

jeffbolznv · 2025-10-06T15:41:12Z

I'm seeing two classes of failures with vulkan. In the coopmat2 path, the backend will generate some negative infinity results, e.g.:

[FLASH_ATTN_EXT] inf mismatch: Vulkan0=-340282346638528859811704183484516925440.000000 CPU=0.019682   FLASH_ATTN_EXT(hsk=128,hsv=128,nh=4,nr23=[16,1],kv=512,nb=1,mask=1,sinks=0,max_bias=8.000000,logit_softcap=10.000000,prec=def,type_KV=q8_0,permute=[0,1,2,3]): FAIL

I can workaround this by clamping in the shader. But more concerning is cases where the CPU backend generates NaN:

[FLASH_ATTN_EXT] NaN at index 18432 (Vulkan0=nan CPU=-nan(ind))   FLASH_ATTN_EXT(hsk=80,hsv=80,nh=4,nr23=[4,1],kv=512,nb=35,mask=1,sinks=0,max_bias=0.000000,logit_softcap=0.000000,prec=f32,type_KV=q8_0,permute=[0,2,1,3]): FAIL

It's easy to trigger this - just increase the frequency of the -inf blocks by 100x:

-    const int n_inf_blocks = 0.1*(ne0*ne1*ne2*ne3)/(blck0*blck1);
+    const int n_inf_blocks = 10*(ne0*ne1*ne2*ne3)/(blck0*blck1);

I imagine this may be happening when a whole row has -inf. I assume this will reprodue with other backends too, but I haven't verified it. It's not clear to me what the correct result is in this case.

ggerganov · 2025-10-06T15:48:15Z

I imagine this may be happening when a whole row has -inf. I assume this will reprodue with other backends too, but I haven't verified it. It's not clear to me what the correct result is in this case.

Yes, I have this fixed for CPU and Metal in: fdedda0

I think the correct thing to do is return 0 in this case.

ggerganov · 2025-10-06T18:51:24Z

With ad83f06, the CPU backend should no longer generate NaNs.

jeffbolznv · 2025-10-06T19:01:19Z

After #16447, the vulkan backend should pass the updated tests.

tests : add -INF blocks to the KQ mask in the FA tests

5088c8a

ggerganov requested a review from slaren as a code owner October 2, 2025 05:44

github-actions bot added the testing Everything test related label Oct 2, 2025

ggerganov requested a review from jeffbolznv October 6, 2025 13:04

jeffbolznv reviewed Oct 6, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

cont : bump -INF block size to 64

780e786

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

slaren approved these changes Oct 6, 2025

View reviewed changes

ggml : prevent division by zero in FA CPU op

ad83f06

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 6, 2025

jeffbolznv mentioned this pull request Oct 6, 2025

vulkan: Handle FA with all -inf mask values #16447

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests : add -INF blocks to the KQ mask in the FA tests #16380

tests : add -INF blocks to the KQ mask in the FA tests #16380

ggerganov commented Oct 2, 2025

Uh oh!

Green-Sky commented Oct 2, 2025

Uh oh!

ggerganov commented Oct 2, 2025

Uh oh!

Green-Sky commented Oct 2, 2025

Uh oh!

Green-Sky commented Oct 2, 2025 •

edited

Loading

Uh oh!

ggerganov commented Oct 2, 2025

Uh oh!

Green-Sky commented Oct 2, 2025

Uh oh!

Uh oh!

jeffbolznv commented Oct 6, 2025

Uh oh!

jeffbolznv commented Oct 6, 2025

Uh oh!

ggerganov commented Oct 6, 2025 •

edited

Loading

Uh oh!

ggerganov commented Oct 6, 2025

Uh oh!

jeffbolznv commented Oct 6, 2025

Uh oh!

Uh oh!

tests : add -INF blocks to the KQ mask in the FA tests #16380

Are you sure you want to change the base?

tests : add -INF blocks to the KQ mask in the FA tests #16380

Conversation

ggerganov commented Oct 2, 2025

Uh oh!

Green-Sky commented Oct 2, 2025

Uh oh!

ggerganov commented Oct 2, 2025

Uh oh!

Green-Sky commented Oct 2, 2025

Uh oh!

Green-Sky commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Oct 2, 2025

Uh oh!

Green-Sky commented Oct 2, 2025

Uh oh!

Uh oh!

jeffbolznv commented Oct 6, 2025

Uh oh!

jeffbolznv commented Oct 6, 2025

Uh oh!

ggerganov commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Oct 6, 2025

Uh oh!

jeffbolznv commented Oct 6, 2025

Uh oh!

Uh oh!

Green-Sky commented Oct 2, 2025 •

edited

Loading

ggerganov commented Oct 6, 2025 •

edited

Loading