[Pallas] Support Flash Attention backward kernels #6870

alanwaketan · 2024-04-02T05:50:55Z

Summary:
This changes refactors custom_kernel.py to support all three new kernels from Pallas that are involved in Flash Attention backward calculations.

The refactoring includes:

Adds support for static_argnums which will ignore some positional arguments for jax tracing.
Separate jax tracing part out such that we can do the tracing alone.

Test Plan:
PJRT_DEVICE=TPU python test/test_pallas.py

JackCaoG · 2024-04-02T17:56:02Z

test/test_pallas.py

+    xm.mark_step()
+
+    # TODO: I don't really know how to test the value. Let's do the shape check for now.
+    self.assertEqual(grad_q.shape, (3, 2, 128, 4))


if we do the fwd and do res.backward then check the grad on q they should match?

The softmax is done differently. I don't think there is any guarantees.

result should still be somewhat close right? we can tune down the precision. If the result return by this is dramatically different than the one that was computed using dot attention that seems wrong..

https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html

Softmax requires all the elements to produce the results, but flash attention chunks the data into blocks and use a technique called tiling to make sure the softmax still serve the functionality to stable the data. Since there are no aggregation, I don't know how the tiling softmax could produce the same results as the regular one.

In JAX, I have to use atol=1e-01, rtol=1e-01 to do the comparisons...

alanwaketan · 2024-04-02T22:18:21Z

Thanks, Jack!

alanwaketan added 4 commits April 2, 2024 02:43

support test__flash_attention_impl

1ce54b1

Support test__flash_attention_bwd_dkv

cf9a1d0

Support test__flash_attention_bwd_dkv

e01dead

Fix linters

99dc7c0

alanwaketan requested review from lsy323 and JackCaoG April 2, 2024 05:50

alanwaketan self-assigned this Apr 2, 2024

JackCaoG reviewed Apr 2, 2024

View reviewed changes

JackCaoG approved these changes Apr 2, 2024

View reviewed changes

alanwaketan merged commit c54367c into master Apr 2, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pallas] Support Flash Attention backward kernels #6870

[Pallas] Support Flash Attention backward kernels #6870

alanwaketan commented Apr 2, 2024

JackCaoG Apr 2, 2024

alanwaketan Apr 2, 2024

JackCaoG Apr 2, 2024

alanwaketan Apr 2, 2024

alanwaketan commented Apr 2, 2024

[Pallas] Support Flash Attention backward kernels #6870

[Pallas] Support Flash Attention backward kernels #6870

Conversation

alanwaketan commented Apr 2, 2024

JackCaoG Apr 2, 2024

Choose a reason for hiding this comment

alanwaketan Apr 2, 2024

Choose a reason for hiding this comment

JackCaoG Apr 2, 2024

Choose a reason for hiding this comment

alanwaketan Apr 2, 2024

Choose a reason for hiding this comment

alanwaketan commented Apr 2, 2024