Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Flash Attention implementation (forward + backward) #1

Closed
wants to merge 72 commits into from

Commits on Jan 17, 2024

  1. Configuration menu
    Copy the full SHA
    f7bcfb0 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2024

  1. fix compilation

    FSSRepo committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    e53de28 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a1c004e View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2024

  1. Configuration menu
    Copy the full SHA
    fa7ebcc View commit details
    Browse the repository at this point in the history
  2. Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp

    …into flash-attn-cuda
    FSSRepo committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    09db1a7 View commit details
    Browse the repository at this point in the history

Commits on Jan 20, 2024

  1. apply suggestions

    FSSRepo committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    fded2e6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c3cdfff View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a9681fe View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2024

  1. Configuration menu
    Copy the full SHA
    1173f49 View commit details
    Browse the repository at this point in the history
  2. metal : f16 precision

    ggerganov committed Jan 21, 2024
    Configuration menu
    Copy the full SHA
    528da75 View commit details
    Browse the repository at this point in the history
  3. metal : reduce branches

    ggerganov committed Jan 21, 2024
    Configuration menu
    Copy the full SHA
    52ae085 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b973258 View commit details
    Browse the repository at this point in the history
  5. wip : 8 rows per simd group

    ggerganov committed Jan 21, 2024
    Configuration menu
    Copy the full SHA
    8cde449 View commit details
    Browse the repository at this point in the history
  6. wip : 4 rows per simd group

    ggerganov committed Jan 21, 2024
    Configuration menu
    Copy the full SHA
    f31955f View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    a4b6341 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    77d08f3 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    17720fa View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2024

  1. Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp

    …into flash-attn-cuda
    FSSRepo committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    a689b02 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6374bc5 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2024

  1. Configuration menu
    Copy the full SHA
    6416821 View commit details
    Browse the repository at this point in the history
  2. use half2 instead half4

    FSSRepo committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    972c2ad View commit details
    Browse the repository at this point in the history
  3. match to metal impl

    FSSRepo committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    0fc36d8 View commit details
    Browse the repository at this point in the history

Commits on Jan 25, 2024

  1. Configuration menu
    Copy the full SHA
    1446a12 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d917746 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    432ad04 View commit details
    Browse the repository at this point in the history
  4. metal : fix comment

    ggerganov committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    40ea8cd View commit details
    Browse the repository at this point in the history
  5. Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp

    …into flash-attn-cuda
    FSSRepo committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    78da338 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f9ca5dc View commit details
    Browse the repository at this point in the history
  7. update implementation

    FSSRepo committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    6e7cb0e View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    6fea843 View commit details
    Browse the repository at this point in the history

Commits on Jan 27, 2024

  1. integrate tensor cores

    FSSRepo committed Jan 27, 2024
    Configuration menu
    Copy the full SHA
    0a481fe View commit details
    Browse the repository at this point in the history
  2. Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp

    …into flash-attn-cuda
    FSSRepo committed Jan 27, 2024
    Configuration menu
    Copy the full SHA
    7cea973 View commit details
    Browse the repository at this point in the history
  3. update impl

    FSSRepo committed Jan 27, 2024
    Configuration menu
    Copy the full SHA
    2455a8d View commit details
    Browse the repository at this point in the history

Commits on Jan 28, 2024

  1. Configuration menu
    Copy the full SHA
    b3dd7d9 View commit details
    Browse the repository at this point in the history
  2. metal : move output into local memory + optimize

    - the result from each simdgroup now stays in the registers
    - significantly reduced SRAM usage
    - more efficient skipping of -INF blocks
    - avoid simdgroup barrier in hot loop
    - add comments
    ggerganov committed Jan 28, 2024
    Configuration menu
    Copy the full SHA
    77f6976 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ecc466a View commit details
    Browse the repository at this point in the history
  4. metal : improve precision

    ggerganov committed Jan 28, 2024
    Configuration menu
    Copy the full SHA
    3a428a1 View commit details
    Browse the repository at this point in the history
  5. ggml : fix f16 mad

    ggerganov committed Jan 28, 2024
    Configuration menu
    Copy the full SHA
    8612864 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    0ad44ba View commit details
    Browse the repository at this point in the history
  7. metal : minor

    ggerganov committed Jan 28, 2024
    Configuration menu
    Copy the full SHA
    134c81c View commit details
    Browse the repository at this point in the history
  8. metal : support Q > 8

    ggerganov committed Jan 28, 2024
    Configuration menu
    Copy the full SHA
    1db22d7 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2024

  1. tests : add ATTN tests

    ggerganov committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    4794821 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    abeaf0d View commit details
    Browse the repository at this point in the history
  3. tests : more

    ggerganov committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    c6c1132 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5fcb9c1 View commit details
    Browse the repository at this point in the history
  5. fix compiler error

    FSSRepo committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    a1d5a12 View commit details
    Browse the repository at this point in the history
  6. Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp

    …into flash-attn-cuda
    FSSRepo committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    7980178 View commit details
    Browse the repository at this point in the history

Commits on Jan 30, 2024

  1. Configuration menu
    Copy the full SHA
    d073e4f View commit details
    Browse the repository at this point in the history
  2. tests : ifdef

    ggerganov committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    78df552 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3d03bcb View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    3b0f74b View commit details
    Browse the repository at this point in the history

Commits on Jan 31, 2024

  1. Configuration menu
    Copy the full SHA
    2ddc9bb View commit details
    Browse the repository at this point in the history
  2. fix kernel

    FSSRepo committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    b1479df View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8ad92dc View commit details
    Browse the repository at this point in the history
  4. fix naive implementation

    FSSRepo committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    0afe47f View commit details
    Browse the repository at this point in the history
  5. Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp

    …into flash-attn-cuda
    FSSRepo committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    3df0b8d View commit details
    Browse the repository at this point in the history
  6. cuda: mask as fp16

    FSSRepo committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    fd878f7 View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2024

  1. Configuration menu
    Copy the full SHA
    71b69aa View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2c04bee View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9a5c2a1 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ac26f27 View commit details
    Browse the repository at this point in the history
  5. Merge pull request #3 from ggerganov/flash-attn-cuda

    cuda : fix flash_attn kernel to produce same results as CPU
    FSSRepo authored Feb 1, 2024
    Configuration menu
    Copy the full SHA
    43f7156 View commit details
    Browse the repository at this point in the history
  6. fix mask nullptr

    FSSRepo committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    9240a84 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    8d7a606 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    19e0b8e View commit details
    Browse the repository at this point in the history
  9. cmake: remove unused changes

    FSSRepo committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    cae985c View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    53621e3 View commit details
    Browse the repository at this point in the history

Commits on Feb 3, 2024

  1. Configuration menu
    Copy the full SHA
    674d5ac View commit details
    Browse the repository at this point in the history
  2. Merge pull request #4 from Pints-App/jg/flash-attn-cuda

    unroll 2 loops, int64_t -> int, 309 µs
    FSSRepo authored Feb 3, 2024
    Configuration menu
    Copy the full SHA
    8b51ab4 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a1f9ffe View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ba7699d View commit details
    Browse the repository at this point in the history
  5. fix merge conflicts

    FSSRepo committed Feb 3, 2024
    Configuration menu
    Copy the full SHA
    f659f57 View commit details
    Browse the repository at this point in the history