Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : per-layer KV cache #4309

Merged
merged 15 commits into from
Dec 7, 2023
Merged

llama : per-layer KV cache #4309

merged 15 commits into from
Dec 7, 2023

Commits on Oct 3, 2023

  1. per-layer KV

    slaren committed Oct 3, 2023
    Configuration menu
    Copy the full SHA
    e9bcf66 View commit details
    Browse the repository at this point in the history
  2. remove unnecessary copies

    slaren committed Oct 3, 2023
    Configuration menu
    Copy the full SHA
    55f2f2f View commit details
    Browse the repository at this point in the history

Commits on Oct 6, 2023

  1. Configuration menu
    Copy the full SHA
    f4f9367 View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2023

  1. Configuration menu
    Copy the full SHA
    c294c78 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    986b3da View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f3dbfb9 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    3d3e6bd View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    1fa91a4 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    c44bc1e View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    c80b8a2 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    e262947 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    66aaac9 View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2023

  1. llama : support quantum K cache (#4312)

    * llama : support quantum K cache (wip)
    
    * metal : add F32 -> Q8_0 copy kernel
    
    * cuda : add F32 -> Q8_0 copy kernel
    
    ggml-ci
    
    * cuda : use mmv kernel for quantum cache ops
    
    * llama : pass KV cache type through API
    
    * llama : fix build
    
    ggml-ci
    
    * metal : add F32 -> Q4_0 copy kernel
    
    * metal : add F32 -> Q4_1 copy kernel
    
    * cuda : wip
    
    * cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels
    
    * llama-bench : support type_k/type_v
    
    * metal : use mm kernel only for quantum KV cache
    
    * cuda : add comment
    
    * llama : remove memory_f16 and kv_f16 flags
    
    ---------
    
    Co-authored-by: slaren <slarengh@gmail.com>
    ggerganov and slaren authored Dec 6, 2023
    Configuration menu
    Copy the full SHA
    1a1a1c3 View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2023

  1. Merge branch 'master' into gg/per-layer-kv

    ggml-ci
    ggerganov committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    680a99e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fc5f334 View commit details
    Browse the repository at this point in the history