Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support int8 KVCache Quant in Vllm #1507

Closed
wants to merge 57 commits into from
Closed

Commits on Oct 30, 2023

  1. support kv cache quantization

    Lin Pengyun authored and aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    ce271bc View commit details
    Browse the repository at this point in the history
  2. fix python code

    Lin Pengyun authored and aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    f8b0b05 View commit details
    Browse the repository at this point in the history
  3. merge and reformat

    Lin Pengyun authored and aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    b1560db View commit details
    Browse the repository at this point in the history
  4. support generating kv quant parameters and evaluting kv quant models

    Lin Pengyun authored and aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    5c672ec View commit details
    Browse the repository at this point in the history
  5. modify test functions

    Lin Pengyun authored and aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    f8d6b99 View commit details
    Browse the repository at this point in the history
  6. fix test code

    Lin Pengyun authored and aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    f8427e3 View commit details
    Browse the repository at this point in the history
  7. fix test attention

    aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    df286fe View commit details
    Browse the repository at this point in the history
  8. modify attention kernel test using pytest

    Lin Pengyun authored and aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    b2d9b8c View commit details
    Browse the repository at this point in the history
  9. fix quant parameter passing

    Lin Pengyun authored and aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    c5a1a73 View commit details
    Browse the repository at this point in the history
  10. code clean

    aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    fbed95c View commit details
    Browse the repository at this point in the history
  11. code clean

    aniz1905@gmail.com committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    f396ed3 View commit details
    Browse the repository at this point in the history

Commits on Nov 2, 2023

  1. Configuration menu
    Copy the full SHA
    ad8f950 View commit details
    Browse the repository at this point in the history

Commits on Nov 3, 2023

  1. code format

    zhangpeng156 committed Nov 3, 2023
    Configuration menu
    Copy the full SHA
    2543722 View commit details
    Browse the repository at this point in the history
  2. code format

    zhangpeng156 committed Nov 3, 2023
    Configuration menu
    Copy the full SHA
    4226683 View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2023

  1. fix merge

    aniz1905@gmail.com committed Nov 15, 2023
    Configuration menu
    Copy the full SHA
    df15d44 View commit details
    Browse the repository at this point in the history

Commits on Nov 20, 2023

  1. fix reshape_and_cache_quantized

    aniz1905@gmail.com committed Nov 20, 2023
    Configuration menu
    Copy the full SHA
    872d156 View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2023

  1. tmp fix

    aniz1905@gmail.com committed Nov 22, 2023
    Configuration menu
    Copy the full SHA
    8c29013 View commit details
    Browse the repository at this point in the history
  2. tmp fix2

    aniz1905@gmail.com committed Nov 22, 2023
    Configuration menu
    Copy the full SHA
    8b5278d View commit details
    Browse the repository at this point in the history

Commits on Nov 23, 2023

  1. update kv-quant kernels

    zhangying169 committed Nov 23, 2023
    Configuration menu
    Copy the full SHA
    d8a9d4a View commit details
    Browse the repository at this point in the history
  2. add kv-quant kernel tests

    zhangying169 committed Nov 23, 2023
    Configuration menu
    Copy the full SHA
    0b06f96 View commit details
    Browse the repository at this point in the history
  3. support kv-quant

    zhangying169 committed Nov 23, 2023
    Configuration menu
    Copy the full SHA
    734dcc6 View commit details
    Browse the repository at this point in the history

Commits on Nov 24, 2023

  1. code format

    zhangpeng156 committed Nov 24, 2023
    Configuration menu
    Copy the full SHA
    31c4083 View commit details
    Browse the repository at this point in the history
  2. fix work bugs

    zhangying169 committed Nov 24, 2023
    Configuration menu
    Copy the full SHA
    16bccc4 View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2023

  1. fix unit test

    zhangpeng156 committed Nov 27, 2023
    Configuration menu
    Copy the full SHA
    dd527fc View commit details
    Browse the repository at this point in the history

Commits on Nov 29, 2023

  1. fix unit test

    aniz1905@gmail.com committed Nov 29, 2023
    Configuration menu
    Copy the full SHA
    104fb9b View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2023

  1. fix kv-quant args

    zhangying169 committed Dec 5, 2023
    Configuration menu
    Copy the full SHA
    580566c View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2023

  1. fix attention params

    zhangying169 committed Dec 18, 2023
    Configuration menu
    Copy the full SHA
    88ba3c0 View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2024

  1. Merge tag 'v0.2.7' into kv_quant_v0.2.7

    zhangying169 committed Jan 16, 2024
    Configuration menu
    Copy the full SHA
    e2ff5a6 View commit details
    Browse the repository at this point in the history
  2. format code

    zhangying169 committed Jan 16, 2024
    Configuration menu
    Copy the full SHA
    3065a32 View commit details
    Browse the repository at this point in the history
  3. add .buildkite

    zhangying169 committed Jan 16, 2024
    Configuration menu
    Copy the full SHA
    a896eb3 View commit details
    Browse the repository at this point in the history

Commits on Feb 4, 2024

  1. merge with remote branch 'vllm/main'

    zhangying169 committed Feb 4, 2024
    Configuration menu
    Copy the full SHA
    4072871 View commit details
    Browse the repository at this point in the history

Commits on Feb 5, 2024

  1. Merge branch 'kv_quant_merge' into kv_quant

    zhangying169 committed Feb 5, 2024
    Configuration menu
    Copy the full SHA
    c0d3895 View commit details
    Browse the repository at this point in the history
  2. Merge pull request vllm-project#13 in wm_ai/project_v from tmp to kv_…

    …quant - <merge-MERGE #PR-13 ~merge with remote branch 'vllm/main'
    
    >
    zhangpeng156 committed Feb 5, 2024
    Configuration menu
    Copy the full SHA
    f670d3c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    666549d View commit details
    Browse the repository at this point in the history
  4. fix compile issue

    zhangying169 committed Feb 5, 2024
    Configuration menu
    Copy the full SHA
    16bb483 View commit details
    Browse the repository at this point in the history
  5. fix unit test issue

    zhangying169 committed Feb 5, 2024
    Configuration menu
    Copy the full SHA
    ca1fcb3 View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2024

  1. fix issues

    zhangying169 committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    33f9d53 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    594ec3f View commit details
    Browse the repository at this point in the history
  3. fix benchmarks for kv cache int8

    zhangying169 committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    c37770b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    815eda7 View commit details
    Browse the repository at this point in the history
  5. fix supporting kv cache int8 for specified models

    zhangying169 committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    14ec0ca View commit details
    Browse the repository at this point in the history
  6. add int8_kv_cache.rst

    zhangying169 committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    2ff0e20 View commit details
    Browse the repository at this point in the history

Commits on Feb 8, 2024

  1. code format

    zhangpeng156 committed Feb 8, 2024
    Configuration menu
    Copy the full SHA
    5744c38 View commit details
    Browse the repository at this point in the history
  2. code format

    zhangpeng156 committed Feb 8, 2024
    Configuration menu
    Copy the full SHA
    cf7d939 View commit details
    Browse the repository at this point in the history

Commits on Feb 19, 2024

  1. code format

    zhangpeng156 committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    d79a96e View commit details
    Browse the repository at this point in the history
  2. code format

    zhangpeng156 committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    9a2c2c6 View commit details
    Browse the repository at this point in the history
  3. modify int8 kv cache doc

    zhangying169 committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    b1d4ce3 View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2024

  1. fix conflicts

    zhangpeng156 committed Mar 25, 2024
    Configuration menu
    Copy the full SHA
    74013b7 View commit details
    Browse the repository at this point in the history
  2. fix conflicts

    zhangpeng156 committed Mar 25, 2024
    Configuration menu
    Copy the full SHA
    128cbae View commit details
    Browse the repository at this point in the history

Commits on Mar 26, 2024

  1. fix conflicts

    zhangpeng156 committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    e24d431 View commit details
    Browse the repository at this point in the history
  2. fix rocm compile

    zhangpeng156 committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    2f38a1c View commit details
    Browse the repository at this point in the history
  3. code format

    zhangpeng156 committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    74d706e View commit details
    Browse the repository at this point in the history
  4. fix rocm compile

    zhangpeng156 committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    a999930 View commit details
    Browse the repository at this point in the history
  5. fix param passing

    zhangpeng156 committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    98ef941 View commit details
    Browse the repository at this point in the history
  6. fix param passing

    zhangpeng156 committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    95f8cc7 View commit details
    Browse the repository at this point in the history
  7. add int8_kv_cache.rst to toctree

    zhangpeng156 committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    02c949a View commit details
    Browse the repository at this point in the history
  8. relax int8 kv quant tolerance

    zhangying169 committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    f9fed66 View commit details
    Browse the repository at this point in the history