Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

habana_main rebase #71

Merged
merged 537 commits into from
Jul 2, 2024
Merged

habana_main rebase #71

merged 537 commits into from
Jul 2, 2024
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Jun 13, 2024

  1. [Hardware][Intel] Optimize CPU backend and add more performance tips (v…

    …llm-project#4971)
    
    Co-authored-by: Jianan Gu <jianan.gu@intel.com>
    bigPYJ1151 and jianan-gu authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    80aa7e9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a65634d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    03dccc8 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    3987347 View commit details
    Browse the repository at this point in the history
  5. [Doc] Update LLaVA docs (vllm-project#5437)

    Co-authored-by: Roger Wang <ywang@roblox.com>
    DarkLight1337 and ywang96 authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    0ce7b95 View commit details
    Browse the repository at this point in the history
  6. [Kernel] Factor out epilogues from cutlass kernels (vllm-project#5391)

    Co-authored-by: Michael Goin <michael@neuralmagic.com>
    Co-authored-by: youkaichao <youkaichao@gmail.com>
    Co-authored-by: zifeitong <zifei.tong@parasail.io>
    Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
    5 people authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    85657b5 View commit details
    Browse the repository at this point in the history
  7. [MISC] Remove FP8 warning (vllm-project#5472)

    Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
    comaniac and pcmoritz authored Jun 13, 2024
    Configuration menu
    Copy the full SHA
    30299a4 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    a8fda4f View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    6b0511a View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    1696efe View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    33e3b37 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    e38042d View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    50eed24 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    cd9c0d6 View commit details
    Browse the repository at this point in the history

Commits on Jun 14, 2024

  1. Configuration menu
    Copy the full SHA
    55d6361 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0f0d8bc View commit details
    Browse the repository at this point in the history
  3. [CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs…

    … with `perf-benchmarks` label (vllm-project#5073)
    
    Co-authored-by: simon-mo <simon.mo@hey.com>
    KuntaiDu and simon-mo authored Jun 14, 2024
    Configuration menu
    Copy the full SHA
    319ad7f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    d47af2b View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    703475f View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    d74674b View commit details
    Browse the repository at this point in the history
  7. [ Misc ] Rs/compressed tensors cleanup (vllm-project#5432)

    Co-authored-by: mgoin <michael@neuralmagic.com>
    Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
    3 people authored Jun 14, 2024
    Configuration menu
    Copy the full SHA
    1598568 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    348616a View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    48f589e View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    77490c6 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    d1c3d7d View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    cdab68d View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    6e2527a View commit details
    Browse the repository at this point in the history
  14. [Bugfix] Enable loading FP8 checkpoints for gpt_bigcode models (vllm-…

    …project#5460)
    
    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    tdoublep authored Jun 14, 2024
    Configuration menu
    Copy the full SHA
    e2afb03 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    28c145e View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    f5bb85b View commit details
    Browse the repository at this point in the history

Commits on Jun 15, 2024

  1. Configuration menu
    Copy the full SHA
    bd7efe9 View commit details
    Browse the repository at this point in the history
  2. [Core][Bugfix]: fix prefix caching for blockv2 (vllm-project#5364)

    Signed-off-by: Lei Wen <wenlei03@qiyi.com>
    Co-authored-by: Lei Wen <wenlei03@qiyi.com>
    leiwen83 and wenlei03 authored Jun 15, 2024
    Configuration menu
    Copy the full SHA
    1b8a0d7 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0e9164b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    81fbb36 View commit details
    Browse the repository at this point in the history
  5. [misc] Do not allow to use lora with chunked prefill. (vllm-project#5538

    )
    
    Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
    rkooo567 and DarkLight1337 authored Jun 15, 2024
    Configuration menu
    Copy the full SHA
    e691918 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    d919ecc View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    1c0afa1 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    3ce2c05 View commit details
    Browse the repository at this point in the history

Commits on Jun 16, 2024

  1. Configuration menu
    Copy the full SHA
    f31c1f9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4a67690 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f07d513 View commit details
    Browse the repository at this point in the history

Commits on Jun 17, 2024

  1. Configuration menu
    Copy the full SHA
    845a3f2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e2b85cf View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9333fb8 View commit details
    Browse the repository at this point in the history
  4. Correct alignment in the seq_len diagram. (vllm-project#5592)

    Co-authored-by: Liqian Chen <liqian.chen@deeplang.ai>
    CharlesRiggins and Liqian Chen authored Jun 17, 2024
    Configuration menu
    Copy the full SHA
    9e74d9d View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    890d8d9 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    1f12122 View commit details
    Browse the repository at this point in the history
  7. [Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (vllm-proj…

    …ect#3814)
    
    Co-authored-by: Jiang Li <jiang1.li@intel.com>
    Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
    Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
    4 people authored Jun 17, 2024
    Configuration menu
    Copy the full SHA
    728c4c8 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    ab66536 View commit details
    Browse the repository at this point in the history
  9. [CI] the readability of benchmarking and prepare for dashboard (vllm-…

    …project#5571)
    
    [CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard (vllm-project#5571)
    KuntaiDu authored Jun 17, 2024
    Configuration menu
    Copy the full SHA
    9e4e6fe View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    1b44aaf View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e441bad View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    a3e8a05 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    26e1188 View commit details
    Browse the repository at this point in the history

Commits on Jun 18, 2024

  1. [Speculative Decoding 1/2 ] Add typical acceptance sampling as one of…

    … the sampling techniques in the verifier (vllm-project#5131)
    sroy745 authored Jun 18, 2024
    Configuration menu
    Copy the full SHA
    fa9e385 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    daef218 View commit details
    Browse the repository at this point in the history
  3. [Kernel] Add punica dimensions for Granite 13b (vllm-project#5559)

    Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
    joerunde authored Jun 18, 2024
    Configuration menu
    Copy the full SHA
    5002175 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8eadcf0 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    32c86e4 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    114d727 View commit details
    Browse the repository at this point in the history
  7. [bugfix][distributed] improve p2p capability test (vllm-project#5612)

    [bugfix][distributed] do not error if two processes do not agree on p2p capability (vllm-project#5612)
    youkaichao authored Jun 18, 2024
    Configuration menu
    Copy the full SHA
    db5ec52 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    f0cc0e6 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    4ad7b53 View commit details
    Browse the repository at this point in the history
  10. [ci] Deprecate original CI template (vllm-project#5624)

    Signed-off-by: kevin <kevin@anyscale.com>
    khluu authored Jun 18, 2024
    Configuration menu
    Copy the full SHA
    13db436 View commit details
    Browse the repository at this point in the history
  11. [Misc] Add OpenTelemetry support (vllm-project#4687)

    This PR adds basic support for OpenTelemetry distributed tracing.
    It includes changes to enable tracing functionality and improve monitoring capabilities.
    
    I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
    ronensc authored Jun 18, 2024
    Configuration menu
    Copy the full SHA
    7879f24 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    95db455 View commit details
    Browse the repository at this point in the history
  13. [ci] Setup Release pipeline and build release wheels with cache (vllm…

    …-project#5610)
    
    Signed-off-by: kevin <kevin@anyscale.com>
    khluu authored Jun 18, 2024
    Configuration menu
    Copy the full SHA
    19091ef View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    07feecd View commit details
    Browse the repository at this point in the history
  15. [Bugfix] Fix for inconsistent behaviour related to sampling and repet…

    …ition penalties (vllm-project#5639)
    
    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    tdoublep authored Jun 18, 2024
    Configuration menu
    Copy the full SHA
    8a17338 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    2bd231a View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    b23ce92 View commit details
    Browse the repository at this point in the history

Commits on Jun 19, 2024

  1. Configuration menu
    Copy the full SHA
    6820724 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    59a1eb5 View commit details
    Browse the repository at this point in the history
  3. [Bugfix] Added test for sampling repetition penalty bug. (vllm-projec…

    …t#5659)
    
    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    tdoublep authored Jun 19, 2024
    Configuration menu
    Copy the full SHA
    e5150f2 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f758aed View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    3eea748 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    da971ec View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    7d46c8d View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    d871453 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    e9c2732 View commit details
    Browse the repository at this point in the history
  10. [ci] Add A100 queue into AWS CI template (vllm-project#5648)

    Signed-off-by: kevin <kevin@anyscale.com>
    khluu authored Jun 19, 2024
    Configuration menu
    Copy the full SHA
    3ee5c4b View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    afed90a View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    d571ca0 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    7868750 View commit details
    Browse the repository at this point in the history
  14. [Doc] Update docker references (vllm-project#5614)

    Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
    rafvasq authored Jun 19, 2024
    Configuration menu
    Copy the full SHA
    e83db9e View commit details
    Browse the repository at this point in the history
  15. [Misc] Add per channel support for static activation quantization; up…

    …date w8a8 schemes to share base classes (vllm-project#5650)
    dsikka authored Jun 19, 2024
    Configuration menu
    Copy the full SHA
    4a30d7e View commit details
    Browse the repository at this point in the history
  16. [ci] Limit num gpus if specified for A100 (vllm-project#5694)

    Signed-off-by: kevin <kevin@anyscale.com>
    khluu authored Jun 19, 2024
    Configuration menu
    Copy the full SHA
    949e49a View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2024

  1. Configuration menu
    Copy the full SHA
    3730a1c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1b2eaac View commit details
    Browse the repository at this point in the history
  3. [Kernel] Update Cutlass int8 kernel configs for SM90 (vllm-project#5514)

    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    varun-sundar-rabindranath and Varun Sundar Rabindranath authored Jun 20, 2024
    Configuration menu
    Copy the full SHA
    111af1f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ad137cd View commit details
    Browse the repository at this point in the history
  5. [Kernel] Update Cutlass int8 kernel configs for SM80 (vllm-project#5275)

    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    varun-sundar-rabindranath and Varun Sundar Rabindranath authored Jun 20, 2024
    Configuration menu
    Copy the full SHA
    a7dcc62 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    3f3b6b2 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    8065a7e View commit details
    Browse the repository at this point in the history

Commits on Jun 21, 2024

  1. Configuration menu
    Copy the full SHA
    6c5b7af View commit details
    Browse the repository at this point in the history
  2. [Model] MLPSpeculator speculative decoding support (vllm-project#4947)

    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    
    Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
    Co-authored-by: Nick Hill <nickhill@us.ibm.com>
    Co-authored-by: Davis Wertheimer <Davis.Wertheimer@ibm.com>
    4 people authored Jun 21, 2024
    Configuration menu
    Copy the full SHA
    b12518d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1f56742 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c35e4a3 View commit details
    Browse the repository at this point in the history
  5. [Bugfix] Add fully sharded layer for QKVParallelLinearWithLora (vllm-…

    …project#5665)
    
    Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
    jeejeelee and Yard1 authored Jun 21, 2024
    Configuration menu
    Copy the full SHA
    67005a0 View commit details
    Browse the repository at this point in the history
  6. [Core][Distributed] add shm broadcast (vllm-project#5399)

    Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
    youkaichao and comaniac authored Jun 21, 2024
    Configuration menu
    Copy the full SHA
    d9a252b View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    bd620b0 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    5b15bde View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    f1e72cc View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    7187507 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    f5dda63 View commit details
    Browse the repository at this point in the history

Commits on Jun 22, 2024

  1. Configuration menu
    Copy the full SHA
    cf90ae0 View commit details
    Browse the repository at this point in the history
  2. [Model] Support Qwen-VL and Qwen-VL-Chat models with text-only inputs (

    …vllm-project#5710)
    
    Co-authored-by: Roger Wang <ywang@roblox.com>
    DamonFool and ywang96 authored Jun 22, 2024
    Configuration menu
    Copy the full SHA
    9c62db0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ff9ddbc View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0cbc1d2 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8c00f9c View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    832ea88 View commit details
    Browse the repository at this point in the history

Commits on Jun 23, 2024

  1. [BugFix] [Kernel] Add Cutlass2x fallback kernels (vllm-project#5744)

    Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
    varun-sundar-rabindranath and Varun Sundar Rabindranath authored Jun 23, 2024
    Configuration menu
    Copy the full SHA
    6c916ac View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5d4d905 View commit details
    Browse the repository at this point in the history

Commits on Jun 24, 2024

  1. Configuration menu
    Copy the full SHA
    edd5fe5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c246212 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a2899d5 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    fc6d4b4 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    126c607 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e72dc6c View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    1744cc9 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    ba991d5 View commit details
    Browse the repository at this point in the history

Commits on Jun 25, 2024

  1. [ci] Remove aws template (vllm-project#5757)

    Signed-off-by: kevin <kevin@anyscale.com>
    khluu authored Jun 25, 2024
    Configuration menu
    Copy the full SHA
    e9de9dd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f23871e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2ce5d66 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    d12bff7 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    43ff60b View commit details
    Browse the repository at this point in the history
  6. add WA for model loader

    kzawora-intel committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    efce3c4 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    c1e7589 View commit details
    Browse the repository at this point in the history
  8. tensor parallel fixes

    kzawora-intel committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    58bd037 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    1d6409b View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    7b99314 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    952b7c4 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    67882db View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    cf04c81 View commit details
    Browse the repository at this point in the history
  14. worker_use_ray fix

    kzawora-intel committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    2b850fe View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    c18ebfd View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    d9b34ba View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    dd248f7 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    bc34937 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    dd793d1 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    f178e56 View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2024

  1. [CI/Build] Add E2E tests for MLPSpeculator (vllm-project#5791)

    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    tdoublep authored Jun 26, 2024
    Configuration menu
    Copy the full SHA
    c2a8ac7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8207972 View commit details
    Browse the repository at this point in the history
  3. [Core] Refactor Worker and ModelRunner to consolidate control plane c…

    …ommunication (vllm-project#5408)
    
    Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
    Signed-off-by: Stephanie <swang@anyscale.com>
    Co-authored-by: Stephanie <swang@anyscale.com>
    stephanie-wang and Stephanie authored Jun 26, 2024
    Configuration menu
    Copy the full SHA
    dda4811 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    3aa7b6c View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    515080a View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    6806998 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    3439c5a View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    6984c02 View commit details
    Browse the repository at this point in the history
  9. [Kernel] Adding bias epilogue support for cutlass_scaled_mm (vllm-p…

    …roject#5560)
    
    Co-authored-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
    Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
    3 people authored Jun 26, 2024
    Configuration menu
    Copy the full SHA
    5bfd1bb View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    c54269d View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    cbc53b6 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    f5c8628 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    38a1674 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    294104c View commit details
    Browse the repository at this point in the history

Commits on Jun 27, 2024

  1. Configuration menu
    Copy the full SHA
    b9e8425 View commit details
    Browse the repository at this point in the history
  2. [BugFix] Fix cuda graph for MLPSpeculator (vllm-project#5875)

    Co-authored-by: Abhinav Goyal <abhinav.goyal@flipkart.com>
    njhill and abhigoyal1997 authored Jun 27, 2024
    Configuration menu
    Copy the full SHA
    2110557 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6eabc6c View commit details
    Browse the repository at this point in the history
  4. [VLM][Bugfix] Make sure that multi_modal_kwargs is broadcasted prop…

    …erly (vllm-project#5880)
    
    Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
    xwjiang2010 authored Jun 27, 2024
    Configuration menu
    Copy the full SHA
    d12af20 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    96354d6 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    2061f0b View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    e36df83 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    e9d32d0 View commit details
    Browse the repository at this point in the history
  9. add collective crash WA

    kzawora-intel committed Jun 27, 2024
    Configuration menu
    Copy the full SHA
    1fd06cc View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    940f525 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    98cf2ed View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    3fd02bd View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    691e29e View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    365791f View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    736ed38 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    79c92c7 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    64e8d2a View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    c3dde36 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2024

  1. Configuration menu
    Copy the full SHA
    f136da1 View commit details
    Browse the repository at this point in the history
  2. [VLM][BugFix] Make sure that multi_modal_kwargs can broadcast prope…

    …rly with ring buffer. (vllm-project#5905)
    
    Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
    Co-authored-by: Roger Wang <ywang@roblox.com>
    xwjiang2010 and ywang96 authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    74d55c0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0d0e3a4 View commit details
    Browse the repository at this point in the history
  4. [Core] Registry for processing model inputs (vllm-project#5214)

    Co-authored-by: ywang96 <ywang@roblox.com>
    DarkLight1337 and ywang96 authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    5cbe8d1 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    5932634 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    57f09a4 View commit details
    Browse the repository at this point in the history
  7. [Bugfix] Better error message for MLPSpeculator when `num_speculative…

    …_tokens` is set too high (vllm-project#5894)
    
    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    tdoublep authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    ec1ad00 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    3b752a6 View commit details
    Browse the repository at this point in the history
  9. [Distributed] Make it clear that % should not be in tensor dict keys. (

    …vllm-project#5927)
    
    Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
    xwjiang2010 authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    b90d8cd View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    b2c6202 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    6a2d659 View commit details
    Browse the repository at this point in the history
  12. [ Misc ] Remove fp8_shard_indexer from Col/Row Parallel Linear (Sim…

    …plify Weight Loading) (vllm-project#5928)
    
    Co-authored-by: Robert Shaw <rshaw@neuralmagic>
    robertgshaw2-neuralmagic and Robert Shaw authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    b185230 View commit details
    Browse the repository at this point in the history
  13. [ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8 (

    vllm-project#5921)
    
    Co-authored-by: Robert Shaw <rshaw@neuralmagic>
    robertgshaw2-neuralmagic and Robert Shaw authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    2cd402e View commit details
    Browse the repository at this point in the history
  14. Support Deepseek-V2 (vllm-project#4650)

    Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
    zwd003 and pcmoritz authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    be0b3af View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    4bf35ed View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    5d2a1a9 View commit details
    Browse the repository at this point in the history
  17. [Bugfix] Fix Engine Failing After Invalid Request - AsyncEngineDeadEr…

    …ror (vllm-project#5963)
    
    Co-authored-by: Robert Shaw <rshaw@neuralmagic>
    robertgshaw2-neuralmagic and Robert Shaw authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    6a62cb8 View commit details
    Browse the repository at this point in the history
  18. [Kernel] Flashinfer for prefill & decode, with Cudagraph support for …

    …decode (vllm-project#4628)
    
    Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>
    LiuXiaoxuanPKU and bong-furiosa authored Jun 28, 2024
    Configuration menu
    Copy the full SHA
    7041de4 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2024

  1. Configuration menu
    Copy the full SHA
    54814fd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7f83f40 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    c4bca74 View commit details
    Browse the repository at this point in the history
  4. [Misc] Extend vLLM Metrics logging API (vllm-project#5925)

    Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
    SolitaryThinker and Yard1 authored Jun 29, 2024
    Configuration menu
    Copy the full SHA
    906a19c View commit details
    Browse the repository at this point in the history
  5. [Kernel] Add punica dimensions for Granite 3b and 8b (vllm-project#5930)

    Signed-off-by: Joe Runde <joe@joerun.de>
    joerunde authored Jun 29, 2024
    Configuration menu
    Copy the full SHA
    ba49944 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    580353d View commit details
    Browse the repository at this point in the history
  7. [Misc] Update Phi-3-Vision Example (vllm-project#5981)

    Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
    ywang96 and DarkLight1337 authored Jun 29, 2024
    Configuration menu
    Copy the full SHA
    329df38 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    51e971d View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    7c01f70 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    f7dac83 View commit details
    Browse the repository at this point in the history
  11. [ CI/Build ] Added E2E Test For Compressed Tensors (vllm-project#5839)

    Co-authored-by: Michael Goin <michael@neuralmagic.com>
    Co-authored-by: Robert Shaw <rshaw@neuralmagic>
    3 people authored Jun 29, 2024
    Configuration menu
    Copy the full SHA
    8dbfcd3 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    99397da View commit details
    Browse the repository at this point in the history
  13. [ CI/Build ] LM Eval Harness Based CI Testing (vllm-project#5838)

    Co-authored-by: Robert Shaw <rshaw@neuralmagic>
    robertgshaw2-neuralmagic and Robert Shaw authored Jun 29, 2024
    Configuration menu
    Copy the full SHA
    75aa144 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    9def106 View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2024

  1. Configuration menu
    Copy the full SHA
    bcc6a09 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cff6a1f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9d47f64 View commit details
    Browse the repository at this point in the history
  4. [ci][distributed] fix device count call

    [ci][distributed] fix some cuda init that makes it necessary to use spawn (vllm-project#5991)
    youkaichao authored Jun 30, 2024
    Configuration menu
    Copy the full SHA
    2be6955 View commit details
    Browse the repository at this point in the history
  5. [Frontend]: Support base64 embedding (vllm-project#5935)

    Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
    llmpros and DarkLight1337 authored Jun 30, 2024
    Configuration menu
    Copy the full SHA
    c6c240a View commit details
    Browse the repository at this point in the history
  6. [Lora] Use safetensor keys instead of adapter_config.json to find une…

    …xpected modules. (vllm-project#5909)
    
    Co-authored-by: sang <sangcho@anyscale.com>
    rkooo567 and sang authored Jun 30, 2024
    Configuration menu
    Copy the full SHA
    f5e73c9 View commit details
    Browse the repository at this point in the history
  7. [ CI ] Temporarily Disable Large LM-Eval Tests (vllm-project#6005)

    Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic>
    robertgshaw2-neuralmagic and rshaw@neuralmagic.com authored Jun 30, 2024
    Configuration menu
    Copy the full SHA
    deacb7e View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    7836fdc View commit details
    Browse the repository at this point in the history
  9. [ Misc ] Refactor w8a8 to use process_weights_after_load (Simplify …

    …Weight Loading) (vllm-project#5940)
    
    Co-authored-by: Robert Shaw <rshaw@neuralmagic>
    robertgshaw2-neuralmagic and Robert Shaw authored Jun 30, 2024
    Configuration menu
    Copy the full SHA
    af9ad46 View commit details
    Browse the repository at this point in the history

Commits on Jul 1, 2024

  1. Configuration menu
    Copy the full SHA
    614aa51 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    80ca1e6 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7076c89 View commit details
    Browse the repository at this point in the history
  4. Revert test changes

    kzawora-intel committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    a3ac366 View commit details
    Browse the repository at this point in the history
  5. cleanup

    kzawora-intel committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    85af27e View commit details
    Browse the repository at this point in the history
  6. llm engine cleanup

    kzawora-intel committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    f856a85 View commit details
    Browse the repository at this point in the history
  7. utils.py cleanup

    kzawora-intel committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    b1f8b71 View commit details
    Browse the repository at this point in the history
  8. custom ops refactor

    kzawora-intel committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    fb74454 View commit details
    Browse the repository at this point in the history
  9. move xops to ops

    kzawora-intel committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    0e63941 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    463a8e6 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    0141d57 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    52fa486 View commit details
    Browse the repository at this point in the history
  13. whitespace fix

    kzawora-intel committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    a21fe62 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    aaf5446 View commit details
    Browse the repository at this point in the history
  15. Fix hpugraph hashing

    kzawora-intel committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    1ec95c4 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    2394c41 View commit details
    Browse the repository at this point in the history
  17. fix prompt bucketing:

    kzawora-intel committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    98fb698 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    d76084c View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    4050d64 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    bb60326 View commit details
    Browse the repository at this point in the history
  21. [doc][misc] further lower visibility of simple api server (vllm-proje…

    …ct#6041)
    
    Co-authored-by: Simon Mo <simon.mo@hey.com>
    youkaichao and simon-mo authored Jul 1, 2024
    Configuration menu
    Copy the full SHA
    8893130 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    dec6fc6 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    12a5995 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    83bdcb6 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    8e0817c View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    c4059ea View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    c87ebc3 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    e373853 View commit details
    Browse the repository at this point in the history
  29. [Model] Changes to MLPSpeculator to support tie_weights and input_sca…

    …le (vllm-project#5965)
    
    Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
    Co-authored-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
    tdoublep and JRosenkranz authored Jul 1, 2024
    Configuration menu
    Copy the full SHA
    5460070 View commit details
    Browse the repository at this point in the history

Commits on Jul 2, 2024

  1. Configuration menu
    Copy the full SHA
    3476ed0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2c37540 View commit details
    Browse the repository at this point in the history
  3. [VLM] Remove image_input_type from VLM config (vllm-project#5852)

    Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
    Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
    Co-authored-by: Roger Wang <ywang@roblox.com>
    3 people authored Jul 2, 2024
    Configuration menu
    Copy the full SHA
    98d6682 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c365082 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    31354e5 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    aee6daf View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    d99d986 View commit details
    Browse the repository at this point in the history