Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Length penalty draft #10

Draft
wants to merge 64 commits into
base: release
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
e3180c6
Initial gRPC server and TGIS proto API mapping layer
njhill Feb 13, 2024
eda13e7
Setup Github Actions (#2)
joerunde Mar 12, 2024
1479404
Fix temperature default in sampling mode
njhill Mar 20, 2024
e7c9b2a
:bento: lift grpc_server changes
joerunde Mar 20, 2024
f9fad31
Linting, adjustments related to min_tokens updates
njhill Mar 20, 2024
3d602fc
:construction_worker: swap builds to ephemeral branch
joerunde Mar 21, 2024
080d568
[Bugfix] Fix ROCm support in CMakeLists.txt (#3534)
jamestwhedbee Mar 20, 2024
8424330
[1/n] Triton sampling kernel (#3186)
Yard1 Mar 20, 2024
6bacf66
[1/n][Chunked Prefill] Refactor input query shapes (#3236)
rkooo567 Mar 20, 2024
4534045
Migrate `logits` computation and gather to `model_runner` (#3233)
esmeetu Mar 20, 2024
e8157b7
[BugFix] Hot fix in setup.py for neuron build (#3537)
zhuohan123 Mar 21, 2024
ad8b07d
[PREFIX CACHING FOLLOW UP] OrderedDict-based evictor (#3431)
ElizaWszola Mar 21, 2024
9ec53cf
Fix 1D query issue from `_prune_hidden_states` (#3539)
rkooo567 Mar 21, 2024
216fe97
[🚀 Ready to be merged] Added support for Jais models (#3183)
grandiose-pizza Mar 21, 2024
1508b16
[Misc][Log] Add log for tokenizer length not equal to vocabulary size…
esmeetu Mar 21, 2024
443dcbe
[Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (#3551)
WoosukKwon Mar 21, 2024
cd4af85
[BugFix] gemma loading after quantization or LoRA. (#3553)
taeminlee Mar 21, 2024
446cb1e
[Bugfix][Model] Fix Qwen2 (#3554)
esmeetu Mar 22, 2024
1c0a1bc
[Hardware][Neuron] Refactor neuron support (#3471)
zhuohan123 Mar 22, 2024
aceb4a7
[BugFix] Some fixes for custom allreduce kernels (#2760)
hanzhi713 Mar 22, 2024
8db0fa5
Dynamic scheduler delay to improve ITL performance (#3279)
tdoublep Mar 22, 2024
05b02af
Tokenization-related updates to grpc_server layer
njhill Mar 22, 2024
d75cdc0
Add flash attention to UBI docker build
njhill Mar 24, 2024
764eb47
Fix default min_new_tokens value
njhill Mar 24, 2024
b21a811
[Core] Improve detokenization performance for prefill (#3469)
Yard1 Mar 22, 2024
7c55f61
[Bugfix] use SoftLockFile instead of LockFile (#3578)
kota-iizuka Mar 23, 2024
eb3c6b1
[Misc] Fix BLOOM copyright notice (#3591)
WoosukKwon Mar 24, 2024
26c8395
[Misc] Bump transformers version (#3592)
ywang96 Mar 24, 2024
8db3274
[BugFix] Fix Falcon tied embeddings (#3590)
WoosukKwon Mar 24, 2024
2634734
[BugFix] 1D query fix for MoE models (#3597)
njhill Mar 24, 2024
fff679d
[CI] typo fix: is_hip --> is_hip() (#3595)
youkaichao Mar 24, 2024
8172159
[CI/Build] respect the common environment variable MAX_JOBS (#3600)
youkaichao Mar 25, 2024
01fe748
[CI/Build] fix flaky test (#3602)
youkaichao Mar 25, 2024
c04df90
[BugFix] tensor.get_device() -> tensor.device (#3604)
jikunshang Mar 25, 2024
7b1f301
[Bugfix] store lock file in tmp directory (#3578)" (#3599)
WoosukKwon Mar 25, 2024
42b703e
[Model] Add starcoder2 awq support (#3569)
shaonianyr Mar 25, 2024
59ecdc0
[Core] Refactor Attention Take 2 (#3462)
WoosukKwon Mar 25, 2024
9b1e0ac
[Bugfix] fix automatic prefix args and add log info (#3608)
gty111 Mar 25, 2024
5d87365
[CI] Try introducing isort. (#3495)
rkooo567 Mar 25, 2024
4a46f87
[Core] Adding token ranks along with logprobs (#3516)
SwapnilDreams100 Mar 25, 2024
e5c0825
feat: implement the min_tokens sampling parameter (#3124)
tjohnson31415 Mar 25, 2024
dabe1a1
[Bugfix] API stream returning two stops (#3450)
dylanwhawk Mar 25, 2024
92eefa2
hotfix isort on logprobs ranks pr (#3622)
simon-mo Mar 25, 2024
4626357
[Feature] Add vision language model support. (#3042)
xwjiang2010 Mar 25, 2024
73be357
Optimize `_get_ranks` in Sampler (#3623)
Yard1 Mar 25, 2024
d3e5cc3
[Misc] Include matched stop string/token in responses (#2976)
njhill Mar 26, 2024
ef46c7d
Enable more models to inference based on LoRA (#3382)
jeejeelee Mar 26, 2024
ff46bab
[Bugfix] Fix ipv6 address parsing bug (#3641)
liiliiliil Mar 26, 2024
bbb87a9
Squash 3466
joerunde Mar 26, 2024
58ba829
Squash 3512
joerunde Mar 26, 2024
abe950a
:construction_worker: build release branch
joerunde Mar 26, 2024
9ef5bb1
:construction_worker: build release branch
joerunde Mar 26, 2024
12eead8
:memo: Describe repo organization and processes (#6)
joerunde Mar 26, 2024
4ea041f
Squash 3645
joerunde Mar 26, 2024
040ac72
Fix incorrect arg validation message
njhill Mar 27, 2024
c7891e6
:sparkles: add enum for TGIS validation errors
joerunde Mar 27, 2024
4a6f7d6
:sparkles: port validation logic
joerunde Mar 27, 2024
db9c526
:recycle: use TGIS parameter validation
joerunde Mar 27, 2024
aafec14
:recycle: use tgis input validation
joerunde Mar 27, 2024
da36cde
:bug: fix validation error formatting
joerunde Mar 27, 2024
620c2b6
:bug: fixup length penalty and min token validation
joerunde Mar 27, 2024
74dde18
:bug: Update error messages for length and repetition penalties
joerunde Mar 28, 2024
2c27cc5
Merge branch 'request-validation' into length_penalty_draft
maxdebayser Mar 28, 2024
521f3fc
Simple implementation of length penalty with exponential decay
maxdebayser Mar 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .buildkite/download-images.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash

set -ex
set -o pipefail

(which wget && which curl) || (apt-get update && apt-get install -y wget curl)

# aws s3 sync s3://air-example-data-2/vllm_opensource_llava/ images/
mkdir -p images
cd images
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/stop_sign_pixel_values.pt
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/stop_sign_image_features.pt
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/cherry_blossom_pixel_values.pt
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/cherry_blossom_image_features.pt
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/stop_sign.jpg
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/cherry_blossom.jpg

cd -
15 changes: 12 additions & 3 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,24 @@ steps:

- label: Models Test
commands:
- pytest -v -s models --forked
- bash ../.buildkite/download-images.sh
- pytest -v -s models --ignore=models/test_llava.py --forked
soft_fail: true

- label: Llava Test
commands:
- bash ../.buildkite/download-images.sh
- pytest -v -s models/test_llava.py

- label: Prefix Caching Test
commands:
- pytest -v -s prefix_caching

- label: Samplers Test
command: pytest -v -s samplers --forked
command: pytest -v -s samplers

- label: LogitsProcessor Test
command: pytest -v -s test_logits_processor.py

- label: Worker Test
command: pytest -v -s worker
Expand All @@ -56,7 +65,7 @@ steps:
command: pytest -v -s spec_decode

- label: LoRA Test %N
command: pytest -v -s lora --forked --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT
command: pytest -v -s lora --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT
parallelism: 4

- label: Metrics Test
Expand Down
36 changes: 36 additions & 0 deletions .github/actions/free-up-disk-space/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: "Free up disk space"
description: "Removes non-essential tools, libraries and cached files from GitHub action runner node"

runs:
using: "composite"
steps:
- name: "Remove non-essential tools and libraries"
shell: bash
run: |
# https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
echo "Disk usage before cleanup:"
df -h
echo "Removing non-essential tools and libraries ..."
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/.ghcup
sudo rm -rf /usr/share/dotnet
# sudo rm -rf /usr/local/share/boost
echo "Deleting libraries for Android (12G), CodeQL (5.3G), PowerShell (1.3G), Swift (1.7G) ..."
sudo rm -rf /usr/local/lib/android
sudo rm -rf "${AGENT_TOOLSDIRECTORY}/CodeQL"
sudo rm -rf /usr/local/share/powershell
sudo rm -rf /usr/share/swift
# ref: https://github.com/jlumbroso/free-disk-space/blob/main/action.yml
echo "Deleting some larger apt packages:"
sudo apt-get remove -y azure-cli google-chrome-stable firefox powershell mono-devel libgl1-mesa-dri --fix-missing || echo "::warning::The command [sudo apt-get remove -y azure-cli google-chrome-stable firefox powershell mono-devel libgl1-mesa-dri --fix-missing] failed to complete successfully. Proceeding..."
echo "Disk usage after cleanup:"
df -h

- name: "Prune docker images"
shell: bash
run: |
echo "Pruning docker images ..."
docker image prune -a -f
docker system df
echo "Disk usage after pruning docker images:"
df -h
128 changes: 128 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
name: "Build"

on:
workflow_dispatch:

push:
branches:
- release
paths-ignore:
- "**.md"
- "proto/**"

pull_request:
branches:
- main
paths-ignore:
- "**.md"
- "proto/**"

defaults:
run:
shell: bash

env:
SERVER_IMAGE: "quay.io/wxpe/tgis-vllm"
IMAGE_REGISTRY: "quay.io"

jobs:
build:
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
env:
CACHE_IMAGE: "ghcr.io/ibm/tgis-vllm:build-cache"
CACHE_REGISTRY: "ghcr.io"
CACHE_PACKAGE_NAME: "tgis-vllm"

steps:
- name: "Checkout"
uses: actions/checkout@v4

- name: "Free up disk space"
uses: ./.github/actions/free-up-disk-space

- name: "Set up QEMU"
uses: docker/setup-qemu-action@v3

- name: "Set up Docker Buildx"
uses: docker/setup-buildx-action@v3

- name: "Log in to container registry (server-release)"
uses: docker/login-action@v3
if: github.event_name != 'pull_request'
with:
registry: ${{ env.IMAGE_REGISTRY }}
username: ${{ secrets.WXPE_QUAY_USER }}
password: ${{ secrets.WXPE_QUAY_TOKEN }}

- name: "Log in to container registry (cache image)"
uses: docker/login-action@v3
if: github.event_name != 'pull_request'
with:
registry: ${{ env.CACHE_REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: "Set build cache target"
run: |
# For push to `main` (PR merged), push a new cache image with all layers (cache-mode=max).
# For PR builds, use GitHub action cache which isolates cached layers by PR/branch.
# to optimize builds for subsequent pushes to the same PR/branch.
# Do not set a cache-to image for PR builds to not overwrite the `main` cache image and
# to not ping-pong cache images for two or more different PRs.
# Do not push cache images for each PR or multiple branches to not exceed GitHub package
# usage and traffic limitations.
# UPDATE 2024/02/26: GHA cache appears to have issues, cannot use `cache-to: gha,mode=min`
# if `cache-from: reg...,mode=max` but `cache-to: gha,mode=max` takes longer than uncached
# build and exhausts GHA cache size limits, so use cache `type=inline` (no external cache).
if [ "${{ github.event_name }}" == "pull_request" ]
then
#CACHE_TO="type=gha,mode=min"
CACHE_TO="type=inline"
else
CACHE_TO="type=registry,ref=${{ env.CACHE_IMAGE }},mode=max"
fi
echo "CACHE_TO=$CACHE_TO" >> $GITHUB_ENV

- name: "Generate tags"
id: meta
uses: docker/metadata-action@v5
with:
images: |
${{ env.SERVER_IMAGE }}
tags: |
type=ref,event=branch
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha,enable=true,priority=100,prefix=,suffix=,format=short
type=sha,enable=true,priority=100,prefix=${{ github.ref_name }}.,suffix=,format=short

- name: "UBI Docker build"
uses: docker/build-push-action@v5
with:
context: .
target: vllm-openai
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=registry,ref=${{ env.CACHE_IMAGE }}
cache-to: ${{ env.CACHE_TO }}
push: ${{ github.event_name != 'pull_request' }}
file: Dockerfile.ubi

- name: "Cleanup old cache images"
uses: actions/delete-package-versions@v5
if: ${{ github.event_name == 'push' }}
with:
package-name: ${{ env.CACHE_PACKAGE_NAME }}
package-type: container
delete-only-untagged-versions: true

- name: "List docker images"
run: docker images

- name: "Check disk usage"
shell: bash
run: |
docker system df
df -h
102 changes: 0 additions & 102 deletions .github/workflows/publish.yml

This file was deleted.

7 changes: 5 additions & 2 deletions .github/workflows/ruff.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,13 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ruff==0.1.5 codespell==2.2.6 tomli==2.0.1
pip install ruff==0.1.5 codespell==2.2.6 tomli==2.0.1 isort==5.13.2
- name: Analysing the code with ruff
run: |
ruff .
- name: Spelling check with codespell
run: |
codespell --toml pyproject.toml
codespell --toml pyproject.toml
- name: Run isort
run: |
isort . --check-only
20 changes: 0 additions & 20 deletions .github/workflows/scripts/build.sh

This file was deleted.

20 changes: 0 additions & 20 deletions .github/workflows/scripts/create_release.js

This file was deleted.

Loading