Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[do not merge] sccache test #170

Closed
wants to merge 77 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
dbf639e
chore: add fork OWNERS
z103cb Apr 30, 2024
a24f03d
add ubi Dockerfile
dtrifiro May 21, 2024
753f948
Dockerfile.ubi: remove references to grpc/protos
dtrifiro May 21, 2024
84eb826
Dockerfile.ubi: use vllm-tgis-adapter
dtrifiro May 28, 2024
577cb43
gha: add sync workflow
dtrifiro Jun 3, 2024
e3be40e
Dockerfile.ubi: use distributed-executor-backend=mp as default
dtrifiro Jun 10, 2024
6844cb9
Dockerfile.ubi: remove vllm-nccl workaround
dtrifiro Jun 13, 2024
cbffbf1
Dockerfile.ubi: add missing requirements-*.txt bind mounts
dtrifiro Jun 18, 2024
ae4ac8d
add triton CustomCacheManger
tdoublep May 29, 2024
bd5ea86
gha: sync-with-upstream workflow create PRs as draft
dtrifiro Jun 19, 2024
57b2cbf
add smoke/unit tests scripts
dtrifiro Jun 19, 2024
5b1ada5
extras: exit unit tests on err
dtrifiro Jun 20, 2024
3a3dae7
Dockerfile.ubi: misc improvements
dtrifiro May 28, 2024
f8e70cd
update OWNERS
dtrifiro Jun 21, 2024
57fdb00
Dockerfile.ubi: use tensorizer (#64)
prashantgupta24 Jun 25, 2024
586c30c
Dockerfile.ubi: pin vllm-tgis-adapter to 0.1.2
dtrifiro Jun 26, 2024
f0dc391
gha: fix fetch step in upstream sync workflow
dtrifiro Jul 2, 2024
cfded1b
gha: always update sync workflow PR body/title
dtrifiro Jul 2, 2024
5493671
Dockerfile.ubi: bump vllm-tgis-adapter to 0.1.3
dtrifiro Jul 3, 2024
eb82bd4
Dockerfile.ubi: get rid of --distributed-executor-backend=mp
dtrifiro Jul 10, 2024
a4bb48b
Dockerfile.ubi: add flashinfer
dtrifiro Jul 9, 2024
8dcca5c
pin adapter to 2.0.0
prashantgupta24 Jul 12, 2024
48aa285
deps: bump flashinfer to 0.0.9
dtrifiro Jul 15, 2024
8955a3b
Update OWNERS with IBM folks
heyselbi Jun 27, 2024
c66756a
Dockerfile.ubi: bind mount .git dir to allow inclusion of git commit …
dtrifiro Jul 17, 2024
9d29a26
gha: remove reminder_comment
dtrifiro Jul 17, 2024
24a9763
Dockerfile: bump vllm-tgis-adapter to 0.2.1
dtrifiro Jul 18, 2024
2e83b96
fix: update setup.py to differentiate between fork and upstream
nathan-weinberg Jul 18, 2024
6cb0961
Dockerfile.ubi: properly mount .git dir
dtrifiro Jul 19, 2024
a6ee52d
Revert "[CI/Build] fix: update setup.py to differentiate between fork…
dtrifiro Jul 19, 2024
6ada55d
Dockerfile.ubi: bump vllm-tgis-adapter to 0.2.2
dtrifiro Jul 19, 2024
155d16f
gha: remove unused upstream workflows
dtrifiro Jul 23, 2024
0835e4d
deps: bump vllm-tgis-adapter to 0.2.3
dtrifiro Jul 24, 2024
d40e335
Dockerfile.ubi: get rid of custom cache manager
dtrifiro Jul 24, 2024
2f92fe8
Dockerfile.ubi: add missing dependency
dtrifiro Aug 6, 2024
e061bb7
deps: bump vllm-tgis-adapter to 0.3.0
dtrifiro Jul 24, 2024
1f786f1
Dockerfile.ubi: force using python-installed cuda runtime libraries
dtrifiro Aug 12, 2024
7b71e0c
Dockerfile: use uv pip everywhere (it's faster)
dtrifiro Aug 12, 2024
ecb1c9d
Dockerfile.ubi: bump flashinfer to 0.1.2
dtrifiro Aug 5, 2024
a5c68a0
feat: allow long max seq length
tjohnson31415 Aug 8, 2024
367809a
smoke test: kill server on timeout
dtrifiro Aug 13, 2024
fcd3419
Dockerfile.ubi: set vllm_tgis_adapter unicorn log level to warning
dtrifiro Aug 13, 2024
35a9167
fix: enable logprobs during spec decoding by default
tjohnson31415 Aug 20, 2024
0df042d
deps: bump vllm-tgis-adapter to 0.4.0 (#132)
vaibhavjainwiz Aug 21, 2024
5a14601
Disable usage tracking
stevegrubb Aug 29, 2024
91520a7
Start by updating the image
stevegrubb Sep 4, 2024
7c76447
Update ROCm build for UBI
Xaenalt Sep 3, 2024
05dd581
Add sample chat template into vLLM container
vaibhavjainwiz Sep 10, 2024
c540965
Harden build of libsodium
stevegrubb Aug 27, 2024
d02a789
Update Dockerfile.ubi
RH-steve-grubb Sep 4, 2024
1491b9e
Update OWNERS file
vaibhavjainwiz Sep 16, 2024
fed9b25
Merge pull request #160 from vaibhavjainwiz/update_OWNER
vaibhavjainwiz Sep 16, 2024
e30866f
[CI/Build] Enable InternVL2 PP test only on single node (#8437)
Isotr0py Sep 13, 2024
616f5ad
[doc] recommend pip instead of conda (#8446)
youkaichao Sep 13, 2024
8b1f881
[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 (#8442)
jeejeelee Sep 13, 2024
07afc6d
[misc][ci] fix quant test (#8449)
youkaichao Sep 13, 2024
2c657ec
[Installation] Gate FastAPI version for Python 3.8 (#8456)
DarkLight1337 Sep 13, 2024
f8d2bf0
[plugin][torch.compile] allow to add custom compile backend (#8445)
youkaichao Sep 13, 2024
754dc0f
[CI/Build] Reorganize models tests (#7820)
DarkLight1337 Sep 13, 2024
bf7e710
[Doc] Add oneDNN installation to CPU backend documentation (#8467)
Isotr0py Sep 13, 2024
8d32eaf
[HotFix] Fix final output truncation with stop string + streaming (#8…
njhill Sep 13, 2024
1ed711a
bump version to v0.6.1.post2 (#8473)
simon-mo Sep 13, 2024
93c04f3
Dockerfile.rocm.ubi: cleanup
dtrifiro Sep 6, 2024
66984d4
add vllm-tgis-adapter layer
dtrifiro Sep 11, 2024
f5387d0
Dockerfile.ubi: bump python to 3.12
dtrifiro Sep 12, 2024
b3abd3a
Dockerfile.ubi: bump flashinfer to 0.1.6
dtrifiro Sep 12, 2024
9f85dae
Dockerfile.rocm.ubi: do not use nightly pytorch_triton
dtrifiro Sep 16, 2024
083f0d5
Dockerfile.ubi: fix PYTHON_VERSION arg usage
dtrifiro Sep 17, 2024
c29c9f4
Dockerfile.rocm.ubi: move microdnf update in base stage
dtrifiro Sep 25, 2024
69ac6c1
Dockerfile.rocm.ubi: bump torch version to 2.5.0.dev20240912+rocm6.1
dtrifiro Sep 25, 2024
399c114
Dockerfile.rocm.ubi: get rid of build triton stage
dtrifiro Sep 25, 2024
9ebf28d
Merge pull request #167 from dtrifiro/fix-amd-build
RH-steve-grubb Sep 25, 2024
18d7da7
Sync with upstream @ v0.6.2
dtrifiro Sep 26, 2024
56fdd53
Dockerfile.rocm.ubi: add setuptools-scm build dependency
dtrifiro Sep 26, 2024
d151278
Dockerfile.ubi: add VLLM_FA_CMAKE_GPU_ARCHES
dtrifiro Sep 26, 2024
42a5a5b
:alembic: try public sccache
joerunde Sep 26, 2024
8abfa38
:fire: remove sudo
joerunde Sep 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 0 additions & 21 deletions .github/workflows/add_label_automerge.yml

This file was deleted.

21 changes: 0 additions & 21 deletions .github/workflows/reminder_comment.yml

This file was deleted.

84 changes: 84 additions & 0 deletions .github/workflows/sync-with-upstream.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
name: "Sync with upstream"

on:
schedule:
- cron: 20 4 * * *

workflow_dispatch:


env:
# repo to fetch changes from
UPSTREAM_REPO: vllm-project/vllm
# branch to sync
BRANCH: main

jobs:
upstream-sync:
name: Sync with upstream
runs-on: ubuntu-latest
permissions:
pull-requests: write
contents: write

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Fetch upstream repo
run: |
git remote add upstream https://github.com/${UPSTREAM_REPO}
git fetch upstream

- name: Check diff
id: diff
shell: bash
run: |
echo 'diff<<EOF' >> $GITHUB_OUTPUT
git diff --stat upstream/${BRANCH} | tee -a >(cat >> $GITHUB_OUTPUT)
echo 'EOF' >> $GITHUB_OUTPUT

- name: Create PR
if: ${{ steps.diff.outputs.diff != '' }}
env:
GH_TOKEN: ${{ github.token }}
run: |
set -xeu

git_hash="$(git rev-parse upstream/${BRANCH})"
echo "git_hash=$git_hash" >> $GITHUB_OUTPUT
git_describe="$(git describe --tags upstream/${BRANCH})"
echo "git_describe=$git_describe" >> $GITHUB_OUTPUT

# echo 'commits<<EOF' >> $GITHUB_OUTPUT
# git log --oneline ..upstream/${BRANCH} >> $GITHUB_OUTPUT
# echo 'EOF' >> $GITHUB_OUTPUT

upstream_url="https://github.com/${UPSTREAM_REPO}"
upstream_branch="$upstream_url/tree/${BRANCH}"

title="Sync with upstream@${git_describe}"
body="Merge [${UPSTREAM_REPO}]($upstream_url):[${BRANCH}]($upstream_branch)@[${git_describe}](${upstream_url}/commit/$git_hash) into $BRANCH"

gh repo set-default $GITHUB_REPOSITORY
pr_number=$(gh pr list -S "Sync with upstream@" --json number --jq '.[0].number')

if [[ -z $pr_number ]]; then
echo "Creating PR"
gh pr create \
--head $(echo $UPSTREAM_REPO | sed 's|/|:|g'):${BRANCH} \
--base ${BRANCH} \
--label code-sync \
--title "$title" \
--body "$body" \
--draft \
--no-maintainer-edit
exit 0
fi

echo "Updating PR \#${pr_number}"
gh pr edit \
$pr_number \
--body "$body" \
--title "$title"
227 changes: 227 additions & 0 deletions Dockerfile.rocm.ubi
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
## Global Args #################################################################
ARG BASE_UBI_IMAGE_TAG=9.4
ARG PYTHON_VERSION=3.12
# Default ROCm ARCHes to build vLLM for.
ARG PYTORCH_ROCM_ARCH="gfx908;gfx90a;gfx942;gfx1100"
ARG MAX_JOBS=12

FROM registry.access.redhat.com/ubi9/ubi-minimal:${BASE_UBI_IMAGE_TAG} AS base

ARG PYTHON_VERSION

ENV VIRTUAL_ENV=/opt/vllm
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

RUN --mount=type=cache,target=/root/.cache/pip \
microdnf -y update && \
microdnf install -y --setopt=install_weak_deps=0 --nodocs \
python${PYTHON_VERSION}-devel \
python${PYTHON_VERSION}-pip \
python${PYTHON_VERSION}-wheel && \
python${PYTHON_VERSION} -m venv $VIRTUAL_ENV && \
pip install -U pip wheel setuptools uv


FROM base AS rocm_base
ENV ROCM_VERSION=6.1.2

RUN printf "[amdgpu]\n\
name=amdgpu\n\
baseurl=https://repo.radeon.com/amdgpu/${ROCM_VERSION}/rhel/9.4/main/x86_64/\n\
enabled=1\n\
priority=50\n\
gpgcheck=1\n\
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key\n\
[ROCm-${ROCM_VERSION}]\n\
name=ROCm${ROCM_VERSION}\n\
baseurl=https://repo.radeon.com/rocm/rhel9/${ROCM_VERSION}/main\n\
enabled=1\n\
priority=50\n\
gpgcheck=1\n\
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key" > /etc/yum.repos.d/amdgpu.repo


RUN microdnf -y install \
rocm-hip-libraries rocm-hip-runtime \
miopen-hip && \
microdnf clean all

RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/uv \
uv pip install -v --index-url "https://download.pytorch.org/whl/nightly/rocm6.1" \
torch==2.5.0.dev20240912+rocm6.1 \
torchvision==0.20.0.dev20240912+rocm6.1

FROM rocm_base as rocm_devel

ENV CCACHE_DIR=/root/.cache/ccache

RUN rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
rpm -ql epel-release && \
microdnf -y update && \
microdnf -y install \
ccache \
git \
rocm \
hipcc \
wget \
which && \
microdnf clean all

WORKDIR /workspace

ENV LLVM_SYMBOLIZER_PATH=/opt/rocm/llvm/bin/llvm-symbolizer
ENV PATH=$PATH:/opt/rocm/bin:/libtorch/bin
ENV CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/libtorch/include:/libtorch/include/torch/csrc/api/include:/opt/rocm/include


FROM rocm_devel AS build_amdsmi

# Build AMD SMI wheel
RUN cd /opt/rocm/share/amd_smi && \
python3 -m pip wheel . --wheel-dir=/install

##################################################################################################

FROM rocm_devel AS build_flashattention

# Whether to install CK-based flash-attention
ARG BUILD_FA="1"
ARG TRY_FA_WHEEL="1"
# Note: The ROCm fork provides a wheel built for ROCm but only for 2.5.9 and python 3.9, so this will be incompatible with the current build
ARG FA_WHEEL_URL="https://github.com/ROCm/flash-attention/releases/download/v2.5.9post1-cktile-vllm/flash_attn-2.5.9.post1-cp39-cp39-linux_x86_64.whl"
# only required when not using triton backend
ARG FA_GFX_ARCHS="gfx90a;gfx942"
ARG FLASH_ATTENTION_USE_TRITON_ROCM="TRUE"
# FA_BRANCH is the main_perf branch as of Sep 4 2024 which includes triton backend support, see https://github.com/Dao-AILab/flash-attention/pull/1203
ARG FA_BRANCH="75b5360"
ARG MAX_JOBS
ENV MAX_JOBS=${MAX_JOBS}
ENV FLASH_ATTENTION_USE_TRITON_ROCM=${FLASH_ATTENTION_USE_TRITON_ROCM}

# Build ROCm flash-attention wheel if `BUILD_FA` is set to 1
RUN --mount=type=cache,target=/root/.cache/uv \
--mount=type=cache,target=/workspace/build \
if [ "$BUILD_FA" = "1" ]; then \
if [ "$TRY_FA_WHEEL" = "1" ] && python3 -m pip install "${FA_WHEEL_URL}"; then \
# If a suitable wheel exists, download it instead of building FA
mkdir -p /install && wget -N "${FA_WHEEL_URL}" -P /install; \
else \
mkdir -p /libs && \
cd /libs && \
git clone https://github.com/ROCm/flash-attention.git && \
cd flash-attention && \
git checkout ${FA_BRANCH} && \
git submodule update --init && \
uv pip install cmake ninja packaging && \
env \
GPU_ARCHS="${FA_GFX_ARCHS}" \
BUILD_TARGET="rocm" \
python3 setup.py bdist_wheel --dist-dir=/install; \
fi; \
else \
# Create an empty directory otherwise AS later build stages expect one
mkdir -p /install; \
fi

##################################################################################################

FROM rocm_devel AS build_vllm
ARG PYTORCH_ROCM_ARCH
ARG MAX_JOBS
ENV MAX_JOBS=${MAX_JOBS}
ENV PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}


COPY . .


ENV VLLM_TARGET_DEVICE="rocm"
ENV MAX_JOBS=${MAX_JOBS}
# Make sure punica kernels are built (for LoRA)
ENV VLLM_INSTALL_PUNICA_KERNELS=1

RUN --mount=type=cache,target=/root/.cache/ccache \
--mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/uv \
uv pip install -v -U \
ninja setuptools-scm>=8 "cmake>=3.26" packaging && \
python3 setup.py bdist_wheel --dist-dir=dist

##################################################################################################

FROM rocm_base AS vllm-openai
ARG MAX_JOBS

WORKDIR /workspace

ENV VIRTUAL_ENV=/opt/vllm
ENV PATH=$VIRTUAL_ENV/bin:$PATH

# Required for triton
RUN microdnf install -y --setopt=install_weak_deps=0 --nodocs gcc && \
microdnf clean all

RUN --mount=type=bind,from=build_amdsmi,src=/install,target=/install/amdsmi/ \
--mount=type=bind,from=build_flashattention,src=/install,target=/install/flashattention \
--mount=type=bind,from=build_vllm,src=/workspace/dist,target=/install/vllm/ \
--mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/uv \
uv pip install -v \
--index-strategy=unsafe-best-match \
--extra-index-url "https://download.pytorch.org/whl/nightly/rocm6.1" \
/install/amdsmi/*.whl\
/install/flashattention/*.whl\
/install/vllm/*.whl

# Set up a non-root user for OpenShift
RUN umask 002 && \
useradd --uid 2000 --gid 0 vllm && \
mkdir -p /licenses && \
chmod g+rwx $HOME /usr/src /workspace

COPY LICENSE /licenses/vllm.md
COPY examples/*.jinja /app/data/template/

ENV HF_HUB_OFFLINE=1 \
PORT=8000 \
HOME=/home/vllm \
# Allow requested max length to exceed what is extracted from the
# config.json
# see: https://github.com/vllm-project/vllm/pull/7080
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \
VLLM_USAGE_SOURCE=production-docker-image \
VLLM_WORKER_MULTIPROC_METHOD=fork \
VLLM_NO_USAGE_STATS=1 \
# Silences the HF Tokenizers warning
TOKENIZERS_PARALLELISM=false \
RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1 \
FLASH_ATTENTION_USE_TRITON_ROCM="TRUE" \
OUTLINES_CACHE_DIR=/tmp/outlines \
NUMBA_CACHE_DIR=/tmp/numba \
TRITON_CACHE_DIR=/tmp/triton

# Switch to the non-root user
USER 2000

# Set the entrypoint
ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]


FROM vllm-openai as vllm-grpc-adapter

USER root

RUN --mount=type=cache,target=/root/.cache/pip \
pip install vllm-tgis-adapter==0.4.0

ENV GRPC_PORT=8033 \
PORT=8000 \
# As an optimization, vLLM disables logprobs when using spec decoding by
# default, but this would be unexpected to users of a hosted model that
# happens to have spec decoding
# see: https://github.com/vllm-project/vllm/pull/6485
DISABLE_LOGPROBS_DURING_SPEC_DECODING=false

USER 2000
ENTRYPOINT ["python3", "-m", "vllm_tgis_adapter", "--uvicorn-log-level=warning"]
Loading
Loading