Skip to content

Commit

Permalink
fix: enable logprobs during spec decoding by default
Browse files Browse the repository at this point in the history
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
  • Loading branch information
tjohnson31415 authored and dtrifiro committed Aug 21, 2024
1 parent 19adb9d commit b361484
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions Dockerfile.ubi
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,6 @@ RUN --mount=type=cache,target=/root/.cache/pip \
uv pip install https://github.com/flashinfer-ai/flashinfer/releases/download/v0.1.2/flashinfer-0.1.2+cu121torch2.4-cp311-cp311-linux_x86_64.whl

ENV HF_HUB_OFFLINE=1 \
PORT=8000 \
HOME=/home/vllm \
# Allow requested max length to exceed what is extracted from the
# config.json
Expand All @@ -208,6 +207,13 @@ USER root
RUN --mount=type=cache,target=/root/.cache/pip \
pip install vllm-tgis-adapter==0.3.0

ENV GRPC_PORT=8033
ENV GRPC_PORT=8033 \
PORT=8000 \
# As an optimization, vLLM disables logprobs when using spec decoding by
# default, but this would be unexpected to users of a hosted model that
# happens to have spec decoding
# see: https://github.com/vllm-project/vllm/pull/6485
DISABLE_LOGPROBS_DURING_SPEC_DECODING=false

USER 2000
ENTRYPOINT ["python3", "-m", "vllm_tgis_adapter", "--uvicorn-log-level=warning"]

0 comments on commit b361484

Please sign in to comment.