Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with upstream @ v0.6.2 #169

Merged
merged 152 commits into from
Sep 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
9b4a3b2
[CI/Build] Enable InternVL2 PP test only on single node (#8437)
Isotr0py Sep 13, 2024
cab69a1
[doc] recommend pip instead of conda (#8446)
youkaichao Sep 13, 2024
06311e2
[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 (#8442)
jeejeelee Sep 13, 2024
a246912
[misc][ci] fix quant test (#8449)
youkaichao Sep 13, 2024
ecd7a1d
[Installation] Gate FastAPI version for Python 3.8 (#8456)
DarkLight1337 Sep 13, 2024
0a4806f
[plugin][torch.compile] allow to add custom compile backend (#8445)
youkaichao Sep 13, 2024
a84e598
[CI/Build] Reorganize models tests (#7820)
DarkLight1337 Sep 13, 2024
f57092c
[Doc] Add oneDNN installation to CPU backend documentation (#8467)
Isotr0py Sep 13, 2024
18e9e1f
[HotFix] Fix final output truncation with stop string + streaming (#8…
njhill Sep 13, 2024
9ba0817
bump version to v0.6.1.post2 (#8473)
simon-mo Sep 13, 2024
8517252
[Hardware][intel GPU] bump up ipex version to 2.3 (#8365)
jikunshang Sep 13, 2024
1ef0d2e
[Kernel][Hardware][Amd]Custom paged attention kernel for rocm (#8310)
charlifu Sep 14, 2024
8a0cf1d
[Model] support minicpm3 (#8297)
SUDA-HLT-ywfang Sep 14, 2024
a36e070
[torch.compile] fix functionalization (#8480)
youkaichao Sep 14, 2024
47790f3
[torch.compile] add a flag to disable custom op (#8488)
youkaichao Sep 14, 2024
50e9ec4
[TPU] Implement multi-step scheduling (#8489)
WoosukKwon Sep 14, 2024
3724d5f
[Bugfix][Model] Fix Python 3.8 compatibility in Pixtral model by upda…
chrisociepa Sep 15, 2024
fc990f9
[Bugfix][Kernel] Add `IQ1_M` quantization implementation to GGUF kern…
Isotr0py Sep 15, 2024
a091e2d
[Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032)
ElizaWszola Sep 16, 2024
837c196
[Frontend] Expose revision arg in OpenAI server (#8501)
lewtun Sep 16, 2024
acd5511
[BugFix] Fix clean shutdown issues (#8492)
njhill Sep 16, 2024
781e3b9
[Bugfix][Kernel] Fix build for sm_60 in GGUF kernel (#8506)
sasha0552 Sep 16, 2024
5d73ae4
[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270)
ProExpertProg Sep 16, 2024
2759a43
[doc] update doc on testing and debugging (#8514)
youkaichao Sep 16, 2024
47f5e03
[Bugfix] Bind api server port before starting engine (#8491)
kevin314 Sep 16, 2024
5478c4b
[perf bench] set timeout to debug hanging (#8516)
simon-mo Sep 16, 2024
5ce45eb
[misc] small qol fixes for release process (#8517)
simon-mo Sep 16, 2024
cca6164
[Bugfix] Fix 3.12 builds on main (#8510)
joerunde Sep 17, 2024
546034b
[refactor] remove triton based sampler (#8524)
simon-mo Sep 17, 2024
1c1bb38
[Frontend] Improve Nullable kv Arg Parsing (#8525)
alex-jw-brooks Sep 17, 2024
ee2bcea
[Misc][Bugfix] Disable guided decoding for mistral tokenizer (#8521)
ywang96 Sep 17, 2024
99aa4ed
[torch.compile] register allreduce operations as custom ops (#8526)
youkaichao Sep 17, 2024
cbdb252
[Misc] Limit to ray[adag] 2.35 to avoid backward incompatible change …
ruisearch42 Sep 17, 2024
1b6de83
[Benchmark] Support sample from HF datasets and image input for bench…
Isotr0py Sep 17, 2024
1009e93
[Encoder decoder] Add cuda graph support during decoding for encoder-…
sroy745 Sep 17, 2024
9855b99
[Feature][kernel] tensor parallelism with bitsandbytes quantization (…
chenqianfzh Sep 17, 2024
a54ed80
[Model] Add mistral function calling format to all models loaded with…
patrickvonplaten Sep 17, 2024
56c3de0
[Misc] Don't dump contents of kvcache tensors on errors (#8527)
njhill Sep 17, 2024
98f9713
[Bugfix] Fix TP > 1 for new granite (#8544)
joerunde Sep 17, 2024
fa0c114
[doc] improve installation doc (#8550)
youkaichao Sep 17, 2024
09deb47
[CI/Build] Excluding kernels/test_gguf.py from ROCm (#8520)
alexeykondrat Sep 17, 2024
8110e44
[Kernel] Change interface to Mamba causal_conv1d_update for continuou…
tlrmchlsmth Sep 17, 2024
95965d3
[CI/Build] fix Dockerfile.cpu on podman (#8540)
dtrifiro Sep 18, 2024
e351572
[Misc] Add argument to disable FastAPI docs (#8554)
Jeffwan Sep 18, 2024
6ffa3f3
[CI/Build] Avoid CUDA initialization (#8534)
DarkLight1337 Sep 18, 2024
9d104b5
[CI/Build] Update Ruff version (#8469)
aarnphm Sep 18, 2024
7c7714d
[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#…
alexm-neuralmagic Sep 18, 2024
a8c1d16
[Core] *Prompt* logprobs support in Multi-step (#8199)
afeldman-nm Sep 18, 2024
d65798f
[Core] zmq: bind only to 127.0.0.1 for local-only usage (#8543)
russellb Sep 18, 2024
e18749f
[Model] Support Solar Model (#8386)
shing100 Sep 18, 2024
b3195bc
[AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call (#8380)
gshtras Sep 18, 2024
db9120c
[Kernel] Change interface to Mamba selective_state_update for continu…
tlrmchlsmth Sep 18, 2024
d9cd78e
[BugFix] Nonzero exit code if MQLLMEngine startup fails (#8572)
njhill Sep 18, 2024
0d47bf3
[Bugfix] add `dead_error` property to engine client (#8574)
joerunde Sep 18, 2024
4c34ce8
[Kernel] Remove marlin moe templating on thread_m_blocks (#8573)
tlrmchlsmth Sep 19, 2024
3118f63
[Bugfix] [Encoder-Decoder] Bugfix for encoder specific metadata const…
sroy745 Sep 19, 2024
02c9afa
Revert "[Misc][Bugfix] Disable guided decoding for mistral tokenizer"…
ywang96 Sep 19, 2024
c52ec5f
[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (#8616)
KuntaiDu Sep 19, 2024
855c8ae
[MISC] remove engine_use_ray in benchmark_throughput.py (#8615)
jikunshang Sep 19, 2024
76515f3
[Frontend] Use MQLLMEngine for embeddings models too (#8584)
njhill Sep 19, 2024
9cc373f
[Kernel][Amd] Add fp8 kv cache support for rocm custom paged attentio…
charlifu Sep 19, 2024
e42c634
[Core] simplify logits resort in _apply_top_k_top_p (#8619)
hidva Sep 19, 2024
ea4647b
[Doc] Add documentation for GGUF quantization (#8618)
Isotr0py Sep 19, 2024
9e99407
Create SECURITY.md (#8642)
simon-mo Sep 19, 2024
6cb748e
[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that…
alexeykondrat Sep 19, 2024
de6f90a
[Misc] guard against change in cuda library name (#8609)
bnellnm Sep 19, 2024
18ae428
[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571)
garg-amit Sep 20, 2024
9e5ec35
[bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetad…
SolitaryThinker Sep 20, 2024
260d40b
[Core] Support Lora lineage and base model metadata management (#6315)
Jeffwan Sep 20, 2024
3b63de9
[Model] Add OLMoE (#7922)
Muennighoff Sep 20, 2024
2940afa
[CI/Build] Removing entrypoints/openai/test_embedding.py test from RO…
alexeykondrat Sep 20, 2024
b28298f
[Bugfix] Validate SamplingParam n is an int (#8548)
saumya-saran Sep 20, 2024
035fa89
[Misc] Show AMD GPU topology in `collect_env.py` (#8649)
DarkLight1337 Sep 20, 2024
2874bac
[Bugfix] Config got an unexpected keyword argument 'engine' (#8556)
Juelianqvq Sep 20, 2024
b4e4eda
[Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640)
patrickvonplaten Sep 20, 2024
7c8566a
[Doc] neuron documentation update (#8671)
omrishiv Sep 20, 2024
7f9c890
[Hardware][AWS] update neuron to 2.20 (#8676)
omrishiv Sep 20, 2024
0f961b3
[Bugfix] Fix incorrect llava next feature size calculation (#8496)
zyddnys Sep 20, 2024
0057894
[Core] Rename `PromptInputs` and `inputs`(#8673)
DarkLight1337 Sep 21, 2024
d4bf085
[MISC] add support custom_op check (#8557)
jikunshang Sep 21, 2024
0455c46
[Core] Factor out common code in `SequenceData` and `Sequence` (#8675)
DarkLight1337 Sep 21, 2024
0faab90
[beam search] add output for manually checking the correctness (#8684)
youkaichao Sep 21, 2024
71c6049
[Kernel] Build flash-attn from source (#8245)
ProExpertProg Sep 21, 2024
5e85f4f
[VLM] Use `SequenceData.from_token_counts` to create dummy data (#8687)
DarkLight1337 Sep 21, 2024
4dfdf43
[Doc] Fix typo in AMD installation guide (#8689)
Imss27 Sep 21, 2024
ec4aaad
[Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x…
rasmith Sep 21, 2024
9dc7c6c
[dbrx] refactor dbrx experts to extend FusedMoe class (#8518)
divakar-amd Sep 21, 2024
d66ac62
[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (…
tlrmchlsmth Sep 21, 2024
13d88d4
[Bugfix] Refactor composite weight loading logic (#8656)
Isotr0py Sep 22, 2024
0e40ac9
[ci][build] fix vllm-flash-attn (#8699)
youkaichao Sep 22, 2024
06ed281
[Model] Refactor BLIP/BLIP-2 to support composite model loading (#8407)
DarkLight1337 Sep 22, 2024
8ca5051
[Misc] Use NamedTuple in Multi-image example (#8705)
alex-jw-brooks Sep 22, 2024
ca2b628
[MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (#8703)
statelesshz Sep 22, 2024
5b59532
[Model][VLM] Add LLaVA-Onevision model support (#8486)
litianjian Sep 22, 2024
c6bd70d
[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701)
LiuXiaoxuanPKU Sep 22, 2024
d4a2ac8
[build] enable existing pytorch (for GH200, aarch64, nightly) (#8713)
youkaichao Sep 22, 2024
92ba7e7
[misc] upgrade mistral-common (#8715)
youkaichao Sep 22, 2024
3dda7c2
[Bugfix] Avoid some bogus messages RE CUTLASS's revision when buildin…
tlrmchlsmth Sep 23, 2024
57a0702
[Bugfix] Fix CPU CMake build (#8723)
ProExpertProg Sep 23, 2024
d23679e
[Bugfix] fix docker build for xpu (#8652)
yma11 Sep 23, 2024
9b8c8ba
[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657)
alex-jw-brooks Sep 23, 2024
e551ca1
[Hardware][CPU] Refactor CPU model runner (#8729)
Isotr0py Sep 23, 2024
3e83c12
[Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model…
bigPYJ1151 Sep 23, 2024
a79e522
[Model] Support pp for qwen2-vl (#8696)
liuyanyi Sep 23, 2024
f2bd246
[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use …
janimo Sep 23, 2024
ee5f34b
[CI/Build] use setuptools-scm to set __version__ (#4738)
dtrifiro Sep 23, 2024
86e9c8d
[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GP…
LucasWilkinson Sep 23, 2024
9b0e3ec
[Kernel][LoRA] Add assertion for punica sgmv kernels (#7585)
jeejeelee Sep 23, 2024
b05f5c9
[Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575)
russellb Sep 23, 2024
5f7bb58
Fix typical acceptance sampler with correct recovered token ids (#8562)
jiqing-feng Sep 23, 2024
1a2aef3
Add output streaming support to multi-step + async while ensuring Req…
alexm-neuralmagic Sep 23, 2024
530821d
[Hardware][AMD] ROCm6.2 upgrade (#8674)
hongxiayang Sep 24, 2024
88577ac
Fix tests in test_scheduler.py that fail with BlockManager V2 (#8728)
sroy745 Sep 24, 2024
0250dd6
re-implement beam search on top of vllm core (#8726)
youkaichao Sep 24, 2024
3185fb0
Revert "[Core] Rename `PromptInputs` to `PromptType`, and `inputs` to…
simon-mo Sep 24, 2024
b8747e8
[MISC] Skip dumping inputs when unpicklable (#8744)
comaniac Sep 24, 2024
3f06bae
[Core][Model] Support loading weights by ID within models (#7931)
petersalas Sep 24, 2024
8ff7ced
[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658)
alex-jw-brooks Sep 24, 2024
cc4325b
[Bugfix] Fix potentially unsafe custom allreduce synchronization (#8558)
hanzhi713 Sep 24, 2024
a928ded
[Kernel] Split Marlin MoE kernels into multiple files (#8661)
ElizaWszola Sep 24, 2024
2529d09
[Frontend] Batch inference for llm.chat() API (#8648)
aandyw Sep 24, 2024
72fc97a
[Bugfix] Fix torch dynamo fixes caused by `replace_parameters` (#8748)
LucasWilkinson Sep 24, 2024
2467b64
[CI/Build] fix setuptools-scm usage (#8771)
dtrifiro Sep 24, 2024
1e7d5c0
[misc] soft drop beam search (#8763)
youkaichao Sep 24, 2024
13f9f7a
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768)
jeejeelee Sep 25, 2024
01b6f9e
[Core][Bugfix] Support prompt_logprobs returned with speculative deco…
tjohnson31415 Sep 25, 2024
6da1ab6
[Core] Adding Priority Scheduling (#5958)
apatke Sep 25, 2024
6e0c9d6
[Bugfix] Use heartbeats instead of health checks (#8583)
joerunde Sep 25, 2024
ee777d9
Fix test_schedule_swapped_simple in test_scheduler.py (#8780)
sroy745 Sep 25, 2024
b452247
[Bugfix][Kernel] Implement acquire/release polyfill for Pascal (#8776)
sasha0552 Sep 25, 2024
fc3afc2
Fix tests in test_chunked_prefill_scheduler which fail with BlockMana…
sroy745 Sep 25, 2024
e3dd069
[BugFix] Propagate 'trust_remote_code' setting in internvl and minicp…
zifeitong Sep 25, 2024
c239536
[Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (#8770)
Isotr0py Sep 25, 2024
3e073e6
[Bugfix] load fc bias from config for eagle (#8790)
sohamparikh Sep 25, 2024
1ac3de0
[Frontend] OpenAI server: propagate usage accounting to FastAPI middl…
agt Sep 25, 2024
3368c3a
[Bugfix] Ray 2.9.x doesn't expose available_resources_per_node (#8767)
darthhexx Sep 25, 2024
8fae5ed
[Misc] Fix minor typo in scheduler (#8765)
wooyeonlee0 Sep 25, 2024
1c04644
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 up…
hongxiayang Sep 25, 2024
300da09
[Kernel] Fullgraph and opcheck tests (#8479)
bnellnm Sep 25, 2024
c6f2485
[[Misc]] Add extra deps for openai server image (#8792)
jeejeelee Sep 25, 2024
0c4d2ad
[VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614)
DefTruth Sep 25, 2024
28e1299
rename PromptInputs and inputs with backward compatibility (#8760)
DarkLight1337 Sep 25, 2024
64840df
[Frontend] MQLLMEngine supports profiling. (#8761)
abatom Sep 25, 2024
873edda
[Misc] Support FP8 MoE for compressed-tensors (#8588)
mgoin Sep 25, 2024
4f1ba08
Revert "rename PromptInputs and inputs with backward compatibility (#…
simon-mo Sep 25, 2024
770ec60
[Model] Add support for the multi-modal Llama 3.2 model (#8811)
heheda12345 Sep 25, 2024
e2c6e0a
[Doc] Update doc for Transformers 4.45 (#8817)
ywang96 Sep 25, 2024
7193774
[Misc] Support quantization of MllamaForCausalLM (#8822)
mgoin Sep 25, 2024
18d7da7
Sync with upstream @ v0.6.2
dtrifiro Sep 26, 2024
56fdd53
Dockerfile.rocm.ubi: add setuptools-scm build dependency
dtrifiro Sep 26, 2024
d151278
Dockerfile.ubi: add VLLM_FA_CMAKE_GPU_ARCHES
dtrifiro Sep 26, 2024
78a09a7
Dockerfile.ubi: use COPY . . to make repo available when building wheel
dtrifiro Sep 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 1 addition & 2 deletions .buildkite/nightly-benchmarks/benchmark-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,7 @@ steps:
containers:
- image: badouralix/curl-jq
command:
- sh
- .buildkite/nightly-benchmarks/scripts/wait-for-image.sh
- sh .buildkite/nightly-benchmarks/scripts/wait-for-image.sh
- wait
- label: "A100"
agents:
Expand Down
4 changes: 3 additions & 1 deletion .buildkite/nightly-benchmarks/scripts/wait-for-image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@
TOKEN=$(curl -s -L "https://public.ecr.aws/token?service=public.ecr.aws&scope=repository:q9t5s3a7/vllm-ci-test-repo:pull" | jq -r .token)
URL="https://public.ecr.aws/v2/q9t5s3a7/vllm-ci-test-repo/manifests/$BUILDKITE_COMMIT"

TIMEOUT_SECONDS=10

retries=0
while [ $retries -lt 1000 ]; do
if [ $(curl -s -L -H "Authorization: Bearer $TOKEN" -o /dev/null -w "%{http_code}" $URL) -eq 200 ]; then
if [ $(curl -s --max-time $TIMEOUT_SECONDS -L -H "Authorization: Bearer $TOKEN" -o /dev/null -w "%{http_code}" $URL) -eq 200 ]; then
exit 0
fi

Expand Down
11 changes: 11 additions & 0 deletions .buildkite/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ if [[ $commands == *" kernels "* ]]; then
--ignore=kernels/test_encoder_decoder_attn.py \
--ignore=kernels/test_flash_attn.py \
--ignore=kernels/test_flashinfer.py \
--ignore=kernels/test_gguf.py \
--ignore=kernels/test_int8_quant.py \
--ignore=kernels/test_machete_gemm.py \
--ignore=kernels/test_mamba_ssm.py \
Expand All @@ -93,6 +94,16 @@ if [[ $commands == *" kernels "* ]]; then
--ignore=kernels/test_sampler.py"
fi

#ignore certain Entrypoints tests
if [[ $commands == *" entrypoints/openai "* ]]; then
commands=${commands//" entrypoints/openai "/" entrypoints/openai \
--ignore=entrypoints/openai/test_accuracy.py \
--ignore=entrypoints/openai/test_audio.py \
--ignore=entrypoints/openai/test_encoder_decoder.py \
--ignore=entrypoints/openai/test_embedding.py \
--ignore=entrypoints/openai/test_oot_registration.py "}
fi

PARALLEL_JOB_COUNT=8
# check if the command contains shard flag, we will run all shards in parallel because the host have 8 GPUs.
if [[ $commands == *"--shard-id="* ]]; then
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/run-cpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ docker exec cpu-test-avx2 bash -c "python3 examples/offline_inference.py"

# Run basic model test
docker exec cpu-test bash -c "
pip install pytest matplotlib einops transformers_stream_generator
pip install pytest matplotlib einops transformers_stream_generator datamodel_code_generator
pytest -v -s tests/models/decoder_only/language \
--ignore=tests/models/test_fp8.py \
--ignore=tests/models/decoder_only/language/test_jamba.py \
Expand Down
45 changes: 34 additions & 11 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,15 @@ steps:
fast_check: true
source_file_dependencies:
- vllm/
- tests/mq_llm_engine
- tests/async_engine
- tests/test_inputs
- tests/multimodal
- tests/test_utils
- tests/worker
commands:
- pytest -v -s async_engine # Async Engine
- pytest -v -s mq_llm_engine # MQLLMEngine
- pytest -v -s async_engine # AsyncLLMEngine
- NUM_SCHEDULER_STEPS=4 pytest -v -s async_engine/test_async_llm_engine.py
- pytest -v -s test_inputs.py
- pytest -v -s multimodal
Expand All @@ -68,7 +70,7 @@ steps:
- VLLM_ATTENTION_BACKEND=XFORMERS pytest -v -s basic_correctness/test_chunked_prefill.py
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s basic_correctness/test_chunked_prefill.py
- VLLM_TEST_ENABLE_ARTIFICIAL_PREEMPT=1 pytest -v -s basic_correctness/test_preemption.py

- label: Core Test # 10min
mirror_hardwares: [amd]
fast_check: true
Expand All @@ -82,14 +84,17 @@ steps:
- label: Entrypoints Test # 20min
working_dir: "/vllm-workspace/tests"
fast_check: true
#mirror_hardwares: [amd]
mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
commands:
- pip install -e ./plugins/vllm_add_dummy_model
- pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@a4987bba6e9e9b3f22bd3a6c1ecf0abd04fd5622#egg=lm_eval[api]
- pytest -v -s entrypoints/llm --ignore=entrypoints/llm/test_lazy_outlines.py
- pytest -v -s entrypoints/llm --ignore=entrypoints/llm/test_lazy_outlines.py --ignore=entrypoints/llm/test_generate.py --ignore=entrypoints/llm/test_generate_multiple_loras.py --ignore=entrypoints/llm/test_guided_generate.py
- pytest -v -s entrypoints/llm/test_lazy_outlines.py # it needs a clean process
- pytest -v -s entrypoints/llm/test_generate.py # it needs a clean process
- pytest -v -s entrypoints/llm/test_generate_multiple_loras.py # it needs a clean process
- pytest -v -s entrypoints/llm/test_guided_generate.py # it needs a clean process
- pytest -v -s entrypoints/openai
- pytest -v -s entrypoints/test_chat_utils.py
- pytest -v -s entrypoints/offline_mode # Needs to avoid interference with other tests
Expand Down Expand Up @@ -163,13 +168,6 @@ steps:
- python3 tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 offline_inference_encoder_decoder.py

- label: torch compile integration test
source_file_dependencies:
- vllm/
commands:
- pytest -v -s ./compile/test_full_graph.py
- pytest -v -s ./compile/test_wrapper.py

- label: Prefix Caching Test # 7min
#mirror_hardwares: [amd]
source_file_dependencies:
Expand Down Expand Up @@ -212,6 +210,21 @@ steps:
command: pytest -v -s lora --shard-id=$$BUILDKITE_PARALLEL_JOB --num-shards=$$BUILDKITE_PARALLEL_JOB_COUNT --ignore=lora/test_long_context.py
parallelism: 4

- label: "PyTorch Fullgraph Smoke Test"
fast_check: true
source_file_dependencies:
- vllm/
- tests/compile
commands:
- pytest -v -s compile/test_full_graph_smoke.py

- label: "PyTorch Fullgraph Test"
source_file_dependencies:
- vllm/
- tests/compile
commands:
- pytest -v -s compile/test_full_graph.py

- label: Kernels Test %N # 30min each
mirror_hardwares: [amd]
source_file_dependencies:
Expand Down Expand Up @@ -259,6 +272,13 @@ steps:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- bash ./run-tests.sh -c configs/models-small.txt -t 1

- label: Encoder Decoder tests # 5min
source_file_dependencies:
- vllm/
- tests/encoder_decoder
commands:
- pytest -v -s encoder_decoder

- label: OpenAI-Compatible Tool Use # 20 min
fast_check: false
mirror_hardwares: [ amd ]
Expand Down Expand Up @@ -348,7 +368,10 @@ steps:
- vllm/executor/
- vllm/model_executor/models/
- tests/distributed/
- vllm/compilation
commands:
- pytest -v -s ./compile/test_full_graph_multi_gpu.py
- pytest -v -s ./compile/test_wrapper.py
- VLLM_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 distributed/test_same_node.py | grep -q 'Same node test passed'
- TARGET_TEST_SUITE=L4 pytest basic_correctness/ -v -s -m distributed_2_gpus
# Avoid importing model tests that cause CUDA reinitialization error
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/ruff.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ruff==0.1.5 codespell==2.3.0 tomli==2.0.1 isort==5.13.2
pip install -r requirements-lint.txt
- name: Analysing the code with ruff
run: |
ruff .
ruff check .
- name: Spelling check with codespell
run: |
codespell --toml pyproject.toml
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/scripts/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@ $python_executable -m pip install -r requirements-cuda.txt
export MAX_JOBS=1
# Make sure release wheels are built for the following architectures
export TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8.9 9.0+PTX"
export VLLM_FA_CMAKE_GPU_ARCHES="80-real;90-real"
# Build
$python_executable setup.py bdist_wheel --dist-dir=dist
9 changes: 7 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# vllm commit id, generated by setup.py
vllm/commit_id.py
# version file generated by setuptools-scm
/vllm/_version.py

# vllm-flash-attn built from source
vllm/vllm_flash_attn/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand All @@ -12,6 +15,8 @@ __pycache__/
# Distribution / packaging
.Python
build/
cmake-build-*/
CMakeUserPresets.json
develop-eggs/
dist/
downloads/
Expand Down
Loading
Loading