[CI] Reduce wheel size by not shipping debug symbols (vllm-project#4602)

neuralmagic · May 6, 2024 · 9f817f0 · 9f817f0 · github-actions · May 6, 2024
1 parent 2d96b61
commit 9f817f0
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 0 deletions.
diff --git a/.buildkite/check-wheel-size.py b/.buildkite/check-wheel-size.py
@@ -25,6 +25,9 @@ def check_wheel_size(directory):
                         f"compare to the allowed size ({MAX_SIZE_MB} MB).")
                     print_top_10_largest_files(wheel_path)
                     return 1
+                else:
+                    print(f"Wheel {wheel_path} is within the allowed size "
+                          f"({wheel_size_mb} MB).")
     return 0
 
 

diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -79,6 +79,8 @@ jobs:
 
       - name: Build wheel
         shell: bash
+        env:
+          CMAKE_BUILD_TYPE: Release # do not compile with debug symbol to reduce wheel size
         run: |
           bash -x .github/workflows/scripts/build.sh ${{ matrix.python-version }} ${{ matrix.cuda-version }}
           wheel_name=$(ls dist/*whl | xargs -n 1 basename)
Benchmark suite	Current: `9f817f0`	Previous: `df1f1a0`	Ratio
`{"name": "request_throughput", "description": "VLLM Engine throughput - synthetic\nmodel - NousResearch/Llama-2-7b-chat-hf\nmax_model_len - 4096\nbenchmark_throughput {\n \"use-all-available-gpus_\": \"\",\n \"input-len\": 256,\n \"output-len\": 128,\n \"num-prompts\": 1000\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.3.0+cu121"}`	`3.8359607670245874` prompts/s
`{"name": "token_throughput", "description": "VLLM Engine throughput - synthetic\nmodel - NousResearch/Llama-2-7b-chat-hf\nmax_model_len - 4096\nbenchmark_throughput {\n \"use-all-available-gpus_\": \"\",\n \"input-len\": 256,\n \"output-len\": 128,\n \"num-prompts\": 1000\n}", "gpu_description": "NVIDIA A10G x 1", "vllm_version": "0.2.0", "python_version": "3.10.12 (main, Mar 7 2024, 18:39:53) [GCC 9.4.0]", "torch_version": "2.3.0+cu121"}`	`1473.0089345374417` tokens/s