Skip to content

Latest commit

 

History

History
49 lines (45 loc) · 14.2 KB

vllm-benchmark-results-v0.2.md

File metadata and controls

49 lines (45 loc) · 14.2 KB

Benchmark performance on NVIDIA A10

Here are some preview mooncake benchmark results on A10 with up to 2 RDMA NICs. We are currently having some trouble benchmarking PyNcclConnector now. For some unknown reasons, it crashes a lot for inter-node disaggregated scenarios. So the benchmark results haven't included the PyNcclConnector yet.

In addition, we are also coordinating resources to integrate some machines with more RDMA NICs and more advanced GPUs. The official benchmark results will be released in due time.

Varying tp (input length = 1024, qps = 2, output length =6)

Setting num_rdma_nic Successful Requests Duration (s) Total Input Tokens Total Generated Tokens Req Throughput (req/s) Output Token Throughput (tok/s) Total Token Throughput (tok/s) Mean TTFT (ms) Median TTFT (ms) P99 TTFT (ms) Mean TPOT (ms) Median TPOT (ms) P99 TPOT (ms) Mean ITL (ms) Median ITL (ms) P99 ITL (ms)
tp = 1 2 200 99.47 201995 1200 2.01 12.06 2042.74 1056.76 635.00 4006.59 97.08 26.94 781.91 97.01 14.05 2205.51
tp = 2 2 200 98.98 201995 1200 2.02 12.12 2052.95 314.87 231.20 949.40 25.65 15.56 129.60 25.62 15.48 288.06
tp = 4 2 200 98.76 201995 1200 2.03 12.15 2057.44 198.10 160.03 461.61 23.52 18.93 94.38 23.50 18.01 187.79
tp = 1 1 200 99.44 201995 1200 2.01 12.07 2043.39 1071.12 631.56 4361.02 83.93 26.93 794.75 83.86 14.13 1932.66
tp = 2 1 200 98.96 201995 1200 2.02 12.13 2053.35 335.26 258.30 997.93 28.84 15.56 144.82 28.80 15.42 397.56
tp = 4 1 200 98.78 201995 1200 2.02 12.15 2057.03 201.68 162.85 456.33 22.31 16.74 94.76 22.29 16.73 189.13
tp = 1 TCP 200 99.55 201995 1200 2.01 12.05 2041.13 1414.05 766.23 6035.36 155.01 35.28 1191.24 154.91 14.32 3148.99
tp = 2 TCP 200 98.97 201995 1200 2.02 12.12 2053.03 333.74 251.32 954.63 28.74 15.49 161.24 28.70 15.35 393.52
tp = 4 TCP 200 98.78 201995 1200 2.02 12.15 2056.94 205.37 162.92 463.70 21.54 16.51 94.04 21.51 16.56 170.54

Varying qps (length = 1024, tp = 4, output length =6)

Setting num_rdma_nic Successful Requests Duration (s) Total Input Tokens Total Generated Tokens Req Throughput (req/s) Output Token Throughput (tok/s) Total Token Throughput (tok/s) Mean TTFT (ms) Median TTFT (ms) P99 TTFT (ms) Mean TPOT (ms) Median TPOT (ms) P99 TPOT (ms) Mean ITL (ms) Median ITL (ms) P99 ITL (ms)
qps = 2 2 200 98.77 201995 1200 2.02 12.15 2057.33 200.64 156.62 478.22 22.63 17.35 99.61 22.60 17.08 186.25
qps = 4 2 200 49.75 201995 1200 4.02 24.12 4084.03 341.88 240.68 1430.54 38.36 18.39 313.45 38.31 17.17 588.80
qps = 6 2 200 33.44 201995 1200 5.98 35.88 6075.54 851.15 501.59 3239.89 102.51 47.67 606.77 102.34 18.35 1704.79
qps = 8 2 200 27.16 201995 1200 7.36 44.19 7482.52 4835.08 5733.45 8846.27 1276.59 1150.11 4401.23 1274.43 48.34 20682.35
qps = 2 1 200 98.77 201995 1200 2.02 12.15 2057.31 201.77 161.53 473.44 22.13 16.52 96.18 22.11 16.51 190.40
qps = 4 1 200 49.76 201995 1200 4.02 24.12 4083.83 337.31 243.38 1395.85 39.95 17.61 325.39 39.88 17.06 838.68
qps = 6 1 200 33.44 201995 1200 5.98 35.88 6075.99 820.53 458.84 3169.52 83.92 30.50 663.07 83.78 17.85 1306.32
qps = 8 1 200 27.19 201995 1200 7.36 44.14 7473.44 5291.91 6160.55 9596.56 1190.36 1040.63 4418.66 1188.33 47.61 20815.23
qps = 2 TCP 200 98.76 201995 1200 2.03 12.15 2057.42 207.22 160.81 511.01 22.17 16.59 94.96 22.15 16.59 181.82
qps = 4 TCP 200 49.79 201995 1200 4.02 24.10 4081.06 355.43 252.63 1554.91 40.15 16.92 314.28 40.09 16.66 708.50
qps = 6 TCP 200 33.49 201995 1200 5.97 35.83 6067.71 907.74 514.85 3253.93 122.75 45.51 648.40 122.56 18.09 2282.92
qps = 8 TCP 200 28.39 201995 1200 7.04 42.26 7156.09 6714.57 7885.09 11787.51 1116.06 408.32 4645.25 1114.29 46.87 21898.03

Varying input length (tp = 4, qps = 2, output length =6)

Setting num_rdma_nic Successful Requests Duration (s) Total Input Tokens Total Generated Tokens Req Throughput (req/s) Output Token Throughput (tok/s) Total Token Throughput (tok/s) Mean TTFT (ms) Median TTFT (ms) P99 TTFT (ms) Mean TPOT (ms) Median TPOT (ms) P99 TPOT (ms) Mean ITL (ms) Median ITL (ms) P99 ITL (ms)
1024 2 200 98.77 201995 1200 2.02 12.15 2057.32 195.47 151.55 482.84 22.83 19.27 96.55 22.81 18.12 158.16
2048 2 200 99.22 406707 1200 2.02 12.09 4110.95 723.76 488.67 2941.96 67.25 18.93 632.73 67.20 17.49 1209.54
4096 2 200 117.42 818415 1200 1.70 10.22 6979.90 14616.48 18323.82 23191.04 8042.84 7593.16 19851.11 8040.02 65.43 93511.26
8192 2 200 247.77 1636065 1200 0.81 4.84 6608.10 75783.36 79331.60 147544.42 16961.27 15140.11 39278.98 16958.32 90.01 186151.61
1024 1 200 98.77 201995 1200 2.02 12.15 2057.31 201.77 161.53 473.44 22.13 16.52 96.18 22.11 16.51 190.40
2048 1 200 99.25 406707 1200 2.02 12.09 4109.96 719.43 482.02 3208.13 61.92 17.64 681.26 61.86 16.83 978.90
4096 1 200 111.88 818415 1200 1.79 10.73 7326.16 20362.10 22807.05 31853.55 5915.16 4521.51 18739.12 5913.18 67.03 81600.29
8192 1 200 270.01 1636065 1200 0.74 4.44 6063.79 103355.40 106546.65 172025.11 12894.35 11027.66 35110.13 12892.85 64.84 151774.68
1024 TCP 200 98.81 201995 1200 2.02 12.14 2056.44 203.32 160.83 460.90 21.81 16.96 95.27 21.78 16.91 171.80
2048 TCP 200 99.27 406707 1200 2.01 12.09 4108.98 731.60 484.78 3213.69 68.55 17.88 639.93 68.49 17.33 1257.45
4096 TCP 200 118.37 818415 1200 1.69 10.14 6923.89 23735.69 27101.97 36573.47 6386.62 5102.00 20032.26 6384.71 69.57 92811.27
8192 TCP 200 278.12 1636065 1200 0.72 4.31 5886.95 106873.23 109941.33 179781.64 13360.87 12155.24 36022.96 13359.20 68.01 156716.38