Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: server: tests python env on github container ubuntu latest / fix n_predict #6935

Merged
merged 3 commits into from
Apr 27, 2024

Conversation

phymbert
Copy link
Collaborator

@phymbert phymbert commented Apr 26, 2024

Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 431 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=10949.34ms p(95)=28697.81ms fails=, finish reason: stop=374 truncated=57
  • Prompt processing (pp): avg=114.27tk/s p(95)=509.19tk/s
  • Token generation (tg): avg=25.81tk/s p(95)=35.16tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=hp/ci/server/fix-python commit=a3764f8f0474ecb3ae56b550bab9d18fcc9b4cb9

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 431 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1714170824 --> 1714171460
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 337.13, 337.13, 337.13, 337.13, 337.13, 371.9, 371.9, 371.9, 371.9, 371.9, 331.51, 331.51, 331.51, 331.51, 331.51, 353.87, 353.87, 353.87, 353.87, 353.87, 398.42, 398.42, 398.42, 398.42, 398.42, 412.2, 412.2, 412.2, 412.2, 412.2, 414.4, 414.4, 414.4, 414.4, 414.4, 427.41, 427.41, 427.41, 427.41, 427.41, 456.83, 456.83, 456.83, 456.83, 456.83, 458.47, 458.47, 458.47, 458.47, 458.47, 474.35, 474.35, 474.35, 474.35, 474.35, 479.56, 479.56, 479.56, 479.56, 479.56, 502.21, 502.21, 502.21, 502.21, 502.21, 520.0, 520.0, 520.0, 520.0, 520.0, 525.53, 525.53, 525.53, 525.53, 525.53, 536.84, 536.84, 536.84, 536.84, 536.84, 518.42, 518.42, 518.42, 518.42, 518.42, 526.2, 526.2, 526.2, 526.2, 526.2, 529.04, 529.04, 529.04, 529.04, 529.04, 530.48, 530.48, 530.48, 530.48, 530.48, 547.38, 547.38, 547.38, 547.38, 547.38, 550.43, 550.43, 550.43, 550.43, 550.43, 551.27, 551.27, 551.27, 551.27, 551.27, 551.46, 551.46, 551.46, 551.46, 551.46, 558.62, 558.62, 558.62, 558.62, 558.62, 562.43, 562.43, 562.43, 562.43, 562.43, 565.77, 565.77, 565.77, 565.77, 565.77, 534.58, 534.58, 534.58, 534.58, 534.58, 538.32, 538.32, 538.32, 538.32, 538.32, 540.76, 540.76, 540.76, 540.76, 540.76, 541.86, 541.86, 541.86, 541.86, 541.86, 541.09, 541.09, 541.09, 541.09, 541.09, 542.94, 542.94, 542.94, 542.94, 542.94, 545.24, 545.24, 545.24, 545.24, 545.24, 549.42, 549.42, 549.42, 549.42, 549.42, 552.91, 552.91, 552.91, 552.91, 552.91, 553.09, 553.09, 553.09, 553.09, 553.09, 555.97, 555.97, 555.97, 555.97, 555.97, 558.96, 558.96, 558.96, 558.96, 558.96, 569.19, 569.19, 569.19, 569.19, 569.19, 566.88, 566.88, 566.88, 566.88, 566.88, 569.41, 569.41, 569.41, 569.41, 569.41, 575.82, 575.82, 575.82, 575.82, 575.82, 576.75, 576.75, 576.75, 576.75, 576.75, 577.01, 577.01, 577.01, 577.01, 577.01, 577.64, 577.64, 577.64, 577.64, 577.64, 581.12, 581.12, 581.12, 581.12, 581.12, 583.98, 583.98, 583.98, 583.98, 583.98, 584.49, 584.49, 584.49, 584.49, 584.49, 585.44, 585.44, 585.44, 585.44, 585.44, 586.25, 586.25, 586.25, 586.25, 586.25, 579.97, 579.97, 579.97, 579.97, 579.97, 579.55, 579.55, 579.55, 579.55, 579.55, 579.26, 579.26, 579.26, 579.26, 579.26, 578.6, 578.6, 578.6, 578.6, 578.6, 577.66, 577.66, 577.66, 577.66, 577.66, 579.12, 579.12, 579.12, 579.12, 579.12, 581.5, 581.5, 581.5, 581.5, 581.5, 583.14, 583.14, 583.14, 583.14, 583.14, 583.75, 583.75, 583.75, 583.75, 583.75, 583.74, 583.74, 583.74, 583.74]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 431 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1714170824 --> 1714171460
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.94, 33.94, 33.94, 33.94, 33.94, 33.48, 33.48, 33.48, 33.48, 33.48, 26.59, 26.59, 26.59, 26.59, 26.59, 25.88, 25.88, 25.88, 25.88, 25.88, 25.34, 25.34, 25.34, 25.34, 25.34, 24.95, 24.95, 24.95, 24.95, 24.95, 24.92, 24.92, 24.92, 24.92, 24.92, 26.01, 26.01, 26.01, 26.01, 26.01, 25.96, 25.96, 25.96, 25.96, 25.96, 25.88, 25.88, 25.88, 25.88, 25.88, 25.45, 25.45, 25.45, 25.45, 25.45, 25.33, 25.33, 25.33, 25.33, 25.33, 25.31, 25.31, 25.31, 25.31, 25.31, 25.15, 25.15, 25.15, 25.15, 25.15, 24.64, 24.64, 24.64, 24.64, 24.64, 24.66, 24.66, 24.66, 24.66, 24.66, 24.14, 24.14, 24.14, 24.14, 24.14, 23.38, 23.38, 23.38, 23.38, 23.38, 23.45, 23.45, 23.45, 23.45, 23.45, 23.72, 23.72, 23.72, 23.72, 23.72, 23.71, 23.71, 23.71, 23.71, 23.71, 23.22, 23.22, 23.22, 23.22, 23.22, 23.2, 23.2, 23.2, 23.2, 23.2, 22.79, 22.79, 22.79, 22.79, 22.79, 22.68, 22.68, 22.68, 22.68, 22.68, 22.59, 22.59, 22.59, 22.59, 22.59, 22.75, 22.75, 22.75, 22.75, 22.75, 22.74, 22.74, 22.74, 22.74, 22.74, 23.02, 23.02, 23.02, 23.02, 23.02, 23.06, 23.06, 23.06, 23.06, 23.06, 23.18, 23.18, 23.18, 23.18, 23.18, 23.04, 23.04, 23.04, 23.04, 23.04, 22.96, 22.96, 22.96, 22.96, 22.96, 22.79, 22.79, 22.79, 22.79, 22.79, 22.95, 22.95, 22.95, 22.95, 22.95, 23.1, 23.1, 23.1, 23.1, 23.1, 23.14, 23.14, 23.14, 23.14, 23.14, 23.3, 23.3, 23.3, 23.3, 23.3, 23.33, 23.33, 23.33, 23.33, 23.33, 23.23, 23.23, 23.23, 23.23, 23.23, 23.23, 23.23, 23.23, 23.23, 23.23, 23.07, 23.07, 23.07, 23.07, 23.07, 22.79, 22.79, 22.79, 22.79, 22.79, 22.63, 22.63, 22.63, 22.63, 22.63, 22.65, 22.65, 22.65, 22.65, 22.65, 22.69, 22.69, 22.69, 22.69, 22.69, 22.76, 22.76, 22.76, 22.76, 22.76, 22.84, 22.84, 22.84, 22.84, 22.84, 22.9, 22.9, 22.9, 22.9, 22.9, 22.97, 22.97, 22.97, 22.97, 22.97, 22.91, 22.91, 22.91, 22.91, 22.91, 22.87, 22.87, 22.87, 22.87, 22.87, 22.43, 22.43, 22.43, 22.43, 22.43, 22.4, 22.4, 22.4, 22.4, 22.4, 22.28, 22.28, 22.28, 22.28, 22.28, 21.72, 21.72, 21.72, 21.72, 21.72, 20.87, 20.87, 20.87, 20.87, 20.87, 20.89, 20.89, 20.89, 20.89, 20.89, 20.88, 20.88, 20.88, 20.88, 20.88, 20.89, 20.89, 20.89, 20.89, 20.89, 20.92, 20.92, 20.92, 20.92]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 431 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1714170824 --> 1714171460
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.14, 0.14, 0.14, 0.14, 0.14, 0.42, 0.42, 0.42, 0.42, 0.42, 0.18, 0.18, 0.18, 0.18, 0.18, 0.21, 0.21, 0.21, 0.21, 0.21, 0.2, 0.2, 0.2, 0.2, 0.2, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.13, 0.13, 0.13, 0.13, 0.13, 0.3, 0.3, 0.3, 0.3, 0.3, 0.24, 0.24, 0.24, 0.24, 0.24, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.31, 0.31, 0.31, 0.31, 0.31, 0.27, 0.27, 0.27, 0.27, 0.27, 0.28, 0.28, 0.28, 0.28, 0.28, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.08, 0.08, 0.08, 0.08, 0.08, 0.14, 0.14, 0.14, 0.14, 0.14, 0.28, 0.28, 0.28, 0.28, 0.28, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.24, 0.24, 0.24, 0.24, 0.24, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24, 0.24, 0.24, 0.24, 0.24, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.33, 0.33, 0.33, 0.33, 0.33, 0.43, 0.43, 0.43, 0.43, 0.43, 0.48, 0.48, 0.48, 0.48, 0.48, 0.52, 0.52, 0.52, 0.52, 0.52, 0.53, 0.53, 0.53, 0.53, 0.53, 0.59, 0.59, 0.59, 0.59, 0.59, 0.46, 0.46, 0.46, 0.46, 0.46, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22, 0.22, 0.22, 0.22, 0.24, 0.24, 0.24, 0.24, 0.24, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 431 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1714170824 --> 1714171460
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0]
                    
Loading

@slaren
Copy link
Collaborator

slaren commented Apr 27, 2024

It looks like the server CI is still failing.

@phymbert
Copy link
Collaborator Author

That's because it's a pull request target event.

It needs to be merged first to master branch.

Tested on my fork, see the summary, and I am pretty confident this time :)

@phymbert phymbert merged commit b736833 into master Apr 27, 2024
61 of 64 checks passed
@phymbert phymbert deleted the hp/ci/server/fix-python branch April 27, 2024 15:50
nopperl pushed a commit to nopperl/llama.cpp that referenced this pull request May 5, 2024
…n_predict (ggerganov#6935)

* ci: server: fix python env

* ci: server: fix server tests after ggerganov#6638

* ci: server: fix windows is not building PR branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants