Skip to content

CUDA non-determinism on identical requests #2838

@phiharri

Description

@phiharri

When layers are offloaded with CUDA, sending identical requests to the examples/server completion API returns a different response the "first time":

$ for x in `seq 5`; do curl -s -X POST --url 'http://miku:8080/completion' --data '{"prompt":"Some random words:","n_predict":50,"seed":1337}' | jq '.content' ; done
" hollow, glowing, tinkling, crunchy, sinking"
" apartment, blouse, bobby, carousel"
" apartment, blouse, bobby, carousel"
" apartment, blouse, bobby, carousel"
" apartment, blouse, bobby, carousel"

This seems cache related as responses then remain the same until a different prompt is processed, after which the differing first response occurs again:

$ curl -s -X POST --url 'http://miku:8080/completion' --data '{"prompt":"Building a website is as simple as","n_predict":0}' >/dev/null

$ for x in `seq 5`; do curl -s -X POST --url 'http://miku:8080/completion' --data '{"prompt":"Some random words:","n_predict":50,"seed":1337}' | jq '.content' ; done
" hollow, glowing, tinkling, crunchy, sinking"
" apartment, blouse, bobby, carousel"
" apartment, blouse, bobby, carousel"
..

Expected Behaviour

Output should remain the same when parameters and seed are constant.

Other Observations

  • Not observed with Metal offload.
  • Not observed without CUDA offload (interestingly neither with small n-gpu-layers, eg. CodeLlama-34b shows this behaviour with -ngl 3 but not with -ngl 2).
  • Behaviour observed with both non-K and K-quants.
  • The first response ("hollow, glowing.." above) is what examples/main returns with the same parameters.

Environment

  • Verified behaviour on latest master commit, compiled with LLAMA_CUBLAS=1 make -j
  • Linux 5.15.0-79-generic x86_64
  • NVIDIA 535.86.05
  • CUDA 12.2
  • Python 3.10.12
  • GNU Make 4.3
  • g++ 11.4.0

Thanks for reading! 😎

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions