Misc. bug: server api endpoint /completion ignoring grammar parameter #11544

norteo · 2025-01-31T12:49:30Z

Name and Version

version: 4600 (553f1e4)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

I use llama with docker-compose.
My docker-compose.yaml is:

services:
  llamacpp-server:
    image: ghcr.io/ggerganov/llama.cpp:server-cuda
    ports:
      - 127.0.0.1:8080:8080
    volumes:
      - ../Llamafile/llamafile/appVolume:/models
    environment:
      LLAMA_ARG_MODEL: /models/dolphin-2.9.3-mistral-nemo-12b.Q6_K.gguf
      LLAMA_ARG_CTX_SIZE: 17000
      LLAMA_ARG_N_GPU_LAYERS: 9999
      LLAMA_ARG_HOST: 0.0.0.0
      LLAMA_ARG_PORT: 8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Problem description & steps to reproduce

If I try to force a grammar through the REST parameter "grammar", it is just ignored:

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "grammar": "answer ::= 'yes' " }' http://localhost:8080/completion

Response:

{"index":0,"content":"A hand has 5 fingers.","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":8,"tokens_evaluated":23,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":17024,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"answer ::= yes ","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<s><|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":30,"timings":{"prompt_n":1,"prompt_ms":18.504,"prompt_per_token_ms":18.504,"prompt_per_second":54.04236921746649,"predicted_n":8,"predicted_ms":109.862,"predicted_per_token_ms":13.73275,"predicted_per_second":72.81862700478783}}

On the other hand, the json_schema seems to work fine.

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "json_schema": { "type": "number" } }' http://localhost:8080/completion

Response:

{"index":0,"content":"5","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":2,"tokens_evaluated":23,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":17024,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"decimal-part ::= [0-9]{1,16}\nintegral-part ::= [0] | [1-9] [0-9]{0,15}\nroot ::= (\"-\"? integral-part) (\".\" decimal-part)? ([eE] [-+]? integral-part)? space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\n","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<s><|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":24,"timings":{"prompt_n":1,"prompt_ms":35.208,"prompt_per_token_ms":35.208,"prompt_per_second":28.402635764598955,"predicted_n":2,"predicted_ms":25.512,"predicted_per_token_ms":12.756,"predicted_per_second":78.3944810285356}}

First Bad Commit

No response

Relevant log output

The text was updated successfully, but these errors were encountered:

matteoserva · 2025-01-31T13:08:22Z

There are two problems in your query:

the root rule must be called "root"
the string literal must be enclosed in double quotes.

Here is a fixed version of your example:
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "grammar": "root ::= \"yes\" " }' http://localhost:8080/completion

norteo · 2025-01-31T13:13:43Z

Thanks for the quick response.
I tested what you said and it seems to work fine.
Thanks.

norteo added the bug-unconfirmed label Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: server api endpoint /completion ignoring grammar parameter #11544

Misc. bug: server api endpoint /completion ignoring grammar parameter #11544

norteo commented Jan 31, 2025 •

edited

Loading

matteoserva commented Jan 31, 2025

norteo commented Jan 31, 2025

Misc. bug: server api endpoint /completion ignoring grammar parameter #11544

Misc. bug: server api endpoint /completion ignoring grammar parameter #11544

Comments

norteo commented Jan 31, 2025 • edited Loading

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

matteoserva commented Jan 31, 2025

norteo commented Jan 31, 2025

norteo commented Jan 31, 2025 •

edited

Loading