Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc. bug: server api endpoint /completion ignoring grammar parameter #11544

Open
norteo opened this issue Jan 31, 2025 · 2 comments
Open

Misc. bug: server api endpoint /completion ignoring grammar parameter #11544

norteo opened this issue Jan 31, 2025 · 2 comments

Comments

@norteo
Copy link

norteo commented Jan 31, 2025

Name and Version

version: 4600 (553f1e4)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

I use llama with docker-compose.
My docker-compose.yaml is:

services:
  llamacpp-server:
    image: ghcr.io/ggerganov/llama.cpp:server-cuda
    ports:
      - 127.0.0.1:8080:8080
    volumes:
      - ../Llamafile/llamafile/appVolume:/models
    environment:
      LLAMA_ARG_MODEL: /models/dolphin-2.9.3-mistral-nemo-12b.Q6_K.gguf
      LLAMA_ARG_CTX_SIZE: 17000
      LLAMA_ARG_N_GPU_LAYERS: 9999
      LLAMA_ARG_HOST: 0.0.0.0
      LLAMA_ARG_PORT: 8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Problem description & steps to reproduce

If I try to force a grammar through the REST parameter "grammar", it is just ignored:

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "grammar": "answer ::= 'yes' " }' http://localhost:8080/completion

Response:

{"index":0,"content":"A hand has 5 fingers.","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":8,"tokens_evaluated":23,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":17024,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"answer ::= yes ","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<s><|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":30,"timings":{"prompt_n":1,"prompt_ms":18.504,"prompt_per_token_ms":18.504,"prompt_per_second":54.04236921746649,"predicted_n":8,"predicted_ms":109.862,"predicted_per_token_ms":13.73275,"predicted_per_second":72.81862700478783}}

On the other hand, the json_schema seems to work fine.

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "json_schema": { "type": "number" } }' http://localhost:8080/completion

Response:

{"index":0,"content":"5","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":2,"tokens_evaluated":23,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":17024,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"decimal-part ::= [0-9]{1,16}\nintegral-part ::= [0] | [1-9] [0-9]{0,15}\nroot ::= (\"-\"? integral-part) (\".\" decimal-part)? ([eE] [-+]? integral-part)? space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\n","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<s><|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":24,"timings":{"prompt_n":1,"prompt_ms":35.208,"prompt_per_token_ms":35.208,"prompt_per_second":28.402635764598955,"predicted_n":2,"predicted_ms":25.512,"predicted_per_token_ms":12.756,"predicted_per_second":78.3944810285356}}

First Bad Commit

No response

Relevant log output

@matteoserva
Copy link
Contributor

There are two problems in your query:

  • the root rule must be called "root"
  • the string literal must be enclosed in double quotes.

Here is a fixed version of your example:
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "grammar": "root ::= \"yes\" " }' http://localhost:8080/completion

@norteo
Copy link
Author

norteo commented Jan 31, 2025

Thanks for the quick response.
I tested what you said and it seems to work fine.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants