You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
version: 4600 (553f1e4)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
I use llama with docker-compose.
My docker-compose.yaml is:
services:
llamacpp-server:
image: ghcr.io/ggerganov/llama.cpp:server-cuda
ports:
- 127.0.0.1:8080:8080
volumes:
- ../Llamafile/llamafile/appVolume:/models
environment:
LLAMA_ARG_MODEL: /models/dolphin-2.9.3-mistral-nemo-12b.Q6_K.gguf
LLAMA_ARG_CTX_SIZE: 17000
LLAMA_ARG_N_GPU_LAYERS: 9999
LLAMA_ARG_HOST: 0.0.0.0
LLAMA_ARG_PORT: 8080
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Problem description & steps to reproduce
If I try to force a grammar through the REST parameter "grammar", it is just ignored:
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "grammar": "answer ::= 'yes' " }' http://localhost:8080/completion
Response:
{"index":0,"content":"A hand has 5 fingers.","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":8,"tokens_evaluated":23,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":17024,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"answer ::= yes ","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<s><|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":30,"timings":{"prompt_n":1,"prompt_ms":18.504,"prompt_per_token_ms":18.504,"prompt_per_second":54.04236921746649,"predicted_n":8,"predicted_ms":109.862,"predicted_per_token_ms":13.73275,"predicted_per_second":72.81862700478783}}
On the other hand, the json_schema seems to work fine.
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "json_schema": { "type": "number" } }' http://localhost:8080/completion
Response:
{"index":0,"content":"5","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":2,"tokens_evaluated":23,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":17024,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"decimal-part ::= [0-9]{1,16}\nintegral-part ::= [0] | [1-9] [0-9]{0,15}\nroot ::= (\"-\"? integral-part) (\".\" decimal-part)? ([eE] [-+]? integral-part)? space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\n","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<s><|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":24,"timings":{"prompt_n":1,"prompt_ms":35.208,"prompt_per_token_ms":35.208,"prompt_per_second":28.402635764598955,"predicted_n":2,"predicted_ms":25.512,"predicted_per_token_ms":12.756,"predicted_per_second":78.3944810285356}}
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered:
the string literal must be enclosed in double quotes.
Here is a fixed version of your example: curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "grammar": "root ::= \"yes\" " }' http://localhost:8080/completion
Name and Version
version: 4600 (553f1e4)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
If I try to force a grammar through the REST parameter "grammar", it is just ignored:
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "grammar": "answer ::= 'yes' " }' http://localhost:8080/completion
Response:
{"index":0,"content":"A hand has 5 fingers.","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":8,"tokens_evaluated":23,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":17024,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"answer ::= yes ","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<s><|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":30,"timings":{"prompt_n":1,"prompt_ms":18.504,"prompt_per_token_ms":18.504,"prompt_per_second":54.04236921746649,"predicted_n":8,"predicted_ms":109.862,"predicted_per_token_ms":13.73275,"predicted_per_second":72.81862700478783}}
On the other hand, the json_schema seems to work fine.
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "<|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n", "json_schema": { "type": "number" } }' http://localhost:8080/completion
Response:
{"index":0,"content":"5","tokens":[],"id_slot":0,"stop":true,"model":"gpt-3.5-turbo","tokens_predicted":2,"tokens_evaluated":23,"generation_settings":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":17024,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"decimal-part ::= [0-9]{1,16}\nintegral-part ::= [0] | [1-9] [0-9]{0,15}\nroot ::= (\"-\"? integral-part) (\".\" decimal-part)? ([eE] [-+]? integral-part)? space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\n","grammar_trigger_tokens":[],"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<s><|im_start|>system\n<|im_end|>\n<|im_start|>user\nhow many fingers does a hand have?<|im_end|>\n<|im_start|>assistant\n","has_new_line":false,"truncated":false,"stop_type":"eos","stopping_word":"","tokens_cached":24,"timings":{"prompt_n":1,"prompt_ms":35.208,"prompt_per_token_ms":35.208,"prompt_per_second":28.402635764598955,"predicted_n":2,"predicted_ms":25.512,"predicted_per_token_ms":12.756,"predicted_per_second":78.3944810285356}}
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: