Skip to content

Conversation

ServeurpersoCom
Copy link
Collaborator

Add generic fallback to detect trailing tags in Jinja templates and handle forced-open reasoning blocks :

  • Detect trailing tags in generic chat templates, trim whitespace, and either append the closing tag or mark the reasoning block as forced-open based on enable_thinking
  • Added a regression test covering a fallback template that opens the reasoning block in the prompt and verifies prompt differences, forced-open behaviour, and reasoning parsing
  • Now compatible with models using the default Jinja chat template, such as https://huggingface.co/unsloth/GLM-Z1-32B-0414-GGUF

Make sure to read the contributing guidelines before submitting a PR

…mplates and handle forced-open reasoning blocks

- Detect trailing <think> tags in generic chat templates, trim whitespace, and either append
  the closing tag or mark the reasoning block as forced-open based on enable_thinking
- Added a regression test covering a fallback template that opens the reasoning block in the
  prompt and verifies prompt differences, forced-open behaviour, and reasoning parsing
- Now compatible with models using the default Jinja chat template, such as
  https://huggingface.co/unsloth/GLM-Z1-32B-0414-GGUF
@github-actions github-actions bot added the testing Everything test related label Oct 4, 2025
@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 4, 2025

(llama-server --jinja)

  • Before fix "reasoning_format":"auto" ; "stream":false:
(root|~/llama.cpp.pascal) curl -s -N https://www.serveurperso.com/ia/v1/chat/completions -H "Content-Type: application/json" -d '{"messages":[{"role":"system","content":"Tu es un assistant utile."},{"role":"user","content":"Salut"}],"stream":false,"model":"GLM-Z1-32B-0414","reasoning_format":"auto","temperature":0.8,"max_tokens":-1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"xtc_probability":0,"xtc_threshold":0.1,"typ_p":1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"samplers":["edkypmxt"],"timings_per_token":true}'
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Okay, the user said \"Salut\", which is French for \"Hello\". I need to respond in a friendly manner.\n\nSince they used French, maybe they want the response in French too. Let me check the history. The initial prompt was in French, but the user's previous message was \"Salut\". Wait, the user message starts with \"Tu es un assistant utile.\" which is French. Then they wrote \"Salut\". So perhaps they expect a French response.\n\nBut looking at the assistant's response, it's in English. The user set the initial instruction in French, so maybe they prefer French. However, sometimes people switch languages. Let me confirm. The user's messages are in French, so I should reply in French to be consistent.\n\nSo, the correct response would be in French: \"Salut ! Comment puis-je t'aider aujourd'hui ?\" That's friendly and helpful. Make sure the greeting matches and the offer to assist is clear.\n</think>\nSalut ! comment puis-je t'aider aujourd'hui ? 😊"}}],"created":1759601227,"model":"GLM-Z1-32B-0414","system_fingerprint":"b6699-3fd608ce","object":"chat.completion","usage":{"completion_tokens":216,"prompt_tokens":19,"total_tokens":235},"id":"chatcmpl-8IbqU9EHbs3gq1r4tysaPBq4SIoJe8Az","timings":{"cache_n":14,"prompt_n":5,"prompt_ms":41.655,"prompt_per_token_ms":8.331,"prompt_per_second":120.03360941063498,"predicted_n":216,"predicted_ms":4130.0,"predicted_per_token_ms":19.12037037037037,"predicted_per_second":52.300242130750604}}
  • Before fix "reasoning_format":"none" "stream":false :
(root|~/llama.cpp.pascal) curl -s -N https://www.serveurperso.com/ia/v1/chat/completions -H "Content-Type: application/json" -d '{"messages":[{"role":"system","content":"Tu es un assistant utile."},{"role":"user","content":"Salut"}],"stream":false,"model":"GLM-Z1-32B-0414","reasoning_format":"none","temperature":0.8,"max_tokens":-1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"xtc_probability":0,"xtc_threshold":0.1,"typ_p":1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"samplers":["edkypmxt"],"timings_per_token":true}'
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Okay, the user greeted me with \"Salut\" which is French for \"Hello\". I should respond in French to keep the conversation natural. I'll say \"Salut ! Comment puis-je t'aider aujourd'hui ?\" which means \"Hello! How can I assist you today?\" That's friendly and open for them to ask for help. Let me double-check the spelling to make sure there are no mistakes. Yep, looks good. I'll go with that.\n</think>\nSalut ! Comment puis-je t'aider aujourd'hui ?"}}],"created":1759601236,"model":"GLM-Z1-32B-0414","system_fingerprint":"b6699-3fd608ce","object":"chat.completion","usage":{"completion_tokens":113,"prompt_tokens":19,"total_tokens":132},"id":"chatcmpl-gvxkBcwrkMukqSyBsVwCBAkJVgQTEzDC","timings":{"cache_n":18,"prompt_n":1,"prompt_ms":19.573,"prompt_per_token_ms":19.573,"prompt_per_second":51.090788330863944,"predicted_n":113,"predicted_ms":2150.993,"predicted_per_token_ms":19.035336283185842,"predicted_per_second":52.53387621438099}}
  • After fix "reasoning_format":"auto" "stream":false:
(root|~/llama.cpp.pascal) curl -s -N https://www.serveurperso.com/ia/v1/chat/completions -H "Content-Type: application/json" -d '{"messages":[{"role":"system","content":"Tu es un assistant utile."},{"role":"user","content":"Salut"}],"stream":false,"model":"GLM-Z1-32B-0414","reasoning_format":"auto","temperature":0.8,"max_tokens":-1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"xtc_probability":0,"xtc_threshold":0.1,"typ_p":1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"samplers":["edkypmxt"],"timings_per_token":true}'
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","reasoning_content":"Okay, the user said \"Salut\", which is French for \"Hi\". I should respond in a friendly manner. Let me greet them back and ask how I can assist them. Keep it simple and welcoming. Maybe \"Salut ! Comment puis-je vous aider aujourd'hui ?\" That's \"Hi! How can I assist you today?\" in French. They might expect a French response since they used French. Let me make sure the translation is correct. Yes, that works. Keep the tone polite and open-ended.","content":"Salut ! Comment puis-je vous aider aujourd'hui ?"}}],"created":1759605045,"model":"GLM-Z1-32B-0414","system_fingerprint":"b6699-3fd608ce","object":"chat.completion","usage":{"completion_tokens":122,"prompt_tokens":19,"total_tokens":141},"id":"chatcmpl-CyUbsLJowfBgwu2oJsTmR8uYEVzTAI41","timings":{"cache_n":14,"prompt_n":5,"prompt_ms":42.02,"prompt_per_token_ms":8.404,"prompt_per_second":118.99095668729176,"predicted_n":122,"predicted_ms":2321.744,"predicted_per_token_ms":19.030688524590165,"predicted_per_second":52.54670626908048}}
  • After fix "reasoning_format":"none" "stream":false :
(root|~/llama.cpp.pascal) curl -s -N https://www.serveurperso.com/ia/v1/chat/completions -H "Content-Type: application/json" -d '{"messages":[{"role":"system","content":"Tu es un assistant utile."},{"role":"user","content":"Salut"}],"stream":false,"model":"GLM-Z1-32B-0414","reasoning_format":"none","temperature":0.8,"max_tokens":-1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"xtc_probability":0,"xtc_threshold":0.1,"typ_p":1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"samplers":["edkypmxt"],"timings_per_token":true}'
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Okay, the user said \"Salut\" which is French for \"Hi.\" I should respond in French to keep the conversation consistent.\n\nLet me make sure to use a friendly and welcoming tone. Maybe say \"Salut ! Comment puis-je t'aider aujourd'hui ?\" which means \"Hi! How can I assist you today?\"\n\nI should double-check the spelling and grammar to ensure it's correct. Also, keeping it simple and straightforward would be best.\n</think>\nSalut ! Comment puis-je t'aider aujourd'hui ?"}}],"created":1759605067,"model":"GLM-Z1-32B-0414","system_fingerprint":"b6699-3fd608ce","object":"chat.completion","usage":{"completion_tokens":110,"prompt_tokens":19,"total_tokens":129},"id":"chatcmpl-LBffKWe3rnTJlz0u5ytmlYMXesZX9UKu","timings":{"cache_n":18,"prompt_n":1,"prompt_ms":27.242,"prompt_per_token_ms":27.242,"prompt_per_second":36.708024374128186,"predicted_n":110,"predicted_ms":2091.538,"predicted_per_token_ms":19.01398181818182,"predicted_per_second":52.592876629542474}}

Works also with "stream": true together with streaming-aware parser #16394

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant