-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Closed
Labels
Description
Name and Version
llama.cpp version: release b5589
Command Line:
./build/bin/llama-cli -m ../models/Mistral-Nemo-Instruct-2407/Mistral-Nemo-12B-Instruct-2407-F16.gguf -c 1024
Log events:
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Failed to infer a tool call example (possible template bug)
Tool call test:
Server:
./build/bin/llama-server -m ../models/Mistral-Nemo-Instruct-2407/Mistral-Nemo-12B-Instruct-2407-F16.gguf -c 1024 --host 0.0.0.0 --port 18087 --jinja
Request:
curl http://192.168.253.167:18087/v1/chat/completions -d '{
"model": "gpt-3.5-turbo",
"tools": [
{
"type": "function",
"function": {
"name": "python",
"description": "Runs code in an ipython interpreter and returns the result of the execution after 60 seconds.",
"parameters": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "The code to run in the ipython interpreter."
}
},
"required": ["code"]
}
}
}
],
"messages": [
{
"role": "user",
"content": "Print a hello world message with python."
}
]
}' | jq .choices
Results:
[
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
}
}
]
Last working version
The last working version is b4738 (https://github.com/ggml-org/llama.cpp/releases/tag/b4738)
Result:
[
{
"finish_reason": "tool_calls",
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"type": "function",
"function": {
"name": "python",
"arguments": "{\"code\":\"print('Hello, World!')'}}]```python\\nprint('Hello, World!')\\n```\\nHello, World!```\"}"
},
"id": "c9a2248d9"
}
]
}
}
]
Operating systems
Linux
GGML backends
CPU
Hardware
AMD Ryzen 5 4500
Models
Mistral Nemo Instruct 2407
Problem description & steps to reproduce
The Mistral NeMo model does not support tool calling functionality with the current llama.cpp/server version.
First Bad Commit
tool-call: refactor common chat / tool-call api (+ tests / fixes) (https://github.com/ggml-org/llama.cpp/pull/11900)
* tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type
* addressed clang-tidy lints in [test-]chat.*
* rm minja deps from util & common & move it to common/minja/
* add name & tool_call_id to common_chat_msg
* add common_chat_tool
* added json <-> tools, msgs conversions to chat.h
* fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens)
* fix deepseek r1 slow test (no longer <think> opening w/ new template)
* allow empty tools w/ auto + grammar
* fix & test server grammar & json_schema params w/ & w/o --jinja
Release b4739 (https://github.com/ggml-org/llama.cpp/releases/tag/b4739)
Relevant log output
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Failed to infer a tool call example (possible template bug)