Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function/tool calling never resolves #2986

Open
2 of 4 tasks
awmartin opened this issue Feb 2, 2025 · 8 comments
Open
2 of 4 tasks

Function/tool calling never resolves #2986

awmartin opened this issue Feb 2, 2025 · 8 comments

Comments

@awmartin
Copy link

awmartin commented Feb 2, 2025

Description

When using the inference client with function calling, models seem to never resolve their calls.

As we know, typically, with the OpenAI pattern, the simplest function/tool call is a series of messages of various roles (system, user, assistant, tool) organized like this:

system → user ("what's the weather?") → assistant (tool_calls) → tool (result: "4ºC") → assistant (content: "it's 4ºC")

The HF docs seem to indicate this is the same pattern, although the messages have some minor differences (e.g. description: null, which never happens with OpenAI). When using the Python inference client, these tool_calls never resolve even after functions are called and their return values are included and seemingly properly referenced. Instead, they look like this:

system → user ("what's the weather?") → assistant (tool_calls) → tool (result: "4ºC") → assistant (tool_calls) …

Instead of returning a text completion, the HF inference client returns the same "assistant" message specifying a required tool_calls. In OpenAI, they resolve to a typical "assistant" message with token content if the function calls have been satisfied and no further calls are required.

Models used that exhibit this behavior:

  • NousResearch/Hermes-3-Llama-3.1-8B
  • Qwen/Qwen2.5-72B-Instruct
  • meta-llama/Meta-Llama-3-8B-Instruct

It's worth noting that Mistral models also error out, specifying that a 9-character alphanumeric string is required for the tool_call_id. Now, the models themselves don't provide such IDs, so we need to supply them ourselves. But even when doing so, the same error occurs, that 9-char identifiers are missing. (e.g. mistralai/Mistral-7B-Instruct-v0.3)

The JavaScript client also fails with the above errors, and also a third: "An error occurred while fetching the blob".

System Info

  • macOS 15.2
  • Python 3.13.1
  • huggingface_hub 0.28.1

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Gist of sample error code is here: https://gist.github.com/awmartin/c64c84fbbdc3a9f0c2ce6e5ae0dab3dc

  1. Provide API token
  2. python inference-tool-calls.py

A message results that's unexpected. I expected this to be a typical message with a string content, something like, "It's 4 degrees today." Instead, it just repeats the assistant message with the original tool_call message:

[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'unit': 'Celsius', 'location': 'Philadelphia, PA, US'}, name='get_current_temperature', description=None), id='0', type='function')]), logprobs=None)]

Expected behavior

I expected a message that resolved to something similar to "It's 4 degrees Celsius today" rather than the tool_call message repeated.

@awmartin
Copy link
Author

awmartin commented Feb 3, 2025

I think I opened this issue in the wrong repo. Moving it to here: huggingface/huggingface_hub#2829

@awmartin awmartin closed this as completed Feb 3, 2025
@awmartin
Copy link
Author

awmartin commented Feb 3, 2025

Reopening as I'm more convinced this is an error with the inference API and not the clients. All the clients (HF JS, HF PY, and OpenAI) fail in the same way.

@awmartin awmartin reopened this Feb 3, 2025
@calycekr
Copy link

calycekr commented Feb 5, 2025

@awmartin TGI's open AI API compatibility is still lacking compared to vllm.

@awmartin
Copy link
Author

awmartin commented Feb 6, 2025

@calycekr Thanks, I'll check it out!

My workaround for this bug(?) is to remove the "tools" definitions from the follow-up chat completion instance that supplies the tool responses/return values. It seems to work for now for short chats, but I suspect there are edge cases that will fail.

Related, I need to do this as well for vision models that accept OpenAI "image_url" messages. When supplying an image_url, tools are always triggered, seemingly randomly, even though the semantics of the prompt have nothing to do with the tool descriptions. Seems like another bug to report, but I'm not sure if HF's intent is to be OpenAI compatible or if the intent is to be able to provide prompts, images, and tools and have them triggered properly, in a more general or more HF-specific sense.

@LikeSundayLikeRain
Copy link

LikeSundayLikeRain commented Feb 7, 2025

I suspect this is because the input message doesn't support tool_calls field, so the model don't know it already generated a tool_call response, so it returns tool_call again.

https://github.com/huggingface/text-generation-inference/blob/main/router/src/lib.rs#L1180

@awmartin
Copy link
Author

awmartin commented Feb 8, 2025

Further description of the problem and my workaround here. These kinds of workarounds will work for simple cases, but when multiple tool calls are required or when images should trigger a tool call, as in OpenAI, they will likely fall short.

@qdrddr
Copy link

qdrddr commented Feb 20, 2025

Would that be of any help the LM Studio has implemented MLX. And here is Anemll ANE library to work with MLX it is MIT Licensed. And there's FastMLX with an Apache 2.0 license.

@awmartin
Copy link
Author

@qdrddr Thanks. I do use LM Studio and MLX models, but I'm not blocked on getting tool calling working in general, I'm hindered by getting it working as well as OpenAI with HF specifically. HF's inference API appears to be broken, as @LikeSundayLikeRain may have found.

The app I'm building isn't macOS-specific, it's web-based, and it's intended to support OpenAI, HF, and arbitrary inference endpoints. So these suggestions certainly may work for local inference setups on macOS like mine, but I haven't yet tested tool calls on them as extensively yet.

But if the MLX implementation serves as a clue for how to help resolve this bug in HF, that's great. Tool behaviors are highly model-dependent, but this bug may hinder the correct behavior even if the model responds properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants