You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The get_result_message_simple currently validates function call arguments before calling the tool_fn and returns a generic reject_arguments_prompt (The previous tool call included unexpected arguments or argument types. Please try again with the correct arguments, or attempt a different action) if the validation fails. This causes degradation in performance of llm agents on tasks since it doesn't have information on how to correct the previous submission.
For example, given task instructions "Give your answer as a json object with a single key, "answer", whose value an integer.", llm produces a function call:
submit:
{
"answer": 80
}
instead of
submit:
{
"submission": "{\"answer\": 20}"
}
If the agent were given a more detailed message, such as "unknown submit function argument answer" or "missing required submit function argument submission" then the agent could self-correct. Currently most of the time the agent gets stuck in the loop eventually running out of token budged.
Important question is whether this would give an unfair advantage to the agent over a human performing the same task. I would argue that it will not when considered in the following scenarios:
Human is submitting results in the same way as llm agent, e.g. by writing submit: { "submission": "{\"answer.... In this scenario, IMO human would struggle as much as the llm agent given the non-informative feedback.
Human is submitting results via cli submit "{\"answer... since if the human mistyped the submit command they would be given an informative feedback allowing them to correct the submission.
P.S. Looking at the get_result_message_simple code, I don't see where presence of required arguments is validated.
The text was updated successfully, but these errors were encountered:
The get_result_message_simple currently validates function call arguments before calling the
tool_fn
and returns a genericreject_arguments_prompt
(The previous tool call included unexpected arguments or argument types. Please try again with the correct arguments, or attempt a different action) if the validation fails. This causes degradation in performance of llm agents on tasks since it doesn't have information on how to correct the previous submission.For example, given task instructions "Give your answer as a json object with a single key, "answer", whose value an integer.", llm produces a function call:
instead of
If the agent were given a more detailed message, such as "unknown submit function argument answer" or "missing required submit function argument submission" then the agent could self-correct. Currently most of the time the agent gets stuck in the loop eventually running out of token budged.
Important question is whether this would give an unfair advantage to the agent over a human performing the same task. I would argue that it will not when considered in the following scenarios:
submit: { "submission": "{\"answer...
. In this scenario, IMO human would struggle as much as the llm agent given the non-informative feedback.submit "{\"answer...
since if the human mistyped thesubmit
command they would be given an informative feedback allowing them to correct the submission.P.S. Looking at the get_result_message_simple code, I don't see where presence of required arguments is validated.
The text was updated successfully, but these errors were encountered: