Chiron - Part 7 Tool Use #72
nduartech
announced in
Blog Posts
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
slug: chiron-fc
description: Attempting to have Chiron perform some basic tasks
published: 2024-11-24
After getting the speech-to-text and, text-to-speech, and basic chat components working smoothly, I dove into what turned out to be the most challenging aspect of Chiron: implementing tool use. This seemingly straightforward feature proved to be quite the stumbling block, especially when working with smaller language models.
The core issue I discovered is that these models seem to struggle with multi-step tool use. While they can handle one-shot tool calls reasonably well, asking them to chain multiple tool calls together or use tools after an initial call leads to increasingly unreliable results. This limitation became apparent across various models I tested, suggesting it's not just a quirk of one particular implementation.
To work around these limitations, I implemented a few strategic compromises:
Request Classification
I use the LLM to classify each incoming request, with a simple single-token binary output of either 'tool' or 'chat'. This not only helps determine whether a request requires tool use, but also enables streaming responses when tools aren't needed. By constraining the classification to a single token output, the system can quickly decide whether to proceed with tool use or fall back to standard chat behavior.
Capability Checking
Before attempting to use any tools, the system performs a second classification to determine if the request matches either current or planned capabilities, once again outputting only a single token: either 'YES' or 'NO'. This helps prevent the model from attempting actions it can't actually perform, reducing the likelihood of hallucinated tool calls.
Simplified Tool Design
Rather than creating complex, multi-purpose tools, I opted for writing more detailed tools for simpler tasks. While this means more individual tools, it helps the model make more accurate decisions about which tool to use and when.
Functions added so far:
get_today_day_date_time - Retrieves both the day as well as the datetime in the format of 'MM-DD-YYYY HH:MM:SS' currently
get_time_till - Retrieves the hours, minutes, and seconds until the specified time
get_time_until_holiday - Retrieves the years, months, days, hours, minutes, and seconds until the specified US federal holiday
My initial ambitions included implementing internet search capabilities, RAG (Retrieval-Augmented Generation), and IOT device integration. While these features remain on the roadmap, I decided to pause development after achieving a basic proof of concept. The current implementation, though limited, demonstrates that it's possible to build a privacy-focused AI assistant that runs entirely locally on consumer hardware.
Looking ahead, there's clearly room for improvement and expansion. The challenges I've encountered with tool use in smaller models might be addressed through further experimentation or eventually upgrading to more capable hardware, and I still have to add a cancel button that stops chat and audio generation (though I did however add a system tray icon that allows the user to both toggle the UI display and close the application). For now, I feel I've achieved my primary goal: proving that we can build AI assistants that respect user privacy and run efficiently on local hardware, even if we have to make some compromises along the way. Maybe I'll continue progress on this in the future, but for now fatherhood and the duties it entails beckons.
Beta Was this translation helpful? Give feedback.
All reactions