Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function calling support using bolt-fc-1b #35

Merged
merged 18 commits into from
Sep 10, 2024

Conversation

adilhafeez
Copy link
Contributor

@adilhafeez adilhafeez commented Aug 8, 2024

This code change adds bolt-fc-1b to bolt gateway. There are several changes that went in to support bolt-fc-1b. Here is a summary of key changes.

  • add gradio based chatbot-ui to interact with bolt gateway
  • add model file to host bolt-fc in ollama
  • there is a readme that explains how to run e2e demo that uses weather_forecast as mode, please review it and make sure you can follow it
  • in the demo ollama is marked manual because ollama runs extremely slow when virtualized in docker
    • primary reason is that docker doesn't support mapping mac gpu directly into docker container. So if you run ollama inside docker you are essentially using cpu
    • but if you have nvidia then docker can map nvidia gpu directly onto docker container
  • in demo pls ignore all the files related to dashboard and prometheus
  • notice the new container called config-generator, its task is to take envoy.template.yaml, katanemo-config and generate final envoy.yaml that can be used by envoy to bootstrap
  • code needs some improvements that I will pick up in next PR
    • add unit tests and add integration tests
    • break big files like stream_context.rs into smaller components so its easy to read and manage
    • remove dependency on qdrant completely and load embeddings into memory

@adilhafeez adilhafeez changed the title fix test Add function calling support using kfc-1b Aug 8, 2024
@adilhafeez adilhafeez changed the title Add function calling support using kfc-1b Add function calling support using bolt-fc-1b Aug 26, 2024
@cotran2 cotran2 self-requested a review August 26, 2024 19:49
function_resolver/app/handler.py Outdated Show resolved Hide resolved
function_resolver/app/template.py Outdated Show resolved Hide resolved
elements += self.handler._format_system(tools)

if message.role == "user":
elements += self.handler._format_user(content=message.content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use dict or a better way to handle this

function_resolver/.vscode/launch.json Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add specific version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should organize using makefile in the future

Converts elements to token ids.
"""
token_ids = []
for elem in elements:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loop would be slow, any chance we can vectorize this?

@cotran2 cotran2 requested a review from nehcgs August 26, 2024 21:40
@nehcgs
Copy link
Contributor

nehcgs commented Aug 26, 2024

@adilhafeez Thanks for making this PR!

Here are some changes I think you need to make:

  1. Remove all redundant files about KFC-1B, including function_resolver/app/handler.py and function_resolver/app/template.py.
  2. Also, you may want to remove everything about NER in demo.
  3. Replace KFC-1B with Bolt-FC-1B. Many files need to be updated.
  4. In function_resolver/app/main.py, replace Bolt-Function-Calling-1B:Q3_K_L to Bolt-Function-Calling-1B:Q4_K_M as a tentative option.
  5. Please check function calling - system design doc to add engineering support for function calling.

@adilhafeez adilhafeez marked this pull request as ready for review September 6, 2024 18:00
Copy link
Collaborator

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments -- I did not read the non rust code. Let me know if you'd like me to.

Looks like lints and tests are failing

pub score: f64,
pub struct ToolParameters {
#[serde(rename = "type")]
pub parameters_type: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not enum? also why not simply name it type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be enum, will update

pub parameter_type: Option<String>,
pub description: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub required: Option<bool>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to say that required parameter is optional to enable following use case. In this spec, when required is missing then parameter is optional. For example city is required and days, units are optional parameters.

    parameters:
      - name: city
        required: true
        description: The city for which the weather forecast is requested.
      - name: days
        description: The number of days for which the weather forecast is requested.
      - name: units
        description: The units in which the weather forecast is requested.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting API choice, instead of using required: false

pub model: String,
pub created_at: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not chrono::DateTime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use chrono::DateTime


#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(untagged)]
pub enum IntOrString {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is more descriptive to name the int and the string by names that tell us why someone would use a string or an int in this field.

}

pub mod open_ai {
use serde::{Deserialize, Serialize};

use super::ToolsDefinition;

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ChatCompletions {
#[serde(default)]
pub model: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between the model here and in the message?

debug!("function_resolver response str: {:?}", body_str);

let mut resp = serde_json::from_str::<FunctionCallingModelResponse>(&body_str).unwrap();
resp.resolver_name = Some(callout_context.prompt_target.as_ref().unwrap().name.clone());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of unwrap re-introduction all over the place in this PR. Is this the best error handling we can do?

info!("possibly some required parameters are missing, send back the response");
let resp_str = serde_json::to_string(&resp).unwrap();
self.send_http_response(
StatusCode::OK.as_u16().into(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why would this be 200, if the response was invalid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part needs to be rewritten - will update this code block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrote this part, its better now

return;
}
};
let tool_call_details = _tool_call_details.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this block of code seems to be like this due to all the print line debugging, did you mean to clean this up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned it up

// add original user prompt
messages.push({
Message {
role: USER_ROLE.to_string(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should role and model be enums?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to restrict role and model to be limited to fixed values. New models keep coming in. And to support new model it would require change in spec.

For Role I am still thinking wether we should use enum or not. I am leaning toward using enum and list all possible roles in spec that are used by API based LLMs that we support in open-message-format.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all the inline requests to local servers I have been thinking that this code would benefit from some sort of strongly typed state machine pattern inspired by a common rust pattern, examples:

  1. https://willcrichton.net/rust-api-type-patterns/state_machines.html
  2. https://hoverbear.org/blog/rust-state-machine-pattern/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junr03 I've been thinking about it too. The orchestration inside filter can be viewed and programmed as state machine. And if done right, it can become much easier to visualize and reason.

Copy link
Collaborator

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of the comments are outstanding. Nothing is truly a blocker, so I'll approve because I know you want to land this. But I believe addressing the comments will improve quality and readability.

@junr03
Copy link
Collaborator

junr03 commented Sep 10, 2024

Also -- lint and test are still not passing.

@adilhafeez adilhafeez merged commit 7b5203a into main Sep 10, 2024
3 checks passed
@junr03 junr03 deleted the adil/add_function_calling branch October 8, 2024 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants