Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Add token counting per message #460

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions app/controllers/messages_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ def message_params
:cancelled_at,
:branched,
:branched_from_version,
:input_token_count,
:output_token_count,
documents_attributes: [:file]
)
if modified_params.has_key?(:content_text) && modified_params[:content_text].blank?
Expand Down
2 changes: 1 addition & 1 deletion app/models/message.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
class Message < ApplicationRecord
include DocumentImage, Version, Cancellable, Toolable
include DocumentImage, Version, Cancellable, Toolable, TokenCount

belongs_to :assistant
belongs_to :conversation
Expand Down
7 changes: 7 additions & 0 deletions app/models/message/token_count.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
module Message::TokenCount
extend ActiveSupport::Concern
included do
attribute :input_token_count, :integer, default: 0
attribute :output_token_count, :integer, default: 0
end
end
9 changes: 9 additions & 0 deletions app/services/ai_backend/anthropic.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@ def get_next_chat_message(&chunk_received_handler)

response_handler = proc do |intermediate_response, bytesize|
chunk = intermediate_response.dig("delta", "text")

# input and output tokens are sent in different responses
if (input_tokens = intermediate_response.dig("message", "usage", "input_tokens"))
@message.input_token_count += input_tokens
end
if (output_tokens = intermediate_response.dig("message", "usage", "output_tokens"))
@message.output_token_count += output_tokens
end

print chunk if Rails.env.development?
if chunk
stream_response_text += chunk
Expand Down
7 changes: 7 additions & 0 deletions app/services/ai_backend/open_ai.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def get_next_chat_message(&chunk_handler)
tools: Toolbox.tools,
stream: response_handler,
max_tokens: 2000, # we should really set this dynamically, based on the model, to the max
stream_options: {include_usage: true}
})
rescue ::Faraday::UnauthorizedError => e
raise ::OpenAI::ConfigurationError
Expand All @@ -54,6 +55,12 @@ def stream_handler(&chunk_received_handler)
content_chunk = intermediate_response.dig("choices", 0, "delta", "content")
tool_calls_chunk = intermediate_response.dig("choices", 0, "delta", "tool_calls")

# input and output tokens are sent in the same response
if (input_tokens, output_tokens = intermediate_response["usage"]&.values_at("prompt_tokens", "completion_tokens"))
@message.input_token_count += input_tokens
@message.output_token_count += output_tokens
end

lumpidu marked this conversation as resolved.
Show resolved Hide resolved
print content_chunk if Rails.env.development?
if content_chunk
@stream_response_text += content_chunk
Expand Down
18 changes: 18 additions & 0 deletions app/views/messages/_message.html.erb
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,24 @@ end
<% end %>
</menu>
</div>
<% if last_message %>
<% input_tokens = conversation.messages.sum(:input_token_count) %>
<% output_tokens = conversation.messages.sum(:output_token_count) %>

<div class="dropdown dropdown-top flex items-center ml-2">
<%= button_tag "$",
tabindex: 0,
role: :button,
class: "text-gray-600 dark:text-gray-300 hover:text-gray-900 dark:hover:text-white",
data: { role: "show-token-info" }
%>

<div tabindex="0" class="dropdown-content -ml-6 z-10 p-2 shadow-xl bg-base-100 rounded-box w-52 dark:!bg-gray-700">
<p class="py-1 px-2 text-sm text-gray-700 dark:text-gray-300">Input tokens: <%= input_tokens %></p>
<p class="py-1 px-2 text-sm text-gray-700 dark:text-gray-300">Output tokens: <%= output_tokens %></p>
</div>
</div>
<% end %>
Copy link
Contributor

@krschacht krschacht Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lumpidu I tried out this PR and it's looking good! But I just realized a pretty decent issue with this which I didn't consider before: the _message partial updates a lot because we stream continual fragments. Having this partial trigger some SQL queries doesn't seem like a good idea.

I spent a little while looking at the UI and I made room for this next to the conversation title, in the left column. Here is the screenshot:
image

What do you think? It's this PR #463 (I just hard coded those token numbers as placeholders which you can make dynamic)

One implication of putting it in the left column is that we need to make sure it updates itself. I already have the front-end wired up with turbo so that if a conversation model changes at all, it will broadcast this update to the left side and replace the full _conversation partial. Every time a message finishes streaming, it does message.conversation.touch to trigger an update to the timestamp of the conversation object, and this also causes the conversation title to update because it has an after_touch hook (all this is within conversation.rb) So as long as we continually update the cost on the conversation model then it will just automagically stay updated on the front-end.

If you like the direction I'm heading with this, then I can merge in that PR 463 and then you can pull main back into this PR. You can then update this PR to be:

  • Add 3 token columns to the conversation table: input_token_count, output_token_count, and token_cost
  • Add an after_save hook within message which updates these three columns on the conversation
  • Within get_next_ai_message_job find the conversation.touch and remove this, it's no longer necessary now that message.save always triggers the conversation to update
  • Within conversation.rb change the after_touch to an after_save
  • Update the _conversation partial to remove "hidden" on that and wire up those numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks good to me. I will merge your changes and continue as proposed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! It's now in main so you can pull it into this branch.

<% end %>
</div>
</turbo-frame>
Expand Down
6 changes: 6 additions & 0 deletions db/migrate/20240713130357_add_token_counts_to_messages.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
class AddTokenCountsToMessages < ActiveRecord::Migration[7.1]
def change
add_column :messages, :input_token_count, :integer, default: 0, null: false
add_column :messages, :output_token_count, :integer, default: 0, null: false
end
end
4 changes: 3 additions & 1 deletion db/schema.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#
# It's strongly recommended that you check this file into your version control system.

ActiveRecord::Schema[7.1].define(version: 2024_06_24_100000) do
ActiveRecord::Schema[7.1].define(version: 2024_07_13_130357) do
# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"

Expand Down Expand Up @@ -197,6 +197,8 @@
t.integer "branched_from_version"
t.jsonb "content_tool_calls"
t.string "tool_call_id"
t.integer "input_token_count", default: 0, null: false
t.integer "output_token_count", default: 0, null: false
t.index ["assistant_id"], name: "index_messages_on_assistant_id"
t.index ["content_document_id"], name: "index_messages_on_content_document_id"
t.index ["conversation_id", "index", "version"], name: "index_messages_on_conversation_id_and_index_and_version", unique: true
Expand Down