-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Add token counting per message #460
Conversation
562cb30
to
13dbd5f
Compare
Implement token counting per message by using the returned usage stats from the LLM API backend. Anthropic sends token stats already by default. As for OpenAI, the stream needs to be first configured to return those. - Add migration for new Message attributes input_token_count, output_token_count - Add concern Message::TokenCount for Message model - Implement capturing of returned token count from Anthropic & OpenAI and set Message attributes inside the async API-specific jobs. For OpenAI, configure the stream to return usage stats. Signed-off-by: Daniel Schnell <dschnell@grammatek.com> Co-authored-by: Keith Schacht <krschacht@gmail.com>
Add a drop-down showing the LLM API reported input/output tokens for the ongoing conversation. Signed-off-by: Daniel Schnell <dschnell@grammatek.com>
f270bdc
to
7de810f
Compare
<p class="py-1 px-2 text-sm text-gray-700 dark:text-gray-300">Output tokens: <%= output_tokens %></p> | ||
</div> | ||
</div> | ||
<% end %> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lumpidu I tried out this PR and it's looking good! But I just realized a pretty decent issue with this which I didn't consider before: the _message partial updates a lot because we stream continual fragments. Having this partial trigger some SQL queries doesn't seem like a good idea.
I spent a little while looking at the UI and I made room for this next to the conversation title, in the left column. Here is the screenshot:
What do you think? It's this PR #463 (I just hard coded those token numbers as placeholders which you can make dynamic)
One implication of putting it in the left column is that we need to make sure it updates itself. I already have the front-end wired up with turbo so that if a conversation model changes at all, it will broadcast this update to the left side and replace the full _conversation
partial. Every time a message finishes streaming, it does message.conversation.touch
to trigger an update to the timestamp of the conversation object, and this also causes the conversation title to update because it has an after_touch hook (all this is within conversation.rb
) So as long as we continually update the cost on the conversation model then it will just automagically stay updated on the front-end.
If you like the direction I'm heading with this, then I can merge in that PR 463 and then you can pull main back into this PR. You can then update this PR to be:
- Add 3 token columns to the conversation table:
input_token_count
,output_token_count
, andtoken_cost
- Add an after_save hook within message which updates these three columns on the conversation
- Within get_next_ai_message_job find the conversation.touch and remove this, it's no longer necessary now that message.save always triggers the conversation to update
- Within conversation.rb change the
after_touch
to anafter_save
- Update the _conversation partial to remove "hidden" on that and wire up those numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks good to me. I will merge your changes and continue as proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! It's now in main so you can pull it into this branch.
I can't push some fixes to this branch and I want to get it merged in, so I needed to create a new PR: #495 |
Implement token counting per message by using the returned usage stats from the LLM API backend. Anthropic sends token stats already by default. As for OpenAI, the stream needs to be first configured to return those.
This MR is related to #402