Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Add token counting per message #460

Closed
wants to merge 2 commits into from

Conversation

lumpidu
Copy link
Contributor

@lumpidu lumpidu commented Jul 14, 2024

Implement token counting per message by using the returned usage stats from the LLM API backend. Anthropic sends token stats already by default. As for OpenAI, the stream needs to be first configured to return those.

  • Add migration for new Message attributes input_token_count, output_token_count
  • Add concern Message::TokenCount for Message model
  • Implement capturing of returned token count from Anthropic & OpenAI and set Message attributes inside the async API-specific jobs. For OpenAI, configure the stream to return usage stats
  • Add basic usage dropdown-menu beneath last message that shows the number of input/output tokens when clicked on the "$" sign. This is just a quick demo for the functionality and will likely be changed to something else

This MR is related to #402

@lumpidu lumpidu force-pushed the feat/token_count branch from 562cb30 to 13dbd5f Compare July 14, 2024 22:15
lumpidu and others added 2 commits July 15, 2024 12:15
Implement token counting per message by using the returned usage stats
from the LLM API backend. Anthropic sends token stats already by default.
As for OpenAI, the stream needs to be first configured to return those.

- Add migration for new Message attributes input_token_count,
  output_token_count
- Add concern Message::TokenCount for Message model
- Implement capturing of returned token count from Anthropic & OpenAI
  and set Message attributes inside the async API-specific jobs.
  For OpenAI, configure the stream to return usage stats.

Signed-off-by: Daniel Schnell <dschnell@grammatek.com>
Co-authored-by: Keith Schacht <krschacht@gmail.com>
Add a drop-down showing the LLM API reported input/output tokens for the
ongoing conversation.


Signed-off-by: Daniel Schnell <dschnell@grammatek.com>
@lumpidu lumpidu force-pushed the feat/token_count branch from f270bdc to 7de810f Compare July 15, 2024 12:17
<p class="py-1 px-2 text-sm text-gray-700 dark:text-gray-300">Output tokens: <%= output_tokens %></p>
</div>
</div>
<% end %>
Copy link
Contributor

@krschacht krschacht Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lumpidu I tried out this PR and it's looking good! But I just realized a pretty decent issue with this which I didn't consider before: the _message partial updates a lot because we stream continual fragments. Having this partial trigger some SQL queries doesn't seem like a good idea.

I spent a little while looking at the UI and I made room for this next to the conversation title, in the left column. Here is the screenshot:
image

What do you think? It's this PR #463 (I just hard coded those token numbers as placeholders which you can make dynamic)

One implication of putting it in the left column is that we need to make sure it updates itself. I already have the front-end wired up with turbo so that if a conversation model changes at all, it will broadcast this update to the left side and replace the full _conversation partial. Every time a message finishes streaming, it does message.conversation.touch to trigger an update to the timestamp of the conversation object, and this also causes the conversation title to update because it has an after_touch hook (all this is within conversation.rb) So as long as we continually update the cost on the conversation model then it will just automagically stay updated on the front-end.

If you like the direction I'm heading with this, then I can merge in that PR 463 and then you can pull main back into this PR. You can then update this PR to be:

  • Add 3 token columns to the conversation table: input_token_count, output_token_count, and token_cost
  • Add an after_save hook within message which updates these three columns on the conversation
  • Within get_next_ai_message_job find the conversation.touch and remove this, it's no longer necessary now that message.save always triggers the conversation to update
  • Within conversation.rb change the after_touch to an after_save
  • Update the _conversation partial to remove "hidden" on that and wire up those numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks good to me. I will merge your changes and continue as proposed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! It's now in main so you can pull it into this branch.

@krschacht
Copy link
Contributor

I can't push some fixes to this branch and I want to get it merged in, so I needed to create a new PR: #495

@krschacht krschacht closed this Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants