Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add attributes to GenAI message events indicating position within list of messages #1912

Open
alexmojaki opened this issue Feb 17, 2025 · 2 comments

Comments

@alexmojaki
Copy link

Area(s)

area:gen-ai

What's missing?

If users want to count the number of times that a user message contains some keyword, their query has to account for the fact that the same user message can appear many times in the logs, since the whole message history is resent and relogged with each request in a back-and-forth conversation. Conceptually this means they need to filter down to user messages which are the last message in the message history, because if there are other messages after it then it was sent and logged before.

Currently this is at least difficult, maybe impossible. A query has to do something like get the child event with the latest timestamp in each parent span. But as pointed out in #1883 a single message can contain both text from the user and tool call responses, generating multiple events. Getting the the last gen_ai.user.message event wouldn't work either, e.g. this would double count in the case where gen_ai.user.message is followed by a tool call and response with no further gen_ai.user.message.

Describe the solution you'd like

I propose that the gen_ai.*.message events generated for each request message as in https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/ should have two additional attributes representing:

  • The total number of messages sent in the API request, i.e. the length of the array representing the message history
  • The position/index within that array of messages of the message corresponding to the event in question

A single message can produce multiple events, so multiple events can have the same value for the position attribute. This can be used to reconstruct the actual API request based on the full list of events as requested in #1883.

Filtering for the last user message then means checking that the position is equal to the total, or the total minus one if the position attribute is a 0-based index.

@alexmojaki
Copy link
Author

Thinking about this more, clients can send multiple user messages at the end of the list, even if that's weird. so maybe it would be better to have a boolean which is set to true for user messages that have no non-user messages after it.

@michaelsafyan
Copy link
Contributor

+1 to this. It is strange that index exists on response event but not on the request event. It would be good to be able to identify the index/position for the prompt, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants