Tracking token usage when using RunnableWithMessageHistory and Streaming #5880

KeenanFernandes2000 · 2024-06-24T11:01:56Z

KeenanFernandes2000
Jun 24, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo-0125",
  temperature: 0,
});
      
const prompt = ChatPromptTemplate.fromMessages([
        ["system", system],
        ["placeholder", "{chat_history}"],
        ["human", "{input}"],
      ]);

      const chain = prompt.pipe(llm);

      const chatwithMessageHistory = new RunnableWithMessageHistory({
        runnable: chain,
        getMessageHistory: (_sessionId) =>
          new UpstashRedisChatMessageHistory({
            sessionId: `${_sessionId}-${ai}-Chat`,
            sessionTTL: 300,
            config: {
              url: process.env.REDIS_URL!,
              token: process.env.REDIS_TOKEN!,
            },
          }),
        inputMessagesKey: "input",
        historyMessagesKey: "history",
      });
      // Check end of file for streaming agent based output
      const stream = await chatwithMessageHistory.stream(
        {
          input: message,
        },
        {
          configurable: {
            sessionId: sessId,
          },
        }
      );

Description

I am trying to track the token usage based on each session, however the sample code provide on the documentation only works on the bare bones model and setup with the message history or anything. What can I add to track the token usage with the code above. Please note that I am using streaming and NOT invoke.

System Info

Node v18.18.0
"@langchain/anthropic": "^0.1.9",
"@langchain/community": "^0.2.13",
"@langchain/core": "^0.2.9",
"@langchain/openai": "^0.1.3",
"langchain": "^0.2.5",

Answered by KeenanFernandes2000

Jun 25, 2024

const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo-0125",
  temperature: 0,
  streaming: true,
  streamUsage: true,
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", system],
  ["placeholder", "{chat_history}"],
  ["human", "{input}"],
]);

const chain = prompt.pipe(llm);

const chatwithMessageHistory = new RunnableWithMessageHistory({
  runnable: chain,
  getMessageHistory: (_sessionId) =>
    new UpstashRedisChatMessageHistory({
      sessionId: "session id goes here",
      sessionTTL: 300,
      config: {
        url: "url goes here",
        token: "token goes here",
      },
    }),
  inputMessagesKey: "input",
  historyMessagesKey: "history",
});

/*Code abov…

View full answer

@dosu · 2024-06-24T11:07:27Z

dosubot[bot]
bot Jun 24, 2024

To track the token usage based on each session when using RunnableWithMessageHistory and streaming in your TypeScript code, you can use the response_metadata attribute to capture token usage information. Here is an example of how you can implement this:

import { ChatOpenAI } from "@langchain/openai";
import { RunnableWithMessageHistory } from "your-runnable-with-message-history-module"; // Replace with actual import

const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo-0125",
  temperature: 0,
  callbacks: [
    {
      handleLLMEnd(output) {
        if (output.llmOutput && output.llmOutput.tokenUsage) {
          console.log("Token usage:", output.llmOutput.tokenUsage);
        }
      },
    },
  ],
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", system],
  ["placeholder", "{chat_history}"],
  ["human", "{input}"],
]);

const chain = prompt.pipe(llm);

const chatwithMessageHistory = new RunnableWithMessageHistory({
  runnable: chain,
  getMessageHistory: (_sessionId) =>
    new UpstashRedisChatMessageHistory({
      sessionId: `${_sessionId}-${ai}-Chat`,
      sessionTTL: 300,
      config: {
        url: process.env.REDIS_URL!,
        token: process.env.REDIS_TOKEN!,
      },
    }),
  inputMessagesKey: "input",
  historyMessagesKey: "history",
  onMessage: (message) => {
    if (message.response_metadata && message.response_metadata.tokenUsage) {
      console.log("Token usage for session:", message.response_metadata.tokenUsage);
    }
  },
});

// Example usage
const stream = await chatwithMessageHistory.stream(
  {
    input: message,
  },
  {
    configurable: {
      sessionId: sessId,
    },
  }
);

const chunks = [];
for await (const chunk of stream) {
  console.log(chunk);
  chunks.push(chunk);
}

In this example, the handleLLMEnd callback is used to log the token usage information from the response_metadata attribute for each session. The response_metadata object typically includes details such as completionTokens, promptTokens, and totalTokens, which you can use to track token usage. Make sure to replace "your-runnable-with-message-history-module" with the actual module where RunnableWithMessageHistory is defined ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

0 replies

KeenanFernandes2000 · 2024-06-24T11:39:39Z

KeenanFernandes2000
Jun 24, 2024
Author

@dosu I get Object literal may only specify known properties, and 'onMessage' does not exist in type 'RunnableWithMessageHistoryInputs<any, AIMessageChunk>'.ts(2353) error

9 replies

dosubot[bot] bot Jun 24, 2024

To ensure that token usage is tracked correctly during streaming, you need to verify that the response_metadata object contains the token usage information. It appears that the response_metadata object in your chunks does not have the expected token usage data.

Here is an updated example that includes additional checks and logs to help debug the issue:

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";
import { UpstashRedisChatMessageHistory } from "@langchain/community/stores/message/upstash_redis";
import { RunnableWithMessageHistory } from "langchain-core/src/runnables/history";

const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo-0125",
  temperature: 0,
  streaming: true, // Enable streaming
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "You are a helpful assistant."],
  new MessagesPlaceholder("history"),
  ["human", "{input}"],
]);

const chain = prompt.pipe(llm);

const chatwithMessageHistory = new RunnableWithMessageHistory({
  runnable: chain,
  getMessageHistory: (_sessionId) =>
    new UpstashRedisChatMessageHistory({
      sessionId: `${_sessionId}-Chat`,
      sessionTTL: 300,
      config: {
        url: process.env.REDIS_URL!,
        token: process.env.REDIS_TOKEN!,
      },
    }),
  inputMessagesKey: "input",
  historyMessagesKey: "history",
});

// Example usage
const stream = await chatwithMessageHistory.stream(
  {
    input: "Tell me a joke.",
  },
  {
    configurable: {
      sessionId: "example-session-id",
    },
  }
);

let totalTokens = 0;
const chunks = [];
for await (const chunk of stream) {
  console.log(chunk);
  chunks.push(chunk);
  if (chunk.response_metadata) {
    console.log("Response metadata:", chunk.response_metadata);
    if (chunk.response_metadata.completion) {
      totalTokens += chunk.response_metadata.completion;
    }
  } else {
    console.log("No response metadata found in chunk.");
  }
}

console.log("Total tokens used:", totalTokens);

In this example, additional logs are added to check the presence of response_metadata in each chunk. This will help you identify if the response_metadata is missing or if it does not contain the expected token usage information.

If the response_metadata is consistently missing or does not include token usage data, you may need to ensure that your OpenAI configuration and model support token usage tracking during streaming. You can also refer to the OpenAI API documentation to verify if there are any specific settings required to enable token usage tracking for streaming responses.

To continue talking to Dosu, mention @dosu.

KeenanFernandes2000 Jun 24, 2024
Author

@dosu

Response metadata: { prompt: 0, completion: 0, finish_reason: null }
Response metadata: { prompt: 0, completion: 0, finish_reason: null }
Response metadata: { prompt: 0, completion: 0, finish_reason: null }
Response metadata: { prompt: 0, completion: 0, finish_reason: null }
Response metadata: { prompt: 0, completion: 0, finish_reason: null }
Response metadata: { prompt: 0, completion: 0, finish_reason: null }
Response metadata: { prompt: 0, completion: 0, finish_reason: null }
Response metadata: { prompt: 0, completion: 0, finish_reason: null }
Response metadata: { prompt: 0, completion: 0, finish_reason: null }
Response metadata: { prompt: 0, completion: 0, finish_reason: null }
Response metadata: { prompt: 0, completion: 0, finish_reason: 'stop' }
Response metadata: {}
Total tokens used: 0

this is what is returned

dosubot[bot] bot Jun 24, 2024

The issue you're encountering is that the response_metadata object does not contain the expected token usage information during streaming. This is a known limitation with the stream method in the ChatOpenAI class, where token usage metadata is only available for the final chunk, and it may not accurately reflect the total token usage for the entire response ^[1]^[2].

To work around this limitation, you can aggregate the token usage from the final chunk of the response. Here is an updated example that captures token usage from the final chunk:

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";
import { UpstashRedisChatMessageHistory } from "@langchain/community/stores/message/upstash_redis";
import { RunnableWithMessageHistory } from "langchain-core/src/runnables/history";

const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo-0125",
  temperature: 0,
  streaming: true, // Enable streaming
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "You are a helpful assistant."],
  new MessagesPlaceholder("history"),
  ["human", "{input}"],
]);

const chain = prompt.pipe(llm);

const chatwithMessageHistory = new RunnableWithMessageHistory({
  runnable: chain,
  getMessageHistory: (_sessionId) =>
    new UpstashRedisChatMessageHistory({
      sessionId: `${_sessionId}-Chat`,
      sessionTTL: 300,
      config: {
        url: process.env.REDIS_URL!,
        token: process.env.REDIS_TOKEN!,
      },
    }),
  inputMessagesKey: "input",
  historyMessagesKey: "history",
});

// Example usage
const stream = await chatwithMessageHistory.stream(
  {
    input: "Tell me a joke.",
  },
  {
    configurable: {
      sessionId: "example-session-id",
    },
  }
);

let finalChunk;
const chunks = [];
for await (const chunk of stream) {
  console.log(chunk);
  chunks.push(chunk);
  finalChunk = chunk; // Save the final chunk
}

if (finalChunk && finalChunk.response_metadata) {
  const totalTokens = finalChunk.response_metadata.prompt + finalChunk.response_metadata.completion;
  console.log("Total tokens used:", totalTokens);
} else {
  console.log("No token usage information found in the final chunk.");
}

In this example, the finalChunk variable captures the last chunk of the stream, and the token usage is calculated from the response_metadata of this final chunk. This approach ensures that you get the token usage information, even if it is only available in the final chunk ^[1].

To continue talking to Dosu, mention @dosu.

KeenanFernandes2000 Jun 24, 2024
Author

@dosu this also does not work. What are the alternatives to obtaining the token usage

dosubot[bot] bot Jun 24, 2024

To obtain token usage during streaming, you can use the CallbackManager to handle token usage tracking more effectively. This approach ensures that token usage is captured in real-time during the streaming process.

Here is an example of how to implement this:

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";
import { UpstashRedisChatMessageHistory } from "@langchain/community/stores/message/upstash_redis";
import { RunnableWithMessageHistory } from "langchain-core/src/runnables/history";
import { CallbackManager } from "@langchain/callbacks";

const callbackManager = CallbackManager.fromHandlers({
  async handleLLMEnd(output) {
    if (output.response_metadata && output.response_metadata.tokenUsage) {
      console.log("Token usage:", output.response_metadata.tokenUsage);
    }
  },
});

const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo-0125",
  temperature: 0,
  streaming: true, // Enable streaming
  callbackManager,
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "You are a helpful assistant."],
  new MessagesPlaceholder("history"),
  ["human", "{input}"],
]);

const chain = prompt.pipe(llm);

const chatwithMessageHistory = new RunnableWithMessageHistory({
  runnable: chain,
  getMessageHistory: (_sessionId) =>
    new UpstashRedisChatMessageHistory({
      sessionId: `${_sessionId}-Chat`,
      sessionTTL: 300,
      config: {
        url: process.env.REDIS_URL!,
        token: process.env.REDIS_TOKEN!,
      },
    }),
  inputMessagesKey: "input",
  historyMessagesKey: "history",
});

// Example usage
const stream = await chatwithMessageHistory.stream(
  {
    input: "Tell me a joke.",
  },
  {
    configurable: {
      sessionId: "example-session-id",
    },
  }
);

const chunks = [];
for await (const chunk of stream) {
  console.log(chunk);
  chunks.push(chunk);
}

In this example, the CallbackManager is used to handle the end of the LLM response and log the token usage. This ensures that token usage is tracked correctly during the streaming process ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

MJDeligan · 2024-06-25T08:42:39Z

MJDeligan
Jun 25, 2024

Try setting the streamUsage field in the ChatOpenAI constructor to true. Streaming usage was recently introduced by OpenAI and is not on by default.

As for processing the usage while streaming, I haven't used callbacks much, but maybe the documentation can help you with those. Alternatively you can try the streamEvents API. You'd be looking for the 'on_llm_end' event here.

1 reply

KeenanFernandes2000 Jun 25, 2024
Author

ok this is very weird, once I set streamUsage to true, it started working like how the documentation state, and when I removed it, it continued to work. I have no idea what's going on xD.

I will post both the solution in a bit so nobody else needs to struggle.

KeenanFernandes2000 · 2024-06-25T09:47:18Z

KeenanFernandes2000
Jun 25, 2024
Author

const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo-0125",
  temperature: 0,
  streaming: true,
  streamUsage: true,
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", system],
  ["placeholder", "{chat_history}"],
  ["human", "{input}"],
]);

const chain = prompt.pipe(llm);

const chatwithMessageHistory = new RunnableWithMessageHistory({
  runnable: chain,
  getMessageHistory: (_sessionId) =>
    new UpstashRedisChatMessageHistory({
      sessionId: "session id goes here",
      sessionTTL: 300,
      config: {
        url: "url goes here",
        token: "token goes here",
      },
    }),
  inputMessagesKey: "input",
  historyMessagesKey: "history",
});

/*Code above remains the same*/


// Solution for stream Events - useful for extracting extra data from the stream
const streamEvent = chatwithMessageHistory.streamEvents(
  {
    input: message,
  },
  {
    version: "v1",
    configurable: {
      sessionId: sessId,
    },
  }
);

for await (const chunk of streamEvent) {
  if (chunk.event == "on_chain_end" && chunk.tags?.length == 0) {
    console.log(chunk);
    if (chunk.data.output.usage_metadata !== undefined) {
      console.log(chunk.data.output.usage_metadata);
      totalTokens = chunk.data.output.usage_metadata.total_tokens;
      console.log(
        `Session: ${chunk.metadata.sessionId}\nTotal tokens: ${totalTokens}`
      );
    }
  }

  if (chunk.event == "on_chain_stream") {
    console.log(chunk.data.chunk.content);
  }
}

// Solution for stream - better for just getting the content and usage 
const stream = await chatwithMessageHistory.stream(
  {
    input: message,
  },
  {
    configurable: {
      sessionId: sessId,
    },
  }
);

for await (const chunk of stream) {
  if (chunk.usage_metadata) {
    console.log(chunk.usage_metadata.total_tokens);
  }
  console.log(chunk.content)
}

will keep this discussion open for a day or so

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking token usage when using RunnableWithMessageHistory and Streaming #5880

{{title}}

Replies: 4 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Tracking token usage when using RunnableWithMessageHistory and Streaming #5880

KeenanFernandes2000 Jun 24, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 4 comments · 10 replies

dosubot[bot] bot Jun 24, 2024

KeenanFernandes2000 Jun 24, 2024 Author

dosubot[bot] bot Jun 24, 2024

KeenanFernandes2000 Jun 24, 2024 Author

dosubot[bot] bot Jun 24, 2024

KeenanFernandes2000 Jun 24, 2024 Author

dosubot[bot] bot Jun 24, 2024

MJDeligan Jun 25, 2024

KeenanFernandes2000 Jun 25, 2024 Author

KeenanFernandes2000 Jun 25, 2024 Author

KeenanFernandes2000
Jun 24, 2024

Replies: 4 comments 10 replies

dosubot[bot]
bot Jun 24, 2024

KeenanFernandes2000
Jun 24, 2024
Author

KeenanFernandes2000 Jun 24, 2024
Author

KeenanFernandes2000 Jun 24, 2024
Author

MJDeligan
Jun 25, 2024

KeenanFernandes2000 Jun 25, 2024
Author

KeenanFernandes2000
Jun 25, 2024
Author