Make number of generated tokens consistent with CLI #1690

rasbt · 2024-08-21T16:07:16Z

Updated the llm.benchmark() method to make the number of generated tokens used for the calculation consistent with those in the CLI via litgpt generate and litgpt chat.

rasbt · 2024-08-21T16:25:38Z

Actually, I don't think the stream=False vs stream=True difference is an issue in the LLM API. It's reproducible for the LitGPT CLI as well. @Andrei-Aksionov @apaz-cli

Here are screenshots to show that I am not making this up.

1) Stream=False

For a fair comparison, you need to set this to False:

litgpt/litgpt/generate/base.py

Line 92 in 1bd8600

include_prompt: bool = True,

2) Stream=True

TLDR

So if we see this difference between streaming and non-streaming, I think this is a legacy difference, not something introduced by the LLM API or the benchmark code. Maybe the unified code in #1675 will address this. Otherwise, this is a future issue to look into.

Andrei-Aksionov · 2024-08-21T16:54:10Z

Interesting. Thanks for the comparison @rasbt
That would an interesting task to solve. Maybe one day.

Or perhaps @apaz-cli might have noticed something regarding this issue while working on his PR.

apaz-cli · 2024-08-21T17:32:51Z

@rasbt I tested both streaming and nonstreaming against the new impl. They both definitely call next_token() the same number of times, with the same args. It also makes the same call to cat().

Literally the only difference is

# Chat
# Scalarize the tensor for all iterations beyond the first
input_pos[-1:].add_(1)

vs

# Generate, new impl
# Post-prefill create scalar tensor directly
input_pos = input_pos.add_(1)

But this being significant doesn't line up with the theory. Don't know.

Make number of generated tokens consistent with CLI

40f3fab

rasbt requested review from awaelchli and lantiga as code owners August 21, 2024 16:07

rasbt enabled auto-merge (squash) August 21, 2024 16:07

rasbt disabled auto-merge August 21, 2024 16:11

Merge branch 'main' into consistent-tokens

16cd382

rasbt enabled auto-merge (squash) August 21, 2024 17:41

rasbt merged commit aaed893 into main Aug 21, 2024
8 of 9 checks passed

rasbt deleted the consistent-tokens branch August 21, 2024 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make number of generated tokens consistent with CLI #1690

Make number of generated tokens consistent with CLI #1690

rasbt commented Aug 21, 2024

rasbt commented Aug 21, 2024

Andrei-Aksionov commented Aug 21, 2024

apaz-cli commented Aug 21, 2024

Make number of generated tokens consistent with CLI #1690

Make number of generated tokens consistent with CLI #1690

Conversation

rasbt commented Aug 21, 2024

rasbt commented Aug 21, 2024

1) Stream=False

2) Stream=True

TLDR

Andrei-Aksionov commented Aug 21, 2024

apaz-cli commented Aug 21, 2024