Resolve #1047: LLM caching: Bug when sharing prefix in target #1048

ottonemo · 2024-02-19T12:28:18Z

When caching labels we're assuming that each encoded label has an EOS token. This is not given with every tokenizer. For example the GPT2 tokenizer doesn't do this.

Without the EOS token labels with shared prefixes, e.g. '11' and '11111' (= '11' + '111'), will both have cache entries for the shared prefix '11' but will have different total label lengths (in this case 1 vs. 2 tokens). This then leads to the scenario that, when generating logits for label '11' we will have a 'next' cache entry (for '111') but no more label left. The code only checks for the EOS token (which is not present) and we run into an index error.

The solution is, in this case, to also check if the label we want logits for is already completely checked.

When caching labels we're assuming that each encoded label has an EOS token. This is not given with every tokenizer. For example the GPT2 tokenizer doesn't do this. Without the EOS token labels with shared prefixes, e.g. '11' and '11111' (= '11' + '111'), will both have cache entries for the shared prefix '11' but will have different total label lengths (in this case 1 vs. 2 tokens). This then leads to the scenario that, when generating logits for label '11' we will have a 'next' cache entry (for '111') but no more label left. The code only checks for the EOS token (which is not present) and we run into an index error. The solution is, in this case, to also check if the label we want logits for is already completely checked.

BenjaminBossan

Nice catch, thanks for the fix and test.

ottonemo requested a review from BenjaminBossan February 19, 2024 12:28

BenjaminBossan approved these changes Feb 19, 2024

View reviewed changes

BenjaminBossan merged commit f3d9ea3 into master Feb 19, 2024
15 checks passed

ottonemo mentioned this pull request Feb 19, 2024

LLM caching breaks with shared-prefix labels under certain conditions #1047

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve #1047: LLM caching: Bug when sharing prefix in target #1048

Resolve #1047: LLM caching: Bug when sharing prefix in target #1048

ottonemo commented Feb 19, 2024

BenjaminBossan left a comment

Resolve #1047: LLM caching: Bug when sharing prefix in target #1048

Resolve #1047: LLM caching: Bug when sharing prefix in target #1048

Conversation

ottonemo commented Feb 19, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment