Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reusing results from llama_token_to_piece when sampling grammars #4213

Closed
wants to merge 6 commits into from
Closed

Allow reusing results from llama_token_to_piece when sampling grammars #4213

wants to merge 6 commits into from

Conversation

MarcusDunn
Copy link
Contributor

This is an amendment to #4210 which allows passing in already computed (ideally shared across calls) strings to avoid calling llama_token_to_piece in llama_sample_grammar. Behavoir remains unchanged by passing in nullptr.

image

image

See #4210 for the "before".

I'm marking as a draft for now to allow #4210 to be merged in and then I'll clean up the git history.

Also I wanted to see if this is worth a breaking API change (if it is - I'll go fix up other spots in the code where the change can be taken advantage of.)

const std::string piece = llama_token_to_piece(ctx, id);
std::string piece;

if (pieces != nullptr && pieces[id] != nullptr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for this function doesn't clearly state that individual pieces may be nullptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly playing defence as std::string cannot handle nullptrs / empty c strings (from my understanding - I'm a complete c++ novice)

The lack of docs allows swapping the behavior from calling token_to_piece to assuming a null entry is an empty string, but from my measurements the performance impact was negligible.

If we want to document the current behavior that's fine by me.

Copy link
Collaborator

@cebtenzzre cebtenzzre Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string can represent empty strings, but not null. If it was a C string you would still have to do something with it eventually if it is null to prevent a segfault.

Since this code is apparently not used yet, it's hard to reason about performance or what the correct inputs should be.

@ggerganov
Copy link
Owner

Also I wanted to see if this is worth a breaking API change (if it is - I'll go fix up other spots in the code where the change can be taken advantage of.)

Wouldn't it be better if we pre-compute the pieces when the model is loaded and store it in llama_vocab?

llama.cpp/llama.cpp

Lines 1329 to 1332 in 3e73d31

std::unordered_map<token, id> token_to_id;
std::vector<token_data> id_to_token;

This way the user does not need to do it themselves.

@MarcusDunn
Copy link
Contributor Author

MarcusDunn commented Dec 4, 2023

Closing in favor of implementing #4213 (comment)

Done in #4330

@MarcusDunn MarcusDunn closed this Dec 4, 2023
@MarcusDunn MarcusDunn deleted the grammar_cache_tokens branch December 4, 2023 19:50
@MarcusDunn MarcusDunn mentioned this pull request Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants