Skip to content

Commit

Permalink
speculative : fix prompt tokenization in speculative example (ggergan…
Browse files Browse the repository at this point in the history
…ov#4025)

* Support special tokens and not adding BOS to prompt in speculative

* Adapt to new should_add_bos function

* Ensure tgt and dft have same add_bos setting
  • Loading branch information
AutonomicPerfectionist authored and olexiyb committed Nov 23, 2023
1 parent 083526d commit 03d6ae0
Showing 1 changed file with 15 additions and 2 deletions.
17 changes: 15 additions & 2 deletions examples/speculative/speculative.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,22 @@ int main(int argc, char ** argv) {
}
}

// tokenize the prompt

// Tokenize the prompt
const bool add_bos_tgt = llama_should_add_bos_token(model_tgt);
LOG("add_bos tgt: %d\n", add_bos_tgt);

const bool add_bos_dft = llama_should_add_bos_token(model_dft);
LOG("add_bos dft: %d\n", add_bos_dft);

if (add_bos_tgt != add_bos_dft) {
fprintf(stderr, "%s: error: draft model add_bos must match target model to use speculation but ", __func__);
fprintf(stderr, "add_bos_dft = %d while add_bos_tgt = %d\n", add_bos_dft, add_bos_tgt);
return 1;
}

std::vector<llama_token> inp;
inp = ::llama_tokenize(ctx_tgt, params.prompt, true);
inp = ::llama_tokenize(ctx_tgt, params.prompt, add_bos_tgt, true);

const int max_context_size = llama_n_ctx(ctx_tgt);
const int max_tokens_list_size = max_context_size - 4;
Expand Down

0 comments on commit 03d6ae0

Please sign in to comment.