Make reverse prompt option act as a stop token in non-interactive sce… #1032

data-angel · 2023-04-17T17:20:22Z

…narios. This use case is important for exposing the command line main example in non-interactive mode as an API through a tool like Cortex. Without a stop token, the generation will just continue to generate until the token limit is reached, which is not ideal if you know what kind of output you're expecting. This is particularly useful if you're trying to use a few-shot markup like chatML to separate instruction or chat responses.

This change does separate the reverse prompt from interactive mode, so now interactive mode needs to be specified along with reverse prompt for interactive scenarios. The help has been updated accordingly. I could have just added a separate stop token string, but that seemed bloaty.

…narios

SlyEcho · 2023-04-18T11:08:51Z

Reverse prompt will switch to interactive mode right now, but if you pipe in /dev/null to stdin the program will exit. I've used this method to connect main.cpp to Langchain, for example.

data-angel · 2023-04-19T01:05:02Z

Reverse prompt will switch to interactive mode right now, but if you pipe in /dev/null to stdin the program will exit. I've used this method to connect main.cpp to Langchain, for example.

Thanks for the info - that's good to know! I still like the idea of making it a first-class citizen to align better with the way we typically use other gen models.

examples/common.cpp

examples/main/main.cpp

DannyDaemonic · 2023-05-11T10:10:41Z

examples/main/main.cpp

+                    size_t extra_padding = params.interactive ? 0 : 2;
+                    size_t search_start_pos = last_output.length() > static_cast<size_t>(antiprompt.length() + extra_padding)
+                        ? last_output.length() - static_cast<size_t>(antiprompt.length() + extra_padding)
+                        : 0;
+
+                    if (last_output.find(antiprompt.c_str(), search_start_pos) != std::string::npos) {
+                        if (params.interactive) {
+                            is_interacting = true;
+                            set_console_color(con_st, CONSOLE_COLOR_USER_INPUT);
+                        }


Nothing in this PR causes this issue. The problem this is trying to fix is the one that pops up when people end their reverse prompts with a space. If you remove the trailing spaces from your reverse prompt, it should work. There's been talk of adding a warning to the program when users include a trailing space. I'd suggest we just remove it.

Suggested change

size_t extra_padding = params.interactive ? 0 : 2;

size_t search_start_pos = last_output.length() > static_cast<size_t>(antiprompt.length() + extra_padding)

? last_output.length() - static_cast<size_t>(antiprompt.length() + extra_padding)

: 0;

if (last_output.find(antiprompt.c_str(), search_start_pos) != std::string::npos) {

if (params.interactive) {

is_interacting = true;

set_console_color(con_st, CONSOLE_COLOR_USER_INPUT);

}

Not done. I've found this code is necessary in several scenarios. For example, if your stop token is chatML like "<|IM_END|>", it often is tokenized with trailing characters like "\", so if you don't do a fuzzy search for the stop token you miss it in the generated stream. In the non-interactive case, it's not a big deal to have extra trailing characters in the response as that can be easily handled by the code that's processing the response.

I think there's a proper way to do this such that it works with both interactive mode and noninteractive mode that involves looking at the next token right when we first get it from llama_sample, before it's printed to the terminal or evaluated by llama_eval. It would be more efficient as well because you're skipping an eval. Of course I don't expect you to do that. It's far from a trivial fix and should be its own pull request anyway.

I guess what's holding me back from approving this is that we've made everyone else adjust their reverse prompts not to include the token that gets split. So we tell people to remove the trailing space ( ) from their reverse prompt, or in your case the equivalent suggestion would be to drop the right angle bracket (>) from your reverse prompt (it may take dropping both |>).

The decision to rigidly enforce the reverse prompt is from before I became a collaborator, so as much as I agree that reverse prompts should not automatically enable interactive mode, this isn't part of that. I'm going to have to wait for someone more senior than me to approve this particular part of the PR.

I understand your perspective, but I think it would be a mistake to require arcane tokenization knowledge to have people "properly" set a stop token. Better to just set it to the string you want to look for. I agree that the more correct answer is to wire the stop token detection at a deeper level, but as you pointed out that also comes with more risk and complexity. I think this is a fine compromise and the potential for negative impact seems minimal.

Can we split out this hunk into its own PR perhaps? It appears everything else here has consensus and could be landed. Even without this, non-interactive -r would be a net improvement and it would at worst be consistent with the current (buggy) behavior of -r.

Sure we could, but I feel like it's a pretty concise, contained, low risk change as it sits. Allowing for a more natural expression of the reverse prompt in the non-interactive case I think is far more accessible for users who aren't going to understand how their prompt string is tokenized exactly and more maintainable for those folks down the road. Imagine a case in which your reverse prompt is expressed as "<|IM_END" in your code - you'll need to comment why it's expressed like that as a quirk of the underlying platform so that someone doesn't change it to the more correct looking "<|IM_END|>" and be confused when that doesn't terminate the execution despite being clearly present in the output.

The way the code is currently written, it only changes it in the non-interactive case, which is pretty surgical. IMO the bigger issue with this PR is that it's a breaking change for people who are just using -r to trigger interactivity in their scripts, but I think that part is acceptable and folks seem to agree.

@DannyDaemonic - who should take a look at this to see if they're bothered by allowing reverse prompts that aren't precisely token aligned?

DannyDaemonic · 2023-05-11T10:20:32Z

How does everyone feel about this approach? I think it's cleaner than having a separate set of exit-program prompts that are just a copy and paste of the antiprompt checks. It may require a few scripts to be updated but with 1305 happening, people can just fix both at once.

@data-angel Are you willing to bring this up to date?

SlyEcho · 2023-05-11T10:41:43Z

I like this more since it is predictable, if I don't enable interactive, it is not enabled for me.

data-angel · 2023-05-11T18:43:53Z

@DannyDaemonic Thanks for the feedback! It should be up to date now. Let me know if you need anything else.

DannyDaemonic

You should probably also update common.cpp's gpt_params_parse to reflect the test for interactivity because antiprompts no longer trigger interactive mode.
So something like this:

    if (params.prompt_cache_all && 
        (params.interactive || params.interactive_first || params.instruct)) {

DannyDaemonic · 2023-05-11T20:37:36Z

examples/main/main.cpp

+                    size_t extra_padding = params.interactive ? 0 : 2;
+                    size_t search_start_pos = last_output.length() > static_cast<size_t>(antiprompt.length() + extra_padding)
+                        ? last_output.length() - static_cast<size_t>(antiprompt.length() + extra_padding)
+                        : 0;
+
+                    if (last_output.find(antiprompt.c_str(), search_start_pos) != std::string::npos) {
+                        if (params.interactive) {
+                            is_interacting = true;
+                            set_console_color(con_st, CONSOLE_COLOR_USER_INPUT);
+                        }


I think there's a proper way to do this such that it works with both interactive mode and noninteractive mode that involves looking at the next token right when we first get it from llama_sample, before it's printed to the terminal or evaluated by llama_eval. It would be more efficient as well because you're skipping an eval. Of course I don't expect you to do that. It's far from a trivial fix and should be its own pull request anyway.

I guess what's holding me back from approving this is that we've made everyone else adjust their reverse prompts not to include the token that gets split. So we tell people to remove the trailing space ( ) from their reverse prompt, or in your case the equivalent suggestion would be to drop the right angle bracket (>) from your reverse prompt (it may take dropping both |>).

The decision to rigidly enforce the reverse prompt is from before I became a collaborator, so as much as I agree that reverse prompts should not automatically enable interactive mode, this isn't part of that. I'm going to have to wait for someone more senior than me to approve this particular part of the PR.

data-angel · 2023-05-12T03:49:39Z

You should probably also update common.cpp's gpt_params_parse to reflect the test for interactivity because antiprompts no longer trigger interactive mode. So something like this:
    if (params.prompt_cache_all && 
        (params.interactive || params.interactive_first || params.instruct)) {

Good catch - will update that.

This reverts commit 2bb2ff1.

data-angel · 2023-05-12T04:20:16Z

You should probably also update common.cpp's gpt_params_parse to reflect the test for interactivity because antiprompts no longer trigger interactive mode. So something like this:
    if (params.prompt_cache_all && 
        (params.interactive || params.interactive_first || params.instruct)) {
Good catch - will update that.

This is done.

…ive mode (ggerganov#1032) * Make reverse prompt option act as a stop token in non-interactive scenarios * Making requested review changes * Update gpt_params_parse and fix a merge error * Revert "Update gpt_params_parse and fix a merge error" This reverts commit 2bb2ff1. * Update gpt_params_parse and fix a merge error take 2

…oadcasting for ggml_mul (#1483) * Broadcasting for ggml_mul * CUDA kernel for ggml_mul, norms in VRAM * GPU weights not in RAM, direct loading with cuFile * fixup! GPU weights not in RAM, direct loading with cuFile * fixup! GPU weights not in RAM, direct loading with cuFile * define default model path once, sync path with readme (#1366) * ~7% faster Q5_1 AVX2 code (#1477) * convert.py: Support models which are stored in a single pytorch_model.bin (#1469) * Support models in a single pytorch_model.bin * Remove spurious line with typo * benchmark-matmul: Print the average of the test results (#1490) * Remove unused n_parts parameter (#1509) * Fixes #1511 lambda issue for w64devkit (mingw) (#1513) * Fix for w64devkit and mingw * make kv_f16 the default for api users (#1517) * minor : fix compile warnings * readme : adds WizardLM to the list of supported models (#1485) * main : make reverse prompt option act as a stop token in non-interactive mode (#1032) * Make reverse prompt option act as a stop token in non-interactive scenarios * Making requested review changes * Update gpt_params_parse and fix a merge error * Revert "Update gpt_params_parse and fix a merge error" This reverts commit 2bb2ff1. * Update gpt_params_parse and fix a merge error take 2 * examples : add persistent chat (#1495) * examples : add persistent chat * examples : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * tests : add missing header * ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508) * ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0 * llama : bump LLAMA_FILE_VERSION to 3 * cuda : update Q4 and Q8 dequantize kernels * ggml : fix AVX dot products * readme : update performance table + hot topics * ggml : fix scalar implementation of Q4_1 dot * llama : fix compile warnings in llama_set_state_data() * llama : fix name shadowing and C4146 (#1526) * Fix name shadowing and C4146 * Fix if macros not using defined when required * Update llama-util.h Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update llama-util.h Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Code style Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix for mingw (#1462) * llama : add llama_init_backend() API (close #1527) * feature : add blis and other BLAS implementation support (#1502) * feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Revert "feature : add blis and other BLAS implementation support (#1502)" This reverts commit 07e9ace. * GPU weights not in RAM, direct loading with cuFile * llama : code style fixes + progress print fix * ggml : ggml_mul better broadcast support * cmake : workarounds for cufile when CMake version < 3.25 * gg rebase fixup * Loop in llama.cpp, fixed progress callback * Attempt clang-tidy fix * llama : fix vram size computation * Add forgotten fclose() --------- Co-authored-by: András Salamon <ott2@users.noreply.github.com> Co-authored-by: Ilya Kurdyukov <59548320+ilyakurdyukov@users.noreply.github.com> Co-authored-by: Tom Jobbins <784313+TheBloke@users.noreply.github.com> Co-authored-by: rankaiyx <rankaiyx@rankaiyx.com> Co-authored-by: Stephan Walter <stephan@walter.name> Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com> Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: David Kennedy <dakennedyd@gmail.com> Co-authored-by: Jason McCartney <jmac@theroot.org> Co-authored-by: Evan Jones <evan.q.jones@gmail.com> Co-authored-by: Maxime <672982+maximegmd@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Zenix <zenixls2@gmail.com>

Make reverse prompt option act as a stop token in non-interactive sce…

331343a

…narios

ejones mentioned this pull request May 11, 2023

main : add stop keywords #1387

Closed

DannyDaemonic requested changes May 11, 2023

View reviewed changes

data-angel added 3 commits May 11, 2023 10:26

Merge branch 'master' into add_stop_token

099a07f

Making requested review changes

f7229f2

Merge branch 'master' into add_stop_token

927afdd

DannyDaemonic requested changes May 11, 2023

View reviewed changes

data-angel added 3 commits May 11, 2023 21:00

Update gpt_params_parse and fix a merge error

2bb2ff1

Revert "Update gpt_params_parse and fix a merge error"

121c986

This reverts commit 2bb2ff1.

Update gpt_params_parse and fix a merge error take 2

e052d53

This was referenced May 17, 2023

examples : add persistent chat #1495

Merged

[Feature Request] --prompt-cache-all + user input #1398

Closed

ggerganov approved these changes May 19, 2023

View reviewed changes

ggerganov merged commit 7694b52 into ggerganov:master May 19, 2023

DannyDaemonic mentioned this pull request May 21, 2023

Main.exe no longer functions interactively in some scenarios #1547

Closed

ejones mentioned this pull request May 30, 2023

can no longer get into chat mode #1645

Closed

dwillie mentioned this pull request Jun 12, 2023

Stop keywords #57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make reverse prompt option act as a stop token in non-interactive sce… #1032

Make reverse prompt option act as a stop token in non-interactive sce… #1032

data-angel commented Apr 17, 2023

SlyEcho commented Apr 18, 2023

data-angel commented Apr 19, 2023

DannyDaemonic May 11, 2023 •

edited

Loading

data-angel May 11, 2023 •

edited

Loading

DannyDaemonic May 11, 2023 •

edited

Loading

data-angel May 12, 2023

ejones May 12, 2023

data-angel May 13, 2023 •

edited

Loading

DannyDaemonic commented May 11, 2023

SlyEcho commented May 11, 2023

data-angel commented May 11, 2023

DannyDaemonic left a comment

DannyDaemonic May 11, 2023 •

edited

Loading

data-angel commented May 12, 2023

data-angel commented May 12, 2023

Make reverse prompt option act as a stop token in non-interactive sce… #1032

Make reverse prompt option act as a stop token in non-interactive sce… #1032

Conversation

data-angel commented Apr 17, 2023

SlyEcho commented Apr 18, 2023

data-angel commented Apr 19, 2023

DannyDaemonic May 11, 2023 • edited Loading

Choose a reason for hiding this comment

data-angel May 11, 2023 • edited Loading

Choose a reason for hiding this comment

DannyDaemonic May 11, 2023 • edited Loading

Choose a reason for hiding this comment

data-angel May 12, 2023

Choose a reason for hiding this comment

ejones May 12, 2023

Choose a reason for hiding this comment

data-angel May 13, 2023 • edited Loading

Choose a reason for hiding this comment

DannyDaemonic commented May 11, 2023

SlyEcho commented May 11, 2023

data-angel commented May 11, 2023

DannyDaemonic left a comment

Choose a reason for hiding this comment

DannyDaemonic May 11, 2023 • edited Loading

Choose a reason for hiding this comment

data-angel commented May 12, 2023

data-angel commented May 12, 2023

DannyDaemonic May 11, 2023 •

edited

Loading

data-angel May 11, 2023 •

edited

Loading

DannyDaemonic May 11, 2023 •

edited

Loading

data-angel May 13, 2023 •

edited

Loading

DannyDaemonic May 11, 2023 •

edited

Loading