llamamodel: always print special tokens #2701
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This assertion fails when gemma-2-9b-it tries to print
<end_of_text>
, which is not its EOS token. Historically, special tokens in output have been rendered as an empty string in llama.cpp and thus GPT4All has done the same.gpt4all/gpt4all-chat/chatllm.cpp
Line 694 in 54ed309
This is a familiar problem with other models such as Hermes 2 Pro Mistral 7B and even Llama 3 (prior to the upstream fix), see also #2167.
This works around the problem by printing the tokens instead of rendering them as blanks, which recently became possible with the
special
argument to llama_token_to_piece. We should also fix the bugs that cause empty tokens to crash/hang GPT4All, as there's nothing strictly preventing tokenToString from returning an empty string, but this should get us by for now.Hermes 2 Pro Mistral 7B generates garbage after its response with this change since it was never trained on generations past the EOS token it tries to output, but at least you can stop the generation instead of having to restart GPT4All due to the hang.
The changelog is not merged yet, but the entry for this PR should be under "Fixed" and read: