-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The new tokenizer no longer encode space properly #2721
Labels
high priority
Very important issue
Comments
Tagging @goerch for this. |
goerch
added a commit
to goerch/llama.cpp
that referenced
this issue
Aug 22, 2023
Merged
ggerganov
pushed a commit
that referenced
this issue
Sep 13, 2023
* Fix für #2721 * Reenable tokenizer test for LLaMa * Add `console.cpp` dependency * Fix dependency to `common` * Fixing wrong fix. * Make console usage platform specific Work on compiler warnings. * Adapting makefile * Remove trailing whitespace * Adapting the other parts of the makefile * Fix typo.
slaren
pushed a commit
that referenced
this issue
Sep 16, 2023
…izer-1 (#3170) * Fix für #2721 * Reenable tokenizer test for LLaMa * Add `console.cpp` dependency * Fix dependency to `common` * Fixing wrong fix. * Make console usage platform specific Work on compiler warnings. * Adapting makefile * Remove trailing whitespace * Adapting the other parts of the makefile * Fix typo. * Fixing the last deviations from sentencepiece indicated by test-tokenizer-1 * Simplify logic * Add missing change... * Fix ugly compiler warning * llama_tokenize should accept strings containing NUL now * Adding huichen's test case
pkrmf
pushed a commit
to morlockstudios-com/llama.cpp
that referenced
this issue
Sep 26, 2023
…ov#3096) * Fix für ggerganov#2721 * Reenable tokenizer test for LLaMa * Add `console.cpp` dependency * Fix dependency to `common` * Fixing wrong fix. * Make console usage platform specific Work on compiler warnings. * Adapting makefile * Remove trailing whitespace * Adapting the other parts of the makefile * Fix typo.
pkrmf
pushed a commit
to morlockstudios-com/llama.cpp
that referenced
this issue
Sep 26, 2023
…izer-1 (ggerganov#3170) * Fix für ggerganov#2721 * Reenable tokenizer test for LLaMa * Add `console.cpp` dependency * Fix dependency to `common` * Fixing wrong fix. * Make console usage platform specific Work on compiler warnings. * Adapting makefile * Remove trailing whitespace * Adapting the other parts of the makefile * Fix typo. * Fixing the last deviations from sentencepiece indicated by test-tokenizer-1 * Simplify logic * Add missing change... * Fix ugly compiler warning * llama_tokenize should accept strings containing NUL now * Adding huichen's test case
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
llama.tokenizer
Previous version, the one in PR #2306 before the GGUF merge
Current Behavior
The text was updated successfully, but these errors were encountered: