-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : add Deepseek support #5981
Comments
Hello, @ggerganov, I'd like to try working on it as my good first issue. |
Ok 👍 keep us posted |
Waiting for this to come. deepseek model is famouse for coding well and Korean language ability. |
In the supported models there is Deepseek. I am able to use DeepSeekCoder-33B with comparatively similar results to their API. Can someone please clarify what are the failure cases with the current tokenization? |
@fostiropoulos There may be a problem with deepseek coder 1.3b, which might be somehow irreprodicible in 6.7b, 7b, and 33b. |
any progress with Deepseek support? |
Hey, so it could support DeepSeek-coder model after #6920 be merged? Thanks ! |
I've updated the description of the issue with the latest state. Support is pretty much complete, though there are some edge cases during tokenization that are handled incorrectly. For example the letter I think we can declare DeepSeek models supported and handle the problem above into a separate task related to correct processing of added tokens, since it is not specific for DeepSeek models |
Support is almost complete. There is a dangling issue with the pre-tokenizer: #7036
A useful discussion related to that is here: #7144
Outdated below
Creating this issue for more visibility
The main problem is around tokenization support, since the models use some variation of the BPE pre-processing regex. There are also some issues with the conversion scripts.
Anyway, looking for contributions to help with this
Previous unfinished work:
Possible implementation plan: #5464 (comment)
The text was updated successfully, but these errors were encountered: