Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Deepseek Coder V2 model #2451

Closed
kba-tmn3 opened this issue Jun 20, 2024 · 16 comments
Closed

Request: Deepseek Coder V2 model #2451

kba-tmn3 opened this issue Jun 20, 2024 · 16 comments
Assignees
Labels

Comments

@kba-tmn3
Copy link

I would like to use bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF, but I have a problem with launching it properly with TabbyML.

I forked a registry and start downloading it correctly, but after downloading tabby is not responding anyway, I tried to wait, but it doesn't work and logs are empty at all.

Additional context
Registry https://github.com/kba-tmn3/registry-tabby
Command line

docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model kba-tmn3/DeepseekCoder-V2 --device cuda

Source https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
GGUF https://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF
Accurancy of the model
Comparison with other models


Please reply with a 👍 if you want this feature.

@kba-tmn3 kba-tmn3 added the enhancement New feature or request label Jun 20, 2024
@wsxiaoys wsxiaoys self-assigned this Jun 20, 2024
@wsxiaoys
Copy link
Member

Thanks for the feature request.

I forked a registry and start downloading it correctly, but after downloading tabby is not responding anyway, I tried to wait, but it doesn't work and logs are empty at all.

Could you turn on RUST_LOG=debug RUST_BACKTRACE=1 in your docker environment and share its output?

@kba-tmn3
Copy link
Author

Could you turn on RUST_LOG=debug RUST_BACKTRACE=1 in your docker environment and share its output?

Sorry, but it can not log anything, I added environment variables and it's empty at all, hangs as running status and nothing working.

Command line RUST_LOG=debug RUST_BACKTRACE=1 docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model kba-tmn3/DeepseekCoder-V2 --device cuda

It utilize 98,42% CPU (%) in docker desktop and task manager's hardware monitor looks like this

image

@wsxiaoys
Copy link
Member

to pass environment flag to docker, you need to do something like -e RUST_BACKTRACE=1 - could you try again?

It's very likely the stuck is caused by model loading / computation, though

@kba-tmn3
Copy link
Author

I guess it because of version of llama.cpp tabby using

model using this for quantization https://github.com/ggerganov/llama.cpp/releases/tag/b3166

Logs
logs.txt

@kba-tmn3
Copy link
Author

Sorry for external links, but I found some people stuck with the problem on russian forum named Habr:
https://habr.com/ru/news/822503/comments/#:~:text=llama.cpp%20unknown%20model%20architecture%3A%20%27deepseek2%27

I attach the link with text of error they stuck (did i receive the same message? idk)

@wsxiaoys
Copy link
Member

Right - this means the support of DeepseekCoder v2 in llama.cpp is only added very recently, will try include it in the upcoming 0.13 release

@Mizzlr
Copy link

Mizzlr commented Jun 22, 2024

Just for added context, ollama just started support for deepseekcoder v2. See https://github.com/ollama/ollama/releases/tag/v0.1.45

I was wondering the same from tabby.

Thanks again, looking forward to release 0.13

@wsxiaoys
Copy link
Member

wsxiaoys commented Jun 22, 2024

For context - you can actually connect tabby to ollama by using config.toml based model configuration: https://tabby.tabbyml.com/docs/administration/model/#ollama

@f6ra07nk14
Copy link

f6ra07nk14 commented Jun 25, 2024

For context - you can actually connect tabby to ollama by using config.toml based model configuration: https://tabby.tabbyml.com/docs/administration/model/#ollama

Can you give an example shown how to create tabby server with model configuration by docker.
Thank you in advance

@LLukas22
Copy link

LLukas22 commented Jun 27, 2024

@f6ra07nk14

Here is a simple example im currently using to run tabby with an ollama server as it's llm backend to use deepseek-coder-v1 for code completions and deepseek-coder-v2 as a chat model.

config.toml

[model.completion.http]
kind = "ollama/completion"
model_name = "deepseek-coder"
api_endpoint = "http://ollama:11434" # Insert your URL here
prompt_template = "<|fim▁begin|>{prefix}<|fim▁hole|>{suffix}<|fim▁end|>"

[model.chat.http]
kind = "ollama/chat"
model_name = "deepseek-coder-v2"
api_endpoint = "http://ollama:11434" # Insert your URL here

docker-compose.yml

version: '3.5'
services:
  tabby:
    restart: always
    image: ghcr.io/tabbyml/tabby:0.13.0-rc.3
    command: serve 
    volumes:
      - "./tabby:/data"
      - "./config.toml:/data/config.toml"
    ports:
      - 8080:8080

Basically to use the config.toml you have to mount it into the /data/ directory of the container. Contrary to the documentation you also have to provide a model_name in the [model.completion.http] section for the completion model to work.

@wsxiaoys
Copy link
Member

Thanks for contributing such an example @LLukas22

Right - since ollama is a backend able to serve multiple models concurrently. If interested, consider make an edit at https://github.com/TabbyML/tabby/edit/main/website/docs/administration/model.md to contribute a PR, thank you!

@JohnSmithToYou
Copy link

JohnSmithToYou commented Jun 27, 2024

Hi, @LLukas22, I noticed you're specifying "prompt_template" for Ollama. As far as I know, Ollama expects pure prompt text. It maintains it's own templates in it's modelfiles. Is Tabby ignoring "prompt_template" for Ollama? Otherwise, if Tabby is formatting it's prompts using "prompt_template" and passing that to Ollama, the results won't be correct.

Edit: Oh, the state of prompt templates are still a total mess! Ollama doesn't support FIM in prompt templates yet. See unit-mesh/auto-dev-vscode#61 and ollama/ollama#5207. It looks like CodeGPT is trying to make some Ollama changes carlrobertoh/CodeGPT#510 but they realized llama.cpp can't get it right either carlrobertoh/CodeGPT#510 (comment). What a mess!

I guess defining "prompt_template" is the only reliable way to implement FIM with Ollama and llama.cpp?
@wsxiaoys Does this mean I need to define a blank prompt template in my Ollama .modelfile or is Tabby blanking out the prompt template in the request?

@LLukas22
Copy link

@JohnSmithToYou

As far as i can tell, the prompt from the ollama file only contains the base structure for the prompt e.g.

{{ .System }}
### Instruction:
{{ .Prompt }}
### Response:

Then to perform the fill in the middle (FIM) task, tabby has to format the instruction as a fill in the middle task by applying the prompt_template provided e.g.

<|fim▁begin|>{prefix}<|fim▁hole|>{suffix}<|fim▁end|>

This basically results in the following combined prompt:

{{ .System }}
### Instruction:
<|fim▁begin|>{prefix}<|fim▁hole|>{suffix}<|fim▁end|>
### Response:

Where the prefix and suffix are inserted by tabby. But i could be wrong here since tabby is using the completions endpoint instead of the chat endpoint of ollama to perform code completions, which maybe doesn't apply any template at all from the ollama side 🤔

@JohnSmithToYou
Copy link

JohnSmithToYou commented Jun 27, 2024

@LLukas22 Thanks for the response. I dug in deeper and figured out a few things:

  1. FIM is not supported in the instruct version of Deepseek-coder v2
  2. The correct prompt_template supporting FIM for deepseek coder v2 is:
prompt_template = """<|fim▁begin|>{prefix}
<|fim▁hole|>
{suffix}<|fim▁end|>"""

The new lines matter.
This also requires a custom Ollama modelfile to contain:

# Ollama does not support FIM prompt templates. Instead we rely on the invoker to implement it.
TEMPLATE {{ .Prompt }}

PARAMETER stop "<|fim▁begin|>"
PARAMETER stop "<|fim▁hole|>"
PARAMETER stop "<|fim▁end|>"

This allows Tabby's prompt to pass through without anything being added to it. System context is not supported (I read it on the deepseek's github).

  1. I used deepseek-coder-instruct v2 in [model.chat.http], but I didn't use a prompt_template. Instead I defined a custom Ollama modelfile:
#Note: https://github.com/deepseek-ai/DeepSeek-Coder-V2/issues/12#issuecomment-2181637976
TEMPLATE """{{ if .System }}{{ .System }}

{{ end }}{{ if .Prompt }}User: {{ .Prompt }}

{{ end }}Assistant:{{ .Response }}<|end▁of▁sentence|>"""

PARAMETER stop "User:"
PARAMETER stop "Assistant:"
PARAMETER stop "<|end▁of▁sentence|>"

You must leave off the begin▁of▁sentence.

@SpeedCrash100
Copy link
Contributor

SpeedCrash100 commented Jun 30, 2024

@LLukas22
I had implemented changing prompt template specified by model here. So there are no need to override this. I mean only in modelfile, you must set prompt_template for FIM.

The debug log from ollama tell that all is ok with that. I used a different model for now:

time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:179 msg="generate handler" prompt="<fim_prefix>def fib(n):\n    <fim_suffix>\n        return fib(n - 1) + fib(n - 2)<fim_middle>"
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:180 msg="generate handler" template="{{ .Prompt }}"
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:181 msg="generate handler" system=""
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:212 msg="generate handler" prompt="<fim_prefix>def fib(n):\n    <fim_suffix>\n        return fib(n - 1) + fib(n - 2)<fim_middle>"

However, stop words is common problem with ollama. starcoder2 have same issue, for example, and creating modelfile is required.

@wsxiaoys
Copy link
Member

Supported since https://github.com/TabbyML/tabby/releases/tag/v0.13.1 (though we haven't add it to official registry)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants