Request: Deepseek Coder V2 model #2451

kba-tmn3 · 2024-06-20T05:43:28Z

I would like to use bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF, but I have a problem with launching it properly with TabbyML.

I forked a registry and start downloading it correctly, but after downloading tabby is not responding anyway, I tried to wait, but it doesn't work and logs are empty at all.

Additional context
Registry https://github.com/kba-tmn3/registry-tabby
Command line

docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model kba-tmn3/DeepseekCoder-V2 --device cuda

Source https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
GGUF https://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF
Accurancy of the model

Please reply with a 👍 if you want this feature.

The text was updated successfully, but these errors were encountered:

wsxiaoys · 2024-06-21T02:53:37Z

Thanks for the feature request.

I forked a registry and start downloading it correctly, but after downloading tabby is not responding anyway, I tried to wait, but it doesn't work and logs are empty at all.

Could you turn on RUST_LOG=debug RUST_BACKTRACE=1 in your docker environment and share its output?

kba-tmn3 · 2024-06-21T04:39:29Z

Could you turn on RUST_LOG=debug RUST_BACKTRACE=1 in your docker environment and share its output?

Sorry, but it can not log anything, I added environment variables and it's empty at all, hangs as running status and nothing working.

Command line RUST_LOG=debug RUST_BACKTRACE=1 docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model kba-tmn3/DeepseekCoder-V2 --device cuda

It utilize 98,42% CPU (%) in docker desktop and task manager's hardware monitor looks like this

wsxiaoys · 2024-06-21T04:41:09Z

to pass environment flag to docker, you need to do something like -e RUST_BACKTRACE=1 - could you try again?

It's very likely the stuck is caused by model loading / computation, though

kba-tmn3 · 2024-06-21T04:46:05Z

I guess it because of version of llama.cpp tabby using

model using this for quantization https://github.com/ggerganov/llama.cpp/releases/tag/b3166

Logs
logs.txt

kba-tmn3 · 2024-06-21T04:56:26Z

Sorry for external links, but I found some people stuck with the problem on russian forum named Habr:
https://habr.com/ru/news/822503/comments/#:~:text=llama.cpp%20unknown%20model%20architecture%3A%20%27deepseek2%27

I attach the link with text of error they stuck (did i receive the same message? idk)

wsxiaoys · 2024-06-22T01:54:07Z

Right - this means the support of DeepseekCoder v2 in llama.cpp is only added very recently, will try include it in the upcoming 0.13 release

Mizzlr · 2024-06-22T05:06:43Z

Just for added context, ollama just started support for deepseekcoder v2. See https://github.com/ollama/ollama/releases/tag/v0.1.45

I was wondering the same from tabby.

Thanks again, looking forward to release 0.13

wsxiaoys · 2024-06-22T08:26:02Z

For context - you can actually connect tabby to ollama by using config.toml based model configuration: https://tabby.tabbyml.com/docs/administration/model/#ollama

f6ra07nk14 · 2024-06-25T09:43:29Z

For context - you can actually connect tabby to ollama by using config.toml based model configuration: https://tabby.tabbyml.com/docs/administration/model/#ollama

Can you give an example shown how to create tabby server with model configuration by docker.
Thank you in advance

LLukas22 · 2024-06-27T13:20:05Z

@f6ra07nk14

Here is a simple example im currently using to run tabby with an ollama server as it's llm backend to use deepseek-coder-v1 for code completions and deepseek-coder-v2 as a chat model.

config.toml

[model.completion.http]
kind = "ollama/completion"
model_name = "deepseek-coder"
api_endpoint = "http://ollama:11434" # Insert your URL here
prompt_template = "<｜fim▁begin｜>{prefix}<｜fim▁hole｜>{suffix}<｜fim▁end｜>"

[model.chat.http]
kind = "ollama/chat"
model_name = "deepseek-coder-v2"
api_endpoint = "http://ollama:11434" # Insert your URL here

docker-compose.yml

version: '3.5'
services:
  tabby:
    restart: always
    image: ghcr.io/tabbyml/tabby:0.13.0-rc.3
    command: serve 
    volumes:
      - "./tabby:/data"
      - "./config.toml:/data/config.toml"
    ports:
      - 8080:8080

Basically to use the config.toml you have to mount it into the /data/ directory of the container. Contrary to the documentation you also have to provide a model_name in the [model.completion.http] section for the completion model to work.

wsxiaoys · 2024-06-27T14:54:03Z

Thanks for contributing such an example @LLukas22

Right - since ollama is a backend able to serve multiple models concurrently. If interested, consider make an edit at https://github.com/TabbyML/tabby/edit/main/website/docs/administration/model.md to contribute a PR, thank you!

JohnSmithToYou · 2024-06-27T16:02:34Z

Hi, @LLukas22, I noticed you're specifying "prompt_template" for Ollama. As far as I know, Ollama expects pure prompt text. It maintains it's own templates in it's modelfiles. Is Tabby ignoring "prompt_template" for Ollama? Otherwise, if Tabby is formatting it's prompts using "prompt_template" and passing that to Ollama, the results won't be correct.

Edit: Oh, the state of prompt templates are still a total mess! Ollama doesn't support FIM in prompt templates yet. See unit-mesh/auto-dev-vscode#61 and ollama/ollama#5207. It looks like CodeGPT is trying to make some Ollama changes carlrobertoh/CodeGPT#510 but they realized llama.cpp can't get it right either carlrobertoh/CodeGPT#510 (comment). What a mess!

I guess defining "prompt_template" is the only reliable way to implement FIM with Ollama and llama.cpp?
@wsxiaoys Does this mean I need to define a blank prompt template in my Ollama .modelfile or is Tabby blanking out the prompt template in the request?

LLukas22 · 2024-06-27T19:37:42Z

@JohnSmithToYou

As far as i can tell, the prompt from the ollama file only contains the base structure for the prompt e.g.

{{ .System }}
### Instruction:
{{ .Prompt }}
### Response:

Then to perform the fill in the middle (FIM) task, tabby has to format the instruction as a fill in the middle task by applying the prompt_template provided e.g.

<｜fim▁begin｜>{prefix}<｜fim▁hole｜>{suffix}<｜fim▁end｜>

This basically results in the following combined prompt:

{{ .System }}
### Instruction:
<｜fim▁begin｜>{prefix}<｜fim▁hole｜>{suffix}<｜fim▁end｜>
### Response:

Where the prefix and suffix are inserted by tabby. But i could be wrong here since tabby is using the completions endpoint instead of the chat endpoint of ollama to perform code completions, which maybe doesn't apply any template at all from the ollama side 🤔

JohnSmithToYou · 2024-06-27T23:38:38Z

@LLukas22 Thanks for the response. I dug in deeper and figured out a few things:

FIM is not supported in the instruct version of Deepseek-coder v2
The correct prompt_template supporting FIM for deepseek coder v2 is:

prompt_template = """<｜fim▁begin｜>{prefix}
<｜fim▁hole｜>
{suffix}<｜fim▁end｜>"""

The new lines matter.
This also requires a custom Ollama modelfile to contain:

# Ollama does not support FIM prompt templates. Instead we rely on the invoker to implement it.
TEMPLATE {{ .Prompt }}

PARAMETER stop "<｜fim▁begin｜>"
PARAMETER stop "<｜fim▁hole｜>"
PARAMETER stop "<｜fim▁end｜>"

This allows Tabby's prompt to pass through without anything being added to it. System context is not supported (I read it on the deepseek's github).

I used deepseek-coder-instruct v2 in [model.chat.http], but I didn't use a prompt_template. Instead I defined a custom Ollama modelfile:

#Note: https://github.com/deepseek-ai/DeepSeek-Coder-V2/issues/12#issuecomment-2181637976
TEMPLATE """{{ if .System }}{{ .System }}

{{ end }}{{ if .Prompt }}User: {{ .Prompt }}

{{ end }}Assistant:{{ .Response }}<｜end▁of▁sentence｜>"""

PARAMETER stop "User:"
PARAMETER stop "Assistant:"
PARAMETER stop "<｜end▁of▁sentence｜>"

You must leave off the begin▁of▁sentence.

SpeedCrash100 · 2024-06-30T18:13:16Z

@LLukas22
I had implemented changing prompt template specified by model here. So there are no need to override this. I mean only in modelfile, you must set prompt_template for FIM.

The debug log from ollama tell that all is ok with that. I used a different model for now:

time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:179 msg="generate handler" prompt="<fim_prefix>def fib(n):\n    <fim_suffix>\n        return fib(n - 1) + fib(n - 2)<fim_middle>"
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:180 msg="generate handler" template="{{ .Prompt }}"
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:181 msg="generate handler" system=""
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:212 msg="generate handler" prompt="<fim_prefix>def fib(n):\n    <fim_suffix>\n        return fib(n - 1) + fib(n - 2)<fim_middle>"

However, stop words is common problem with ollama. starcoder2 have same issue, for example, and creating modelfile is required.

wsxiaoys · 2024-07-10T01:27:19Z

Supported since https://github.com/TabbyML/tabby/releases/tag/v0.13.1 (though we haven't add it to official registry)

kba-tmn3 added the enhancement New feature or request label Jun 20, 2024

wsxiaoys self-assigned this Jun 20, 2024

wsxiaoys added the fixed-in-next-release label Jun 28, 2024

wsxiaoys mentioned this issue Jun 28, 2024

llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:94: llama-server <chat> exited with status code 1 #2544

Closed

wsxiaoys removed the fixed-in-next-release label Jun 28, 2024

wsxiaoys added the fixed-in-next-release label Jul 10, 2024

wsxiaoys closed this as completed Jul 10, 2024

meditans mentioned this issue Aug 12, 2024

Discovering FIM prompt templates for ollama use #2845

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Deepseek Coder V2 model #2451

Request: Deepseek Coder V2 model #2451

kba-tmn3 commented Jun 20, 2024

wsxiaoys commented Jun 21, 2024

kba-tmn3 commented Jun 21, 2024

wsxiaoys commented Jun 21, 2024

kba-tmn3 commented Jun 21, 2024

kba-tmn3 commented Jun 21, 2024

wsxiaoys commented Jun 22, 2024

Mizzlr commented Jun 22, 2024

wsxiaoys commented Jun 22, 2024 •

edited

Loading

f6ra07nk14 commented Jun 25, 2024 •

edited

Loading

LLukas22 commented Jun 27, 2024 •

edited

Loading

wsxiaoys commented Jun 27, 2024

JohnSmithToYou commented Jun 27, 2024 •

edited

Loading

LLukas22 commented Jun 27, 2024

JohnSmithToYou commented Jun 27, 2024 •

edited

Loading

SpeedCrash100 commented Jun 30, 2024 •

edited

Loading

wsxiaoys commented Jul 10, 2024

Request: Deepseek Coder V2 model #2451

Request: Deepseek Coder V2 model #2451

Comments

kba-tmn3 commented Jun 20, 2024

wsxiaoys commented Jun 21, 2024

kba-tmn3 commented Jun 21, 2024

wsxiaoys commented Jun 21, 2024

kba-tmn3 commented Jun 21, 2024

kba-tmn3 commented Jun 21, 2024

wsxiaoys commented Jun 22, 2024

Mizzlr commented Jun 22, 2024

wsxiaoys commented Jun 22, 2024 • edited Loading

f6ra07nk14 commented Jun 25, 2024 • edited Loading

LLukas22 commented Jun 27, 2024 • edited Loading

wsxiaoys commented Jun 27, 2024

JohnSmithToYou commented Jun 27, 2024 • edited Loading

LLukas22 commented Jun 27, 2024

JohnSmithToYou commented Jun 27, 2024 • edited Loading

SpeedCrash100 commented Jun 30, 2024 • edited Loading

wsxiaoys commented Jul 10, 2024

wsxiaoys commented Jun 22, 2024 •

edited

Loading

f6ra07nk14 commented Jun 25, 2024 •

edited

Loading

LLukas22 commented Jun 27, 2024 •

edited

Loading

JohnSmithToYou commented Jun 27, 2024 •

edited

Loading

JohnSmithToYou commented Jun 27, 2024 •

edited

Loading

SpeedCrash100 commented Jun 30, 2024 •

edited

Loading