-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Loading DeepSeek-R1-Distill-Qwen-1.5B-uncensored.Q6_K.gguf in oobabooga Web UI: Unknown Pre-Tokenizer Type 'deepseek-r1-qwen' #6710
Comments
mradermacher Almost certainly you need a new enough version of oobabooga to run this model. Check for an update. ((I don't know if this is the problem)) |
I cannot run any deepseek R1 gguf distills, even that I have the latest version. |
There is an existing issue for this: The problem is llama-cpp-python need to be updated, that's what we are waiting for. Looks like it's being worked on currently and possibly new wheels soon. |
Yes, I am waiting for the update and looking forward to it. |
[offtopic] Is there a chance that |
Just updated on the last commit (thx Ooba for the quick response). Testing now but should work in llama cpp |
I updated everything already and i still get an error while using the Q6_K_L of the 14B I tried lowering the context size, the gpu layers, nothing... this is Bartowski's gguf EDIT/Update: After removing the folder called "installer_files" and letting it reinstall everything, it fixed itself and i can now load the model. However for some reason now the UI as soon as it starts generating, about a few seconds into the generation it suddenly freezes and stops. The CPU usage goes to 0 and the model just stops generating. The UI still says it is typing/generating but the Ui becomes unresponsive. Cannot unload model, change it or etc. It happens to any model now, if i restore the previous installer_files i had it works fine. I tried setting the threads of CPU to 12 to see if lowering it to half of my CPU makes it better. I get this: Turning off text streaming doesn't do anything to fix that either. I think Llama-cpp 3.7 is bugged right now. I also tried a fresh install of the whole ui and it still gets stuck when generating. I tried also reinstalling with both CUDA 12.1 and 11.8. on both the issue persists. |
DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf |
Hi everyone! I’ve been experimenting with running low-quantity models on my CPU using the oobabooga text-generation-webui, and I recently came across the DeepSeek-R1-Distill-Qwen-1.5B-uncensored model. I saw that there’s a GGUF version available (this one), so I decided to give it a try. Unfortunately, I’m running into an issue when trying to load the model, and I’m not sure what’s going wrong.
Here’s what I’ve done so far:
Downloaded the DeepSeek-R1-Distill-Qwen-1.5B-uncensored.Q6_K.gguf model from Hugging Face.
Placed the model in the models folder of my text-generation-webui directory.
Started the server using python server.py --cpu.
When I try to load the model, I get this error:
llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-r1-qwen'
llama_model_load_from_file: failed to load model
ValueError: Failed to load model from file: models/DeepSeek-R1-Distill-Qwen-1.5B-uncensored.Q6_K.gguf
I’m not super experienced with this stuff, so I’m not sure what the issue is. Is the deepseek-r1-qwen pre-tokenizer supported by llama.cpp? Or is there something else I need to do to get this model working?
Here’s some info about my setup:
OS: Ubuntu 22.04.5 LTS
CPU: Intel i3-3225 (4 cores, 3.3 GHz)
RAM: 8 GB
Environment: Running everything on CPU with llama.cpp in the oobabooga Web UI.
python server.py --cpu
21:54:13-523178 INFO Starting Text generation web UI
Running on local URL: http://127.0.0.1:7860/
21:54:37-758021 INFO Loading
"DeepSeek-R1-Distill-Qwen-1.5B-uncensored.Q6_K.gguf"
21:54:38-248012 INFO llama.cpp weights detected:
"models/DeepSeek-R1-Distill-Qwen-1.5B-uncensored.Q6_K.g
guf"
llama_model_loader: loaded meta data with 35 key-value pairs and 339 tensors from models/DeepSeek-R1-Distill-Qwen-1.5B-uncensored.Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen2.5 1.5B Instruct
llama_model_loader: - kv 3: general.organization str = Qwen
llama_model_loader: - kv 4: general.finetune str = Instruct
llama_model_loader: - kv 5: general.basename str = Qwen2.5
llama_model_loader: - kv 6: general.size_label str = 1.5B
llama_model_loader: - kv 7: qwen2.block_count u32 = 28
llama_model_loader: - kv 8: qwen2.context_length u32 = 131072
llama_model_loader: - kv 9: qwen2.embedding_length u32 = 1536
llama_model_loader: - kv 10: qwen2.feed_forward_length u32 = 8960
llama_model_loader: - kv 11: qwen2.attention.head_count u32 = 12
llama_model_loader: - kv 12: qwen2.attention.head_count_kv u32 = 2
llama_model_loader: - kv 13: qwen2.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 14: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 16: tokenizer.ggml.pre str = deepseek-r1-qwen
llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 20: tokenizer.ggml.bos_token_id u32 = 151646
llama_model_loader: - kv 21: tokenizer.ggml.eos_token_id u32 = 151643
llama_model_loader: - kv 22: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 24: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 25: tokenizer.chat_template str = {% if not add_generation_prompt is de...
llama_model_loader: - kv 26: general.quantization_version u32 = 2
llama_model_loader: - kv 27: general.file_type u32 = 18
llama_model_loader: - kv 28: general.url str = https://huggingface.co/mradermacher/D...
llama_model_loader: - kv 29: mradermacher.quantize_version str = 2
llama_model_loader: - kv 30: mradermacher.quantized_by str = mradermacher
llama_model_loader: - kv 31: mradermacher.quantized_at str = 2025-01-23T16:25:41+01:00
llama_model_loader: - kv 32: mradermacher.quantized_on str = back
llama_model_loader: - kv 33: general.source.url str = https://huggingface.co/thirdeyeai/Dee...
llama_model_loader: - kv 34: mradermacher.convert_type str = hf
llama_model_loader: - type f32: 141 tensors
llama_model_loader: - type q6_K: 198 tensors
llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-r1-qwen'
llama_model_load_from_file: failed to load model
21:54:38-606632 ERROR Failed to load the model.
Traceback (most recent call last):
File "/home/lts/text-generation-webui/modules/ui_model_menu.py", line 214, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lts/text-generation-webui/modules/models.py", line 90, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lts/text-generation-webui/modules/models.py", line 280, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lts/text-generation-webui/modules/llamacpp_model.py", line 111, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "/home/lts/miniconda3/envs/textgen/lib/python3.11/site-packages/llama_cpp/llama.py", line 369, in init
internals.LlamaModel(
File "/home/lts/miniconda3/envs/textgen/lib/python3.11/site-packages/llama_cpp/_internals.py", line 56, in init
raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models/DeepSeek-R1-Distill-Qwen-1.5B-uncensored.Q6_K.gguf
Exception ignored in: <function LlamaCppModel.del at 0x7fb234b5cc20>
Traceback (most recent call last):
File "/home/lts/text-generation-webui/modules/llamacpp_model.py", line 62, in del
del self.model
^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
The text was updated successfully, but these errors were encountered: