Error running speculative inference #441

pwilkin · 2025-02-17T09:20:50Z

I've tried to use the Speculative Inference function, but using it with qwen2.5-14b-instruct-1m with qwen2.5-coder-0.5b-instruct as the draft yields the following error:

2025-02-17 10:14:59 [DEBUG] 
llama_model_load: error loading model: invalid vector subscript
llama_model_load_from_file_impl: failed to load model
2025-02-17 10:14:59 [DEBUG] 
common_init_from_params: failed to load model 'D:\models\lmstudio-community\Qwen2.5-Coder-0.5B-Instruct-GGUF\Qwen2.5-Coder-0.5B-Instruct-Q4_K_M.gguf'
2025-02-17 10:14:59 [DEBUG] 
[10388:0217/101459.695:ERROR:crashpad_client_win.cc(868)] not connected

which causes the following in the app:

[2025-02-17 10:15:00.030] [error] [LMSInternal][Client=LM Studio][Endpoint=sendMessage] Error in RPC handler: Error: Rehydrated error
aded or crashed.
    at _0x8702b8.<computed> (C:\Program Files\LM Studio\resources\app\.webpack\main\index.js:453:109001)
    at _0x2584f6.subscriber (C:\Program Files\LM Studio\resources\app\.webpack\main\index.js:126:1877)
    at _0x2584f6.notifier (C:\Program Files\LM Studio\resources\app\.webpack\main\index.js:287:139189)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
- Caused By: Error: Channel Error
    at <computed> (C:\Program Files\LM Studio\resources\app\.webpack\main\index.js:148:61183)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
- Caused By: Error: Model has unloaded or crashed.
    at _0x3a48f8._0x38ddde (C:\Program Files\LM Studio\resources\app\.webpack\main\index.js:40:283917)
    at _0x3a48f8.emit (node:events:531:35)
    at _0x3a48f8.onChildExit (C:\Program Files\LM Studio\resources\app\.webpack\main\index.js:40:245731)
    at _0xc979e6.<anonymous> (C:\Program Files\LM Studio\resources\app\.webpack\main\index.js:40:244990)
    at _0xc979e6.emit (node:events:519:28)
    at ForkUtilityProcess.<anonymous> (C:\Program Files\LM Studio\resources\app\.webpack\main\index.js:201:7864)
    at ForkUtilityProcess.emit (node:events:519:28)
    at ForkUtilityProcess.a.emit (node:electron/js2c/browser_init:2:71438)

Running 0.3.10beta4 on Windows.

The text was updated successfully, but these errors were encountered:

yagil · 2025-02-17T14:39:33Z

Hi, thanks for the bug report.

Does qwen2.5-coder-0.5b-instruct load successfully on its own when you pick it from the main model loader? (ctrl + L)

ka-admin · 2025-02-20T12:50:17Z

llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4080) - 0 MiB free
2025-02-20 15:46:06 [DEBUG]
llama_model_loader: loaded meta data with 26 key-value pairs and 435 tensors from F:\LMModels\Qwen\Qwen2.5-Coder-3B-Instruct-GGUF\qwen2.5-coder-3b-instruct-q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 3B Instruct AWQ
llama_model_loader: - kv 3: general.finetune str = Instruct-AWQ
llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder
llama_model_loader: - kv 5: general.size_label str = 3B
llama_model_loader: - kv 6: qwen2.block_count u32 = 36
llama_model_loader: - kv 7: qwen2.context_length u32 = 32768
llama_model_loader: - kv 8: qwen2.embedding_length u32 = 2048
llama_model_loader: - kv 9: qwen2.feed_forward_length u32 = 11008
llama_model_loader: - kv 10: qwen2.attention.head_count u32 = 16
llama_model_loader: - kv 11: qwen2.attention.head_count_kv u32 = 2
llama_model_loader: - kv 12: qwen2.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 13: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 14: general.file_type u32 = 7
llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2
2025-02-20 15:46:06 [DEBUG]
llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", """, "#", "$", "%", "&", "'", ...
2025-02-20 15:46:06 [DEBUG]
llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
2025-02-20 15:46:06 [DEBUG]
llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645
llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
llama_model_loader: - kv 25: general.quantization_version u32 = 2
llama_model_loader: - type f32: 181 tensors
llama_model_loader: - type q8_0: 254 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 3.36 GiB (8.50 BPW)
2025-02-20 15:46:07 [DEBUG]
init_tokenizer: initializing tokenizer for type 2
2025-02-20 15:46:07 [DEBUG]
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
2025-02-20 15:46:07 [DEBUG]
load: control token: 151649 '<|box_end|>' is not marked as EOG
2025-02-20 15:46:07 [DEBUG]
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
2025-02-20 15:46:07 [DEBUG]
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
2025-02-20 15:46:07 [DEBUG]
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
2025-02-20 15:46:07 [DEBUG]
load: control token: 151648 '<|box_start|>' is not marked as EOG
2025-02-20 15:46:07 [DEBUG]
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
2025-02-20 15:46:07 [DEBUG]
load: special tokens cache size = 22
2025-02-20 15:46:07 [DEBUG]
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen2
print_info: vocab_only = 0
print_info: n_ctx_train = 32768
print_info: n_embd = 2048
print_info: n_layer = 36
print_info: n_head = 16
print_info: n_head_kv = 2
print_info: n_rot = 128
print_info: n_swa = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 8
print_info: n_embd_k_gqa = 256
print_info: n_embd_v_gqa = 256
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: n_ff = 11008
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 2
print_info: rope scaling = linear
2025-02-20 15:46:07 [DEBUG]
print_info: freq_base_train = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 32768
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 3B
print_info: model params = 3.40 B
print_info: general.name = Qwen2.5 Coder 3B Instruct AWQ
print_info: vocab type = BPE
print_info: n_vocab = 151936
print_info: n_merges = 151387
print_info: BOS token = 151643 '<|endoftext|>'
print_info: EOS token = 151645 '<|im_end|>'
print_info: EOT token = 151645 '<|im_end|>'
print_info: PAD token = 151643 '<|endoftext|>'
print_info: LF token = 148848 'ÄĬ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|endoftext|>'
print_info: EOG token = 151645 '<|im_end|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
2025-02-20 15:46:07 [DEBUG]
llama_model_load: error loading model: invalid vector subscript
llama_model_load_from_file_impl: failed to load model
2025-02-20 15:46:07 [DEBUG]
common_init_from_params: failed to load model 'F:\LMModels\Qwen\Qwen2.5-Coder-3B-Instruct-GGUF\qwen2.5-coder-3b-instruct-q8_0.gguf'
2025-02-20 15:46:07 [DEBUG]
[13252:0220/154607.088:ERROR:crashpad_client_win.cc(868)] not connected

All models loaded successfully. But any attemp to send a promt brings to crash and unload of main model

pwilkin · 2025-02-20T20:49:47Z

On an interesting note, I tried with an LM Studio install on Linux (on the same box, with dual boot) and it works OK.

pwilkin · 2025-02-20T21:03:53Z

And yes, I even did the test with both models already loaded.

pwilkin · 2025-02-21T23:39:49Z

Can confirm that on Windows on 0.3.10b6 it still fails with the same error message (llama_model_load: error loading model: invalid vector subscript).

Ryvix · 2025-03-03T04:14:08Z

I've had this happen as well. I've found if I set the context length to 16384 instead of the max 131072 that it allows setting automatically with the button when loading with custom settings then it I select the smaller model on the Inference tab it will load without errors.

Also, I have found that if I save the preset there and load it then it doesn't load the smaller model unless I manually change it and select it in the dropdown menu there.

yagil transferred this issue from lmstudio-ai/lms Feb 17, 2025

yagil added the more-info-needed Need more information to diagnose the problem label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error running speculative inference #441

Error running speculative inference #441

pwilkin commented Feb 17, 2025 •

edited

Loading

yagil commented Feb 17, 2025

ka-admin commented Feb 20, 2025

pwilkin commented Feb 20, 2025

pwilkin commented Feb 20, 2025

pwilkin commented Feb 21, 2025

Ryvix commented Mar 3, 2025

Error running speculative inference #441

Error running speculative inference #441

Comments

pwilkin commented Feb 17, 2025 • edited Loading

yagil commented Feb 17, 2025

ka-admin commented Feb 20, 2025

pwilkin commented Feb 20, 2025

pwilkin commented Feb 20, 2025

pwilkin commented Feb 21, 2025

Ryvix commented Mar 3, 2025

pwilkin commented Feb 17, 2025 •

edited

Loading