-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running speculative inference #441
Comments
Hi, thanks for the bug report. Does |
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4080) - 0 MiB free All models loaded successfully. But any attemp to send a promt brings to crash and unload of main model |
On an interesting note, I tried with an LM Studio install on Linux (on the same box, with dual boot) and it works OK. |
And yes, I even did the test with both models already loaded. |
Can confirm that on Windows on 0.3.10b6 it still fails with the same error message (llama_model_load: error loading model: invalid vector subscript). |
I've had this happen as well. I've found if I set the context length to 16384 instead of the max 131072 that it allows setting automatically with the button when loading with custom settings then it I select the smaller model on the Inference tab it will load without errors. Also, I have found that if I save the preset there and load it then it doesn't load the smaller model unless I manually change it and select it in the dropdown menu there. |
I've tried to use the Speculative Inference function, but using it with qwen2.5-14b-instruct-1m with qwen2.5-coder-0.5b-instruct as the draft yields the following error:
which causes the following in the app:
Running 0.3.10beta4 on Windows.
The text was updated successfully, but these errors were encountered: