Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows CUDA exit code 18446744072635812000 #471

Open
neilmehta24 opened this issue Feb 28, 2025 · 5 comments
Open

Windows CUDA exit code 18446744072635812000 #471

neilmehta24 opened this issue Feb 28, 2025 · 5 comments

Comments

@neilmehta24
Copy link
Member

Discussion moved from here: #414 (comment)

@neilmehta24
Copy link
Member Author

From the other issue: #414 (comment)

Hello, this sounds like a different bug. Are you still seeing this issue on the latest build of LM Studio, 0.3.10? If you are, could you please download this debug build and send us the logs when using v1.16.1. We would need the app logs, and the verbose logs from the server page.

@Starfiresg1
Copy link

Starfiresg1 commented Feb 28, 2025

I've run into the same issue since the update of the runtime to v1.17 with CUDA, Vulkan variant works fine - the older v.15.3 works fine with the same settings also.

App log
main.log

Server log
2025-02-28.2.log

Windows Event Log also logs a error in nvlddmkm

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="nvlddmkm" /> <EventID Qualifiers="0">153</EventID> <Version>0</Version> <Level>2</Level> <Task>0</Task> <Opcode>0</Opcode> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2025-02-28T22:56:09.6635069Z" /> <EventRecordID>45372</EventRecordID> <Correlation /> <Execution ProcessID="4" ThreadID="432" /> <Channel>System</Channel> <Computer>Starfire</Computer> <Security /> </System> <EventData> <Data>\Device\Video3</Data> <Data>Error occurred on GPUID: a00</Data> <Binary>00000000020030000000000099000000000000000000000000000000000000000000000000000000</Binary> </EventData> </Event>

@ref202404
Copy link

ref202404 commented Mar 3, 2025

Hi, I just tested the debug version of 0.3.10 build 6 (backend CUDA llama.cpp 1.18.0) and here's the error info: ```
🥲 Failed to load the model

Error loading model.
(Exit code: 18446744072635812000). Unknown error. Try a different model and/or config.

Windows event logger doesnt have a lot of meaning info but anyway here it is (source = nvlddmkm):
EventData

\Device\Video3
Error occurred on GPUID: 100
00000000020030000000000099000000000000000000000000000000000000000000000000000000

Again, if I switch to the older CUDA llama.cpp backend version 1.15.3 it works fine. Any version after that will result in the error code 18446744072635812000.

As of your asking for verbose logs from the server page, I'm not sure if you are asking for the log from developer page with LM STUDIO SERVER enabled, but the logs are as following:

2025-03-04 00:30:53 [DEBUG]
llama_kv_cache_init: CUDA0 KV buffer size = 2176.00 MiB
llama_init_from_model: KV self size = 2176.00 MiB, K (q8_0): 1088.00 MiB, V (q8_0): 1088.00 MiB
2025-03-04 00:30:53 [DEBUG]
llama_init_from_model: CUDA_Host output buffer size = 0.58 MiB
2025-03-04 00:30:53 [DEBUG]
llama_init_from_model: CUDA0 compute buffer size = 307.00 MiB
llama_init_from_model: CUDA_Host compute buffer size = 42.01 MiB
llama_init_from_model: graph nodes = 1991
llama_init_from_model: graph splits = 2
2025-03-04 00:30:53 [DEBUG]
common_init_from_params: setting dry_penalty_last_n to ctx_size = 16384
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
2025-03-04 00:30:54 [DEBUG]
CUDA error: unspecified launch failure
current device: 0, in function ggml_cuda_op_mul_mat at C:\a\llmster\llmster\electron\vendor\llm-engine\llama.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:1516
cudaGetLastError()
llama.cpp abort:73: CUDA error

------------------ end of log --------------------

@neilmehta24
Copy link
Member Author

@Starfiresg1 @ref202404 we believe this issue is being caused due to an underlying change in llama.cpp that broke flash attention for Turing-architecture GPUs when using Volta-architecture CUDA code. Please turn off flash attention as a workaround while we decide the best path forward. Please let us know if you are seeing this error with flash attention turned off

@ref202404
Copy link

@neilmehta24 You are correct. Once turning off flash att it works fine, while I have to disable K/V cache quantization as well since it depends on flash attention. Understood it's due to new llama.cpp's flash att imcompatibility of old RTX20... Is there a way to solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants