Windows CUDA exit code 18446744072635812000 #471

neilmehta24 · 2025-02-28T19:35:01Z

Discussion moved from here: #414 (comment)

neilmehta24 · 2025-02-28T19:42:52Z

From the other issue: #414 (comment)

Hello, this sounds like a different bug. Are you still seeing this issue on the latest build of LM Studio, 0.3.10? If you are, could you please download this debug build and send us the logs when using v1.16.1. We would need the app logs, and the verbose logs from the server page.

Starfiresg1 · 2025-02-28T23:03:39Z

I've run into the same issue since the update of the runtime to v1.17 with CUDA, Vulkan variant works fine - the older v.15.3 works fine with the same settings also.

App log
main.log

Server log
2025-02-28.2.log

Windows Event Log also logs a error in nvlddmkm

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="nvlddmkm" /> <EventID Qualifiers="0">153</EventID> <Version>0</Version> <Level>2</Level> <Task>0</Task> <Opcode>0</Opcode> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2025-02-28T22:56:09.6635069Z" /> <EventRecordID>45372</EventRecordID> <Correlation /> <Execution ProcessID="4" ThreadID="432" /> <Channel>System</Channel> <Computer>Starfire</Computer> <Security /> </System> <EventData> <Data>\Device\Video3</Data> <Data>Error occurred on GPUID: a00</Data> <Binary>00000000020030000000000099000000000000000000000000000000000000000000000000000000</Binary> </EventData> </Event>

ref202404 · 2025-03-03T06:11:33Z

Hi, I just tested the debug version of 0.3.10 build 6 (backend CUDA llama.cpp 1.18.0) and here's the error info: ```
🥲 Failed to load the model

Error loading model.
(Exit code: 18446744072635812000). Unknown error. Try a different model and/or config.

Windows event logger doesnt have a lot of meaning info but anyway here it is (source = nvlddmkm):
EventData

\Device\Video3
Error occurred on GPUID: 100
00000000020030000000000099000000000000000000000000000000000000000000000000000000

Again, if I switch to the older CUDA llama.cpp backend version 1.15.3 it works fine. Any version after that will result in the error code 18446744072635812000.

As of your asking for verbose logs from the server page, I'm not sure if you are asking for the log from developer page with LM STUDIO SERVER enabled, but the logs are as following:

2025-03-04 00:30:53 [DEBUG]
llama_kv_cache_init: CUDA0 KV buffer size = 2176.00 MiB
llama_init_from_model: KV self size = 2176.00 MiB, K (q8_0): 1088.00 MiB, V (q8_0): 1088.00 MiB
2025-03-04 00:30:53 [DEBUG]
llama_init_from_model: CUDA_Host output buffer size = 0.58 MiB
2025-03-04 00:30:53 [DEBUG]
llama_init_from_model: CUDA0 compute buffer size = 307.00 MiB
llama_init_from_model: CUDA_Host compute buffer size = 42.01 MiB
llama_init_from_model: graph nodes = 1991
llama_init_from_model: graph splits = 2
2025-03-04 00:30:53 [DEBUG]
common_init_from_params: setting dry_penalty_last_n to ctx_size = 16384
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
2025-03-04 00:30:54 [DEBUG]
CUDA error: unspecified launch failure
current device: 0, in function ggml_cuda_op_mul_mat at C:\a\llmster\llmster\electron\vendor\llm-engine\llama.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:1516
cudaGetLastError()
llama.cpp abort:73: CUDA error

------------------ end of log --------------------

neilmehta24 · 2025-03-03T22:47:07Z

@Starfiresg1 @ref202404 we believe this issue is being caused due to an underlying change in llama.cpp that broke flash attention for Turing-architecture GPUs when using Volta-architecture CUDA code. Please turn off flash attention as a workaround while we decide the best path forward. Please let us know if you are seeing this error with flash attention turned off

ref202404 · 2025-03-04T06:19:56Z

@neilmehta24 You are correct. Once turning off flash att it works fine, while I have to disable K/V cache quantization as well since it depends on flash attention. Understood it's due to new llama.cpp's flash att imcompatibility of old RTX20... Is there a way to solve this problem?

neilmehta24 mentioned this issue Feb 28, 2025

CUDA backend error loading model on cards with CUDA compute capability 5.x #414

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows CUDA exit code 18446744072635812000 #471

Windows CUDA exit code 18446744072635812000 #471

neilmehta24 commented Feb 28, 2025

neilmehta24 commented Feb 28, 2025

Starfiresg1 commented Feb 28, 2025 •

edited

Loading

ref202404 commented Mar 3, 2025 •

edited

Loading

neilmehta24 commented Mar 3, 2025

ref202404 commented Mar 4, 2025

Windows CUDA exit code 18446744072635812000 #471

Windows CUDA exit code 18446744072635812000 #471

Comments

neilmehta24 commented Feb 28, 2025

neilmehta24 commented Feb 28, 2025

Starfiresg1 commented Feb 28, 2025 • edited Loading

ref202404 commented Mar 3, 2025 • edited Loading

neilmehta24 commented Mar 3, 2025

ref202404 commented Mar 4, 2025

Starfiresg1 commented Feb 28, 2025 •

edited

Loading

ref202404 commented Mar 3, 2025 •

edited

Loading