Server CUDA Infill Segmentation Fault #6672

kherud · 2024-04-14T10:31:23Z

With a CUDA build of the server, there is a segmentation fault possible when using the /infill endpoint.
I tested this with release b2667, but the problem seems to be present for at least 1-2 weeks.

The segmentation fault only seems to happen with models that don't support infilling (whatever that means), but the situation should probably handled more gracefully.

For example, CodeLlama-7B-GGUF does not produce a seg fault, but Mistral-7B-Instruct-v0.2-GGUF does.

Steps to reproduce:

System:

OS: Arch Linux
GPU: RTX 4090

Building the library:

mkdir build
cd build
cmake -DLLAMA_CUDA=ON -DLLAMA_CURL=ON ..
cmake --build . --config Release -j

Starting the server:

mkdir -p models/7B
./server -ngl 43 -mu https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q2_K.gguf

Making an infill request

curl --request POST \
--url http://localhost:8080/infill \
--header "Content-Type: application/json" \
--data '{                                                                                                                                                                              
    "input_prefix": "def remove_non_ascii(s: str) -> str:\n    \"\"\" ",
    "input_suffix": "\n    return result\n",
    "prompt": ""
}'

The text was updated successfully, but these errors were encountered:

josh-ramer · 2024-04-30T14:20:26Z

I'd like to take this up as a first issue. Do you assign issues or just wait on PRs? Should I open a PR preemptively & tag it to this issue?

slaren · 2024-04-30T14:24:28Z

@josh-ramer I have assigned it to you.

josh-ramer · 2024-05-06T01:16:14Z

This bug was fixed by tag b2680 which handled the case @kherud explained above. The problem was that during model conversion there were some special tokens being assigned incorrectly. I planned on adding a test case that failed for the b2667 tag above & then passed >= b2680, but after I made one, lol, I found a later tag where someone is testing the whole vocabulary against each tokenizer which should be a more comprehensive test & cover this scenario.

It looks like this issue can be closed (IMHO).

josh-ramer · 2024-05-06T02:18:23Z

I've been searching around the repository looking for information about how to test & debug things & I spent a good amount of time just figuring that out. I think I'll add a short document about how I got up & running in the docs folder. If that's something you welcome.

josh-ramer · 2024-05-06T05:49:13Z

I put together a little document about my workflow during this first issue that I think could be helpful to some people. #7096

kherud · 2024-05-06T07:17:03Z

Awesome, thank you @josh-ramer! I'll test it later today and close the ticket then.

kherud · 2024-05-06T20:01:01Z

Everything works, thanks again!

kherud added the bug-unconfirmed label Apr 14, 2024

ggerganov added bug Something isn't working good first issue Good for newcomers and removed bug-unconfirmed labels Apr 15, 2024

kherud mentioned this issue Apr 15, 2024

Java tests failed when CUDA enabled on version 3.0.0 kherud/java-llama.cpp#54

Closed

slaren assigned josh-ramer Apr 30, 2024

josh-ramer mentioned this issue May 6, 2024

Scripting & documenting debugging one test without anything else in the loop. #7096

Merged

kherud closed this as completed May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server CUDA Infill Segmentation Fault #6672

Server CUDA Infill Segmentation Fault #6672

kherud commented Apr 14, 2024 •

edited

Loading

josh-ramer commented Apr 30, 2024 •

edited

Loading

slaren commented Apr 30, 2024

josh-ramer commented May 6, 2024

josh-ramer commented May 6, 2024 •

edited

Loading

josh-ramer commented May 6, 2024

kherud commented May 6, 2024

kherud commented May 6, 2024

Server CUDA Infill Segmentation Fault #6672

Server CUDA Infill Segmentation Fault #6672

Comments

kherud commented Apr 14, 2024 • edited Loading

Steps to reproduce:

Building the library:

Starting the server:

Making an infill request

josh-ramer commented Apr 30, 2024 • edited Loading

slaren commented Apr 30, 2024

josh-ramer commented May 6, 2024

josh-ramer commented May 6, 2024 • edited Loading

josh-ramer commented May 6, 2024

kherud commented May 6, 2024

kherud commented May 6, 2024

kherud commented Apr 14, 2024 •

edited

Loading

josh-ramer commented Apr 30, 2024 •

edited

Loading

josh-ramer commented May 6, 2024 •

edited

Loading