Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server CUDA Infill Segmentation Fault #6672

Closed
kherud opened this issue Apr 14, 2024 · 7 comments
Closed

Server CUDA Infill Segmentation Fault #6672

kherud opened this issue Apr 14, 2024 · 7 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@kherud
Copy link
Contributor

kherud commented Apr 14, 2024

With a CUDA build of the server, there is a segmentation fault possible when using the /infill endpoint.
I tested this with release b2667, but the problem seems to be present for at least 1-2 weeks.

The segmentation fault only seems to happen with models that don't support infilling (whatever that means), but the situation should probably handled more gracefully.

For example, CodeLlama-7B-GGUF does not produce a seg fault, but Mistral-7B-Instruct-v0.2-GGUF does.

Steps to reproduce:

System:

  • OS: Arch Linux
  • GPU: RTX 4090

Building the library:

mkdir build
cd build
cmake -DLLAMA_CUDA=ON -DLLAMA_CURL=ON ..
cmake --build . --config Release -j

Starting the server:

mkdir -p models/7B
./server -ngl 43 -mu https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q2_K.gguf

Making an infill request

curl --request POST \
--url http://localhost:8080/infill \
--header "Content-Type: application/json" \
--data '{                                                                                                                                                                              
    "input_prefix": "def remove_non_ascii(s: str) -> str:\n    \"\"\" ",
    "input_suffix": "\n    return result\n",
    "prompt": ""
}'
@josh-ramer
Copy link
Contributor

josh-ramer commented Apr 30, 2024

I'd like to take this up as a first issue. Do you assign issues or just wait on PRs? Should I open a PR preemptively & tag it to this issue?

@slaren
Copy link
Collaborator

slaren commented Apr 30, 2024

@josh-ramer I have assigned it to you.

@josh-ramer
Copy link
Contributor

This bug was fixed by tag b2680 which handled the case @kherud explained above. The problem was that during model conversion there were some special tokens being assigned incorrectly. I planned on adding a test case that failed for the b2667 tag above & then passed >= b2680, but after I made one, lol, I found a later tag where someone is testing the whole vocabulary against each tokenizer which should be a more comprehensive test & cover this scenario.

It looks like this issue can be closed (IMHO).

@josh-ramer
Copy link
Contributor

josh-ramer commented May 6, 2024

I've been searching around the repository looking for information about how to test & debug things & I spent a good amount of time just figuring that out. I think I'll add a short document about how I got up & running in the docs folder. If that's something you welcome.

@josh-ramer
Copy link
Contributor

I put together a little document about my workflow during this first issue that I think could be helpful to some people. #7096

@kherud
Copy link
Contributor Author

kherud commented May 6, 2024

Awesome, thank you @josh-ramer! I'll test it later today and close the ticket then.

@kherud
Copy link
Contributor Author

kherud commented May 6, 2024

Everything works, thanks again!

@kherud kherud closed this as completed May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants