-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server CUDA Infill Segmentation Fault #6672
Comments
I'd like to take this up as a first issue. Do you assign issues or just wait on PRs? Should I open a PR preemptively & tag it to this issue? |
@josh-ramer I have assigned it to you. |
This bug was fixed by tag b2680 which handled the case @kherud explained above. The problem was that during model conversion there were some special tokens being assigned incorrectly. I planned on adding a test case that failed for the b2667 tag above & then passed >= b2680, but after I made one, lol, I found a later tag where someone is testing the whole vocabulary against each tokenizer which should be a more comprehensive test & cover this scenario. It looks like this issue can be closed (IMHO). |
I've been searching around the repository looking for information about how to test & debug things & I spent a good amount of time just figuring that out. I think I'll add a short document about how I got up & running in the docs folder. If that's something you welcome. |
I put together a little document about my workflow during this first issue that I think could be helpful to some people. #7096 |
Awesome, thank you @josh-ramer! I'll test it later today and close the ticket then. |
Everything works, thanks again! |
With a CUDA build of the server, there is a segmentation fault possible when using the
/infill
endpoint.I tested this with release
b2667
, but the problem seems to be present for at least 1-2 weeks.The segmentation fault only seems to happen with models that don't support infilling (whatever that means), but the situation should probably handled more gracefully.
For example, CodeLlama-7B-GGUF does not produce a seg fault, but Mistral-7B-Instruct-v0.2-GGUF does.
Steps to reproduce:
System:
Building the library:
Starting the server:
Making an infill request
The text was updated successfully, but these errors were encountered: