Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : update prompt on slot restore #9800

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

ggerganov
Copy link
Member

ref #9781

The slot.prompt field is not being updated after restoring a slot state from a file. Since we don't store the original jsonic representation of the prompt, we simply set it to a descriptive message in order to not confuse the reported state of the slot.

Another option might be to detokenize the restored tokens. Not sure if worth it.

@ngxson
Copy link
Collaborator

ngxson commented Oct 9, 2024

Another option might be to detokenize the restored tokens. Not sure if worth it.

I don't have any strong preference either. IMO the prompt should not present in the returned slot data at all, as it is an intermediate variable. The real info to be exposed should be the array of tokens in cache.

In anyways, the current slot save/store API is quite low-level, I think we should communicate this in the docs so that user don't expect it to be a prod-ready thing. I'm looking forward to reorganize this feature to match Claude prompt caching API, which should be more intuitive for end-user.

@ggerganov ggerganov merged commit 32da4a2 into gg/infill-0 Oct 11, 2024
53 checks passed
@ggerganov ggerganov deleted the gg/server-fix-slot-restore branch October 11, 2024 06:16
ggerganov added a commit that referenced this pull request Oct 12, 2024
* llama : improve infill support

ggml-ci

* llama : add more FIM token strings

ggml-ci

* server : update prompt on slot restore (#9800)

* gguf : deprecate old FIM token KVs
drollings pushed a commit to drollings/llama.cpp that referenced this pull request Oct 18, 2024
…9798)

* llama : improve infill support

ggml-ci

* llama : add more FIM token strings

ggml-ci

* server : update prompt on slot restore (ggml-org#9800)

* gguf : deprecate old FIM token KVs
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
…9798)

* llama : improve infill support

ggml-ci

* llama : add more FIM token strings

ggml-ci

* server : update prompt on slot restore (ggml-org#9800)

* gguf : deprecate old FIM token KVs
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
…9798)

* llama : improve infill support

ggml-ci

* llama : add more FIM token strings

ggml-ci

* server : update prompt on slot restore (ggml-org#9800)

* gguf : deprecate old FIM token KVs
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
…9798)

* llama : improve infill support

ggml-ci

* llama : add more FIM token strings

ggml-ci

* server : update prompt on slot restore (ggml-org#9800)

* gguf : deprecate old FIM token KVs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants