-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama_kv_cache_tokens_rm functioning on index and not position #3840
Comments
Looking again, it looks like the intent of |
This does look pretty weird. There are only a few places that don't call it with The
But since if it doesn't operate on the token position, then talking about "future" makes no sense and this wouldn't function properly as far as I can see.
and
which would be broken for the same reason. (I think those functions are deprecated currently.) Seems like all the calls to |
I fixed this locally by just basically copying and pasting the
Modifying the code as above seems to have resolved my last issue integrating with the new API. All of the tokens are now present and accounted for. I expect that since |
Yeah, pretty much the same as what I did in #3843 here. (Just extends the existing function to allow a "seq id doesn't matter" value.)
In those cases, yeah. |
Yes, now I understand. Technically, |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
This bug is related to the issue in #3825 described here
#3825 (comment)
llama_eval
callsllama_kv_cache_tokens_rm
usingn_past
in an attempt to truncate all cache data beyond the point being currently evaluated.llama_kv_cache_tokens_rm
executes this cache clear by iterating over all cells of the kv cache, however instead of honoring the position property of the cell, it clears the cell out by using the index of that cell within the cacheThis functionality is not compatible with the new "ring cache" design, because given the following cache values
Calling Eval with an
n_past
of 8 in this scenario, will completely clear all data from the kv cache, when it should be expected to retain all data.I'm not sure if the proper method to resolve this would be to modify
llama_eval
to point to the newllama_kv_cache_seq_rm
method with aseq_id
of zero, or if another adjustment should me made to resolve this issue.The text was updated successfully, but these errors were encountered: