-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: NTK rope support #479
Comments
I have looked a little bit deeper, found actually this implementation is simple, no need to edit any cu files. I have drafted a version to support ntk, see if it can works. |
I have tested with NTK support in vllm, it works, the extrapolate can up to 8k without any finetuning. |
Here was the main modification:
|
Great, can you please tell me how to use it? |
Do you know if this can be extended to a 16k context size? If so could you please provide the code necessary for this? @lucasjinreal |
does seq_len in this forward func equals to key.size(0)+ key_cache.size(0)? |
@ShadowTeamCN Am not sure, it should same as torch side len |
In which file do you make this change exactly ? |
I passed in two samples in a batch with lengths of 6 and 8, respectively. |
Closing as RoPE is now supported. If this is incorrect, feel free to re-open this issue. |
Hi, there are some very sucessfull experiements shows that NTK based RoPE can obtain a good extrapolate ability without even finetune.
I have test as well, it works well, an 1024 trained model can have a very impressive long context ability with NTK RoPE.
Would consider support it as it doesn't requires many changes (maybe)?
However, the pos op implement baked in cu op kernel.
Currently I can using torch code to judge if context length bigger than 2048 then applying NTK, but isn't would be better if vllm can support it out of box?
The text was updated successfully, but these errors were encountered: