Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] (v1.3.9) - ROPE calculator in launcher, please. #375

Closed
SabinStargem opened this issue Aug 7, 2023 · 3 comments
Closed

[FEATURE] (v1.3.9) - ROPE calculator in launcher, please. #375

SabinStargem opened this issue Aug 7, 2023 · 3 comments

Comments

@SabinStargem
Copy link

I have been finding that the default ROPE in KoboldCPP is very unreliable. It takes some tweaking to find the right setting. Problem is, I have to go over to the LlamaCPP github and dig around to find workable settings from people who are trying out ROPE settings.

It would be nice if there is a rope calculator in the launcher, so that I could homebrew scaling myself. An example of scaling that I am using for Airoboros 33b 16k:

0.5 , 70000.

Going from what I saw when trawling the githubs, the big number should be the only one that is changed - apparently that reduces perplexity, being a NTK-aware scaling. Problem is, I don't know how to calculate the scaling.

Jxy's github post has some calculation numbers. Being terrible at math, I don't understand them.

Implement customizable RoPE

@LostRuins
Copy link
Owner

LostRuins commented Aug 9, 2023

Generally this is more of an art than a science. You usually only want to use either NTK-aware (change big number) or linear (change small number) scaling, not both together.

For linear, the target is to find the largest number that still results in coherent output. For 2x context, this is 0.5, for 4x context, this is 0.25 and so on

For ntk-aware, the target is to find the smallest number that still results in coherent output. This seems to be non-linear, but for 2x it's somewhere around 10000->32000, for 4x maybe about 80k. You may have to trial and error.

For more info, refer to ggerganov#2402 but ultimately you need to trial and error.

rope-scale will not be added as it's a nothingburger, you can self calculate that as 1/rope-freq-scale

@SabinStargem
Copy link
Author

SabinStargem commented Aug 9, 2023

High for linear, low for NTK? That is a useful detail. Thank you, that points me in the right directions. Having rules and structure for the madness is very good. :)

@LostRuins
Copy link
Owner

Not "high" but more like "try to keep it high as it can but still works".

The perplexity-to-rope-scale follows a CURVE. Too high or too low will give bad results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants