-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Oversight] -> Ideal Rope for CodeLLama 2 based models differs vastly from LLama 2. #3090
Comments
Looks like there's a |
Here is my log for booting up c34b. Question: Is the value "1.0e-05" in my log correct? There is a LlamaCPP thread where Slaren said this:
.
|
I see. There isn't any "automatic" rope scaling stuff in base llama.cpp as far as I know. However as of #2793 it should respect the parameters if they're in Just for example: https://huggingface.co/Phind/Phind-CodeLlama-34B-Python-v1/blob/main/config.json Has: "rope_scaling": null,
"rope_theta": 1000000, Assuming the model was converted with a version that included the pull I mentioned, it should include the correct rope scaling in the |
Checking the pytorch for Airoboros c34b, looks to be the right value for Theta. On that front, it looks to be a KoboldCPP issue. However, that still leaves a point of concern for me. Slaren said that "--rope-freq-base 1e6", is what CodeLLama uses. I am seeing in Phind and Airoboros's pytorch files using "rms_norm_eps": 1e-05. Assuming that I am not misunderstanding, the llamacpp tools might be assigning the wrong rms_norm_eps. In KoboldCPP, 1e-05 pops up for both Airo and WizardLM 34b. The Bloke said it should be 1e6 and that it should bake straight into the GGUF. That was about 15 days ago, but the Airo and WizardLM that I downloaded are from about 4 days old according to the github. Knowing me, I am likely to be wrong. Still, I wanted to bring that up, just in case. |
I'm not sure I understand. Aside from using scientific notation to express the number, there's no relationship between Also, the log messages you pasted from loading the model seem to have the correct EPS values:
It's the same in "rms_norm_eps": 1e-05, |
In that case, I stand corrected. Thank you. :) |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
I did not discover this. A user of KoboldCPP posted that auto-rope for Code Llama was incorrect. Just in case this applies to LlamaCPP, I wanted to draw attention to the issue. Here is a quote of their findings.
The text was updated successfully, but these errors were encountered: