Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert fixes #3967

Closed
wants to merge 2 commits into from
Closed

Conversation

lorddoskias
Copy link

Fixes for two issues in the convert.py script:

  1. First one is detecting LLAMA2 by considering the 1e-06 norm_eps value to also suggest presence of LLAMA2
  2. Fix for handling vocab_size: -1 in params.json in LLAMA2.

Nikolay Borisov added 2 commits November 6, 2023 13:49
In recent downloads of LLAMA2 dataset the norm_eps is set to 1e-06, this
leads to convert.py erroneously considering the model to be LLAMA1 and
setting the context to 2k tokens.

Fix it by extending the existing hack to also check for the 1e-06 value.
When vocab_size is detected to be -1 simply remove its value from the
parsed params.json and fallback to using the tok_embeddings.weight.

Fixes  ggerganov#3900
@@ -250,9 +250,14 @@ def loadOriginalParamsJson(model: LazyModel, config_path: Path) -> Params:
if config.get("rope_theta") == 1000000:
# CodeLlama
n_ctx = 16384
elif config["norm_eps"] == 1e-05:
elif config["norm_eps"] in (1e-05, 1e-06):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then how do we detect LLaMA-1? See #2384

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, LLAMA2 files with norm_eps equal to 1e-06.

Copy link
Collaborator

@cebtenzzre cebtenzzre Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell me where to get an original LLaMA-2 model with norm_eps=1e-6?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I got it via the LLAMA2's repository download script and utilizing the link that was sent to my email. Since it's tied to my identity and accepting the EULA I'd say it's not prudent to share it? I guess if you create a new registration just now and try to download llama 7b you'd see it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just downloaded LLaMA-2 7B and got this:

{"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-05, "vocab_size": -1}

So you'll have to be more specific.

Copy link
Author

@lorddoskias lorddoskias Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I have actually downloaded llama2-7b-chat :

cat llama-2-7b-chat/params.json
{"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": -1}

cat llama-2-7b-chat/checklist.chk
0c4837f3ef97f648452f91faed308a07 consolidated.00.pth
1c39bc3c6b51079fd807cc105b86c9df params.json

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm this.

@ggerganov Our "hack to determine LLaMA v1 vs v2" is apparently insufficient. Maybe we need a CLI arg to disambiguate?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, this is quite messy. I guess we do need to add a CLI arg that would be required in cases where the vocab_size is -1. In such cases we abort and ask the user to specify explicitly LLaMA v1 or LLaMA v2.

# LLaMA v2
n_ctx = 4096
# For some reason FB writes -1 to vocab size for their LLAMA2 models
# simply remove this bogus value and let the return statement belo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling

@cebtenzzre
Copy link
Collaborator

Partially obsoleted by #4258

@ggerganov ggerganov closed this Jan 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants