Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

30B model support #13

Closed
nsarrazin opened this issue Mar 22, 2023 · 12 comments
Closed

30B model support #13

nsarrazin opened this issue Mar 22, 2023 · 12 comments

Comments

@nsarrazin
Copy link
Member

No description provided.

@nsarrazin
Copy link
Member Author

I've made the 30B model available to download but I don't have the hardware to test it. So if someone feels like it, feel free to download it ! Instructions are in the README.

@dacamp
Copy link

dacamp commented Mar 22, 2023

I can test it, but it looks like the 7B tokenizer is downloaded. I'm running main at ref 58cf7d0

Screenshot 2023-03-22 at 11 36 08 AM

@dacamp
Copy link

dacamp commented Mar 22, 2023

I hacked it locally with this, but it's pretty jank. I think the model should determine the tokenizer.

index 73298b5..d0eafcb 100644
--- a/api/utils/download.py
+++ b/api/utils/download.py
@@ -10,6 +10,7 @@ models_info = {
     "13B": ["Pi3141/alpaca-13B-ggml", "ggml-model-q4_0.bin"],
     "30B": ["Pi3141/alpaca-30B-ggml", "ggml-model-q4_0.bin"],
     "tokenizer": ["decapoda-research/llama-7b-hf", "tokenizer.model"],
+    "30B-tokenizer": ["decapoda-research/llama-30b-hf", "tokenizer.model"],
 }


@@ -21,7 +22,7 @@ def parse_args():
         "model",
         help="Model name",
         nargs="+",
-        choices=["7B", "13B", "30B", "tokenizer"],
+        choices=["7B", "13B", "30B", "tokenizer", "30B-tokenizer"],
     )

     return parser.parse_args()

@maxime-dlabai
Copy link

maxime-dlabai commented Mar 22, 2023

Hi guys, I just tested the 30B model, it works fine (with the conversion manually from https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82) and don't forget the modification in the llama cpp to load a single file for the 30b model.

@nsarrazin
Copy link
Member Author

Hi guys, I just tested the 30B model, it works fine (with the conversion manually from https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82) and don't forget the modification in the llama cpp to load a single file for the 30b model.

Are you sure this is needed ? I was pretty sure the --n_parts argument to llama.cpp could let you handle it without modifying the source.

In serge it's handled there:
https://github.com/nsarrazin/serge/blob/58cf7d0f451035843d8b7e8cac048513fe47e01c/api/utils/generate.py#L47-L48

@maxime-dlabai
Copy link

Sorry, you're right, I was on the old method in my head, I had forgotten the features in command line.

@nsarrazin
Copy link
Member Author

I hacked it locally with this, but it's pretty jank. I think the model should determine the tokenizer.

index 73298b5..d0eafcb 100644
--- a/api/utils/download.py
+++ b/api/utils/download.py
@@ -10,6 +10,7 @@ models_info = {
     "13B": ["Pi3141/alpaca-13B-ggml", "ggml-model-q4_0.bin"],
     "30B": ["Pi3141/alpaca-30B-ggml", "ggml-model-q4_0.bin"],
     "tokenizer": ["decapoda-research/llama-7b-hf", "tokenizer.model"],
+    "30B-tokenizer": ["decapoda-research/llama-30b-hf", "tokenizer.model"],
 }


@@ -21,7 +22,7 @@ def parse_args():
         "model",
         help="Model name",
         nargs="+",
-        choices=["7B", "13B", "30B", "tokenizer"],
+        choices=["7B", "13B", "30B", "tokenizer", "30B-tokenizer"],
     )

     return parser.parse_args()

Thanks for doing this do you know if it's actually necessary to grab the matching tokenizer.model ?

The instructions here ggml-org/llama.cpp#382 (comment) just mention grabbing a tokenizer, and so I assumed you could use the tokenizer from the 7B repo for all the weights. I'm gonna test for myself it that works still.

And are you able to get any outputs from the 30B model with Serge so far ? @dacamp

@nsarrazin
Copy link
Member Author

I think you don't need to grab a different tokenizer, I believe they're exactly the same.

You can check it here:
https://huggingface.co/decapoda-research/llama-30b-hf/blob/main/tokenizer.model
https://huggingface.co/decapoda-research/llama-13b-hf/blob/main/tokenizer.model
https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

They all have the same SHA256 hash

@maxime-dlabai
Copy link

Yes you right no need, i used same

@dacamp
Copy link

dacamp commented Mar 22, 2023

Thanks @maximeseth

@nsarrazin
Copy link
Member Author

Seems like this could be closed then ?

@nsarrazin
Copy link
Member Author

I'm closing this, I think it works. If it doesn't I'll reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants