-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. #35
Comments
@ilovedbsql Than you. Please try |
Thank you for your prompt response. I tried the method you suggested, but I'm still experiencing the same issue. Here are the results of my attempt. (textgen) jjjj@jjjj-gm:~/Project/00.TextGen/pyllama$ pip install pyllama -U Requirement already satisfied: pyllama in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (0.0.6) |
Can you open a python shell and run?
|
This is the result of running the command in the terminal (textgen) jjjj@jjjj-gm:~/Project/00.TextGen/pyllama$ python |
Then it should work. Can you reproduce the error in any google colab? I can go to take a look then. |
I tried to reproduce the error on Colab, but encountered a different issue and could not reproduce it. Here's the link to the Colab notebook: https://colab.research.google.com/drive/1odpM3NxO9j8J2kubJOJxOyPXgqym82EN?usp=sharing |
I changed your colab's runtime type to GPU and it is working now ! |
This is the result of running the code in Colab, and it shows that the same error is occurring. Here is the link below. ..... |
I have found the cause and solution for the problem from the link provided. The issue was caused by the recent change in the transformers source from LLaMATokenizer to LlamaTokenizer. Please refer to this link for more information: huggingface/transformers#22222 The tokenizer_config.json on the website (https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer_config.json) where the model was downloaded had LLaMATokenizer, while the recent change in transformers source has LlamaTokenizer, which seems to have caused the problem. Therefore, I uninstalled the transformers source that I installed yesterday using "pip uninstall transformers" and reinstalled the fork that was restored to LLaMATokenizer. I installed it using "pip install git+https://github.com/mbehm/transformers". I am not sure if this is an official source, but for now, the problem is resolved. Thank you for your support. |
This worked for me, I needed this version of the Transformers as the model I have still references LLaMA |
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 33/33 [00:12<00:00, 2.68it/s]
Found cached dataset json (/home/jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Found cached dataset json (/home/jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
File "/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jjjj/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 474, in
run()
File "/home/jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 437, in run
dataloader, testloader = get_loaders(
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/gptq/datautils.py", line 112, in get_loaders
return get_c4(nsamples, seed, seqlen, model, tokenizer)
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/gptq/datautils.py", line 67, in get_c4
tokenizer = tokenizer or AutoTokenizer.from_pretrained(model, use_fast=False)
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 655, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
This error might be caused by the fact that LLaMATokenizer was changed to LlamaTokenizer. Where should I make the modification?
The text was updated successfully, but these errors were encountered: