Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. #35

Closed
ilovedbsql opened this issue Mar 17, 2023 · 10 comments

Comments

@ilovedbsql
Copy link

ilovedbsql commented Mar 17, 2023

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 33/33 [00:12<00:00, 2.68it/s]
Found cached dataset json (/home/jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Found cached dataset json (/home/jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
File "/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jjjj/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 474, in
run()
File "/home/jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 437, in run
dataloader, testloader = get_loaders(
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/gptq/datautils.py", line 112, in get_loaders
return get_c4(nsamples, seed, seqlen, model, tokenizer)
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/gptq/datautils.py", line 67, in get_c4
tokenizer = tokenizer or AutoTokenizer.from_pretrained(model, use_fast=False)
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 655, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
This error might be caused by the fact that LLaMATokenizer was changed to LlamaTokenizer. Where should I make the modification?

@juncongmoo
Copy link
Owner

@ilovedbsql Than you. Please try pip install pyllama -U and it should fix the issue.

@ilovedbsql
Copy link
Author

ilovedbsql commented Mar 18, 2023

Thank you for your prompt response. I tried the method you suggested, but I'm still experiencing the same issue. Here are the results of my attempt.

(textgen) jjjj@jjjj-gm:~/Project/00.TextGen/pyllama$ pip install pyllama -U

Requirement already satisfied: pyllama in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (0.0.6)
Collecting pyllama
Using cached pyllama-0.0.8-py3-none-any.whl (51 kB)
Requirement already satisfied: hiq-python>=1.1.9 in /home/jjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from pyllama) (1.1.9)
Requirement already satisfied: torch>=1.12.0 in /home/ jjjj/.local/lib/python3.10/site-packages (from pyllama) (1.13.1)
Requirement already satisfied: fire~=0.5.0 in /home/ jjjj/.local/lib/python3.10/site-packages (from pyllama) (0.5.0)
Requirement already satisfied: sentencepiece==0.1.97 in /home/ jjjj/.local/lib/python3.10/site-packages (from pyllama) (0.1.97)
Requirement already satisfied: fairscale>=0.4.13 in /home/ jjjj/.local/lib/python3.10/site-packages (from pyllama) (0.4.13)
Requirement already satisfied: numpy>=1.22.0 in /home/ jjjj/.local/lib/python3.10/site-packages (from fairscale>=0.4.13->pyllama) (1.24.2)
Requirement already satisfied: six in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from fire~=0.5.0->pyllama) (1.16.0)
Requirement already satisfied: termcolor in /home/ jjjj/.local/lib/python3.10/site-packages (from fire~=0.5.0->pyllama) (2.2.0)
Requirement already satisfied: cachetools in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (5.3.0)
Requirement already satisfied: urllib3 in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (1.26.15)
Requirement already satisfied: py-itree in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (0.0.18)
Requirement already satisfied: PyYAML in /home/ jjjj/.local/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (6.0)
Requirement already satisfied: psutil in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (5.9.4)
Requirement already satisfied: requests in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (2.28.2)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /home/ jjjj/.local/lib/python3.10/site-packages (from torch>=1.12.0->pyllama) (11.10.3.66)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /home/ jjjj/.local/lib/python3.10/site-packages (from torch>=1.12.0->pyllama) (11.7.99)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /home/ jjjj/.local/lib/python3.10/site-packages (from torch>=1.12.0->pyllama) (8.5.0.96)
Requirement already satisfied: typing-extensions in /home/ jjjj/.local/lib/python3.10/site-packages (from torch>=1.12.0->pyllama) (4.5.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /home/ jjjj/.local/lib/python3.10/site-packages (from torch>=1.12.0->pyllama) (11.7.99)
Requirement already satisfied: wheel in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.12.0->pyllama) (0.38.4)
Requirement already satisfied: setuptools in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.12.0->pyllama) (67.6.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/ jjjj/.local/lib/python3.10/site-packages (from requests->hiq-python>=1.1.9->pyllama) (3.1.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from requests->hiq-python>=1.1.9->pyllama) (2022.12.7)
Requirement already satisfied: idna<4,>=2.5 in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from requests->hiq-python>=1.1.9->pyllama) (3.4)
Installing collected packages: pyllama
Attempting uninstall: pyllama
Found existing installation: pyllama 0.0.6
Uninstalling pyllama-0.0.6:
Successfully uninstalled pyllama-0.0.6
Successfully installed pyllama-0.0.8
(textgen) jjjj@jjjj-gm:~/Project/00.TextGen/pyllama$ python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 33/33 [00:09<00:00, 3.31it/s]
Found cached dataset json (/home/ jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Found cached dataset json (/home/ jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
File "/home/ jjjj/miniconda3/envs/textgen/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ jjjj/miniconda3/envs/textgen/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 474, in
run()
File "/home/ jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 437, in run
dataloader, testloader = get_loaders(
File "/home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages/gptq/datautils.py", line 112, in get_loaders
return get_c4(nsamples, seed, seqlen, model, tokenizer)
File "/home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages/gptq/datautils.py", line 67, in get_c4
tokenizer = tokenizer or AutoTokenizer.from_pretrained(model, use_fast=False)
File "/home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 676, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

@juncongmoo
Copy link
Owner

juncongmoo commented Mar 18, 2023

Can you open a python shell and run?

>>> from llama.hf import LLaMATokenizer
>>> 

@ilovedbsql
Copy link
Author

ilovedbsql commented Mar 18, 2023

This is the result of running the command in the terminal

(textgen) jjjj@jjjj-gm:~/Project/00.TextGen/pyllama$ python
Python 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama.hf import LLaMATokenizer
>>>

@juncongmoo
Copy link
Owner

Then it should work. Can you reproduce the error in any google colab? I can go to take a look then.

@ilovedbsql
Copy link
Author

I tried to reproduce the error on Colab, but encountered a different issue and could not reproduce it. Here's the link to the Colab notebook: https://colab.research.google.com/drive/1odpM3NxO9j8J2kubJOJxOyPXgqym82EN?usp=sharing

@juncongmoo
Copy link
Owner

I changed your colab's runtime type to GPU and it is working now !

@ilovedbsql
Copy link
Author

ilovedbsql commented Mar 18, 2023

This is the result of running the code in Colab, and it shows that the same error is occurring. Here is the link below.
https://colab.research.google.com/drive/1odpM3NxO9j8J2kubJOJxOyPXgqym82EN?usp=sharing

.....
Downloading data files: 100% 1/1 [00:07<00:00, 7.15s/it]
Extracting data files: 100% 1/1 [00:06<00:00, 6.49s/it]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
Downloading and preparing dataset json/allenai--c4 to /root/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 0% 0/1 [00:00<?, ?it/s]
Downloading data: 0% 0.00/40.5M [00:00<?, ?B/s]
Downloading data: 19% 7.76M/40.5M [00:00<00:00, 77.6MB/s]
Downloading data: 38% 15.5M/40.5M [00:00<00:00, 73.9MB/s]
Downloading data: 57% 22.9M/40.5M [00:00<00:00, 72.8MB/s]
Downloading data: 75% 30.2M/40.5M [00:00<00:00, 70.0MB/s]
Downloading data: 100% 40.5M/40.5M [00:00<00:00, 69.3MB/s]
Downloading data files: 100% 1/1 [00:02<00:00, 2.45s/it]
Extracting data files: 100% 1/1 [00:00<00:00, 1.21it/s]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
Downloading (…)okenizer_config.json: 100% 141/141 [00:00<00:00, 22.1kB/s]
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/content/pyllama/llama/llama_quant.py", line 475, in
run()
File "/content/pyllama/llama/llama_quant.py", line 438, in run
dataloader, testloader = get_loaders(
File "/usr/local/lib/python3.9/dist-packages/gptq/datautils.py", line 112, in get_loaders
return get_c4(nsamples, seed, seqlen, model, tokenizer)
File "/usr/local/lib/python3.9/dist-packages/gptq/datautils.py", line 67, in get_c4
tokenizer = tokenizer or AutoTokenizer.from_pretrained(model, use_fast=False)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/auto/tokenization_auto.py", line 676, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

@ilovedbsql
Copy link
Author

I have found the cause and solution for the problem from the link provided. The issue was caused by the recent change in the transformers source from LLaMATokenizer to LlamaTokenizer. Please refer to this link for more information: huggingface/transformers#22222

The tokenizer_config.json on the website (https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer_config.json) where the model was downloaded had LLaMATokenizer, while the recent change in transformers source has LlamaTokenizer, which seems to have caused the problem.

Therefore, I uninstalled the transformers source that I installed yesterday using "pip uninstall transformers" and reinstalled the fork that was restored to LLaMATokenizer. I installed it using "pip install git+https://github.com/mbehm/transformers". I am not sure if this is an official source, but for now, the problem is resolved.

Thank you for your support.

@thorhamma
Copy link

I have found the cause and solution for the problem from the link provided. The issue was caused by the recent change in the transformers source from LLaMATokenizer to LlamaTokenizer. Please refer to this link for more information: huggingface/transformers#22222

The tokenizer_config.json on the website (https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer_config.json) where the model was downloaded had LLaMATokenizer, while the recent change in transformers source has LlamaTokenizer, which seems to have caused the problem.

Therefore, I uninstalled the transformers source that I installed yesterday using "pip uninstall transformers" and reinstalled the fork that was restored to LLaMATokenizer. I installed it using "pip install git+https://github.com/mbehm/transformers". I am not sure if this is an official source, but for now, the problem is resolved.

Thank you for your support.

This worked for me, I needed this version of the Transformers as the model I have still references LLaMA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants