ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. #35

ilovedbsql · 2023-03-17T15:19:43Z

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 33/33 [00:12<00:00, 2.68it/s]
Found cached dataset json (/home/jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Found cached dataset json (/home/jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
File "/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jjjj/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 474, in
run()
File "/home/jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 437, in run
dataloader, testloader = get_loaders(
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/gptq/datautils.py", line 112, in get_loaders
return get_c4(nsamples, seed, seqlen, model, tokenizer)
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/gptq/datautils.py", line 67, in get_c4
tokenizer = tokenizer or AutoTokenizer.from_pretrained(model, use_fast=False)
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 655, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
This error might be caused by the fact that LLaMATokenizer was changed to LlamaTokenizer. Where should I make the modification?

juncongmoo · 2023-03-17T22:59:21Z

@ilovedbsql Than you. Please try pip install pyllama -U and it should fix the issue.

ilovedbsql · 2023-03-18T00:51:46Z

Thank you for your prompt response. I tried the method you suggested, but I'm still experiencing the same issue. Here are the results of my attempt.

(textgen) jjjj@jjjj-gm:~/Project/00.TextGen/pyllama$ pip install pyllama -U

Requirement already satisfied: pyllama in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (0.0.6)
Collecting pyllama
Using cached pyllama-0.0.8-py3-none-any.whl (51 kB)
Requirement already satisfied: hiq-python>=1.1.9 in /home/jjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from pyllama) (1.1.9)
Requirement already satisfied: torch>=1.12.0 in /home/ jjjj/.local/lib/python3.10/site-packages (from pyllama) (1.13.1)
Requirement already satisfied: fire~=0.5.0 in /home/ jjjj/.local/lib/python3.10/site-packages (from pyllama) (0.5.0)
Requirement already satisfied: sentencepiece==0.1.97 in /home/ jjjj/.local/lib/python3.10/site-packages (from pyllama) (0.1.97)
Requirement already satisfied: fairscale>=0.4.13 in /home/ jjjj/.local/lib/python3.10/site-packages (from pyllama) (0.4.13)
Requirement already satisfied: numpy>=1.22.0 in /home/ jjjj/.local/lib/python3.10/site-packages (from fairscale>=0.4.13->pyllama) (1.24.2)
Requirement already satisfied: six in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from fire~=0.5.0->pyllama) (1.16.0)
Requirement already satisfied: termcolor in /home/ jjjj/.local/lib/python3.10/site-packages (from fire~=0.5.0->pyllama) (2.2.0)
Requirement already satisfied: cachetools in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (5.3.0)
Requirement already satisfied: urllib3 in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (1.26.15)
Requirement already satisfied: py-itree in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (0.0.18)
Requirement already satisfied: PyYAML in /home/ jjjj/.local/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (6.0)
Requirement already satisfied: psutil in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (5.9.4)
Requirement already satisfied: requests in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from hiq-python>=1.1.9->pyllama) (2.28.2)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /home/ jjjj/.local/lib/python3.10/site-packages (from torch>=1.12.0->pyllama) (11.10.3.66)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /home/ jjjj/.local/lib/python3.10/site-packages (from torch>=1.12.0->pyllama) (11.7.99)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /home/ jjjj/.local/lib/python3.10/site-packages (from torch>=1.12.0->pyllama) (8.5.0.96)
Requirement already satisfied: typing-extensions in /home/ jjjj/.local/lib/python3.10/site-packages (from torch>=1.12.0->pyllama) (4.5.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /home/ jjjj/.local/lib/python3.10/site-packages (from torch>=1.12.0->pyllama) (11.7.99)
Requirement already satisfied: wheel in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.12.0->pyllama) (0.38.4)
Requirement already satisfied: setuptools in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.12.0->pyllama) (67.6.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/ jjjj/.local/lib/python3.10/site-packages (from requests->hiq-python>=1.1.9->pyllama) (3.1.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from requests->hiq-python>=1.1.9->pyllama) (2022.12.7)
Requirement already satisfied: idna<4,>=2.5 in /home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages (from requests->hiq-python>=1.1.9->pyllama) (3.4)
Installing collected packages: pyllama
Attempting uninstall: pyllama
Found existing installation: pyllama 0.0.6
Uninstalling pyllama-0.0.6:
Successfully uninstalled pyllama-0.0.6
Successfully installed pyllama-0.0.8
(textgen) jjjj@jjjj-gm:~/Project/00.TextGen/pyllama$ python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 33/33 [00:09<00:00, 3.31it/s]
Found cached dataset json (/home/ jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Found cached dataset json (/home/ jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
File "/home/ jjjj/miniconda3/envs/textgen/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ jjjj/miniconda3/envs/textgen/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 474, in
run()
File "/home/ jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 437, in run
dataloader, testloader = get_loaders(
File "/home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages/gptq/datautils.py", line 112, in get_loaders
return get_c4(nsamples, seed, seqlen, model, tokenizer)
File "/home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages/gptq/datautils.py", line 67, in get_c4
tokenizer = tokenizer or AutoTokenizer.from_pretrained(model, use_fast=False)
File "/home/ jjjj/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 676, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

juncongmoo · 2023-03-18T03:55:38Z

Can you open a python shell and run?

>>> from llama.hf import LLaMATokenizer
>>>

ilovedbsql · 2023-03-18T04:06:22Z

This is the result of running the command in the terminal

(textgen) jjjj@jjjj-gm:~/Project/00.TextGen/pyllama$ python
Python 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama.hf import LLaMATokenizer
>>>

juncongmoo · 2023-03-18T04:16:05Z

Then it should work. Can you reproduce the error in any google colab? I can go to take a look then.

ilovedbsql · 2023-03-18T05:21:10Z

I tried to reproduce the error on Colab, but encountered a different issue and could not reproduce it. Here's the link to the Colab notebook: https://colab.research.google.com/drive/1odpM3NxO9j8J2kubJOJxOyPXgqym82EN?usp=sharing

juncongmoo · 2023-03-18T05:49:59Z

I changed your colab's runtime type to GPU and it is working now !

ilovedbsql · 2023-03-18T07:31:52Z

This is the result of running the code in Colab, and it shows that the same error is occurring. Here is the link below.
https://colab.research.google.com/drive/1odpM3NxO9j8J2kubJOJxOyPXgqym82EN?usp=sharing

.....
Downloading data files: 100% 1/1 [00:07<00:00, 7.15s/it]
Extracting data files: 100% 1/1 [00:06<00:00, 6.49s/it]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
Downloading and preparing dataset json/allenai--c4 to /root/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 0% 0/1 [00:00<?, ?it/s]
Downloading data: 0% 0.00/40.5M [00:00<?, ?B/s]
Downloading data: 19% 7.76M/40.5M [00:00<00:00, 77.6MB/s]
Downloading data: 38% 15.5M/40.5M [00:00<00:00, 73.9MB/s]
Downloading data: 57% 22.9M/40.5M [00:00<00:00, 72.8MB/s]
Downloading data: 75% 30.2M/40.5M [00:00<00:00, 70.0MB/s]
Downloading data: 100% 40.5M/40.5M [00:00<00:00, 69.3MB/s]
Downloading data files: 100% 1/1 [00:02<00:00, 2.45s/it]
Extracting data files: 100% 1/1 [00:00<00:00, 1.21it/s]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
Downloading (…)okenizer_config.json: 100% 141/141 [00:00<00:00, 22.1kB/s]
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/content/pyllama/llama/llama_quant.py", line 475, in
run()
File "/content/pyllama/llama/llama_quant.py", line 438, in run
dataloader, testloader = get_loaders(
File "/usr/local/lib/python3.9/dist-packages/gptq/datautils.py", line 112, in get_loaders
return get_c4(nsamples, seed, seqlen, model, tokenizer)
File "/usr/local/lib/python3.9/dist-packages/gptq/datautils.py", line 67, in get_c4
tokenizer = tokenizer or AutoTokenizer.from_pretrained(model, use_fast=False)
File "/usr/local/lib/python3.9/dist-packages/transformers/models/auto/tokenization_auto.py", line 676, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

ilovedbsql · 2023-03-18T12:24:51Z

I have found the cause and solution for the problem from the link provided. The issue was caused by the recent change in the transformers source from LLaMATokenizer to LlamaTokenizer. Please refer to this link for more information: huggingface/transformers#22222

The tokenizer_config.json on the website (https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer_config.json) where the model was downloaded had LLaMATokenizer, while the recent change in transformers source has LlamaTokenizer, which seems to have caused the problem.

Therefore, I uninstalled the transformers source that I installed yesterday using "pip uninstall transformers" and reinstalled the fork that was restored to LLaMATokenizer. I installed it using "pip install git+https://github.com/mbehm/transformers". I am not sure if this is an official source, but for now, the problem is resolved.

Thank you for your support.

thorhamma · 2023-03-22T14:52:50Z

I have found the cause and solution for the problem from the link provided. The issue was caused by the recent change in the transformers source from LLaMATokenizer to LlamaTokenizer. Please refer to this link for more information: huggingface/transformers#22222

The tokenizer_config.json on the website (https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer_config.json) where the model was downloaded had LLaMATokenizer, while the recent change in transformers source has LlamaTokenizer, which seems to have caused the problem.

Therefore, I uninstalled the transformers source that I installed yesterday using "pip uninstall transformers" and reinstalled the fork that was restored to LLaMATokenizer. I installed it using "pip install git+https://github.com/mbehm/transformers". I am not sure if this is an official source, but for now, the problem is resolved.

Thank you for your support.

This worked for me, I needed this version of the Transformers as the model I have still references LLaMA

ilovedbsql closed this as completed Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. #35

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. #35

ilovedbsql commented Mar 17, 2023 •

edited

Loading

juncongmoo commented Mar 17, 2023

ilovedbsql commented Mar 18, 2023 •

edited

Loading

juncongmoo commented Mar 18, 2023 •

edited

Loading

ilovedbsql commented Mar 18, 2023 •

edited

Loading

juncongmoo commented Mar 18, 2023

ilovedbsql commented Mar 18, 2023

juncongmoo commented Mar 18, 2023

ilovedbsql commented Mar 18, 2023 •

edited

Loading

ilovedbsql commented Mar 18, 2023

thorhamma commented Mar 22, 2023

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. #35

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. #35

Comments

ilovedbsql commented Mar 17, 2023 • edited Loading

juncongmoo commented Mar 17, 2023

ilovedbsql commented Mar 18, 2023 • edited Loading

juncongmoo commented Mar 18, 2023 • edited Loading

ilovedbsql commented Mar 18, 2023 • edited Loading

juncongmoo commented Mar 18, 2023

ilovedbsql commented Mar 18, 2023

juncongmoo commented Mar 18, 2023

ilovedbsql commented Mar 18, 2023 • edited Loading

ilovedbsql commented Mar 18, 2023

thorhamma commented Mar 22, 2023

ilovedbsql commented Mar 17, 2023 •

edited

Loading

ilovedbsql commented Mar 18, 2023 •

edited

Loading

juncongmoo commented Mar 18, 2023 •

edited

Loading

ilovedbsql commented Mar 18, 2023 •

edited

Loading

ilovedbsql commented Mar 18, 2023 •

edited

Loading