Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Can not train llama-7b-hf due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on 3090(24GB) #3372

Closed
x22x22 opened this issue Mar 31, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@x22x22
Copy link

x22x22 commented Mar 31, 2023

🐛 Describe the bug

train_sft.sh file content:
···python
torchrun --standalone --nproc_per_node=1 train_sft.py
--pretrain "/hy-tmp/ai/colossal-ai-chat/models/llama-7b-hf/"
--model 'llama'
--strategy colossalai_zero2
--log_interval 10
--save_path /hy-tmp/ai/colossal-ai-chat/train/models/coati-llama-7b-hf
--dataset /hy-tmp/ai/colossal-ai-chat/dataset/instinwild_cn.json
--batch_size 1
--accimulation_steps 8
--lr 2e-5
--max_datasets_size 512
--max_epochs 1 \


run ./train_sft.sh, The error information is as follows:
```bash
device: 0
device = _get_device_index(device): 0
[03/31/23 15:04:16] INFO     colossalai - colossalai - INFO: /hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/colossalai/context/parallel_context.py:522 set_device                      
                    INFO     colossalai - colossalai - INFO: process rank 0 is bound to device 0                                                                                                
[03/31/23 15:04:18] INFO     colossalai - colossalai - INFO: /hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/colossalai/context/parallel_context.py:558 set_seed                        
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 42, python random: 42, ParallelMode.DATA: 42, ParallelMode.TENSOR: 42,the default parallel seed 
                             is ParallelMode.DATA.                                                                                                                                              
                    INFO     colossalai - colossalai - INFO: /hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/colossalai/initialize.py:116 launch                                        
                    INFO     colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1                  
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:25<00:00,  1.30it/s]Traceback (most recent call last):
  File "train_sft.py", line 184, in <module>
    train(args)
  File "train_sft.py", line 65, in train
    tokenizer = AutoTokenizer.from_pretrained(
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 678, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 32609) of binary: /hy-tmp/conda/colossal-chat/bin/python
Traceback (most recent call last):
  File "/hy-tmp/conda/colossal-chat/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
    run(args)
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/hy-tmp/conda/colossal-chat/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train_sft.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-31_15:06:14
  host      : I11b07783a900101bd6
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 32609)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Environment

torch==1.12.1+cu113
torchvision==0.13.1+cu113
torchaudio==0.12.1
Python==3.8.16
os=Ubuntu 20.04.4 LTS

@x22x22 x22x22 added the bug Something isn't working label Mar 31, 2023
@x22x22 x22x22 changed the title [BUG]: Can not train boom-7b due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on V100(32GB) [BUG]: Can not train llama due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on V100(32GB) Mar 31, 2023
@x22x22 x22x22 changed the title [BUG]: Can not train llama due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on V100(32GB) [BUG]: Can not train llama-7b-hf due to “Tokenizer class LLaMATokenizer does not exist or is not currently imported.” on 3090(24GB) Mar 31, 2023
@cauyxy
Copy link

cauyxy commented Mar 31, 2023

从源码构建的方式安装transformer
pip install git+https://github.com/huggingface/transformers.git

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I have installed this, and I thought it was the problem at first, and I installed it repeatedly. But the problem remains the same.

@x22x22
Copy link
Author

x22x22 commented Mar 31, 2023

从源码构建的方式安装transformer pip install git+https://github.com/huggingface/transformers.git

这个我安装过了,而且我一开始也觉得是这个问题了,还重复安装过。但是问题还是一样。

@x22x22
Copy link
Author

x22x22 commented Mar 31, 2023

找到原因了

huggingface/transformers#22222

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.

This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

@x22x22 x22x22 closed this as completed Mar 31, 2023
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Hello, do you use a single GPU for training, and how much video memory is it?

@xyfigo
Copy link

xyfigo commented May 27, 2023

The same problem,it solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants