"AutoTokenizer.from_pretrained" does not work when loading a pretrained Albert model #8748

iamfaith · 2020-11-24T01:27:57Z

Environment info

transformers version:
Platform: 5.4.0-53-generic not good when I use BERT for seq2seq model in keyphrase generation #59~18.04.1-Ubuntu SMP Wed Oct 21 12:14:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Python version: 3.7.9
PyTorch version (GPU?): 1.7.0
Tensorflow version (GPU?): N/A
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

Information

Model I am using (Bert, XLNet ...):

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Install PyTorch from the official website as well as the transformers via pip.
Using the following pre-trained model:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("ckiplab/albert-tiny-chinese")

model = AutoModelForMaskedLM.from_pretrained("ckiplab/albert-tiny-chinese")

Error:

Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 683/683 [00:00<00:00, 1.32MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 112/112 [00:00<00:00, 215kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 174/174 [00:00<00:00, 334kB/s]
Traceback (most recent call last):
  File "/home/faith/torch_tutorials/torch_chatbot.py", line 30, in <module>
    tokenizer = AutoTokenizer.from_pretrained("ckiplab/albert-tiny-chinese")
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 341, in from_pretrained
    return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1653, in from_pretrained
    resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1725, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/transformers/tokenization_albert.py", line 149, in __init__
    self.sp_model.Load(vocab_file)
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/sentencepiece.py", line 367, in Load
    return self.LoadFromFile(model_file)
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/sentencepiece.py", line 177, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

Expected behavior

Expect to download this model correctly with error prompting.

The text was updated successfully, but these errors were encountered:

thomwolf · 2020-11-24T08:34:46Z

Can you share your version of transformers, tokenizers?

NielsRogge · 2020-11-24T18:52:14Z

I can reproduce this in a Colab notebook when doing pip install transformers.

Transformers version 3.5.1
Tokenizers version 0.9.3

Might be solved with v4?

akar5h · 2020-12-24T05:35:37Z

I am having the same issue with AlbertTokenizer.from_pretrained

github-actions · 2021-03-06T00:15:46Z

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

wenHK · 2021-08-12T12:06:05Z

i have the same question!

github-actions bot added the wontfix label Mar 6, 2021

github-actions bot closed this as completed Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"AutoTokenizer.from_pretrained" does not work when loading a pretrained Albert model #8748

"AutoTokenizer.from_pretrained" does not work when loading a pretrained Albert model #8748

iamfaith commented Nov 24, 2020 •

edited

Loading

thomwolf commented Nov 24, 2020

NielsRogge commented Nov 24, 2020

akar5h commented Dec 24, 2020

github-actions bot commented Mar 6, 2021

wenHK commented Aug 12, 2021

"AutoTokenizer.from_pretrained" does not work when loading a pretrained Albert model #8748

"AutoTokenizer.from_pretrained" does not work when loading a pretrained Albert model #8748

Comments

iamfaith commented Nov 24, 2020 • edited Loading

Environment info

Who can help

Information

To reproduce

Expected behavior

thomwolf commented Nov 24, 2020

NielsRogge commented Nov 24, 2020

akar5h commented Dec 24, 2020

github-actions bot commented Mar 6, 2021

wenHK commented Aug 12, 2021

iamfaith commented Nov 24, 2020 •

edited

Loading