Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"AutoTokenizer.from_pretrained" does not work when loading a pretrained Albert model #8748

Closed
1 of 4 tasks
iamfaith opened this issue Nov 24, 2020 · 5 comments
Closed
1 of 4 tasks
Labels

Comments

@iamfaith
Copy link

iamfaith commented Nov 24, 2020

Environment info

  • transformers version:
  • Platform: 5.4.0-53-generic not good when I use BERT for seq2seq model in keyphrase generation #59~18.04.1-Ubuntu SMP Wed Oct 21 12:14:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Python version: 3.7.9
  • PyTorch version (GPU?): 1.7.0
  • Tensorflow version (GPU?): N/A
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help

Information

Model I am using (Bert, XLNet ...):

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Install PyTorch from the official website as well as the transformers via pip.
  2. Using the following pre-trained model:
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("ckiplab/albert-tiny-chinese")

model = AutoModelForMaskedLM.from_pretrained("ckiplab/albert-tiny-chinese")
  1. Error:
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 683/683 [00:00<00:00, 1.32MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 112/112 [00:00<00:00, 215kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 174/174 [00:00<00:00, 334kB/s]
Traceback (most recent call last):
  File "/home/faith/torch_tutorials/torch_chatbot.py", line 30, in <module>
    tokenizer = AutoTokenizer.from_pretrained("ckiplab/albert-tiny-chinese")
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 341, in from_pretrained
    return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1653, in from_pretrained
    resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1725, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/transformers/tokenization_albert.py", line 149, in __init__
    self.sp_model.Load(vocab_file)
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/sentencepiece.py", line 367, in Load
    return self.LoadFromFile(model_file)
  File "/home/faith/miniconda3/envs/torch/lib/python3.7/site-packages/sentencepiece.py", line 177, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

Expected behavior

Expect to download this model correctly with error prompting.

@thomwolf
Copy link
Member

Can you share your version of transformers, tokenizers?

@NielsRogge
Copy link
Contributor

I can reproduce this in a Colab notebook when doing pip install transformers.

  • Transformers version 3.5.1
  • Tokenizers version 0.9.3

Might be solved with v4?

@akar5h
Copy link

akar5h commented Dec 24, 2020

I am having the same issue with AlbertTokenizer.from_pretrained

@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

@wenHK
Copy link

wenHK commented Aug 12, 2021

i have the same question!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants