Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An Error report about pipeline #3227

Closed
SizhaoXu opened this issue Mar 11, 2020 · 8 comments · Fixed by #3439
Closed

An Error report about pipeline #3227

SizhaoXu opened this issue Mar 11, 2020 · 8 comments · Fixed by #3439
Labels
Core: Pipeline Internals of the library; Pipeline. Version mismatch

Comments

@SizhaoXu
Copy link

🐛 Bug

Information

This may be an easy question, but it has been bothering me all day.

When I run the code:
nlp = pipeline("question-answering")

It always tells me:
Couldn't reach server at 'https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-cased-distilled-squad-modelcard.json' to download model card file.
Creating an empty model card.

If I ignore it and continue to run the rest of the code:
nlp({
'question': 'What is the name of the repository ?',
'context': 'Pipeline have been included in the huggingface/transformers repository'
})

The error will appear:
KeyError: 'token_type_ids'

@EllieRoseS
Copy link

EllieRoseS commented Mar 11, 2020

I have this same issue, but have no problems running:

nlp = pipeline("question-answering")

Note: To install the library, I had to install tokenizers version 0.6.0 separately, git clone the transformers repo and edit the setup.py file before installing as per @dafraile's answer for issue: #2831

Update: This error was fixed when I installed tokenizers==0.5.2

@LysandreJik LysandreJik added Core: Pipeline Internals of the library; Pipeline. Version mismatch labels Mar 19, 2020
@nreimers
Copy link
Contributor

I sadly have this issue too with the newest transformers 2.6.0 version.

Tokenizers is at version 0.5.2. But newest version of tokenizers sadly also doesn't work.

And solutions to fix this issue?

@maximuslee1226
Copy link

I have the same issue here. I first ran with my own tokenizer, but it failed, and then I tried to run the 03-pipelines.ipynb code with QnA example and I get the following error code.

Environment:
tensorflow==2.0.0
tensorflow-estimator==2.0.1
tensorflow-gpu==2.0.0
torch==1.4.0
transformers==2.5.1
tokenizers==0.6.0

Code that I ran:
nlp_qa = pipeline('question-answering')
nlp_qa(context='Hugging Face is a French company based in New-York.', question='Where is based Hugging Face ?')

Error code:

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…

convert squad examples to features: 0%| | 0/1 [00:00<?, ?it/s]

RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/brandon/anaconda3/envs/transformers/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/brandon/anaconda3/envs/transformers/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/brandon/anaconda3/envs/transformers/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 198, in squad_convert_example_to_features
p_mask = np.array(span["token_type_ids"])
KeyError: 'token_type_ids'
"""

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
in ()
1 nlp_qa = pipeline('question-answering')
----> 2 nlp_qa(context='Hugging Face is a French company based in New-York.', question='Where is based Hugging Face ?')

~/anaconda3/envs/transformers/lib/python3.7/site-packages/transformers/pipelines.py in call(self, *texts, **kwargs)
968 False,
969 )
--> 970 for example in examples
971 ]
972 all_answers = []

~/anaconda3/envs/transformers/lib/python3.7/site-packages/transformers/pipelines.py in (.0)
968 False,
969 )
--> 970 for example in examples
971 ]
972 all_answers = []

~/anaconda3/envs/transformers/lib/python3.7/site-packages/transformers/data/processors/squad.py in squad_convert_examples_to_features(examples, tokenizer, max_seq_length, doc_stride, max_query_length, is_training, return_dataset, threads)
314 p.imap(annotate_, examples, chunksize=32),
315 total=len(examples),
--> 316 desc="convert squad examples to features",
317 )
318 )

~/anaconda3/envs/transformers/lib/python3.7/site-packages/tqdm/std.py in iter(self)
1106 fp_write=getattr(self.fp, 'write', sys.stderr.write))
1107
-> 1108 for obj in iterable:
1109 yield obj
1110 # Update and possibly print the progressbar.

~/anaconda3/envs/transformers/lib/python3.7/multiprocessing/pool.py in (.0)
323 result._set_length
324 ))
--> 325 return (item for chunk in result for item in chunk)
326
327 def imap_unordered(self, func, iterable, chunksize=1):

~/anaconda3/envs/transformers/lib/python3.7/multiprocessing/pool.py in next(self, timeout)
746 if success:
747 return value
--> 748 raise value
749
750 next = next # XXX

KeyError: 'token_type_ids'

@maximuslee1226
Copy link

Any help would be greatly appreciated!

@paras55
Copy link

paras55 commented Mar 27, 2020

use :
pip install transformers==2.5.1
instead of :
pip install transformers

@maximuslee1226
Copy link

Thank you @paras55. your solution worked for me!

@LysandreJik
Copy link
Member

Installing v2.7.0 should work as well.

@ypapanik
Copy link

ypapanik commented Apr 2, 2020

2.7.0 fails with the same error (at least with tokenizers==0.5.2)

julien-c added a commit that referenced this issue Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core: Pipeline Internals of the library; Pipeline. Version mismatch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants