-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError in GLUE data tokenization with RoBERTA #3313
Comments
* Minimal example * Proposal 2 * Proposal 2 for fast tokenizers * Typings * Docs * Revert "Docs" for easier review This reverts commit eaf0f97. * Remove unnecessary assignments * Tests * Fix faulty type * Remove prints * return_outputs -> model_input_names * Revert "Revert "Docs" for easier review" This reverts commit 6fdc694. * code quality
I also have this issue when i run run_multiple_choice.py in RACE data with RoBERTA. |
I get the same error when I try to fine-tune Squad |
Tagging @LysandreJik |
Same here. Any solution? |
@nielingyun @orena1 @Onur90 maybe try pulling again from the latest version of the repo and see if it works? The error went away after I pulled recently, not sure if that fixed it or something else I did - let me know if that worked |
@ethanjperez by latest version you mean latest commit or the latest release (v2.6.0)? It is still not working with the latest commit. |
🐛 Bug
I'm getting a KeyError here when using RoBERTa in examples/run_glue.py and trying to access
'token_type_ids'
while preprocessing the data, maybe from this commit removing'token_type_ids'
from RoBERTa (and DistilBERT)?I get the error when fine-tuning RoBERTa on CoLA and RTE. I haven't tried other tasks, but I think you'd get the same error.
I don't get the error when fine-tuning XLNet (presumably, since XLNet does use
'token_type_ids'
), and I don't get the error when I dopip install transformers
instead ofpip install .
(which I think means the issue is coming from a recent commit).Here's the full error message:
Information
Model I am using (Bert, XLNet ...): RoBERTa. I think DistilBERT may run into the same issue as well.
Language I am using the model on (English, Chinese ...): English
The problem arises when using:
I've made slight modifications to the training loop in the official examples/run_glue.py, but I did not touch the data pre-processing, which is where the error occurs (before any training).
The tasks I am working on is:
I've run into the error on CoLA and RTE, though I think the error should happen on all GLUE tasks.
To reproduce
Steps to reproduce the behavior:
transformers
using the latest clone (usepip install .
notpip install transformers
)data/RTE
using the GLUE download scripts in this repo)Expected behavior
load_and_cache_examples
(and specifically, the call toconvert_examples_to_features
) inexamples/run_glue.py
should run without error, to load, preprocess, and tokenize the dataset.Environment info
transformers
version: 2.5.1The text was updated successfully, but these errors were encountered: