-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token indices sequence length is longer than the specified maximum sequence length for this model. #6939
Comments
Hi, your script doesn't stop and you still end up with a processed Obviously the final annotation can be affected by the fact that some sequences are truncated, but this allows arbitrary texts to be processed without errors and then aligned back with the spacy tokens. There can also be some issues with some model types that don't provide their |
Yes, the script is still running. |
This issue has been automatically closed because it was answered and there was no follow-up discussion. |
Hi, I have a follow-up question. I am facing this 512 tokens warning while using a distilbert model. But my script runs fairly well and the model is able to reach a decent score. So my question is : Is there a chance that just resolving this warning gives me a boost in score? It would be very much helpful if you could explain in detail.. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
How to reproduce the behaviour
This problem arose when I switched to spacy 3.0 and started using 'en_core_web_trf' instead of 'en_core_web_lg'
After executing, it's resulting in the error below:
Token indices sequence length is longer than the specified maximum sequence length for this model (1313 > 512). Running this sequence through the model will result in indexing errors
The problem seems to be in the transformer Pipeline
Your Environment
The text was updated successfully, but these errors were encountered: