-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QuestionAnsweringPipeline returns full context in Japanese #17706
Comments
I suspect that "encoding" in Japanese models do not work at https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/question_answering.py#L452 |
Hi @KoichiYasuoka 👋 As per our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other requests, we'd like to invite you to use our forum 🤗 (Since the issue is about the quality of the output, it's probably model-related, and not a bug per se. In any case, if you suspect it is due to a bug in |
Hi @KoichiYasuoka , This seems to be linked to the pipeline attempts to align on "words". The problem is that this japanese tokenizer does not ever cut on "words" so the whole context is a single word, so the realignment just forgets all about the actual answer, which is a bit sad. I created a PR to include a new parameter to disable this so it can work on your use case (I personally think it should be the default but we cannot change this because of backward compatibility) |
Thank you @Narsil for creating new PR with |
Hi, the PR is not merged yet, and it will take a few days before it lands on the API (API doesn't run master). Afterwards, while being undocumented and thus maybe deactivated at anytime (though we rarely do this), you could send Unfortunately the widget itself will not use parameters. Does that answer your question ? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
Who can help?
@Narsil @sgugger
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
QuestionAnsweringPipeline (almost always) returns full
context
in Japanese, for example:returns
{'score': 0.9999955892562866, 'start': 0, 'end': 30, 'answer': '全学年にわたって小学校の国語の教科書に挿し絵が用いられている'}
. On the other hand, directly withtorch.argmax
the model returns the answer "教科書" correctly.
Expected behavior
Return the right answer "教科書" instead of full context.
The text was updated successfully, but these errors were encountered: