-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flexible handling of non-existent message_attribute for given training sample #4445
Comments
Thanks for submitting this feature request 🚀@dakshvar22 will get back to you about it soon!✨ |
Hi @dakshvar22 , I broke my head about a solution at this point. I tried several scenarios and I am now able to comprehend your problems with the architectural decision in this situation. At least for now I'd say that it is more or less impossible to change things on the The simple conclusion is: They are doing things with their Doc-object that simply can't be done with an empty Doc - at the moment, e.g. because actually there are no I then came back to the idea to question this part:
The more I thought about that the more I could feel with your struggle. On idea was to change
in I understand your thoughts about obeying the order of the What should/could we do with them? I am running out of ideas. |
Hey @JulianGerhard21 , thanks for giving this a detailed look. I think, going by your observations, we can't rely on spacy or spacy-pytorch-transformers to help us out here.
I think it makes sense to do this because people can build custom components based on pre-trained BERT using spacy docs or integrating any other library that comes up and relies on spacy docs. Since we already have a What do you think? |
Hi @dakshvar22, allright - I agree with you and I am going to start to work on this this afternoon. I will get back to you with a code proposal as soon as it is ready. Thanks for your help! Regards |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Closing as this is in a minor release around 1.3.x. |
Description of Problem:
Currently, the
SpacyNLP
component provides the following:provides = ["spacy_doc", "spacy_nlp", "intent_spacy_doc", "response_spacy_doc"]
which caused the necessity to handle non-existent /
None
-valued attributes for a given training sample. Currently this is realized by convertingNone
values to empty strings since spaCy can't handleNone
values while creating itsDoc
-objects upon them.Since simply filtering out those training samples and therefore disobey their order would cause consecutive problems, we need to find a more flexible solution.
Overview of the Solution:
I am going to think about a robust solution and update this issue likewise.
Examples:
If there are no samples for the
response
-attribute, currently this results in a list of emptyDoc
-objects while callingpipe
on:docs = [doc for doc in self.nlp.pipe(texts, batch_size=50)]
[, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ]
since an empty string is valid for
Doc
-objects but in fact is a problem for e.g. libraries likespacy-pytorch-transformers
or other custom-components which can't handle this cases properly.The coresponding forum entry to this conversation can be found here @dakshvar22
The text was updated successfully, but these errors were encountered: