How does spancat recognizes span and compute confidence score? #9063
-
I am using spacy to extract a single entity from text document (there is only one type of entity in the doc, says person). I used spacy's named entity recognizer to do that and it works out pretty well, except that I also want to get the confidence score for each extracted person. I know spaCy uses a greedy transition-based parser to label the entity so it is not possible to compute the confidence score. I tried out the new component Span Categorizer because the docs said I can access the scores from each predicted span. I am looking through the codes but I could not figure out how exactly Span Categorizer works. From the architecture configuration, it does not use a transition-based parser, so I would really appreciate if someone can give me a pointer to the algorithm for which the Span Cat uses to identify spans. Ultimately I want to know how the confidence score is computed? Update: After reading this discussion #3961 I understand this a bit more. I recognize that Span Cat does classification over the suggested spans. My question now is if my training data only has one type of entity, will all the predicted spans get the same label? If that's the case, how does it compute the confidence score then since there are no other labels (classes)? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
If you look at the implementation of spancat, you can see that it uses a logistic activation. This is different from say, textcat, which uses a softmax activation when you have exclusive classes. In a softmax activation, all probabilities add to one, because you're picking the best option out of a list of options. But with spancat that's not what you're doing - like multilabel textcat, a span can have all labels or no labels. So the decision about each label for a span is made more or less independently, and there's no guarantee that the scores for different labels would add to one (which, as you noticed, would make a one-label spancat meaningless). Since it's a neural network it's not really interpretable, but it learns weights to combine with the inputs in such a way that it maximizes scores during training. You can think of it as calculating a yes-or-no decision for each label for each span, and the score comes from that yes-or-no decision. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the answer. So I guess if the score is meaningless, I think I'd better off with the entity recognizer as I see that the accuracy (F1) is slightly better. Can you please correct if I am wrong in the way I interpret how the greedy parser works? From what I understand, it calculates the probability distribution of available reduce actions (BILOU), and picks the action with the highest probability and apply the corresponding label. If that's the case, I wonder if there is a way for us to access that probability so we can apply some sort of threshold to reject or accept the resulted parse? Say if the action to apply a tag 'B' is highest, but the value itself is only 0.45, and the threshold is 0.5, then I will not accept that result. And consequently, I'll just say the sentence do not have any 'B' tag. Alternatively, I quite like the beam search idea but is it true that spacy v3 no longer has it? I saw the code is still there but in some discussion posts, you guys advise against using it. Appreciate your feedback and suggestion. Thank you. |
Beta Was this translation helpful? Give feedback.
If you look at the implementation of spancat, you can see that it uses a logistic activation. This is different from say, textcat, which uses a softmax activation when you have exclusive classes.
In a softmax activation, all probabilities add to one, because you're picking the best option out of a list of options.
But with spancat that's not what you're doing - like multilabel textcat, a span can have all labels or no labels. So the decision about each label for a span is made more or less independently, and there's no guarantee that the scores for different labels would add to one (which, as you noticed, would make a one-label spancat meaningless).
Since it's a neural network it's not re…