NER on whole paragraphs, instead of just keywords #12359
-
Hi, I am now annotating about 510 contracts. In most cases I tag whole paragraphs, rather than just words or phrases, with entities that describe the paragraph (eg. 'termination clause'). I now realised that maybe using NER will not work on the recognition of whole paragraphs? All I want to do is for the model to guess what a paragraph within the contract is about. On some of the examples I ran in displacy so far it seems to do well but I wonder whether there is a better way to work with paragraphs. Thanks :) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
The NER model isn't really intended for longer texts like this, and this sounds like a good use case for You'd need a solid working definition of "paragraph" so that you can annotate the same units as paragraphs in your training data and also suggest those exact same units for new texts with a custom suggester function for your spancat component. I'm surprised that I didn't easily find this kind of suggester somewhere in our projects, but here's a third-party example that suggests sentences, which should be similar in practice to suggesting paragraphs: (Related discussion: #10657) |
Beta Was this translation helpful? Give feedback.
The NER model isn't really intended for longer texts like this, and this sounds like a good use case for
spancat
, which you could use to directly label all paragraphs in a text.You'd need a solid working definition of "paragraph" so that you can annotate the same units as paragraphs in your training data and also suggest those exact same units for new texts with a custom suggester function for your spancat component.
I'm surprised that I didn't easily find this kind of suggester somewhere in our projects, but here's a third-party example that suggests sentences, which should be similar in practice to suggesting paragraphs:
https://github.com/thiippal/MoodCat/blob/867438444fd3c0d1cae3e680…