-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: general back section (section) #698
Comments
Hi @kermitt2 in #652 (comment) you suggested to use |
Hi @de-code ! Yes the approach is to use |
Okay, thank you for that. In that case, how do you differentiate between those "back matters" sections and the appendix? (I somehow thought |
Well this is for the training data, when sections are recognized explicitly they fall at the right place in the TEI result. |
I have a similar question relating figures and tables. The annotation guideline specifies that the should be part of the |
Relating to my last question, there seems to be a problem (or I may be misunderstanding the guideline). |
For the segmentation model, Figures and tables normally in the "zone" where they belong (where they are referenced primarily), which is mentioned here -> https://grobid.readthedocs.io/en/latest/training/segmentation/#tables-and-figures. So for instance in the header if we have a figure as part of the abstract, or in an annex if they are part of it. Maybe the guidelines are not drafted clearly enough, because the general rule - figure/table in the body - is too much emphasized? For preprint/submission format it's frequent that all the figures appears at the very end of the article (sometimes separated from their captions), in this case they should be labelled as "body" as they are usually figures/tables for the body part, although after the bibliographical section and annex for formatting reasons. |
Okay, maybe I have misinterpreted the general rule as the overriding rule. Perhaps we could say, that figures and tables belong to where they are referenced first? i.e. if a figure is referenced from a body section, then it belongs to the body. But if it is only referenced from a back section, then it belongs there?
(There may also be the question whether it makes sense to extract sections titles like |
It only occurs to me now, that there doesn't seem to be a generic back section, section.
The Annotation guidelines for the 'segmentation' model do not mention any
back
section. The existing training data wraps elements in aback
section, probably to keep the general TEI structure.It does support the following specific back section elements:
listBibl
annex
acknowledgment
Out of those,
acknowledgment
is probably a sort of general back section (section).But there could be others, e.g. relating to:
It would be good to be able to just extract general back sections (with title and paragraph(s)).
(Then
acknowledgment
could just be a special case of that)The text was updated successfully, but these errors were encountered: