Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BITS-enabled JATS reader #9138

Merged
merged 8 commits into from
Oct 26, 2023
Merged

BITS-enabled JATS reader #9138

merged 8 commits into from
Oct 26, 2023

Conversation

kamoe
Copy link
Contributor

@kamoe kamoe commented Oct 15, 2023

Proposal for development of new BITS reader by expanding the JATS reader.

@kamoe kamoe marked this pull request as draft October 15, 2023 20:24
This was referenced Oct 15, 2023
@kamoe kamoe marked this pull request as ready for review October 18, 2023 18:28
@kamoe kamoe changed the title BITS-enabled reader BITS-enabled JATS reader Oct 19, 2023
@kamoe
Copy link
Contributor Author

kamoe commented Oct 24, 2023

@jgm I have updated this PR with all unit tests I needed. They all pass, and all other existing checks pass. Below is the scope of the enhancement to make the JATS reader capable to processing BITS documents:

<title>: two new relevant attributes, @display-as which indicates the Header level at which to display a title; and @suppress, which indicates whether the title should be shown in the document.

<book> and <book-part-wrapper>: the two possible top-level elements of books. These simply set the jatsBook variable to True, then cascade execution to children.

<book-title> and <book-title-group>: These define the structure for <title> elements for books. I created a new getBookTitle function, invoked from parseMeta if document is a book, and renamed getTitle to getArticleTitle, for clarity.

<book-meta>: metadata element for books. Processed via parseMeta, which also invokes a new function called getBookAuthors, if document is a book. Renamed getAuthors to getArticleAuthors, for clarity.

<dedication>, <foreword>, <preface>: Named book sections with assumed titles. Processed via parseBlockWithHeader.

<legend>: New element with title, essential for figures and tables. Processed via parseBlockWithHeader

Index and TOC content elements: book-specific elements, with a title. Processed via parseBlockWithHeader.

Index and TOC title group elements: contain the title for its parent element. Their behaviour is incorporated into wrapWithHeader, therefore, empty output to avoid duplication.

wrapWithHeader: Rewritten to accommodate the new title attributes, named book sections, and index and TOC elements.

The relevant checks have been put in place for <title> and parseMeta when it is necessary to make the distinction between the document as an article, and as a book. Otherwise, given most of the above elements are specific to books, the reader will not have to deal with them if processing an article.

Let me know what you think.

Copy link
Owner

@jgm jgm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I noted one test that doesn't seem right, unless I'm missing something.

</permissions>
</article-meta>
^D
[]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you intended? Shouldn't this test be run with -s so we get the metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. Just added this, plus a few additional tests for metadata in both book and book-part-wrapper cases. Thanks for pointing out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also separate tests from when we just want to see content, not metadata.

Nothing -> do
let name = qName (elName e)
if (name == "dedication" || name == "foreword" || name == "preface")
then return $ str $ T.toTitle name
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to use localized (translated) versions of these names rather than the English.
However, we don't currently have localizations for these, so this will have to wait.

Copy link
Contributor Author

@kamoe kamoe Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The rationale is that, if there are non-English language named sections, then these will probably have a <title>, captured in the case just above. But might still be something to consider in the future.

@jgm
Copy link
Owner

jgm commented Oct 26, 2023

Thanks for the changes. I thought of a couple other things, regarding the tests.
First, it's not ideal to have a test called book-meta.md (for example); better to add a bits- prefix to all these named tests so we can more easily see what they concern.
Second, if you can reduce the amount of text in these tests, it will reduce the size of the test files (and hence the release tarball).

@kamoe
Copy link
Contributor Author

kamoe commented Oct 26, 2023

How about these?

@jgm
Copy link
Owner

jgm commented Oct 26, 2023

Perfect!

@jgm
Copy link
Owner

jgm commented Oct 26, 2023

Questions:

  • should we add something to the manual describing BITS support?
  • should we allow -f bits (I guess it would just act like -f jats?) Or is BITS considered a dialect of JATS so that -f jats would still be appropriate?

@kamoe
Copy link
Contributor Author

kamoe commented Oct 26, 2023

Uhm, I would say yes, and yes.

BITS it's a very similar but officially different schema, so I think it is appropriate to treat as a separate format, with -bits.

Do you need any particular help wrt wording any docs?

@jgm
Copy link
Owner

jgm commented Oct 26, 2023

Would -f bits behave any differently from -f jats? Or would it essentially be a synonym? My understanding is that there is no special "setting" for the reader that is needed to parse BITS.

@kamoe
Copy link
Contributor Author

kamoe commented Oct 26, 2023

It would be a complete synonym. No setting needed. It's the top or root element that tells the reader whether it's a jats or bits document.

@jgm jgm merged commit f2a22e7 into jgm:main Oct 26, 2023
9 of 12 checks passed
@jgm
Copy link
Owner

jgm commented Oct 26, 2023

Great, I'll merge this and add the synonym and manual bits myself.

@kamoe
Copy link
Contributor Author

kamoe commented Oct 31, 2023

Wonderful. I think it would be consistent to add "← BITS" below the XML formats supported in the main page.

@jgm
Copy link
Owner

jgm commented Oct 31, 2023

Will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants