-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework FormattedText
model to better support USX3/USFM3 import
#93
Comments
Lately I have not been actively working with this tool, I mostly use it to convert from USX (2/3) to USFM (3) (which my application surprisingly can parse faster then XML). From my perspective I don't have any remarks about your plans to rework |
Updated the issue to not forget to add support for UBXF alignment milestones. |
@Rolf-Smit just a heads up: in a553d4b I changed the intermediate format used by Paratext formats by moving Figure, VerseStart and VerseEnd to be BookContent instead of CharacterContent (all Paratext formats supported so far do not support those nested in character tags or footnotes anyway). This makes some parsing easier and removes some ugly workarounds that made extending the format harder. Not sure if that affects your use cases. |
UBXF alignment milestones are now implemented, see 93eeed0 (part of main branch) A very early alpha of the new |
Hi Michael, thanks for working on this. Seems to run ok. Has USFM<->USX changed at all, or just when converting to/from other formats? |
USFM<->USX has changed alot when I added support for USFM 3 format. Also I found a few bugs while implementing this feature, but I backported them to the normal version. As a consequence, the current nightly build has many changes in USFM<->USX conversion compared to the previously released version; on the other hand, USFM<->USX behaviour in the normal nightly version and the one from the |
So just to clarify, I'll get the USFM3 upgrade if I use builds from master here: https://github.com/schierlm/BibleMultiConverter/actions ? |
Correct. |
Am I right in saying USFM1-2 is also valid USFM3? Where as USX1-2 is not valid USX3 as they don't have end verse markers? |
Yes, all USFM formats are also valid in later versions of the standard. USX is not, first because of the end markers, and second because of the different XML schema which dropped some deprecated attributes and values. |
The current
FormattedText
model, which is used as intermediate format for every conversion (except conversions between two Paratext formats) has been there since the beginning of BibleMultiConverter. Yet, other Bible formats have evolved. Threrfore, rework the internal model.Some ideas:
FormattingInstructionKind: Add new constants
PSALM_TITLE
(titles of Psalms which sometimes are part of verse 1, sometimes before it)ADDED_TEXT
(text added by the translator which is not linked to original source, often conjunctionsWhen exporting those to a format that does not support them, treat both as
ITALIC
.Add Speaker markup to mark text spoken by a person other than Jesus. Speakers can be identified
by labels (e.g. "Moses") or Strongs numbers (e.g. "H4872").
Rework LineBreakKind based on
ExtendedLineBreakKind
used for Paratext export
GrammarInformation: Add suffix letters for Strongs numbers (optional), also add a way to add
arbitrary key-value pairs (like in OSIS or Paratext). Values need not be ASCII only (e.g. Greek Lemma).
Links: Support
Footnotes: Add a flag whether a footnote contains text or cross references. For now, this is done by adding XREF_MARKER to the beginning of the footnote text, but many new formats have this distrinction and parsing for magic strings gets cumbersome.
Cross References: Support cross references that span more than one book; also support cross references that do not reference individual verses, but whole chapters or books.
As this is a major task (needs to touch most of the modules), my plan is in a first step to only update the roundtrip formats, and make the other formats "just" work again (using fallbacks or ignoring the new options). Will keep a list of status of the modules (e.g. compiles again, tested, compared against format spec), trying to not make a format worse than before anywhere in the process.
When exporting other features from USFM to FormattedText, use ExtraAttributes wherever possible. This should also include custom tags and custom milestones. There should be an option to convert UBXF alignment milestones (for a single alignment source) to GrammarInformation instead of extra attributes.
Did I miss anything? Feature should be present in both USFM3/USX3 and in more than one other format.
// cc @Rolf-Smit @Michahel @shadow-light @paul1149
The text was updated successfully, but these errors were encountered: