TSV/CSV Conversion Should Default to `relaxed` mode #97

joelthe1 · 2020-11-13T22:05:06Z

When converting to TSV and CSV mode from the command line the same USFM file errors out when trying to convert to TSV or CSV except when using the option -l relaxed. I think we should default to relaxed mode whenever the user just wants to convert to TSV or CSV. Specifically I was trying to parse 01-GEN.usfm from here file.

Also, I think we should find out why some of those files won't parse. On the command line I just see this:

Error parsing the input USFM. 
Cannot read property 'bookCode' of undefined

Maybe report filename at least?

The text was updated successfully, but these errors were encountered:

kavitharaju · 2020-11-16T05:47:32Z

As per the updated JSON validation we get an error like this

{
  _messages: {
    _error: [
      'instance requires property "book"',
      'instance requires property "chapters"'
    ]
  }
}

But here it should report the error in USFM. Shall work on that.

But I dont know if defaulting to relaxed mode for all TSV/CSV conversion is a good idea. As the relaxed mode may accept incorrect usages and give incorrect data, I feel the user should choose that explicitly, other wise we should use, the stricter, normal mode, ensuring correctness of data we extract.

kavitharaju · 2020-11-16T06:05:43Z

The above error was shown because the USFM had errors.
We first convert the input USFM to JSON and that JSON is passed on to be converted to CSV and TSV.
In this case the resultant JSON of USFM parsing had the error report and it was passed on to the TSV converter method.

Changed this flow now. It checks if the USFM to JSON converson was successfull before passing it on to the toCSV() or toTSV().

Now the output, error report will be as follows

{
  _messages: {
    _error: 'Line 3977, col 6:\n' +
      '  3976 | \\v 18 \\zaln-s | x-strong="H0853" x-lemma="אֵת" x-morph="He,To" x-occurrence="1" x-occurrences="1" x-content="אֶת"\\*\\w परन्तु|x-occurrence="1" x-occurrences="1"\\w*\\zaln-e\\*\n' +
      '> 3977 | \\tl \\zaln-s | x-strong="H0854" x-lemma="אֵת" x-morph="He,R:Sp2fs" x-occurrence="1" x-occurrences="1" x-content="אִתָּ֑⁠ךְ"\\*\\w तेरे|x-occurrence="1" x-occurrences="1"\\w*\n' +
      '              ^\n' +
      '  3978 | \\w संग|x-occurrence="1" x-occurrences="1"\\w*\n' +
      'Expected "xt", "ex", "x", "ef", "fe", "f", "+", "+liv", "+jmp", "+w", "+rb", "+cat", "+ior", "+rq", "+lik", "+litl", "+qac", "+qs", "+wa", "+wh", "+wg", "+ndx", "+sup", "+sc", "+no", "+bdit", "+it", "+bd", "+em", "+wj", "+tl", "+sls", "+sig", "+qt", "+addpn", "+png", "+pn", "+ord", "+nd", "+k", "+dc", "+bk", or "+add"'
  }
}

joelthe1 · 2020-11-18T02:13:42Z

We will leave the decision to the user whether to use relaxed mode or not. The improved error reporting is good.

joelthe1 added question discuss Discuss about this labels Nov 13, 2020

kavitharaju mentioned this issue Nov 16, 2020

Add tests for CLI mode #96

Merged

joelthe1 closed this as completed Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TSV/CSV Conversion Should Default to `relaxed` mode #97

TSV/CSV Conversion Should Default to `relaxed` mode #97

joelthe1 commented Nov 13, 2020 •

edited

Loading

kavitharaju commented Nov 16, 2020

kavitharaju commented Nov 16, 2020

joelthe1 commented Nov 18, 2020

TSV/CSV Conversion Should Default to relaxed mode #97

TSV/CSV Conversion Should Default to relaxed mode #97

Comments

joelthe1 commented Nov 13, 2020 • edited Loading

kavitharaju commented Nov 16, 2020

kavitharaju commented Nov 16, 2020

joelthe1 commented Nov 18, 2020

TSV/CSV Conversion Should Default to `relaxed` mode #97

TSV/CSV Conversion Should Default to `relaxed` mode #97

joelthe1 commented Nov 13, 2020 •

edited

Loading