Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSV/CSV Conversion Should Default to relaxed mode #97

Closed
joelthe1 opened this issue Nov 13, 2020 · 3 comments
Closed

TSV/CSV Conversion Should Default to relaxed mode #97

joelthe1 opened this issue Nov 13, 2020 · 3 comments
Labels
discuss Discuss about this question

Comments

@joelthe1
Copy link
Collaborator

joelthe1 commented Nov 13, 2020

When converting to TSV and CSV mode from the command line the same USFM file errors out when trying to convert to TSV or CSV except when using the option -l relaxed. I think we should default to relaxed mode whenever the user just wants to convert to TSV or CSV. Specifically I was trying to parse 01-GEN.usfm from here file.

Also, I think we should find out why some of those files won't parse. On the command line I just see this:

Error parsing the input USFM. 
Cannot read property 'bookCode' of undefined

Maybe report filename at least?

@joelthe1 joelthe1 added question discuss Discuss about this labels Nov 13, 2020
@kavitharaju
Copy link
Collaborator

As per the updated JSON validation we get an error like this

{
  _messages: {
    _error: [
      'instance requires property "book"',
      'instance requires property "chapters"'
    ]
  }
}

But here it should report the error in USFM. Shall work on that.

But I dont know if defaulting to relaxed mode for all TSV/CSV conversion is a good idea. As the relaxed mode may accept incorrect usages and give incorrect data, I feel the user should choose that explicitly, other wise we should use, the stricter, normal mode, ensuring correctness of data we extract.

@kavitharaju
Copy link
Collaborator

The above error was shown because the USFM had errors.
We first convert the input USFM to JSON and that JSON is passed on to be converted to CSV and TSV.
In this case the resultant JSON of USFM parsing had the error report and it was passed on to the TSV converter method.

Changed this flow now. It checks if the USFM to JSON converson was successfull before passing it on to the toCSV() or toTSV().

Now the output, error report will be as follows

{
  _messages: {
    _error: 'Line 3977, col 6:\n' +
      '  3976 | \\v 18 \\zaln-s | x-strong="H0853" x-lemma="אֵת" x-morph="He,To" x-occurrence="1" x-occurrences="1" x-content="אֶת"\\*\\w परन्तु|x-occurrence="1" x-occurrences="1"\\w*\\zaln-e\\*\n' +
      '> 3977 | \\tl \\zaln-s | x-strong="H0854" x-lemma="אֵת" x-morph="He,R:Sp2fs" x-occurrence="1" x-occurrences="1" x-content="אִתָּ֑⁠ךְ"\\*\\w तेरे|x-occurrence="1" x-occurrences="1"\\w*\n' +
      '              ^\n' +
      '  3978 | \\w संग|x-occurrence="1" x-occurrences="1"\\w*\n' +
      'Expected "xt", "ex", "x", "ef", "fe", "f", "+", "+liv", "+jmp", "+w", "+rb", "+cat", "+ior", "+rq", "+lik", "+litl", "+qac", "+qs", "+wa", "+wh", "+wg", "+ndx", "+sup", "+sc", "+no", "+bdit", "+it", "+bd", "+em", "+wj", "+tl", "+sls", "+sig", "+qt", "+addpn", "+png", "+pn", "+ord", "+nd", "+k", "+dc", "+bk", or "+add"'
  }
}

@joelthe1
Copy link
Collaborator Author

We will leave the decision to the user whether to use relaxed mode or not. The improved error reporting is good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Discuss about this question
Projects
None yet
Development

No branches or pull requests

2 participants