Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comparison between the contents of the <Word> elements and the <TextEquivType><Unicode> elements : Schematron #16

Closed
tboenig opened this issue Sep 17, 2018 · 5 comments
Assignees

Comments

@tboenig
Copy link
Contributor

tboenig commented Sep 17, 2018

It can happen that a comparison between the content of the elements <Word> and <TextEquiv><Unicode> detects differences. To check any differences a schematron is necessary.

@bertsky
Copy link
Contributor

bertsky commented Sep 17, 2018

You surely mean between some element's TextEquiv:Unicode and its sub-component element's TextEquiv:Unicode, as in:

  • between TextRegion and its TextLine sequence
  • between TextLine and its Word sequence
  • between Word and its Glyph sequence

Should the consistency principle not be added to the spec in PAGE.md?

I sometimes use XSL transformations to concatenate sub-components (joining them by whitespace or newline, depending on position) – maybe this is a good starting point for such schematron. If you think those would in fact be useful, where can I put them?

(But then again, why not add a function to WorkspaceValidator.validate() in core instead?)

@kba
Copy link
Member

kba commented Sep 17, 2018

Should the consistency principle not be added to the spec in PAGE.md?

It should.

Why not add a function to WorkspaceValidator.validate() in core instead?

We could. Preferably define it in the spec first and then implement it referring to it.

@bertsky
Copy link
Contributor

bertsky commented Oct 30, 2018

Shouldn't we at least start an issue on core to support PAGE-related consistency in WorkspaceValidator.validate() before closing here? The problem is that we might have to actually look at the GT data in order to get the consistency principle right in the details. See this comment.

@kba
Copy link
Member

kba commented Oct 30, 2018

Sure. Closed it because I thought OCR-D/spec#82 would be the fix, but that's just the spec not implementation.

@kba kba reopened this Oct 30, 2018
@kba
Copy link
Member

kba commented Dec 18, 2018

Implemented in OCR-D/core#223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants