-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add support for ALTO v3 and v4 #1117
[FEATURE] Add support for ALTO v3 and v4 #1117
Conversation
I would suggest adding a feedback for future "unsupported" versions. This could be a log entry or writing direct to the command line (via the SymfonyStyle). This way at least we know that there is something off. Proposal: Add an |
I have testet it in our development enviroment with SOLR 8.11.2. |
Co-authored-by: Stefan Weil <sw@weilnetz.de> Co-authored-by: Christos Sidiropoulos <csidirop@runbox.com>
as proposed by @stweil i have moved the namespace registration to a private function to reduce code duplication. ...
while i like this idea, i currently have no time to look into the depths of how to achieve this. but we should keep that in mind for the future. looking at what ocr engines output currently, we cover the necessary schemas |
add Alto formats 3 and 4 in documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This PR makes the ALTO Parser agnostic of the actual ALTO schema version in the provided file. It checks the document namespaces and registers it accordingly. This way we can index ALTO files in the schema versions 2, 3 and 4.
I have testet it in our development enviroment with SOLR 9.4.0 and the ocr-highlighting module in version 0.8.3. I have not encountered any issues here. It has been rolled out in our productive enviroment as well.
While it does not resolve the general namespace issue, it does at least resolve the ALTO issue.
This is related to #488 .