Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use XML ALTO as an input file? #1013

Open
keto33 opened this issue May 16, 2023 · 1 comment
Open

Is it possible to use XML ALTO as an input file? #1013

keto33 opened this issue May 16, 2023 · 1 comment

Comments

@keto33
Copy link

keto33 commented May 16, 2023

I wonder if it is possible to save the output of pdfalto as an XML ALTO file, and parse it later with GROBID since it is indeed the internal process/steps.

pdfalto input.pdf alto.xml

curl -sS --form input=alto.xml localhost:8070/api/processFulltextDocument > /parsed.xml
@flckv
Copy link

flckv commented Nov 25, 2024

maybe this: /opt/grobid/grobid-home/pdfalto/lin-64/pdfalto_server -fullFontName -noLineNumbers -noImage -annotation -filesLimit 2000 -l 2 /tmp/bao.pdf /tmp/bao.lxml --timeout 120

see: #1014 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants