Concatenate PageXML-formatted xml files.
pagexmlcat [OPTIONS] [FILES...]
Concatenate FILE(s) to standard output. With no file or if file is
-
, read standard input.
-h
print help
-index
comma-separated list of (zero-based) indices to select from
multiple TextEquiv elements (negative indices count from the end)
-serial
ignore region ordering in the document and use the explicit
region ordering of the document
-id
prefix output lines with their respective line (or word) ids
-conf
prefix output lines with their respective confidences (if
available)
-region
set region type to output (line|word|glyph|block
); default
is line
-filename
output the filename of printed regions
-norm
replace each space with _
in output text
pagexmlcat a.xml - b.xml
Output a.xml's contents, then standard
input, then b.xml's contents.
pagexmlcat
Output document from standard input to standard output.
pagexmlcat -index 0,-1
Output the first and last text equiv region
for each line from standard input to standard output.
pagexmlcat -region word a.xml
Output a.xml's words to standard
output.
pagexmlcat -region block a.xml
Output a.xml's text regions to
standard output.
Written by Florian Fink