Skip to content

Converting DTDs into XML for analysis

Klortho edited this page Dec 18, 2012 · 2 revisions

dtdanalyzer produces an XML representation (elements and attributes) of a DTD. Here are some examples of usage:

Convert a local DTD into XML format, and write the results onto standard output:

dtdanalyzer example.dtd

Also works over HTTP. The following reads and analyzes a JATS DTD, and writes the output to a file:

dtdanalyzer http://jats.nlm.nih.gov/archiving/1.0/JATS-archivearticle1.dtd archive.daz.xml

You can specify the DTD with an instance XML document:

dtdanalyzer -d test1.xml

Or with a public identifer. The following also specifies an OASIS catalog file to use to resolve these public identifiers. Make sure you enter this command all on one line.

dtdanalyzer -p "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v0.4 20110131//EN" 
    -c catalog.xml

Detailed usage information

Usage:

dtdanalyzer [-d <xml-file> | -s <system-id> | -p <public-id>] [other-options] [<output file>]

All options have a short and a long form. The DTD to be processed can either be specified with a system identifier (either using "-s" or as a bare argument), a public identifier (using "-p") or by specifying an instance document ("-d").

  • -s, --system system-id - Use the given system identifier to find the DTD. This could be a relative pathname, if the DTD exists in a file on your system, or an HTTP URL.
  • -d, --doc xml-file - Specify an XML file used to find the DTD. This could be just a "stub" file, that contains nothing other than the doctype declaration and a root element. This file doesn't need to be valid according to the DTD.
  • -p, --public public-id - Use the given public identifier to find the DTD. This would be used in conjunction with an OASIS catalog file.

Other possible options and arguments are:

  • -h,--help - Print usage information and exit.
  • -v,--version - Print version information and exit.
  • -c, --catalog catalog-file - Specify a file to use as the OASIS catalog, to resolve system and public identifiers
  • -x, --xslt xslt-file - An XSLT script to run to post-process the output.
    This is optional.
  • -P,--param param=value - Parameter name & value to pass to the XSLT. You can use multiple instances of this option.
  • -t, --title dtd-title - Specify the title of this DTD. This will be output within a <title> element under the root <declarations> element of the output XML.
  • -r, --roots roots - Specify the set of possible root elements for documents conforming to this DTD. These elements will be tagged with a root=true attribute in the output. This will also cause the DtdAnalyzer to find those elements that are not reachable from this set of possible root elements, and to tag those with a reachable=false attribute. The argument to this should be a space-delimited list of element names.
  • -m,--markdown - Causes structured comments to be processed as Markdown. Requires pandoc to be installed on the system, and accessible to this process. Same as --docproc pandoc.
  • --docproc cmd - Command to use to process structured comments.
    This command should take its input on stdin, and produce valid XHTML fragments on stdout (i.e. not a complete XHTML document).
  • <output file> - Name of the file to write the output to. If this argument is not given, the output is written to standard out.