Chapter 3. Parsing DocBook Documents

Table of Contents
Validating Your Documents
Understanding Parse Errors
Considering Other Schema Languages

A key feature of SGML and XML markup is that you validate it. The DocBook DTD is a precise description of valid nesting, the order of elements, and their content. All DocBook documents must conform to this description or they are not DocBook documents (by definition).

A validating parser is a program that can read the DTD and a particular document and determine whether the exact nesting and order of elements in the document is valid according to the DTD.

If you are not using a structured editor that can enforce the markup as you type, validation with an external parser is a particularly important step in the document creation process. You cannot expect to get rational results from subsequent processing (such as document publishing) if your documents are not valid.

The most popular free SGML parser is SP by James Clark, available at http://www.jclark.com/.

SP includes nsgmls, a fast command-line parser. In the world of free validating XML parsers, James Clark's xp is a popular choice.

Note

Not all XML parsers are validating, and although a non-validating parser may have many uses, it cannot ensure that your documents are valid according to the DTD.

Validating Your Documents

The exact way in which the parser is executed varies according to the parser in use, naturally. For information about your particular parser, consult the documentation that came with it.

Using nsgmls

The nsgmls command from SP is a validating SGML parser. The options used in the example below suppress the normal output (-s), except for error messages, print the version number (-v), and specify the catalog file that should be used to map public identifiers to system identifiers. Printing the version number guarantees that you always get some output, so that you know the command ran successfully:


[n:\dbtdg] nsgmls -sv -c \share\sgml\catalog test.sgm
m:\jade\nsgmls.exe:I: SP version "1.3.2"

Because no error messages were printed, we know our document is valid. If you're working with a document that you discover has many errors, the -f option offers a handy way to direct the errors to a file so they don't all scroll off your screen.

If you want to validate an XML document with SP, you must make sure that SP uses the correct declaration. An XML declaration called xml.dcl is included with SP.

The easiest way to make sure that SP uses xml.dcl is to include the declaration explicitly on the command line when you run nsgmls (or Jade, or other SP tools):


[n:\dbtdg] nsgmls -sv -c \share\sgml\catalog m:\jade\xml.dcl test.xml
m:\jade\nsgmls.exe:I: SP version "1.3.2"

Using xp

The xp distribution includes several sample programs. One of these programs, Time, performs a validating parse of the document and prints the amount of time required to parse the DTD and the document. This program makes an excellent validity checker:


java com.jclark.xml.apps.Time examples\simple.xml
6.639

The result states that it took 6.639 seconds to parse the DTD and the document. This indicates that the document is valid. If the document is invalid, additional error messages are displayed.