The DocBook DTD as XML

Converting the DocBook DTD to XML is much more challenging than converting the instances. It is probably not possible to construct an XML DTD that is identical to the validation power of DocBook. The list below identifies most of the issues that must be addressed, and describes how the DocBook XML DTD; deals with them:

Comments are not allowed inside markup declarations

Most of them have been moved to comment declarations preceding the markup declaration that used to contain them. A few small, inline comments that seemed like they would be out of context if moved before the declaration were simply deleted.

Name groups are not allowed in element or attribute list declarations

The small number of places in which DocBook uses name groups have been expanded.

There's one downside: DocBook uses %admon.class; in a name group to define the content model, and attribute lists for elements in the admonitions class. In DocBook XML, this convenience cannot be expressed. If additional admonitions are added, the element and attribute list declarations will have to be copied for them.

No CDATA or RCDATA declared content

Graphic and InlineGraphic have been made EMPTY. The content model for SynopFragmentRef , the only RCDATA element in DocBook, has been changed to (arg | group)+.

No exclusions or inclusions on element declarations

They had to be removed.

In DocBook, exclusions are used to exclude the following:



Removing these exclusions from DocBook XML means that it is now valid, in the XML sense, to do some things that don't make a lot of sense (like put a Footnote in a Footnote). Be careful.

Inclusions in DocBook are used to add the ubiquitious elements ( indexterm and BeginPage) unconditionally to a large number of contexts. In order to make these elements available in DocBook XML, they have been added to most of the parameter entities that include #PCDATA. If new locations are discovered where these terms are desired, DocBook XML will be updated.

Elements with mixed content must have #PCDATA first.

The content models of many elements have been updated to make them a repeatable OR group beginning with #PCDATA.

Many declared attribute types (NAME, NUMBER, NUTOKEN, and so on) are not allowed

They have all been replaced by NMTOKEN or CDATA.

No #CONREF attributes allowed.

The #CONREF attributes on indexterm, GlossSee, and GlossSeeAlso were changed to #IMPLIED. The content model of indexterm was modified so that it can be empty.

Attribute default values must be quoted.

Quotes were added wherever necessary.