In order to create DocBook documents in XML, you'll need an XML version of DocBook. We've included one on the CD, but it hasn't been officially adopted by the OASIS DocBook Technical Committee yet. If you're interested in the technical details, Appendix B, describes the specific differences between SGML and XML versions of DocBook.
XML, like SGML, requires a specific prologue in your document. The following sections describe the features of the XML prologue.
XML documents should begin with an XML declaration. Unlike the SGML declaration, which is a grab bag of features, the XML declaration identifies a few simple aspects of the document:
<?xml version="1.0" standalone="no"?> |
Identifying the version of XML ensures that future changes to the XML specification will not alter the semantics of this document. The standalone declaration simply makes explicit the fact that this document cannot “stand alone,” and that it relies on an external DTD. The complete details of the XML declaration are described in the XML specification.
Strictly speaking, XML documents don't require a DTD. Realistically, DocBook XML documents will have one.
The document type declaration identifies the DTD that will be used by the document and what the root element of the document will be. A typical doctype declaration for a DocBook document looks like this:
<?xml version='1.0'?> <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN" "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"> |
This declaration indicates that the root element will be book and that the DTD used will be the one indentified by the public identifier -//Norman Walsh//DTD DocBk XML V3.1.4//EN. External declarations in XML must include a system identifier (the public identifier is optional). In this example, the DTD is stored on a web server.
System identifiers in XML must be URIs. Many systems may accept filenames and interpret them locally as file: URLs, but it's always correct to fully qualify them.
It's also possible to provide additional declarations in a document by placing them in the document type declaration:
<?xml version='1.0'?> <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4/EN" "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd" [ <!ENTITY nwalsh "Norman Walsh"> <!ENTITY chap1 SYSTEM "chap1.sgm"> <!ENTITY chap2 SYSTEM "chap2.sgm"> ]> |
These declarations form what is known as the internal subset. The declarations stored in the file referenced by the public or system identifier in the DOCTYPE declaration is called the external subset, which is technically optional. It is legal to put the DTD in the internal subset and to have no external subset, but for a DTD as large as DocBook, that would make very little sense.
Although comments and processing instructions may occur between the document type declaration and the root element, the root element usually immediately follows the document type declaration:
<?xml version='1.0'?> <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN" "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd" [ <!ENTITY nwalsh "Norman Walsh"> <!ENTITY chap1 SYSTEM "chap1.sgm"> <!ENTITY chap2 SYSTEM "chap2.sgm"> ]> <book>...</book> |
The important point is that the root element must be physically present immediately after the document type declaration. You cannot place the root element of the document in an external entity.
If you are entering SGML using a text editor such as Emacs or vi, there are a few things to keep in mind. Using a structured text editor designed for XML hides most of these issues.
In XML, all markup is case-sensitive. In the XML version of DocBook, you must always type all element, attribute, and entity names in lowercase.
You are required to quote all attribute values in XML.
When quoting attribute values, you can use either a straight single quote ('), or a straight double quote ("). Don't use the “curly” quotes (“ and ”) in your editing tool.
Empty elements in XML are marked with a distinctive syntax: <xref/>.
Processing instructions in XML begin and end with a question mark: <?pitarget data?>.
XML was designed to be served, received, and processed over the Web. Two of its most important design principles are ease of implementation and interoperability with both SGML and HTML.
The markup minimization features in SGML documents make it more difficult to process, and harder to write a parser to interpret it; these minimization features also run counter to the XML design principles named above. As a result, XML does not support them.
Luckily, a good authoring environment can offer all of the features of markup minimization without interfering with the interoperability of documents. And because XML tools are easier to write, it's likely that good, inexpensive XML authoring environments will be available eventually.
Conceptually, almost everything in this book applies equally to SGML and XML. But because DocBook V3.1 is an SGML DTD, we naturally tend to use SGML conventions in our writing. If you're primarily interested in XML, there are just a few small details to keep in mind.
XML is case-sensitive, while the SGML version of DocBook is not. In this book, we've chosen to present the element names using mixed case (Book, indexterm, XRef, and so on), but in the DocBook XML DTD, all element, attribute, and entity names are strictly lowercase.
Empty element start tags in XML are marked with a distinctive syntax: <xref/>. In SGML, the trailing slash is not present, so some of our examples need slight revisions to be valid XML elements.
Processing instructions in XML begin and end with a question mark: <?pitarget data?>. In SGML, the trailing question mark is not present, so some of our examples need slight revisions to be valid XML elements.
Generally we use public identifiers in examples, but whenever system identifiers are used, don't forget that XML system identifiers must be Uniform Resource Indicators (URIs), in which SGML system identifiers are usually simple filenames.
For a more detailed discussion of DocBook and XML, see Appendix B.