Chapter 2. Creating DocBook Documents

Table of Contents
Making an SGML Document
Making an XML Document
Public Identifiers, System Identifiers, and Catalog Files
Physical Divisions: Breaking a Document into Physical Chunks
Logical Divisions: The Categories of Elements in DocBook
Making a DocBook Book
Making a Chapter
Making an Article
Making a Reference Page
Making Front- and Backmatter

This chapter explains in concrete, practical terms how to make DocBook documents. It's an overview of all the kinds of markup that are possible in DocBook documents. It explains how to create several kinds of DocBook documents: books, sets of books, chapters, articles, and reference manual entries. The idea is to give you enough basic information to actually start writing. The information here is intentionally skeletal; you can find “the details” in the reference section of this book.

Before we can examine DocBook markup, we have to take a look at what an SGML or XML system requires.

Making an SGML Document

SGML requires that your document have a specific prologue. The following sections describe the features of the prologue.

An SGML Declaration

SGML documents begin with an optional SGML Declaration. The declaration can precede the document instance, but generally it is stored in a separate file that is associated with the DTD. The SGML Declaration is a grab bag of SGML defaults. DocBook includes an SGML Declaration that is appropriate for most DocBook documents, so we won't go into a lot of detail here about the SGML Declaration.

In brief, the SGML Declaration describes, among other things, what characters are markup delimiters (the default is angle brackets), what characters can compose tag and attribute names (usually the alphabetical and numeric characters plus the dash and the period), what characters can legally occur within your document, how long SGML “names” and “numbers” can be, what sort of minimizations (abbreviation of markup) are allowed, and so on. Changing the SGML Declaration is rarely necessary, and because many tools only partially support changes to the declaration, changing it is best avoided, if possible.

Wayne Wholer has written an excellent tutorial on the SGML Declaration; if you're interested in more details, see http://www.oasis-open.org/cover/wlw11.html.

A Document Type Declaration

All SGML documents must begin with a document type declaration. This identifies the DTD that will be used by the document and what the root element of the document will be. A typical doctype declaration for a DocBook document looks like this:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN">

This declaration indicates that the root element, which is the first element in the hierarchical structure of the document, will be book and that the DTD used will be the one identified by the public identifier -//OASIS//DTD DocBook V3.1//EN. See the Section called Public Identifiers” later in this chapter.

An Internal Subset

It's also possible to provide additional declarations in a document by placing them in the document type declaration:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
<!ENTITY nwalsh "Norman Walsh">
<!ENTITY chap1 SYSTEM "chap1.sgm">
<!ENTITY chap2 SYSTEM "chap2.sgm">
]>

These declarations form what is known as the internal subset. The declarations stored in the file referenced by the public or system identifier in the DOCTYPE declaration is called the external subset and it is technically optional. It is legal to put the DTD in the internal subset and to have no external subset, but for a DTD as large as DocBook that wouldn't make much sense.

Note

The internal subset is parsed first and, if multiple declarations for an entity occur, the first declaration is used. Declarations in the internal subset override declarations in the external subset.

The Document (or Root) Element

Although comments and processing instructions may occur between the document type declaration and the root element, the root element usually immediately follows the document type declaration:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
<!ENTITY nwalsh "Norman Walsh">
<!ENTITY chap1 SYSTEM "chap1.sgm">
<!ENTITY chap2 SYSTEM "chap2.sgm">
]>
<book>
&chap1;
&chap2;
</book>

You cannot place the root element of the document in an external entity.

Typing an SGML Document

If you are entering SGML using a text editor such as Emacs or vi, there are a few things to keep in mind.[1] Using a structured text editor designed for SGML hides most of these issues.

  • DocBook element and attribute names are not case-sensitive. There's no difference between Para and pArA. Entity names are case-sensitive, however.

    If you are interested in future XML compatibility, input all element and attribute names strictly in lowercase.

  • If attribute values contain spaces or punctuation characters, you must quote them. You are not required to quote attribute values if they consist of a single word or number, although it is not wrong to do so.

    When quoting attribute values, you can use either a straight single quote ('), or a straight double quote ("). Don't use the “curly” quotes (“ and ”) in your editing tool.

    If you are interested in future XML compatibility, always quote all attribute values.

  • Several forms of markup minimization are allowed, including empty tags. Instead of typing the entire end tag for an element, you can type simply </>. For example:

    
    <para>
    This is <emphasis>important</>: never stick the tines of a fork
    in an electrical outlet.
    </para>
    
    

    You can use this technique for any and every tag, but it will make your documents very hard to understand and difficult to debug if you introduce errors. It is best to use this technique only for inline elements containing a short string of text.

    Empty start tags are also possible, but may be even more confusing. For the record, if you encounter an empty start tag, the SGML parser uses the element that ended last:

    
    <para>
    This is <emphasis>important</emphasis>.  So is <>this</emphasis>.
    </para>
    
    

    Both "important" and "this" are emphasized.

    If you are interested in future XML compatibility, don't use any of these tricks.

  • The null end tag (net) minimization feature allows constructions like this:

    
    <para>
    This is <emphasis/important/: never stick the tines of a fork
    in an electrical outlet.
    </para>
    
    

    If, instead of ending a start tag with >, you end it with a slash, then the next occurrence of a slash ends the element.

    If you are interested in future XML compatibility, don't use net tag minimization either.

If you are willing to modify both the declaration and the DTD, even more dramatic minimizations are possible, including completely omitted tags and "shortcut" markup.

Note Removing Minimizations
 

Although we've made a point of reminding you about which of these minimization features are not valid in XML, that's not really a sufficient reason to avoid using them. (The fact that many of the minimization features can lead to confusing, difficult-to-author documents might be.)

If you want to convert one of these documents to XML at some point in the future, you can run it through a program like sgmlnorm, which will remove all the minimizations and insert the correct, verbose markup. The sgmlnorm program is part of the SP and Jade distributions, which are on the CD-ROM.

Notes

[1]

Many of these things are influenced by the SGML declaration in use. For the purpose of this discussion, we assume you are using the standard DocBook declaration.