Making Front- and Backmatter

DocBook contains markup for the usual variety of front- and backmatter necessary for books and articles: indexes, glossaries, bibliographies, and tables of contents. In many cases, these components are generated automatically, at least in part, from your document by an external processor, but you can create them by hand, and in either case, store them in DocBook.

Some forms of backmatter, like indexes and glossaries, usually require additional markup in the document to make generation by an application possible. Bibliographies are usually composed by hand like the rest of your text, unless you are automatically selecting bibliographic entries out of some larger database. Our principal concern here is to acquaint you with the kind of markup you need to include in your documents if you want to construct these components.

Frontmatter, like the table of contents, is almost always generated automatically from the text of a document by the processing application. If you need information about how to mark up a table of contents in DocBook, please consult the reference page for ToC.

Making an Index

In some highly-structured documents such as reference manuals, you can automate the whole process of generating an index successfully without altering or adding to the original source. You can design a processing application to select the information and compile it into an adequate index. But this is rare.

In most cases—and even in the case of some reference manuals—a useful index still requires human intervention to mark occurrences of words or concepts that will appear in the text of the index.

Marking index terms

Docbook distinguishes two kinds of index markers: those that are singular and result in a single page entry in the index itself, and those that are multiple and refer to a range of pages.

You put a singular index marker where the subject it refers to actually occurs in your text:


<para>
The tiger<indexterm>
<primary>Big Cats</primary>
<secondary>Tigers</secondary></indexterm>
is a very large cat indeed.
</para>
This index term has two levels, primary and secondary. They correspond to an increasing amount of indented text in the resultant index. DocBook allows for three levels of index terms, with the third labeled tertiary.

There are two ways that you can index a range of text. The first is to put index marks at both the beginning and end of the discussion. The mark at the beginning asserts that it is the start of a range, and the mark at the end refers back to the beginning. In this way, the processing application can determine what range of text is indexed. Here's the previous tiger example recast as starting and ending index terms:


<para>
The tiger<indexterm id="tiger-desc" class="startofrange">
<primary>Big Cats</primary>
<secondary>Tigers</secondary></indexterm>
is a very large cat indeed…
</para>
⋮
<para>
So much for tigers<indexterm startref="tiger-desc" class="endofrange">. Let's talk about
leopards.  
</para>

Note that the mark at the start of the range identifies itself as the start of a range with the Class attribute, and provides an ID. The mark at the end of the range points back to the start.

Another way to mark up a range of text is to specify that the entire content of an element, such as a chapter or section, is the complete range. In this case, all you need is for the index term to point to the ID of the element that contains the content in question. The Zone attribute of indexterm provides this functionality.

One of the interesting features of this method is that the actual index marks do not have to occur anywhere near the text being indexed. It is possible to collect all of them together, for example, in one file, but it is not invalid to have the index marker occur near the element it indexes.

Suppose the discussion of tigers in your document comprises a whole text object (like a Sect1 or a Chapter) with an ID value of tiger-desc. You can put the following tag anywhere in your document to index that range of text:


<indexterm zone="tiger-desc">
<primary>Big Cats</primary>
<secondary>Tigers</secondary></indexterm>


DocBook also contains markup for index hits that point to other index hits (of the same type such as "See Cats, big" or "See also Lions"). See the reference pages for See and SeeAlso.

Printing an index

After you have added the appropriate markup to your document, an external application can use this information to build an index. The resulting index must have information about the page numbers on which the concepts appear. It's usually the document formatter that builds the index. In this case, it may never be instantiated in DocBook.

However, there are applications that can produce an index marked up in DocBook. The following example includes some one- and two-level IndexEntry elements (which correspond to the primary and secondary levels in the indexterms themselves) that begin with the letter D:


<!DOCTYPE index PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
<index><title>Index</title>
<indexdiv><title>D</title>
<indexentry>
  <primaryie>database (bibliographic), 253, 255</primaryie>
     <secondaryie>structure, 255</secondaryie>
     <secondaryie>tools, 259</secondaryie>
</indexentry>
<indexentry>
  <primaryie>dates (language specific), 179</primaryie>
</indexentry>
<indexentry>
  <primaryie>DC fonts, <emphasis>172</emphasis>, 177</primaryie>
     <secondaryie>Math fonts, 177</secondaryie>
</indexentry>
</indexdiv>
</index>


Making a Glossary

Glossarys, like Bibliographys, are often constructed by hand. However, some applications are capable of building a skeletal index from glossary term markup in the document. If all of your terms are defined in some glossary database, it may even be possible to construct the complete glossary automatically.

To enable automatic glossary generation, or simply automatic linking from glossary terms in the text to glossary entries, you must add markup to your documents. In the text, you markup a term for compilation later with the inline GlossTerm tag. This tag can have a LinkEnd attribute whose value is the ID of the actual entry in the glossary.[1]

For instance, if you have this markup in your document:


<glossterm linkend="xml">Extensible Markup Language</glossterm> is a new standard… 

your glossary might look like this:


<!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
<glossary><title>Example Glossary</title>
⋮
<glossdiv><title>E</title>

<glossentry id="xml"><glossterm>Extensible Markup Language</glossterm>
  <acronym>XML</acronym>
<glossdef>
  <para>Some reasonable definition here.</para>
  <glossseealso otherterm="sgml">
</glossdef>
</glossentry>

</glossdiv>

Note that the GlossTerm tag reappears in the glossary to mark up the term and distinguish it from its definition within the GlossEntry. The ID that the GlossEntry referenced in the text is the ID of the GlossEntry in the Glossary itself. You can use the link between source and glossary to create a link in the online form of your document, as we have done with the online form of the glossary in this book.

Making a Bibliography

There are two ways to set up a bibliography in DocBook: you can have the data raw or cooked. Here's an example of a raw bibliographical item, wrapped in the Biblioentry element:


<biblioentry xreflabel="Kites75">
  <authorgroup>
    <author><firstname>Andrea</firstname><surname>Bahadur</surname></author>
    <author><firstname>Mark</><surname>Shwarek</></author>
  </authorgroup>
  <copyright><year>1974</year><year>1975</year>
     <holder>Product Development International Holding N. V.</holder>
     </copyright>
  <isbn>0-88459-021-6</isbn>    
  <publisher>
    <publishername>Plenary Publications International, Inc.</publishername>
  </publisher>
  <title>Kites</title>
  <subtitle>Ancient Craft to Modern Sport</subtitle>
  <pagenums>988-999</pagenums>
  <seriesinfo>
    <title>The Family Creative Workshop</title>
    <seriesvolnums>1-22</seriesvolnums>
    <editor>
      <firstname>Allen</firstname>
      <othername role=middle>Davenport</othername>
      <surname>Bragdon</surname>
      <contrib>Editor in Chief</contrib>
    </editor>
  </seriesinfo>
</biblioentry>

The “raw” data in a Biblioentry is comprehensive to a fault—there are enough fields to suit a host of different bibliographical styles, and that is the point. An abundance of data requires processing applications to select, punctuate, order, and format the bibliographical data, and it is unlikely that all the information provided will actually be output.

All the “cooked” data in a Bibliomixed entry in a bibliography, on the other hand, is intended to be presented to the reader in the form and sequence in which it is provided. It even includes punctuation between the fields of data:


<bibliomixed>
  <bibliomset relation=article>
    <surname>Walsh</surname>, <firstname>Norman</firstname>.
    <title role=article>Introduction to Cascading Style Sheets</title>.
  </bibliomset>
  <bibliomset relation=journal>
    <title>The World Wide Web Journal</title> 
    <volumenum>2</volumenum><issuenum>1</issuenum>.
    <publishername>O'Reilly & Associates, Inc.</publishername> and
    <corpname>The World Wide Web Consortium</corpname>.
    <pubdate>Winter, 1996</pubdate></bibliomset>.
</bibliomixed>

Clearly, these two ways of marking up bibliographical entries are suited to different circumstances. You should use one or the other for your bibliography, not both. Strictly speaking, mingling the raw and the cooked may be “kosher” as far as the DTD is concerned, but it will almost certainly cause problems for most processing applications.

Notes

[1]

Some sophisticated formatters might even be able to establish the link simply by examining the content of the terms and the glossary. In that case, the author is not required to make explicit links.