This document describes dbtotexi
, a simple utility for converting XML documents that conform to a subset of the DocBook DTD into GNU texinfo format. The dbtotexi
program is implemented using the XSL Transformations language as described in the working document http://www.w3.org/TR/1999/WD-xslt-19990421. A Java based XSL engine1 carries out the actual transformation as determined by the style sheet dbtotexi.xsl
. A small amount of additional Java code provides a few utility routines not provided by the XSL implementation.
This software is subject to the terms of the GNU General Public License. Please see the file COPYING
for details. The license terms that apply to the supplied third party software contained in the files sax.jar
, xp.jar
and xt.jar
are specified in the files sax-copying.txt
, xp-copying.txt
and xt-copying.txt
respectively.
Once the tar archive has been unpacked2, check the Makefile
to see if the settings at the top are suitable for your site and then just type make
and make install
. By default, the dbtotexi
bash shell script goes into /usr/local/bin
and the support files into /usr/local/share/dbtotexi
. A compiled version of the Java support code is supplied so that you do not need a Java compiler unless you change the Java code.
The installation defaults to using Sun's jre
VM but any JDK 1.1 compliant implementation (such as Kaffe3) should work. No GUI facilities or additional libraries are required. If you use a different VM then the shell script, dbtotexi.sh
may need editing.
A DocBook source file, foo.xml
, is converted to texinfo format very simply:
dbtotexi foo.xml
Will produce output in foo.texinfo
. The name of the output file can be explicitly specified as a second argument. If the output file name is specified as -
, the output is sent to stdout. A third argument will specify the name of the info file to produce, this defaults to the input file name modified to have a .info
suffix. Any DocBook elements that are not recognised (due to either an error in the input document or because the translator does not yet support a translation for that element) are reported to stderr and shown in the output in bold.
A document that conforms to the SGML DocBook DTD must first be converted to XML before it can be processed by dbtotexi
. This can be done using the sx
program that is part of James Clark's SP
SGML toolset. Typical usage would be:
sx -xlower foo.sgm > foo.xml
Note
The XML version of the DocBook DTD is not actually required by the conversion process (but see sec_texinfopi). In fact, if the document to be converted doesn't contain a
DOCTYPE
declaration then the conversion process is somewhat quicker. Irrespective of whether the document contains aDOCTYPE
declaration, it should be valid (i.e. it conforms to the DocBook XML DTD).
This section describes how the translation of some the elements are influenced by the setting of the element's role
attribute.
indexterm
role
attribute can be set to one of c
, f
, v
, k
, p
and d
to indicate which index the entry should be entered in. If the role
attribute is not specified the entry will be entered into the concept index by default.
index
role
attribute can be set to one of c
, f
, v
, k
, p
and d
to indicate which index should be output. If the role
attribute is not specified the concept index will be output by default.
variablelist
role
attribute can be set to one of bold
or fixed
to indicate that the list's terms should be displayed in bold or fixed-width font respectively. If the role
attribute is not specified, the list's terms be displayed "as is".
texinfo
Processing InstructionThe texinfo
processing instruction can be used within a document to insert arbitrary markup into the output. The characters @
, {
and }
are not escaped. This facility can be used to define entities that contain texinfo markup. For example, given that the following general entity declaration is placed in the DTD subset:
<!ENTITY hellip "<?texinfo @dots{}?>">
One can write …
and expect to get dots...!
dircategory
& direntry
Processing InstructionsThe dircategory
and direntry
processing instructions may be used to set the resulting info file's directory category and menu entry. These processing instructions are best positioned after the document type declaration but before the first element (<book>
or <article>
). Here's what this document uses:
<?dircategory Texinfo documentation system?> <?direntry * Dbtotexi: (dbtotexi). DocBook to Texinfo convertor.?>
A few Unicode characters are recognised in element content and converted into the equivalent texinfo command. Unrecognised Unicode characters are passed through unchanged. Norman Walsh's DocBook XML DTD defines the ISO entity set in terms of Unicode characters. app_unicode lists the set of Unicode characters that are currently recognised.
A couple of points should be born in mind:
More information can be found from these links:
http://www.w3.org/TR/WD-xslt
http://www.jclark.com/
xt
and the SP
toolset.
http://nwalsh.com/
http://www.kaffe.org/
The following table lists the set of Unicode characters that are currently recognised. The name of the XML entity that yields each character is also listed.
Unicode Character | Rendered As | Entity Name
|
00a0 | nbsp
| |
00a1 | ¡ | iexcl
|
00a3 | £ | pound
|
00a9 | © | copy
|
00bf | ¿ | iquest
|
00c6 | Æ | AElig
|
00df | ß | szlig
|
00e6 | æ | aelig
|
2022 | | bull
|
2026 | ... | hellip
|
| ||
0131 | i | inodot
|
| ||
00a8 | ¨ | uml
|
00e4 | ä | auml
|
00c4 | Ä | Auml
|
00eb | ë | euml
|
00cb | Ë | Euml
|
00ef | ¨i | iuml
|
00cf | Ï | Iuml
|
00f6 | ö | ouml
|
00d6 | Ö | Ouml
|
00fc | ü | uuml
|
00dc | Ü | Uuml
|
00ff | ÿ | yuml
|
0178 | ¨Y | Yuml
|
| ||
00b4 | ´ | acute
|
00e1 | á | aacute
|
00c1 | Á | Aacute
|
00e9 | é | eacute
|
00c9 | É | Eacute
|
00ed | ´i | iacute
|
00cd | Í | Iacute
|
00f3 | ó | oacute
|
00d3 | Ó | Oacute
|
00fa | ú | uacute
|
00da | Ú | Uacute
|
00fd | ý | yacute
|
00dd | Ý | Yacute
|
0107 | ´c | cacute
|
0106 | ´C | Cacute
|
01f5 | ´g | gacute
|
013a | ´l | lacute
|
0139 | ´L | Lacute
|
0144 | ´n | nacute
|
0143 | ´N | Nacute
|
0155 | ´r | racute
|
0154 | ´R | Racute
|
015b | ´s | sacute
|
015a | ´S | Sacute
|
017a | ´z | zacute
|
0179 | ´Z | Zacute
|
| ||
00b8 | ¸ | cedil
|
00e7 | ç | ccedil
|
00c7 | Ç | Ccedil
|
0122 | ¸G | Gcedil
|
0137 | ¸k | kcedil
|
0136 | ¸K | Kcedil
|
013c' | ¸l | lcedil
|
013b | ¸L | Lcedil
|
0146 | ¸n | ncedil
|
0145 | ¸N | Ncedil
|
0157 | ¸r | rcedil
|
0156 | ¸R | Rcedil
|
015f | ¸s | scedil
|
015e | ¸S | Scedil
|
0163 | ¸t | tcedil
|
0162 | ¸T | Tcedil
|
| ||
00af | ¯ | macr
|
0101 | a¯ | amacr
|
0100 | A¯ | Amacr
|
0113 | e¯ | emacr
|
0112 | E¯ | Emacr
|
012a | I¯ | Imacr
|
012b | i¯ | imacr
|
014c | O¯ | Omacr
|
014d | o¯ | omacr
|
016b | u¯ | umacr
|
016a | U¯ | Umacr
|
| ||
00e2 | â | acirc
|
00c2 | Â | Acirc
|
00ea | ê | ecirc
|
00cA | Ê | Ecirc
|
00ee | ^i | icirc
|
00ce | Î | Icirc
|
00f4 | ô | ocirc
|
00d4 | Ô | Ocirc
|
00db | û | ucirc
|
00fb | Û | Ucirc
|
0109 | ^c | ccirc
|
0108 | ^C | Ccirc
|
011d | ^g | gcirc
|
011c | ^G | Gcirc
|
0125 | ^h | hcirc
|
0124 | ^H | Hcirc
|
0135 | ^j | jcirc
|
0134 | ^J | Jcirc
|
015d | ^s | scirc
|
015c | ^S | Scirc
|
0175 | ^w | wcirc
|
0174 | ^W | Wcirc
|
0177 | ^y | ycirc
|
0176 | ^Y | Ycirc
|
| ||
00e0 | à | agrave
|
00c0 | À | Agrave
|
00e8 | è | egrave
|
00c8 | È | Egrave
|
00ec | `i | igrave
|
00cc | Ì | Igrave
|
00f2 | ò | ograve
|
00d2 | Ò | Ograve
|
00f9 | ù | ugrave
|
00d9 | Ù | Ugrave
|
| ||
00e3 | ã | atilde
|
00c3 | Ã | Atilde
|
00f1 | ñ | ntilde
|
00d1 | ~N | Ntilde
|
00f5 | õ | otilde
|
00d5 | Õ | Otilde
|
0129 | ~i | itilde
|
0128 | ~I | Itilde
|
0169 | ~u | utilde
|
0168 | ~U | Utilde
|
Currently, I am using James Clark's xt
.
You must have done that already to be reading this!
http://www.kaffe.org/