debiandoc-sgml to docbook-xml conversion

==========================================================================
        (original concept) Philippe Batailler <pbatailler@teaser.fr>
        (original concept) Adam DiCarlo       <aph@debian.org>
        (ghost writer) Osamu Aoki             <osamu@debian.org>
                                                  Sat Dec 14 00:54:21 2002
==========================================================================

Table of contents


Why convert?

Because it is cool to be XML :)


How to read this?

Use table capable web browser if you are reading HTML.


Step by step guide.

This is a rehashed tutorial given by Philippe Batailler's to Osamu Aoki through the private e-mails in 2002.

In order to convert debiandoc-sgml into docbook-xml, following steps needs to be taken:

  1. Install debiandoc2dbxml
  2. Make source file compatible with script manually

    Due to some conversion script limitations, if you experience problems converting files, please consider the following source touch-up rules presented below, although script might have fixed some of the issues already (it will not harm).

  3. Normalize SGML to XML compatible format (debiandoc-tidy)

    $ debiandoc-tidy foo.sgml
    $ debiandoc-tidy -e bar.ent
    

  4. Convert SGML tags into XML tags (debiandoc2dbxml)

    If foo.sgml is smaller article in a single file without subset of dtd.

    $ debiandoc2dbxml -a path/to/foo.sgml
    

    If foo.sgml is larger book in a single file without subset of dtd.

    $ debiandoc2dbxml -b path/to/foo.sgml
    

    If foo.sgml is larger book with many included files (subset of dtd).

    $ debiandoc2dbxml -s -b path/to/foo.sgml
    

    Now we have got a large single foo.xml

    If foo.sgml is larger book with many included files (subset of dtd). To create split file output in directory path/to/locale, you should use the option -S and the option -l (locale = en, fr, es...)

    $ debiandoc2dbxml -S -s -b -l fr path/to/foo.sgml
    

    Now we have got a foo.xml in path/to/ with many chunks of files under path/to/fr/

    For debugging, use "-k" to keep intermediate files and use "-t" to trace shell activities.

  5. Test it with emacs and psgml, or nsgmls:

    $ nsgmls -s /usr/share/sgml/declaration/xml.decl foo.xml
    
  6. Format source for readability

    In order to make source more readable, some reformatting may be good idea. For example, to add newline after </listitem>:

    $ perl -i -p -e's,</listitem>,</listitem>\n,g' foo.xml
    

  7. Building output

    There are few strategies to build output.
    Stylesheet Back end
    DSSSL jade and jadetex
    CSS mozilla?
    XSL passivetex?

    Needs more documentation for creating files (plain text, multi-file, HTML, PS, PDF).


How tags are converted?

Here is a conversion list of tags from debiandoc-sgml to dookbook-xml. Each column means as follows:

original debiandoc-sgml tag

converted docbook-xml tag using XLST

alternative docbook-xml tag

book

book (-b option)

article (-a option)

title

title

author

author

name

firstname + surname

email

affiliation + address + email (in author element)

email (other places)

version

releaseinfo

abstract

abstract + para

copyright

copyright

toc

(presentation tool takes care)

(stylesheet is needed?) (oa)

chapt

chapter (-b option)

section (-a option)

appendix

appendix

sect

section

sect1

section

sect2

section

sect3

section

sect4

section

p

para

em

emphasis

strong

emphasis role="strong" (aph)

emphasis role="bold" (pb)

emphasis role="important"

var

replaceable

package

systemitem role="package"

prgn

command

??? (what to use for well known file w/o path)

file

filename

filename class="directory" (if it end with /)

filename class="directory"

tt

literal

command (this should have been prgn but many documents do this)

constant

computeroutput

envar

function

keycap

keycode

keycombo

keysym

markup

option

parameter

prompt

property

returnvalue

sgmltag

symbol

token

userinput

varname

wordasword

(do we need all these? are all in docbook-simple?

qref

link

citation ?

ref

xref (empty element)

manref

citerefentry + refentrytitle + manvolnum

ftpsite (old)

(convert original tag to url in debiandoc source)

ftppath (old)

(convert original tag to url in debiandoc source)

httpsite (old)

(convert original tag to url in debiandoc source)

httppath (old)

(convert original tag to url in debiandoc source)

url

ulink

footnote

footnote

list

itemizedlist

list compact

itemizedlist spacing="compact"

enumlist

orderedlist

enumlist compact

orderedlist spacing="compact"

taglist

variablelist

taglist compact

variablelist (there is no "spacing" attribute)

(possibly converting to table)

item

listitem + para

tag

varlistentry + term

example

screen

literallayout class="monospaced"

heading

title

comment

remark

caution

tip

warning

note

comment/p

phrase

*HTML* (table)

*HTML* (tr)

*HTML* (th)

*HTML* (td)

*HTML* (img src)

?

Here *HTML* entries above is not real tags in debiandoc-sgml but tags of the missing feature to create corresponding HTML tags.

file splitting is has funny bug which create titletoc.xml in scripts directory. Also, multi file XML requires entries like:

<!ENTITY titletoc       SYSTEM "en/titletoc.sgml">
Currently, this is manual process.