Txt2tags User Guide

Aurelio, Sun Nov 30 18:01:09 2003



About this document

"Hi! I'm the txt2tags manual document.

Here you'll find all available information about the txt2tags text conversion tool.

You can find my latest version at http://txt2tags.sf.net/userguide/

For more informations and recent releases, please visit the txt2tags website.

Enjoy!"


Part I - Introducing Txt2tags

The First Questions You May Have

This chapter is a txt2tags overview, that will introduce the program purpose and features.


What is it?

Txt2tags is a text formatting and conversion tool.

Txt2tags converts a plain text file with little marks, to any of the supported targets:


Why should I use it?

You'll find txt2tags really useful if you:

And the main motivation is:

Why is it a good choice among other tools?

Txt2tags has a very straight way of growing, following basic concepts. These are the highlights:

Source file readable Txt2tags marks are very simple, almost natural.
Target document readable As the source file, the target document is also readable, with indentation and short lines.
Consistent marks Txt2tags marks are unique enough to fit at all kind of documents and don't be confused with the document contents.
Consistent Rules As the marks, the rules that applies to them are tied to each other, there are no "exceptions" or "special cases".
Simple structures All the supported formatting are simple, with no extra-options or complicated behaviour modifiers. A mark is just a mark, with no options at all.
Easy to learn With simple marks and readable source, the txt2tags learning curve is user friendly.
Nice examples The sample files included on the package gives real life examples of simple and over-complicated documents written for txt2tags.
Valuable Tools The syntax files included on the package (for vim and emacs editors) help you write documents with no syntax errors.
Three user interfaces There is a Graphical Tk interface that is very user friendly, a Web interface to use it remotely or on the intranet, and a Command Line interface for powerusers and scripting.
Scripting With the full featured comand line mode, an experienced user can automatize tasks and do post-editing on the converted files.
Download and run / Multi-platform Txt2tags is a single Python script. There is no need to compile it or download extra modules. So it runs nicely on *NIX, Linux, Windows and Macintosh machines.
Frequent Updates The program has an active mailing list with users who suggest corrections and improvements. The author himself is an extensive user at home and at work, so the development won't stop briefly.


Do I have to pay for it?

Absolutely NO!

It's free, GPL, open source, public domain, <put-your-favorite-buzzword-here>.

You can copy, use, modify, sell, release as yours. Software politics/copyright is not one of the author's major concerns.


Supported Formatting Structures

The following is a list of all the structures supported by txt2tags.


Supported Targets

SGML
It is a common document format which has powerful sgmltools conversion applications. From a single sgml file you can generate html, pdf, ps, info, latex, lyx, rtf and xml documents. The sgml2* tools also does automatic TOC and break sections into subpages (sgml2html).

Txt2tags generates SGML files in the linuxdoc system type, ready to be converted with sgml2* tools without any extra catalog files or any SGML annoying requirements.

HTML
Everybody knows what HTML is. (hint: internet)

Txt2tags generates clean HTML documents, that look pretty and have its source readable. It DOES NOT use javascript, frames or other futile formatting techniques, that aren't required for simple, techie documents. But a separate CSS file can be used if wanted.

LATEX
TODO.

PM6
I guess you didn't know, but Adobe PageMaker 6.0 has its own tagged language! Styles, colortable, beautifiers, and most of all the PageMaker mouse-clicking features are available on its tagged language also. You just need to access "Import tagged text" menu item. Just for the records, it's an <HTML "like"> tag format.

Txt2tags generates all the tags and already defines a extensive and working header, setting paragraph styles and formatting. This is the hard part. GOTCHA: No line breaks! A paragraph must be one single line.

Author's note: My entire portuguese regular expression's book was written in vi, converted to PageMaker with txt2tags and went to press.

MGP
Magic Point is a very handy presentation tool (hint: Microsoft PowerPoint), that uses a tagged language to define all the screens. So you can do complex presentations in vi/emacs/notepad.

Txt2tags generates a ready-to-use .mgp file, defining all the necessary headers for fonts and appearence definitions, as long as ISO-8859 accents support.

HOTSPOT 1: txt2tags created .mgp file uses the XFree86 Type1 fonts! So you do not need to carry TrueType fonts files with your presentation.

HOTSPOT 2: the color definitions for fonts are clean, so even on a poor color palette system (as startx -- -bpp 8), the presentation will look pretty!

The key is: convert and use. No quickfixes or requirements needed.

MAN
UNIX man pages resist over the years. Document formats come and go, and there they are, unbeatable.

There are other tools to generate man documents, but txt2tags has one advantage: one source, multi targets. So the same man page contents can be converted to a HTML page, Magic Point presentation, etc.

MOIN
You don't know what MoinMoin is? It is a WikiWiki!

Moin syntax is kinda boring when you need to keep {{{'''''adding braces and quotes'''''}}}, so txt2tags comes with the simplified marks and unified solution: one source, multi targets.

TXT
TXT is text. The only true formatting type.

Although txt2tags marks are very intuitive and discrete, you can remove them by converting the file to pure TXT.

The titles are underlined, and the text is basicaly left as is on the source.


Status of Supported Structures by Target

Structure txt html sgml tex mgp pm6 moin man
headers Y Y Y Y Y N N Y
section title Y Y Y Y Y Y Y Y
paragraphs Y Y Y Y Y Y Y Y
bold - Y Y Y Y Y Y Y
italic - Y Y Y Y Y Y Y
bold-italic - Y Y Y Y Y Y Y
underline - Y - Y Y Y ? -
preformatted - Y Y Y Y Y Y -
preformatted line - Y Y Y Y Y Y Y
preformatted area - Y Y Y Y Y Y Y
quoted area Y Y Y Y Y Y ? N
internet links - Y Y - - - Y -
e-mail links - Y Y - - - Y -
local links - Y Y N - - Y -
named links - Y Y - - - Y -
bulleted list Y Y Y Y Y Y Y Y
numbered list Y Y Y Y Y Y Y N
definition list Y Y Y Y N N N Y
horizontal line Y Y - Y Y N Y -
image - Y Y Y Y N Y -
table N Y Y Y N N Y N

Legend
Y supported
N not supported (may be in future releases)
- not supported (can't be done on this target)
? not supported (not sure if it can be done or not)


The Three User Interfaces: Gui, Web and Command Line

As different users have different needs and environments, txt2tags is very flexible on how it runs.

There are three User Interfaces for the program, each one with its own purpose and features.


Graphical Tk Interface

Since version 1.0, there is a nice Graphical Interface, that works on Linux, Windows, Mac and others.

It's pretty simple and easy to use:

And it also has the ability to dump the result file to a window instead of writing to the disc, so you can do quick tests before saving the target file:


Web Interface

The Web Interface is up and running on the internet at http://txt2tags.sf.net/online.php, so you can use and test the program instantly, before download.

One can also put this interface on the local intranet avoiding to install txt2tags in all machines.


Command Line Interface

For command line power users, the --help should be enough:

  Usage: txt2tags -t <type> [OPTIONS] file.t2t
  
    -t, --type         set target document type. currently supported:
                       txt, sgml, html, pm6, mgp, moin, man, tex
  
    -o, --outfile=FILE set FILE as the output filename ('-' for STDOUT)   	  
        --stdout       same as '-o -' or '--outfile -' (deprecated option)
    -H, --noheaders    suppress header, title and footer information
    -n, --enumtitle    enumerate all title lines as 1, 1.1, 1.1.1, etc
        --maskemail    hide email from spam robots. x@y.z turns <x (a) y z>
  
        --toc          add TOC (Table of Contents) to target document
        --toconly      print document TOC and exit
        --toclevel=N   set maximum TOC level (depth) to N
  
        --gui          invoke Graphical Tk Interface
        --style=FILE   use FILE as the document style (like Html CSS)
  
    -h, --help         print this help information and exit
    -V, --version      print program version and exit
  
  Extra options for HTML target (needs sgml-tools):
        --split        split documents. values: 0, 1, 2 (default 0)
        --lang         document language (default english)
  
  By default, converted output is saved to 'file.<type>'.
  Use --outfile to force an output filename.
  If input file is '-', reads from STDIN.
  If output file is '-', dumps output to STDOUT.

Examples

Assuming you have written a file.t2t marked file, let's have some converting fun.

Convert to HTML $ txt2tags -t html file.t2t
The same, using redirection $ txt2tags -t html -o - file.t2t > file.html
  .
Including Table Of Contents $ txt2tags -t html --toc file.t2t
And also, numbering titles $ txt2tags -t html --toc --enumtitle file.t2t
  .
Contents quick view $ txt2tags --toconly file.t2t
Maybe enumerate them? $ txt2tags --toconly --enumtitle file.t2t
  .
One liners from STDIN $ echo -e "\n**bold**" | txt2tags -t html --noheaders -
Testing Mask Email feature $ echo -e "\njohn.wayne@farwest.com" | txt2tags -t txt --maskemail --noheaders -
Post-convert editing $ txt2tags -t html -o- file.t2t | sed "s/<BODY .*/<BODY BGCOLOR=green>/" > file.html

Note
From version 1.6 you can do pre and post processing with the %!preproc and %!postproc configuration macros.

Part II - OK, I want it. Now what?

Just download the program and run it on your machine.

Download & Install Python

First of all, you must download and install the Python interpreter on your system. If you already have it, just skip this step.

Python is one of the nicest programming languages out there, it works on Windows, Linux, UNIX, Macintosh, and others and it can be downloaded from the Python web site. Installation hints are found on the same site. Txt2tags works with Python version 1.5 or newer.

If you are not sure if you have Python or not, open a console (tty, xterm, MSDOS) and type python. If it is not installed, the system will tell you.

Download txt2tags

The official location for txt2tags distribution is on the program homepage, at http://txt2tags.sf.net/src.

All the program's files are on the tarball (.tgz file), which can be expanded by most of the compression utilities (including Winzip).

Just get the latest one (more recent date, higher version number). The previous versions remains for historical purposes only.

Install txt2tags

As a single Python script, txt2tags needs no installation at all.

The only file needed to use the program is the txt2tags script. The other files of the tarball are documentation, tools and sample files.

The fail-proof way to run txt2tags, is calling Python with it:

  prompt$ python txt2tags

If you want to "install" txt2tags on the system as a stand alone program, just copy (or link) the txt2tags script to a System PATH directory and make sure the system knows how to run it.

UNIX/Linux
Make the script executable (chmod +x txt2tags) and copy it to a $PATH directory (cp txt2tags /usr/local/bin)

Windows
Rename the script adding the .py extension (ren txt2tags txt2tags.py) and copy it to a system PATH directory (copy txt2tags.py C:\WINNT)
After that, you can create an icon on your desktop for it, if you want to use the program's Graphical Interface.

Special Packages for Windows Users

There is also two .EXE distribution files for txt2tags, which install the program on Windows machines with just a few clicks:

Please visit the Txt2tags-Win site to download this packages: http://txt2tags-win.sf.net


Part III - Writing and Converting Your First Document

Sorry, this chapter is still a draft.

The .t2t document Areas

Txt2tags marked files are divided in 3 areas. Each area have its own rules and purpose. They are:

Headers Area
Place for Document Title, Author, Version and Date information. (optional)
Config Area
Place for general Document Settings and Parser behaviour modifiers. (optional)
Body Area
Place for the Document Content. (required)
As seen above, the first two Areas are optional, being Body Area the only required one. (Note: The Config Area was introduced on txt2tags version 1.3)

The areas are delimited by special rules, which will be seen in detail on the next chapter. For now, this is a graphical representation of the areas on a document:

               ____________
              |            |
              |   HEADERS  |       1. First, the Headers
              |            |
              |   CONFIG   |       2. Then the Settings
              |            |
              |    BODY    |       3. And finally the Document Body,
              |            |
              |    ...     |          which goes until the end
              |    ...     |
              |____________|
  

In short, this is how the areas are defined:

Headers First 3 lines of the file, or the first line blank for No Headers.
Config Begins right after the Header (4th or 2nd line) and ends when the Body Area starts.
Body The first valid text line (not comment or setting) after the Headers Area.

Full Example

  My nice doc Title
  Mr. John Doe
  Last Updated: %%date(%c)
  
  %! Style   : fancy.css
  %! Encoding: iso8859-1
  %! Cmdline : -t html --toc --enumtitle
  
  Hi! This is my test document.
  Its content will end here.


Part IV - Mastering Txt2tags

The Headers Area

Location:

The Headers Area is the only one that has a fixed position, line oriented. They are located at the first three lines of the source file.

These lines are content-free, with no static information type needed. But the following is recomended for most documents:

Keep in mind that the first 3 lines of the source document will be the first 3 lines on the target document, separated and with high contrast to the text body (i.e. big letters, bold). If paging is allowed, the headers will be alone and centralized on the first page.

Less (or None) Header lines

Sometimes the user wants to specify less than three lines for headers, giving just the document title and/or date information.

Just let the 2nd and/or the 3rd lines empty (blank) and this position will not be placed at the target document. But keep in mind that even blanks, these lines are still part of the headers, so the document body must start after the 3rd line anyway.

The title is the only required header (the first line), but if you leave it blank, you are saying that your document has no headers. So the Body Area will begin right after, on the 2nd line.

No headers on the document is often useful if you want to specify your own customized headers after converting. The command line option --noheaders is usually required for this kind of operation.

Straight to the point

In short: "Headers are just positions, not contents"

Place one text on the first line, and it will appear on the target's first line. The same for 2nd and 3rd header lines.


The Config Area

Location:

The Config Area is optional. An average user can write lots of txt2tags files without even know it exists, but the experienced users will enjoy the power and control it provides.

The primary use of this area is to define settings that affects the program behaviour.

So, how to set something? What's the syntax?

Setting lines are special comment lines, marked by a leading identifier ("!") that makes them different from plain comments. The syntax is just as simple as variable setting, composed by a keyword and a value, separated from each other by the canonical separator colon (":").

%! keyword : value

Syntax Details: The exclamation mark must be placed together with the comment char ("%!"), no spaces between them. The spaces around keyword and the separator are optional, and both keyword and value are case insensitive (case doesn't matter).

What can I set? Which are the valid keywords?

The settings that could be done are Style, Encoding, Cmdline, PreProc and PostProc.

The Style setting is only supported by the HTML target, to define a Cascading Style Sheet (CSS) file.

The Encoding setting is needed by non-english writters, who uses accented letters and other locale specific details, so the target document Character Set must be customized (if allowed).

The Cmdline setting is useful to specify default command line options for the source file. These options can be overwritten by the real command line. Using this option, one can convert the document with the following simple command: txt2tags file.t2t

The PreProc setting is a filter. It defines "find and replace" rules which will be applyed to the document source before any parsing by txt2tags.

The PostProc setting is a filter. It defines "find and replace" rules which will be applyed to the target document after all parsing by txt2tags.

Example:

  %! Style   : fancy.css
  %! Encoding: iso-8859-1
  %! Cmdline : -t html --toc --toclevel 3
  %! PreProc : "amj"       "Aurelio Marinho Jargas"
  %! PostProc: '<BODY.*?>' '<BODY bgcolor="yellow">'

Some rules about Settings


The Body Area

Location:

Well, the body is anything outside Headers and Config Areas.

The body holds the document contents and all formatting and structures txt2tags can recognize. Inside the body you can also put comments for TODOs and self notes.

You can use the --noheaders command line option to convert only the document body, supressing the headers. This is useful to set your own headers on a separate file, then join the converted body.


Marks (RULES)

All marks and syntax used by txt2tags are detailed on a separate RULES file.


The %%date macro

The %%date macro called alone, returns the current date on the ISO yyyymmdd format. Optional formatting can be specified using the %%date(format-string) syntax.

This format-string is made of plain text plus the formatting directives, which are composed by a percent sign % followed by an identification character.

The following is a list of some common use directives. The full list can be found in http://www.python.org/doc/current/lib/module-time.html.

Directive Description
%a Locale's abbreviated weekday name.
%A Locale's full weekday name.
%b Locale's abbreviated month name.
%B Locale's full month name.
%c Locale's appropriate date and time representation.
%d Day of the month as a decimal number [01,31].
%H Hour (24-hour clock) as a decimal number [00,23].
%I Hour (12-hour clock) as a decimal number [01,12].
%m Month as a decimal number [01,12].
%M Minute as a decimal number [00,59].
%p Locale's equivalent of either AM or PM.
%S Second as a decimal number [00,61]. (1)
%x Locale's appropriate date representation.
%X Locale's appropriate time representation.
%y Year without century as a decimal number [00,99].
%Y Year with century as a decimal number.
%% A literal "%" character.

Examples

%%date(format) Results for: 2002, Jan31, 15:00
Last Update: %c Last Update: Thu Jan 31 15:00:00 2002
%Y-%m-%d 2002-01-31
%I:%M %p 03:00 PM
Today is %A, on %B. Today is Thursday, on January.


The %!include command

On txt2tags version 1.7 there is a new include command, to paste the contents of an external file into the source document body.

So %!include is not a config, but a command, and it is valid on the document BODY Area.

The include command is useful to split a large document into smaller pieces (like chapters in a book) or to include the full contents of an external file into the document source. Sample:

  My first book
  Dr. John Doe
  1st Edition
  
  %!include: intro.t2t
  %!include: chapter1.t2t
  %!include: chapter2.t2t
  ...
  %!include: chapter9.t2t
  %!include: ending.t2t

So you just inform the filename after the %!include string. The optional target specification is also supported, so this is valid either:

  %!include(html): file.t2t

Note that include will insert the file BODY Area into the source document. The included file HEAD and CONF Areas are ignored. This way you can convert the included file alone or inside the main document.

But there's another two types of include:

The Verbatim type includes a text file preserving its original spaces and formatting, just like if the text was inside the txt2tags VERB area (---). Just enclose the filename with backquotes:

  %!include: `/etc/fstab`

And the Parsed type is passed directly to the resulting document, with NO parsing or escaping performed by txt2tags. This way you can include aditional tagged parts to your document. Useful for default header or footer information, or more complicated tagged code, unsupported by txt2tags:

  %!include(html): 'footer.html'

Note that the filename is enclosed with single quotes, and as the text inserted is already parsed, you must specify the target to avoid mistakes.


Part V - Mastering Config Directives

The Config Directives are all optional. The average user can live fine without them. But they are addictive, if you start using them, you'll never stop :)


%!Cmdline

Writing long command lines every time you need to convert a document is boring and error prone. The Cmdline setting let the user save all the converting options together with the source document.

This way the user can just call

  $ txt2tags file.t2t

And the conversion will be done, getting the target definition and other settings on the %!cmdline config. This also ensures that the document will always be converted the same way, with the same options.

Just write it with no syntax errors, as you were on the real command line. But omit the "txt2tags" program call on the beggining and the source filename from the ending.

For example, if you do use this command line to convert your document:

  $ txt2tags -t html --toc --toclevel 2 --enumtitle file.t2t

You can save yourself from typing pain using this Cmdline setting inside the document source:

  %!cmdline: -t html --toc --toclevel 2 --enumtitle

As the real command line is now just "txt2tags file.t2t", you can run the conversion right inside your favorite text editor, while editing the document source. In Vi, this is:

  :!txt2tags %


%!Encoding

The Encoding setting is needed by non-english writters, who uses accented letters and other locale specific details, so the target document Character Set must be customized (if allowed).

The valid values for the Encoding setting are the same charset names valid for HTML documents, like iso-8859-1 and koi8-r. If you're not sure which encoding you should use, this complete (and long!) list should help.

The LateX target use alias names for encoding. This is not a problem for the user, because txt2tags translate the names internally. Some examples:

txt2tags/HTML > LaTeX
windows-1250 >>> cp1250
windows-1252 >>> cp1252
ibm850 >>> cp850
ibm852 >>> cp852
iso-8859-1 >>> latin1
iso-8859-2 >>> latin2
koi8-r >>> koi8-r

If the value is unknown to txt2tags, it will be passed "as is", allowing user to specify custom encodings.


%!PreProc

The PreProc user defined filter is a "find and replace" feature which is applyed right after the line is readed from the document source, before any parsing by txt2tags.

It is useful to define some abbreviations for common typed text, as:

  %!preproc amj          "Aurelio Marinho Jargas"
  %!preproc RELEASE_DATE "2003-05-01"
  %!preproc BULLET       "[images/tiny/bullet_blue.png]"

So the user can write a line like:

  Hi, I'm amj. Today is RELEASE_DATE.

And txt2tags will "see" this line as:

  Hi, I'm Aurelio Marinho Jargas. Today is 2003-05-01.

This filter is a component that acts between the document author and the txt2tags conversion. It's like a first conversion before the "real" one. Its behaviour is just like an external Sed/Perl filter, called this way:

  $ cat file.t2t | preproc-script.sh | txt2tags -

So the txt2tags parsing will begin after all the PreProc substitutions were applyed.


%!PostProc

The PostProc user defined filter is a "find and replace" feature which is applyed on the resulting file, after all txt2tags parsing and processing is done.

It is useful to do some refinements on the generated document, change tags and add extra text or tags. Quick samples:

  %!postproc(html): '<BODY.*?>' '<BODY BGCOLOR="green">'
  %!postproc(tex) : "\\newpage" ""

These filters change the background color of the HTML page and remove the page breaks on the LaTeX target.

The PostProc rules are just like an external Sed/Perl filter, called this way:

  $ txt2tags -t html -o- file.t2t | postproc-script.sh > file.html

Before this feature was introduced, it was very common to have little scripts to "adjust" the txt2tags results. These scripts were in fact just lots of sed (or alike) commands, to do "substitute this for that" actions. Now this replacement strings can be saved together with the document text, and the plus is to use the Python powerful Regular Expression machine to find patterns.


%!Style


Details for PreProc and PostProc Filters


Defining a Config for a Specific Target

Since txt2tags version 1.6, all the Config Directives can be specified for a specific target, using the %!key(target): value syntax. This way user can define different config for different targets.

This is specially useful in the pre/postproc filters, but is applicable to all directives. For example, setting different Encoding values for HTML and LaTeX:

  %!encoding(html): iso-8859-1
  %!encoding(tex): latin1

Note: The Encoding mappings for LaTeX special names are already inside txt2tags itself, this is just a silly example.

For %!cmdline it's interesting:

  %!cmdline: -t sgml --toc
  %!cmdline(html): --style foo.css
  %!cmdline(txt): --toconly --toclevel 2

So the default target is Sgml with TOC. If the user run:

  $ txt2tags -t html file.t2t

The target HTML will be done and only the %!cmdline(html) options will be used. So the --style option will be used and the HTML will have no TOC.

Precedende is Different

In general, for config directives, the last found are used, but when using explicit target directives, they take precedence over generic ones, no matter which came first. So

  %!encoding(html): iso-8859-1
  %!encoding: latin1

Will expand to 'iso-8859-1' when called as "-t html" even that 'latin1' is defined after.

Filters are Cumulative

The pre/postproc filters do not have precedence or fit the "last found" schema, they're cumulative. User can set multiple filters, and they will be applied in the same order.

For example:

  %!postproc     : ^ \t
  %!postproc(txt): ^ '> '

With these filters, all targets will be indented by one TAB. If target is TXT, it will be also quoted like e-mail messages.

  So                 My nice line.
  will turn to       \t> My nice line.

In Short


Part VI - Black Magic

This chapter is really not recomended for newbies. It demonstrates how to do strange things with txt2tags filters, abusing of complex patterns and Regular Expressions.

BEWARE! The following procedures are NOT encouraged and can break things. Even some text from the document source can be lost on the conversion process, not appearing on the target document. Just use these tactics if you really need them and know what you are doing.

Filters are a powerful feature, but can be dangerous!

Bad filters do generate unexpected results.

Keep that in mind, please.


Inserting Multiple Lines with %!PostProc (like CSS rules)

In filters, the replacement pattern can include multiple lines using the \n line break char.

This can be handy for including really short CSS rules on HTML target, with no need to create a separate file. This is the case of this User Guide, which use this filters:

  %!postproc: <HEAD>      '<HEAD>\n<STYLE TYPE="text/css">\n</STYLE>'
  %!postproc: (</STYLE>)  'body     { margin:3em               ;} \n\1'
  %!postproc: (</STYLE>)  'a        { text-decoration:none     ;} \n\1'
  %!postproc: (</STYLE>)  'pre,code { background-color:#ffffcc ;} \n\1'
  %!postproc: (</STYLE>)  'th       { background-color:yellow  ;} \n\1'

All the filters are tied to the first one, by replacing a string that it has inserted. So a single "<HEAD>" turns to:

  <HEAD>
  <STYLE TYPE="text/css">
  body     { margin:3em               ;}
  a        { text-decoration:none     ;}
  pre,code { background-color:#ffffcc ;}
  th       { background-color:yellow  ;}
  </STYLE>


Creating "Target-Specific" Contents with %!PreProc

Sometimes you need to insert some text on a specific target, but not on the others. This kind of strange behaviour can be done using some PreProc tricks.

The idea is to insert this extra text on the document source as comments, but mark it in a way that a target-specific filter will "uncomment" those lines.

For example, if an extra paragraph must be added only in HTML target. Place the text as special comments, like this:

  %html% This HTML page is Powered by [txt2tags http://txt2tags.sf.net].
  %html% See the source TXT file [here source.t2t].

As those lines start with %, they are plain comments lines and will be ignored. But when adding this special filter:

  %preproc(html): '^%html% ' ''

The leading string is removed and those lines will be "activated", not being comments anymore. As a explicit target config, this filter will be processed for HTML targets only.


Changing Txt2tags Marks with %!PreProc

Being a Regular Expressions guru, the user can customize the document source syntax, changing the txt2tags default marks to some he find more comfortable.

For example, a leading TAB is the Quotation mark. If the user doesn't like it, or his text editor has some strange relationship with TABs, he can define a new mark for Quoted text. Say a leading ">>> " was his choice. Then he will do this simple filter:

  %!PreProc: '>>> ' '\t'

And on the document source, the quoted text will be something like:

  >>> This is a quoted text.
  >>> The user defined this >>> mark.
  >>> But they will be converted to TABs by PreProc.

Before the parsing begins, the strange ">>> " will be converted to TABs and txt2tags will recognize the Quote mark.

BEWARE! Extreme PreProc rules could eventually change the entire marks syntax, even generating conflicts between marks. Be really really careful when doing this.


Part VII - Txt2tags HISTORY

On July 2001, was launched the first public release of txt2tags (v0.1). But its origins date more than an year before that...

This chapter illustrates in a few words the tool development since its very first draw until the current series.

1999 January: Pre-History

From the author:

"My really first attempts of a text conversion tool began back in 1999, as a very simple and limited Bourne Shell script that converts marked text to an HTML page. Yes, Yet-Another txt2html tool. Everyone, everywhere must have done one of this already... In short, it just recognized simple marks as *bold*, /italic/, _under_, and escape the classic < & > HTML special characters. Not impressive, but hey! I was young ;)"

1999 June: Still Pre-History

The author wants to speak some more:

"Some months passed, and a big Sgml hype arrived at the company I was working (Conectiva). So the txt2html turned into a txt2sgml script. I was really trying to learn about SED* at that moment so txt2sgml was a 110 lines Bourne Shell script with lots of SED code."

* SED: UNIX Stream EDitor - an automatic text editing tool

This improved Sgml version had more supported structures as lists and preformatted text. On the following sample file, you can see the txt2tags marks origins:

  		 * This was a bold line (BOLD line oriented? well...)
  
  		   --
  		 - bullet list was very similar to txt2tags list
  		 - but with these -- to begin and close a list
  		   --
  
  		 =----------------------
  		 Preformatted text was delimited by the =-- pattern.
  		 The other ------- was just cosmetic.
  		 =----------------------

Still not impressive, but the big step is comming...

2000 August: Not Pre-History anymore

TODO (txt2sgml.sed)

2001 July: Debut of 0.x series (World Release)

TODO

2002 September: Debut of 1.x series

Announce
This release starts my 1.x series.

More than a year of almost-monthly updates, and the 0.x series provided me a nice set of features, as Command Line and Web interface, TOC handling, numbering titles and lists, STDIN/STDOUT facilities, vim/emacs syntax files and seven supported target formats.

For the incoming 1.x series, I'll try to spread myself out, providing a nice GUI, extensive documentation, mailing list, user base, Unix/Windows/Mac full compatibility and including more targets (as tex, rtf and xhtml).

On this 1.0 release I'm already at full speed ahead, with a new suit (Graphical Tk Interface) and compatibility with Unix/Windows/Mac, handling line breaks and other platform specific issues. Fortunely, now my master can reach Linux, Windows 2000, Cygwin and MacOS 8.6 systems for testing me.

2002, 2003... and History never stops

06 November 2002 - Released my new version 1.1

Atention! Please read carefuly before upgrade me!

This new v1.1 is a compatibility break release.

This means that after upgrading, you will have to check your current txt2tags files, and possibly fix them. Ignoring this release is not an option :) because the next versions will assume you have fixed your files.

Please read the "Survivor's Guide for v1.1 Upgrade" for more detailed information about what has changed and how to fix your files.

But there are nice news also:

03 December 2002 - Released my new version 1.2 (LaTeX release)

Hi! Finally I've learned one of the geek's most used document formatting: LaTeX. Just run me with "-t tex" and see your current .t2t files turn into LaTeX, ready for compiling!

Ah! I'm getting smart, and now I can handle more than one file at the same time. Hint: "*.t2t"

And my last new feature is the ability of strip down an HTML file to an "almost finished" .t2t file. Sorry !vimmers, because this works on Vim only.

What are you waiting for? Download me right now and test my new features: txt2tags --toc -t tex *.t2t

20 December 2002 - Released my new version 1.3

Weeee, I'm tired! Lots of changes were made on me for this release and now I need a break!

The news include a brand new mark (rare to happen!) for raw text, this way you can easily include marks on the document target and I promisse I won't parse them!

A new Encoding command is available also for worldwide users specify their own Character Set for their documents.

And there are more improvements and bugfixes, but please read the ChangeLog file because now I'll go to sleep. zzZZzzzzZZZZzz...

18 February 2003 - Released my new version 1.4

I've enjoyed my X-mas/New Year vacation, then some lazy days on a hot Brazilian beach, what a great time! But now it is time to get back to work, so there's a new release of the most sexy text eater out there: me!

The burning sun has made lots of changes on me. I'm specially proud of my new ability to master table alignment. I can place the table centered or not, and I can place each table cell contents wherever I want! Left, Right, Centered... See AbuseMe! file for a demonstration.

The good news for HTML users is that now I've learned about that Cascading thing... SCC, CCS, CSS, I don't remember... Just use my new --style command line option or the %!Style: setting.

There's a new --toclevel option also, and I promisse I'll not make TOC deeper than the number you pass me.

Mmmmmm, there were some bug fixes also, but let's forget about them, you know I'm not buggy! ;)

09 May 2003 - Released my new version 1.5

Tsc, Tsc... Almost three months from the last version, what a shame! But the new features will worth the delay.

The most important improvement is my new %!cmdline setting. Using it you can define default options for each document, and when converting, you can call me with no options at all! Example: You place a "%!cmdline: -t html --toc" line on your source document, then you can convert it with the simple "txt2tags file.t2t" command.

There is also a new --outfile option (-o for short) to set the output filename. If you specify "-o -", you get the same behaviour of the old --stdout option (which is now deprecated).

Talking about options, there are the new short -H and -n for the existing --noheaders and --enumtitle options. A nice quick example:

  txt2tags-v1.4 -t html --enumtitle --stdout file.t2t > new.html
  txt2tags-v1.5 -t html -n -o new.html file.t2t

Ah! LaTeX target now supports images :)

23 Jul 2003 - Released my new version 1.6

Hi there! Here I am again with fresh news.

Today my v1.6 was released. The main improvements are the new %!preproc: and %!postproc: user defined filters. They are used to do some strange things on documents (see User Guide). There's also a new mark + for explicit numbered titles, +like this+, so now you can mix normal titles with numbered ones, like a book with Appendix.

Now all the config settings can be linked with a especific target, using the new %!key(target): value syntax. The target specification is optional, so the parentesis and its contents can be omitted. A nice sample: %!encoding(html): iso-8859-1

Lots of user reported bugs were fixed, titles on LaTeX are unnumbered by default (as other targets) and the Gui was improved, showing %!cmdline contents (if any) and refreshing checkboxes when a new file is loaded. Gui also can receive options from the command line, as in txt2tags --gui -n file.t2t


The End. (see source)