Html2Wml - Program that can convert HTML pages to WML pages
in a shell:
html2wml.cgi [options] <file|url>
as a CGI:
/cgi-bin/html2wml.cgi?url=<url>
Html2Wml converts HTML pages to WML pages, suitable for being viewed on a Wap device. The conversion can be done either on the command line to create static WML pages or on-the-fly by calling this program as a CGI.
As of version 0.3, the resulting WML should be well-formed, and in most cases valid. This is not guarantied but it should work for most HTML pages. To be more precise, the validity of the WML depends on the quality of the input HTML. Pages created with softwares that conform to W3C standard are most likely to produce valid WML. To check your HTML pages, your can use W3C's excellent software HTML Tidy, written by Dave Raggett.
Note that most of these options can be used when calling Html2Wml as a CGI. See the file form.html in the t/ directory for an example.
When this option is on, named HTML entities are converted to US-ASCII using the same 7 bit approximations as Lynx. By default, this is off, so that named entities are converted into numeric entities.
This option tells Html2Wml to collapse redundant white space characters and empty paragraphs. This option is on by default, but you can desactivate this by using --nocollapse.
This behavior is not really standard, but the aim is to reduce the size of the output. WML pages are primarily intented for Wap devices, which usually have slow connections. The smaller the WML result is, the faster it can be downloaded. Furthermore, collapsing white spaces is the normal behavior for HTML pages.
Empty paragraphs are also collapsed (this is really not standard), but it should avoid empty screens: the display of a Wap device is usually small, and it can be annoying to scroll down a lot because of many empty lines.
This option uses the WML compiler from WML Tools to convert the WML to a compact binary representation of the WML deck.
This options sets the template that will be used to reconstruct the
href
links.
See Links Reconstruction for more information.
This options is on by default. It makes Html2Wml flattens the tables à la Lynx. I think it is better than trying to use WML tables because, contrary to HTML tables, they have extremely limited features (in particular, they can't be nested). Therefore it's quite difficult to decide what to do when you have three nested tables. Furthermore, calculations on tables are quite CPU consuming, and Wap devices are not supposed to be powerful.
This option tells Html2Wml not to use the <pre>
tag. This is useful if you want to use the WML compiler from WML Tools
0.0.4, which doesn't recognize this tag.
This options sets the template that will be used to reconstruct the
src
links.
See Links Reconstruction for more information.
This option allows you to limit the size of the generated cards. The value is given in bytes. Default is 2000 bytes, which should be small enought to be loaded on any Wap device.
Splitting can occur when the size of the current card is between
max-card-size
- card-split-threshold
and max-card-size
.
This option sets the label of the link that allows the user to go to the next card. Default is ``[>>]'' (which will be rendered as ``[>>]'').
This option activates the debug mode. This prints the output result in HTML with line numbering and with the result of the XML check. This option is very useful for debugging as you can use any web browser for that purpose.
When this option is on, it send the WML output to XML::Parser to check its well-formedness.
In order to match the low memory capabilities of many Wap devices, Html2Wml
allows you to convert the HTML document as a WML deck that contains several
cards. The upper limit size of these cards can be set using the max-card-size
option. This is not a guaranty as the size is calculated in an approximated
way (if you wonder why I don't do an exact calculation, it's because it
would be difficult in the current architecture of Html2Wml).
Actions are a feature similar to the SSI (Server Side Includes) available on web servers like Apache. In order not to interfere with real SSI, but to keep their syntax easy to learn, it differs in very few points.
Syntax
The syntax to execute an action is:
<!-- [action param1="value" param2='value'] -->
Note that the angle brackets are part of the syntax. Except for that point, Actions syntax is very similar to SSI syntax.
Available actions
Description
Includes a file in the document at the current point. Please note that Html2Wml doesn't check nor parse the file, and if the file cannot be found, will silently die (this is the same behavior as SSI).
Parameters
virtual=url
The file is get by http.
file=path
The file is read from the local disk.
Note
If you use the file
parameter, an absolute path is recommend.
Description
Returns the size of a file at the current point of the document.
Parameters
You can use the same parameters as for the include action.
Examples
To include a small navigation bar:
<!-- [include virtual="nav.wml"] -->
This engine allows you to reconstruct the links of the HTML document being converted. It has two modes, depending upon whether Html2Wml was launched from the shell or as a CGI.
When used as a CGI, this engine will reconstructs the links of the HTML document so that all the urls will be passed to Html2Wml in order to convert the pointed files (pages or images). This is completly automatic and can't be customized for now (but I don't think it would be really useful).
When used from the shell, this engine reconstructs the links with the URL template (the parameter of the hreftmpl option). Note that absolute URLs will be left untouched. The template can be customized using the following syntax. If no template is supplied, the links will be left untouched.
Syntax
The template is a string that contains the new URL. You can interpolate
parameters by simply including them in the template between curly brackets: {<em>param</em>}
If the URL contains a query part or a fragment part, they will be appended to the result of the template.
Available parameters
This parameter contains the original URL from the href
or src
attribute.
This parameter contains the base name of the file.
This parameter contains the leading path of the file.
This parameter contains the suffix of the file.
Examples
To add a path option:
{URL}$wap
Using Apache, you can then add a Rewrite directive so that URL ending with
$wap
will be redirected to Html2Wml:
RewriteRule ^(/.*)\$wap$ /cgi-bin/html2wml.cgi?url=$1
To change the base name of the file:
{FILEPATH}{FILENAME}_wap{FILETYPE}
To change the extension of the file:
{FILEPATH}{FILENAME}.wap
Note that FILETYPE contains all the extensions of the file, so its name is index.html.fr for example, FILETYPE contains ``.html.fr
''.
Currently, only the well-formedness of the resulting WML can be tested, not its validity.
Inverted tags (like ``<b>bold <i>italic</b></i>'') may produce unexpected results. But only bad softwares do bad stuff like this.
Werner Heuser - for his numerous ideas, advices and his help for the debugging
Sébastien Aperghis-Tramoni <madingue@resus.univ-mrs.fr>
Html2Wml is Copyright (C)2000 Sébastien Aperghis-Tramoni.
This program is free software. You can redistribute it and/or modify it under the terms of either the Perl Artistic License or the GNU General Public License, version 2 or later.