Karrigell Documentation

Version 2.3.3 15 01 07

Français

17. Internationalization and Unicode

As you'll have guessed by reading this documentation, I'm not from an English-speaking country (I'm French, and more precisely Breton - the name Karrigell is a Breton word). So I've included a program to facilitate internationalization of scripts

17.1 Translation

In a script, every time you want a message translated into a given language, instead of writing it as a normal string with quotes, it's written using a function called _, this way :

print _("Hello everybody")

In Python Inside HTML (PIH) you can use the shortcut <%_ > :

<%_ "Hello everybody" %>

Karrigell provides a simple web interface to create and modify translations of strings. For security reasons, the script that manages translation is reserved to the administrator. An authentication script is run, relying on md5 digests stored in a file called admin.ini, which the administrator must set by running the script k_password.py in the directory admin

With your browser, call the script http://localhost/admin/internat.pih . It opens a directory browser with which you can access all the files which may contain strings to translate (that is, all the files with an extension .py, .pyw, .pih, .hip). When clicking on a file name, a page appears with all the strings to tranlate (the arguments of the _() function) and asks for a translation in all the langages currently chosen in the browser language preferences(1). If translations have already been made they appear in the form fields

Fill in the fields and validate the form ; this creates or updates the translations

You can check the effect by calling the script you've just modified and changing the language order in the preferences

Translation is held in a file which is common to all the files in the same directory. You can also edit the whole dictionary clicking on the first item in the script list

17.2 Unicode support

New in version 2.2.2

mostly written by Radovan Garabik

Unicode is a normalized standard used to represent all the writing styles in the world. For each sign (a letter in any alphabet, an ideogram in an Asiatic language) Unicode defines a unique number, called a "code point". Since computers and networks can only manage bytes, a mapping between "code points" and one or several bytes must be defined ; these mappings are called "encodings"

Because there are many different encodings, when a program has to print a sign (a greek letter, a math symbol, a Chinese sign) it must receive two pieces of information : the string representing the sign (a sequence of bytes) and the encoding used. If it receives only a string, the program can try to guess an encoding (this is what a web browser usually does) but with no guarantee of success

Karrigell defines two parameters in the configuration file to handle Unicode :

  • encodeFormData : if set to 1, Karrigell tries to transform form data into Unicode strings, trying different encodings one after the other. More precisely, if the received data only have ASCII characters, no conversion is made, the string is kept as is ; otherwise several encodings are tried and the first one that succeeds is used
    So, the form data available in QUERY are either bytestrings or Unicode strings
    encodeFormData defaults to 0 : no Unicode conversion is made
  • outputEncoding : defines the encoding that the browser will use to interprete the strings that it's asked to print. The default value is latin-1, an encoding used to represent the signs of western european languages based on the latin alphabet. You can define another encoding by setting outputEncoding in section [Server] of the configuration file to another value

    You can see examples in the demo/unicode directory : set outputEncoding to utf-8 to see the expected result


(1) On Microsoft Internet Explorer the language preference is set by Tools/Internet Options/General/Languages ; accepted languages are chosen from a list and ordered by preference. On Firefox use Edition/Preferences/Languages