Files to translate


Right to left languages

Translation

When translating to bidirectional languages in OmegaT there are two issues to consider: 1) text input and 2) text display. When writing RTL text (e.g. Urdu, Arabic etc.) and using the LTR view, which is the default view, text will not always be displayed in the order in which it was entered. This is not obvious when purely RTL text is being input, and in cases like this the default (LTR) view can be used.

In many cases however, when the target language is an RTL language, it becomes necessary to embed LTR text. For example, some product names have to remain in English, or place holders have to remain in their original form during the localization of GUI strings.

When LTR text is embedded in RTL text, it is not displayed in the order in which it was input. For OmegaT to do this, the view is toggled using Ctrl+Shift+O.

Note: The same shortcut can be used on Mac OSX.

It is also easier to input RTL text when the view has been toggled to RTL, because in the default view, when RTL text is entered, spaces entered first appear to the right of the input text, and the next RTL character then appears to the left.

Using Ctrl+Shift+O causes both text input and display in OmegaT to be changed to RTL. It can be used separately for all three panes (Editing, Fuzzy Matches and Glossary).

Translated documents creation

When the translated document is created, its display direction will be LTR, like the original document. This must be changed manually. Each output format has specific ways to deal with RTL display, check the relevant application manuals for details.


File formats

With OmegaT you can translate files in a number of file formats. There are basically two types of file formats, the plain text and the formatted text formats.

Plain text files

Plain text files contain text only, so their translation is as simple as typing the translation.

There are several methods to specify the file's encoding so that its contents is not garbled when opened in OmegaT.

Such files are not considered as containing more format than "white space" for indentation/alignment purposes. They are usually modified in text editors and it is generally not possible to have such format retain font/color/margin etc information.

Currently, OmegaT supports the following plain text formats:

Other plain text file types can be handled by OmegaT by associating their file extension to a supported file type (for example, .pod files could be associated to the ASCII text filter) and by pre-processing them with specific segmentation rules.

Formatted text files

Formatted text files contain text as well as information such as font type, size, color etc. They are commonly created in word processors or home page editors.

Such file formats are conceived so that they retain formatting information. Such formatting information can be as simple as this is bold or as complex as table data with different font size, color, position etc.

In most translation jobs it is considered important to have the translated document look similar to the original. OmegaT allows you to do this by marking the characters/words that have a special formatting with easy to manipulate tags.

Simplifying the original text formating greatly contributes to reducing the number of tags. Unifying used fonts, font sizes, colors etc should be considered if possible, to simplify the translation and reduce the number of possible tag errors.

Currently, OmegaT support the following formatted text formats:

Other formatted text file types can be handled by OmegaT by associating their file extension to a supported file type and by pre-processing them with specific segmentation rules.

File format specifics

Each file type is handled differently in OmegaT. Specific behaviour can be setup in the file filters.

Other file formats

Other plain text or formatted text file formats may be accessible in OmegaT.

External tools can be used to convert non supported files files to supported formats. The translated file will need to be converted back to the original format. This way, a number of plain text formats (including LaTex etc) can be translated in OmegaT through conversion to the PO format. Similarly, a number of formatted text formats (including Microsoft Office files) can be translated in OmegaT through conversion to the Open Document format.

The quality of the translated file will depend on the quality of the round-trip conversion. Make sure you have tested all your options before proceeding with such conversions.

Available free conversion tools include:

OpenOffice.org

OmegaT does not offer direct support for Microsoft Office formats Word, Excel and PowerPoint. However, OpenOffice.org (and variants) can be used to convert such formats to OpenDocument, that OmegaT natively supports.

OpenOffice.org official page

Okapi Framework

The Text Extraction Utility from the Okapi Framework has an option for creating an OmegaT project folder tree. It is also possible to create an OmegaT specific XLIFF file. Okapi has recently released tools that run under Mono, a free platform available on most operating systems.

Okapi for Mono release page, tutorial

Translate Toolkit

The Translate Toolkit, a python tool set, provides users with a number of converters to and from Portable Object, including Mozilla .properties and dtd files, CSV files, Qt .ts files, XLIFF files. It includes a number of tools to manipulate such files before or after their translation in OmegaT.

Translate Toolkit official page

po4a

po4a is a Debian perl tool. It can convert files formats such as LaTeX, TeX, POD etc to and from Portable Object.

po4a official page


Legal notices