[Top] | [Contents] | [Index] | [ ? ] |
I'm not prone at accepting a charset you just invented, and which nobody uses yet: convince your friends and community first!
In previous versions or recode
, a single
colon `:' was used instead of the two dots `..' for separating
charsets, but this was creating problems because colons are allowed in
official charset names. The old request syntax is still recognised for
compatibility purposes, but is deprecated.
More precisely, pc
is an alias for
the charset IBM-PC
.
Both before and after may be omitted, in which case the double dot separator is mandatory. This is not very useful, as the recoding reduces to a mere copy in that case.
MS-DOS is one of those systems for which the default charset
has implied surfaces, CR-LF
here. Such surfaces are automatically
removed or applied whenever the default charset is read or written,
exactly as it would go for any other charset. In the example above, on
such systems, the hexadecimal surface would then replace the implied
surfaces. For adding an hexadecimal surface without removing any,
one should write the request as `/../x'.
There are still some cases of ambiguous output which are rather difficult to detect, and for which the protection is not active.
The minimality of an UTF-8
encoding
is guaranteed on output, but currently, it is not checked on input.
Another approach would have been to define the level symbols as masks instead, and to give masks to threshold setting routines, and to retain all errors--yet I never met myself such a need in practice, and so I fear it would be overkill. On the other hand, it might be interesting to maintain counters about how many times each kind of error occurred.
It is not
probable that recode
will ever support UTF-1
.
This is when the goal charset allows for 16-bits. For shorter charsets, the `--strict' (`-s') option decides what happens: either the character is dropped, or a reversible mapping is produced on the fly.
On DOS/Windows, stock shells do not know that apostrophes quote special characters like |, so one need to use double quotes instead of apostrophes.
This convention replaced an older one saying that up to 4 immediately preceeding pairs of zero bytes, going backward, are to be considered as part of the end of line and not interpreted as ::.
There are supposed to be seven words in this case. So, one is missing.
Look at one of the following sentences (the second has to be interpreted with the `-c' option):
"Ai"e! Voici le proble`me que j'ai" Ai:e! Voici le proble`me que j'ai: |
There is an ambiguity between an the small animal, and the indicative future of avoir (first person singular), when followed by what could be a diaeresis mark. Hopefully, the case is solved by the fact that an apostrophe always precedes the verb and almost never the animal.
I did not pay attention to proper nouns, but this one showed up as being fairly evident.
Usually, quail means quail egg in Japanese,
while egg alone is usually chicken egg. Both quail egg and chicken
egg are popular food in Japan. The quail
input system has
been named because it is smaller that the previous EGG
system.
As for EGG
, it is the translation of TAMAGO
. This word
comes from the Japanese sentence takusan matasete
gomennasai, meaning sorry to have let you wait so long.
Of course, the publication of EGG
has been delayed many times…
(Story by Takahashi Naoto)
These are mere examples to explain the concept,
recode
only has Base64
and CR-LF
, actually.
If strict mapping is requested, another efficient device will be used instead of a permutation.
[Top] | [Contents] | [Index] | [ ? ] |
This document was generated by root on March, 8 2007 using texi2html 1.76.