Library: Foundation
Package: Text
Header: Poco/TextEncoding.h
An abstract base class for implementing text encodings like UTF-8 or ISO 8859-1.
Subclasses must override the characterMap() and convert() methods.
Known Derived Classes: ASCIIEncoding, Latin9Encoding, Latin1Encoding, UTF16Encoding, UTF8Encoding, Windows1252Encoding
Member Functions: characterMap, convert
typedef int CharacterMap[256];
The map[b] member gives information about byte sequences whose first byte is b. If map[b] is c where c is >= 0, then b by itself encodes the Unicode scalar value c. If map[b] is -1, then the byte sequence is malformed. If map[b] is -n, where n >= 2, then b is the first byte of an n-byte sequence that encodes a single Unicode scalar value. Byte sequences up to 6 bytes in length are supported.
The maximum character byte sequence length supported.
virtual ~TextEncoding();
Destroys the encoding.
virtual const CharacterMap & characterMap() const = 0;
Returns the CharacterMap for the encoding. The CharacterMap should be kept in a static member. As characterMap() can be called frequently, it should be implemented in such a way that it just returns a static map. If the map is built at runtime, this should be done in the constructor.
virtual int convert(
const unsigned char * bytes
) const;
The convert function is used to convert multibyte sequences; bytes will point to a byte sequence of n bytes where getCharacterMap()[*bytes] == -n.
The convert function must return the Unicode scalar value represented by this byte sequence or -1 if the byte sequence is malformed. The default implementation returns (int) bytes[0].
virtual int convert(
int ch,
unsigned char * bytes,
int length
) const;
Transform the Unicode character ch into the encoding's byte sequence. The method returns the number of bytes used. The method must not use more than length characters. Bytes and length can also be null - in this case only the number of bytes required to represent ch is returned. If the character cannot be converted, 0 is returned and the byte sequence remains unchanged. The default implementation simply returns 0.