#include <convert.h>
Public Member Functions | |
UnicodeConverter () | |
Creates Unicode Conversion Object will default to LATIN1 <-> encoding. | |
UnicodeConverter (const char *name, UErrorCode &err) | |
Creates Unicode Conversion Object by specifying the codepage name. | |
UnicodeConverter (const UnicodeString &name, UErrorCode &err) | |
Creates a UnicodeConverter object with the names specified as unicode strings. | |
UnicodeConverter (int32_t codepageNumber, UConverterPlatform platform, UErrorCode &err) | |
Creates Unicode Conversion Object using the codepage ID number. | |
void | fromUnicodeString (char *target, int32_t &targetSize, const UnicodeString &source, UErrorCode &err) const |
Transcodes the source UnicodeString to the target string in a codepage encoding with the specified Unicode converter. | |
void | toUnicodeString (UnicodeString &target, const char *source, int32_t sourceSize, UErrorCode &err) const |
Transcode the source string in codepage encoding to the target string in Unicode encoding. | |
void | fromUnicode (char *&target, const char *targetLimit, const UChar *&source, const UChar *sourceLimit, int32_t *offsets, UBool flush, UErrorCode &err) |
Transcodes an array of unicode characters to an array of codepage characters. | |
void | toUnicode (UChar *&target, const UChar *targetLimit, const char *&source, const char *sourceLimit, int32_t *offsets, UBool flush, UErrorCode &err) |
Converts an array of codepage characters into an array of unicode characters. | |
int8_t | getMaxBytesPerChar (void) const |
Returns the maximum length of bytes used by a character. | |
int8_t | getMinBytesPerChar (void) const |
Returns the minimum byte length for characters in this codepage. | |
UConverterType | getType (void) const |
Gets the type of conversion associated with the converter e.g. | |
void | getStarters (UBool starters[256], UErrorCode &err) const |
Gets the "starter" bytes for the converters of type MBCS will fill in an U_ILLEGAL_ARGUMENT_ERROR if converter passed in is not MBCS. | |
void | getSubstitutionChars (char *subChars, int8_t &len, UErrorCode &err) const |
Fills in the output parameter, subChars, with the substitution characters as multiple bytes. | |
void | setSubstitutionChars (const char *subChars, int8_t len, UErrorCode &err) |
Sets the substitution chars when converting from unicode to a codepage. | |
void | resetState (void) |
Resets the state of stateful conversion to the default state. | |
const char * | getName (UErrorCode &err) const |
Gets the name of the converter (zero-terminated). | |
int32_t | getCodepage (UErrorCode &err) const |
Gets a codepage number associated with the converter. | |
void | getMissingCharAction (UConverterToUCallback *action, const void **context) const |
Returns the current setting action taken when a character from a codepage is missing or a byte sequence is illegal etc. | |
void | getMissingUnicodeAction (UConverterFromUCallback *action, const void **context) const |
Return the current setting action taken when a unicode character is missing or there is an unpaired surrogate etc. | |
void | setMissingCharAction (UConverterToUCallback newAction, const void *newContext, UConverterToUCallback *oldAction, const void **oldContext, UErrorCode &err) |
Sets the current setting action taken when a character from a codepage is missing. | |
void | setMissingUnicodeAction (UConverterFromUCallback newAction, const void *newContext, UConverterFromUCallback *oldAction, const void **oldContext, UErrorCode &err) |
Sets the current setting action taken when a unicode character is missing. | |
void | getDisplayName (const Locale &displayLocale, UnicodeString &displayName) const |
Returns the localized name of the UnicodeConverter, if for any reason it is available, the internal name will be returned instead. | |
UConverterPlatform | getCodepagePlatform (UErrorCode &err) const |
Returns the T_UnicodeConverter_platform (ICU defined enum) of a UnicodeConverter available, the internal name will be returned instead. | |
UnicodeConverter & | operator= (const UnicodeConverter &that) |
UBool | operator== (const UnicodeConverter &that) const |
UBool | operator!= (const UnicodeConverter &that) const |
UnicodeConverter (const UnicodeConverter &that) | |
void | fixFileSeparator (UnicodeString &source) const |
Fixes the backslash character mismapping. | |
UBool | isAmbiguous (void) const |
Determines if the converter contains ambiguous mappings of the same character or not. | |
Static Public Member Functions | |
const char *const * | getAvailableNames (int32_t &num, UErrorCode &err) |
Returns the available names. | |
int32_t | flushCache (void) |
Iterates through every cached converter and frees all the unused ones. |
Use the more powerful C conversion API with the UConverter type and ucnv_... functions.
There are also two new functions in ICU 2.0 that convert a UnicodeString and extract a UnicodeString using a UConverter (search unistr.h for UConverter). They replace the fromUnicodeString() and toUnicodeString() functions here. All other UnicodeConverter functions are basically aliases of C API functions.
Old documentation:
UnicodeConverter is a C++ wrapper class for UConverter. You need one UnicodeConverter object in place of one UConverter object. For details on the API and implementation of the codepage converter interface see ucnv.h.
|
Creates Unicode Conversion Object will default to LATIN1 <-> encoding.
|
|
Creates Unicode Conversion Object by specifying the codepage name. The name string is in ASCII format.
|
|
Creates a UnicodeConverter object with the names specified as unicode strings. The name should be limited to the ASCII-7 alphanumerics. Dash and underscore characters are allowed for readability, but are ignored in the search.
|
|
Creates Unicode Conversion Object using the codepage ID number.
|
|
Fixes the backslash character mismapping. For example, in SJIS, the backslash character in the ASCII portion is also used to represent the yen currency sign. When mapping from Unicode character 0x005C, it's unclear whether to map the character back to yen or backslash in SJIS. This function will take the input buffer and replace all the yen sign characters with backslash. This is necessary when the user tries to open a file with the input buffer on Windows.
|
|
Iterates through every cached converter and frees all the unused ones.
|
|
Transcodes an array of unicode characters to an array of codepage characters. The source pointer is an I/O parameter, it starts out pointing at the place to begin translating, and ends up pointing after the first sequence of the bytes that it encounters that are semantically invalid. if T_UnicodeConverter_setMissingCharAction is called with an action other than STOP before a call is made to this API, consumed and source should point to the same place (unless target ends with an imcomplete sequence of bytes and flush is FALSE).
|
|
Transcodes the source UnicodeString to the target string in a codepage encoding with the specified Unicode converter. For example, if a Unicode to/from JIS converter is specified, the source string in Unicode will be transcoded to JIS encoding. The result will be stored in JIS encoding.
|
|
Returns the available names. Lazy evaluated, Library owns the storage
|
|
Gets a codepage number associated with the converter. This is not guaranteed to be the one used to create the converter. Some converters do not represent IBM registered codepages and return zero for the codepage number. The error code fill-in parameter indicates if the codepage number is available.
|
|
Returns the T_UnicodeConverter_platform (ICU defined enum) of a UnicodeConverter available, the internal name will be returned instead.
|
|
Returns the localized name of the UnicodeConverter, if for any reason it is available, the internal name will be returned instead.
|
|
Returns the maximum length of bytes used by a character. This varies between 1 and 4
|
|
Returns the minimum byte length for characters in this codepage. This is either 1 or 2 for all supported codepages.
|
|
Returns the current setting action taken when a character from a codepage is missing or a byte sequence is illegal etc.
|
|
Return the current setting action taken when a unicode character is missing or there is an unpaired surrogate etc.
|
|
Gets the name of the converter (zero-terminated). the name will be the internal name of the converter
|
|
Gets the "starter" bytes for the converters of type MBCS will fill in an fills in an array of boolean, with the value of the byte as offset to the array. At return, if TRUE is found in at offset 0x20, it means that the byte 0x20 is a starter byte in this converter.
|
|
Fills in the output parameter, subChars, with the substitution characters as multiple bytes.
|
|
Gets the type of conversion associated with the converter e.g. SBCS, MBCS, DBCS, UTF8, UTF16_BE, UTF16_LE, ISO_2022, EBCDIC_STATEFUL, LATIN_1
|
|
Determines if the converter contains ambiguous mappings of the same character or not.
|
|
Resets the state of stateful conversion to the default state. This is used in the case of error to restart a conversion from a known default state.
|
|
Sets the current setting action taken when a character from a codepage is missing. (Currently STOP or SUBSTITUTE).
|
|
Sets the current setting action taken when a unicode character is missing. (currently T_UnicodeConverter_MissingUnicodeAction is either STOP or SUBSTITUTE, SKIP, CLOSEST_MATCH, ESCAPE_SEQ may be added in the future).
|
|
Sets the substitution chars when converting from unicode to a codepage. The substitution is specified as a string of 1-4 bytes, and may contain null byte. The fill-in parameter err will get the error status on return.
|
|
Converts an array of codepage characters into an array of unicode characters. The source pointer is an I/O parameter, it starts out pointing at the place to begin translating, and ends up pointing after the first sequence of the bytes that it encounters that are semantically invalid. if T_UnicodeConverter_setMissingUnicodeAction is called with an action other than STOP before a call is made to this API, consumed and source should point to the same place (unless target ends with an imcomplete sequence of bytes and flush is FALSE).
|
|
Transcode the source string in codepage encoding to the target string in Unicode encoding. For example, if a Unicode to/from JIS converter is specified, the source string in JIS encoding will be transcoded to Unicode encoding. The result will be stored in Unicode encoding.
|