Accessing Locale Information

There are several ways to access locale information. The simplest way is to let the C library itself do the work. Several of the functions in this library implicitly access the locale data, and use what information is provided by the currently selected locale. This is how the locale model is meant to work normally.

As an example take the strftime function, which is meant to nicely format date and time information (the section called “Formatting Calendar Time”). Part of the standard information contained in the LC_TIME category is the names of the months. Instead of requiring the programmer to take care of providing the translations the strftime function does this all by itself. %A in the format string is replaced by the appropriate weekday name of the locale currently selected by LC_TIME. This is an easy example, and wherever possible functions do things automatically in this way.

But there are quite often situations when there is simply no function to perform the task, or it is simply not possible to do the work automatically. For these cases it is necessary to access the information in the locale directly. To do this the C library provides two functions: localeconv and nl_langinfo. The former is part of ISO C and therefore portable, but has a brain-damaged interface. The second is part of the Unix interface and is portable in as far as the system follows the Unix standards.

localeconv: It is portable but …

Together with the setlocale function the ISO C people invented the localeconv function. It is a masterpiece of poor design. It is expensive to use, not extendable, and not generally usable as it provides access to only LC_MONETARY and LC_NUMERIC related information. Nevertheless, if it is applicable to a given situation it should be used since it is very portable. The function strfmon formats monetary amounts according to the selected locale using this information. struct lconv * function>localeconv/function> (void) The localeconv function returns a pointer to a structure whose components contain information about how numeric and monetary values should be formatted in the current locale.

You should not modify the structure or its contents. The structure might be overwritten by subsequent calls to localeconv, or by calls to setlocale, but no other function in the library overwrites this value.

function>struct lconv/function> localeconv's return value is of this data type. Its elements are described in the following subsections.

If a member of the structure struct lconv has type char, and the value is CHAR_MAX, it means that the current locale has no value for that parameter.

Generic Numeric Formatting Parameters

These are the standard members of struct lconv; there may be others.

char *decimal_point, char *mon_decimal_point

These are the decimal-point separators used in formatting non-monetary and monetary quantities, respectively. In the C locale, the value of decimal_point is ".", and the value of mon_decimal_point is "".

char *thousands_sep, char *mon_thousands_sep

These are the separators used to delimit groups of digits to the left of the decimal point in formatting non-monetary and monetary quantities, respectively. In the C locale, both members have a value of "" (the empty string).

char *grouping, char *mon_grouping

These are strings that specify how to group the digits to the left of the decimal point. grouping applies to non-monetary quantities and mon_grouping applies to monetary quantities. Use either thousands_sep or mon_thousands_sep to separate the digit groups. Each member of these strings is to be interpreted as an integer value of type char. Successive numbers (from left to right) give the sizes of successive groups (from right to left, starting at the decimal point.) The last member is either 0, in which case the previous member is used over and over again for all the remaining groups, or CHAR_MAX, in which case there is no more grouping--or, put another way, any remaining digits form one large group without separators.

For example, if grouping is "\04\03\02", the correct grouping for the number 123456787654321 is 12, 34, 56, 78, 765, 4321. This uses a group of 4 digits at the end, preceded by a group of 3 digits, preceded by groups of 2 digits (as many as needed). With a separator of ,, the number would be printed as 12,34,56,78,765,4321.

A value of "\03" indicates repeated groups of three digits, as normally used in the U.S.

In the standard C locale, both grouping and mon_grouping have a value of "". This value specifies no grouping at all.

char int_frac_digits, char frac_digits

These are small integers indicating how many fractional digits (to the right of the decimal point) should be displayed in a monetary value in international and local formats, respectively. (Most often, both members have the same value.)

In the standard C locale, both of these members have the value CHAR_MAX, meaning "unspecified". The ISO standard doesn't say what to do when you find this value; we recommend printing no fractional digits. (This locale also specifies the empty string for mon_decimal_point, so printing any fractional digits would be confusing!)

Printing the Currency Symbol

These members of the struct lconv structure specify how to print the symbol to identify a monetary value--the international analog of $ for US dollars.

Each country has two standard currency symbols. The local currency symbol is used commonly within the country, while the international currency symbol is used internationally to refer to that country's currency when it is necessary to indicate the country unambiguously.

For example, many countries use the dollar as their monetary unit, and when dealing with international currencies it's important to specify that one is dealing with (say) Canadian dollars instead of U.S. dollars or Australian dollars. But when the context is known to be Canada, there is no need to make this explicit--dollar amounts are implicitly assumed to be in Canadian dollars.

char *currency_symbol

The local currency symbol for the selected locale.

In the standard C locale, this member has a value of "" (the empty string), meaning "unspecified". The ISO standard doesn't say what to do when you find this value; we recommend you simply print the empty string as you would print any other string pointed to by this variable.

char *int_curr_symbol

The international currency symbol for the selected locale.

The value of int_curr_symbol should normally consist of a three-letter abbreviation determined by the international standard [ISO 4217 Codes for the Representation of Currency and Funds], followed by a one-character separator (often a space).

In the standard C locale, this member has a value of "" (the empty string), meaning "unspecified". We recommend you simply print the empty string as you would print any other string pointed to by this variable.

char p_cs_precedes, char n_cs_precedes, char int_p_cs_precedes, char int_n_cs_precedes

These members are 1 if the currency_symbol or int_curr_symbol strings should precede the value of a monetary amount, or 0 if the strings should follow the value. The p_cs_precedes and int_p_cs_precedes members apply to positive amounts (or zero), and the n_cs_precedes and int_n_cs_precedes members apply to negative amounts.

In the standard C locale, all of these members have a value of CHAR_MAX, meaning "unspecified". The ISO standard doesn't say what to do when you find this value. We recommend printing the currency symbol before the amount, which is right for most countries. In other words, treat all nonzero values alike in these members.

The members with the int_ prefix apply to the int_curr_symbol while the other two apply to currency_symbol.

char p_sep_by_space, char n_sep_by_space, char int_p_sep_by_space, char int_n_sep_by_space

These members are 1 if a space should appear between the currency_symbol or int_curr_symbol strings and the amount, or 0 if no space should appear. The p_sep_by_space and int_p_sep_by_space members apply to positive amounts (or zero), and the n_sep_by_space and int_n_sep_by_space members apply to negative amounts.

In the standard C locale, all of these members have a value of CHAR_MAX, meaning "unspecified". The ISO standard doesn't say what you should do when you find this value; we suggest you treat it as 1 (print a space). In other words, treat all nonzero values alike in these members.

The members with the int_ prefix apply to the int_curr_symbol while the other two apply to currency_symbol. There is one specialty with the int_curr_symbol, though. Since all legal values contain a space at the end the string one either printf this space (if the currency symbol must appear in front and must be separated) or one has to avoid printing this character at all (especially when at the end of the string).

Printing the Sign of a Monetary Amount

These members of the struct lconv structure specify how to print the sign (if any) of a monetary value.

char *positive_sign, char *negative_sign

These are strings used to indicate positive (or zero) and negative monetary quantities, respectively.

In the standard C locale, both of these members have a value of "" (the empty string), meaning "unspecified".

The ISO standard doesn't say what to do when you find this value; we recommend printing positive_sign as you find it, even if it is empty. For a negative value, print negative_sign as you find it unless both it and positive_sign are empty, in which case print - instead. (Failing to indicate the sign at all seems rather unreasonable.)

char p_sign_posn, char n_sign_posn, char int_p_sign_posn, char int_n_sign_posn

These members are small integers that indicate how to position the sign for nonnegative and negative monetary quantities, respectively. (The string used by the sign is what was specified with positive_sign or negative_sign.) The possible values are as follows:

0

The currency symbol and quantity should be surrounded by parentheses.

1

Print the sign string before the quantity and currency symbol.

2

Print the sign string after the quantity and currency symbol.

3

Print the sign string right before the currency symbol.

4

Print the sign string right after the currency symbol.

CHAR_MAX

"Unspecified". Both members have this value in the standard C locale.

The ISO standard doesn't say what you should do when the value is CHAR_MAX. We recommend you print the sign after the currency symbol.

The members with the int_ prefix apply to the int_curr_symbol while the other two apply to currency_symbol.

Pinpoint Access to Locale Data

When writing the X/Open Portability Guide the authors realized that the localeconv function is not enough to provide reasonable access to locale information. The information which was meant to be available in the locale (as later specified in the POSIX.1 standard) requires more ways to access it. Therefore the nl_langinfo function was introduced.

char * function>nl_langinfo/function> (nl_item item) The nl_langinfo function can be used to access individual elements of the locale categories. Unlike the localeconv function, which returns all the information, nl_langinfo lets the caller select what information it requires. This is very fast and it is not a problem to call this function multiple times.

A second advantage is that in addition to the numeric and monetary formatting information, information from the LC_TIME and LC_MESSAGES categories is available.

The type nl_type is defined in nl_types.h. The argument item is a numeric value defined in the header langinfo.h. The X/Open standard defines the following values:

CODESET

nl_langinfo returns a string with the name of the coded character set used in the selected locale.

ABDAY_1, ABDAY_2, ABDAY_3, ABDAY_4, ABDAY_5, ABDAY_6, ABDAY_7

nl_langinfo returns the abbreviated weekday name. ABDAY_1 corresponds to Sunday.

DAY_1, DAY_2, DAY_3, DAY_4, DAY_5, DAY_6, DAY_7

Similar to ABDAY_1 etc., but here the return value is the unabbreviated weekday name.

ABMON_1, ABMON_2, ABMON_3, ABMON_4, ABMON_5, ABMON_6, ABMON_7, ABMON_8, ABMON_9, ABMON_10, ABMON_11, ABMON_12

The return value is abbreviated name of the month. ABMON_1 corresponds to January.

MON_1, MON_2, MON_3, MON_4, MON_5, MON_6, MON_7, MON_8, MON_9, MON_10, MON_11, MON_12

Similar to ABMON_1 etc., but here the month names are not abbreviated. Here the first value MON_1 also corresponds to January.

AM_STR, PM_STR

The return values are strings which can be used in the representation of time as an hour from 1 to 12 plus an am/pm specifier.

Note that in locales which do not use this time representation these strings might be empty, in which case the am/pm format cannot be used at all.

D_T_FMT

The return value can be used as a format string for strftime to represent time and date in a locale-specific way.

D_FMT

The return value can be used as a format string for strftime to represent a date in a locale-specific way.

T_FMT

The return value can be used as a format string for strftime to represent time in a locale-specific way.

T_FMT_AMPM

The return value can be used as a format string for strftime to represent time in the am/pm format.

Note that if the am/pm format does not make any sense for the selected locale, the return value might be the same as the one for T_FMT.

ERA

The return value represents the era used in the current locale.

Most locales do not define this value. An example of a locale which does define this value is the Japanese one. In Japan, the traditional representation of dates includes the name of the era corresponding to the then-emperor's reign.

Normally it should not be necessary to use this value directly. Specifying the E modifier in their format strings causes the strftime functions to use this information. The format of the returned string is not specified, and therefore you should not assume knowledge of it on different systems.

ERA_YEAR

The return value gives the year in the relevant era of the locale. As for ERA it should not be necessary to use this value directly.

ERA_D_T_FMT

This return value can be used as a format string for strftime to represent dates and times in a locale-specific era-based way.

ERA_D_FMT

This return value can be used as a format string for strftime to represent a date in a locale-specific era-based way.

ERA_T_FMT

This return value can be used as a format string for strftime to represent time in a locale-specific era-based way.

ALT_DIGITS

The return value is a representation of up to 100 values used to represent the values 0 to 99. As for ERA this value is not intended to be used directly, but instead indirectly through the strftime function. When the modifier O is used in a format which would otherwise use numerals to represent hours, minutes, seconds, weekdays, months, or weeks, the appropriate value for the locale is used instead.

INT_CURR_SYMBOL

The same as the value returned by localeconv in the int_curr_symbol element of the struct lconv.

CURRENCY_SYMBOL, CRNCYSTR

The same as the value returned by localeconv in the currency_symbol element of the struct lconv.

CRNCYSTR is a deprecated alias still required by Unix98.

MON_DECIMAL_POINT

The same as the value returned by localeconv in the mon_decimal_point element of the struct lconv.

MON_THOUSANDS_SEP

The same as the value returned by localeconv in the mon_thousands_sep element of the struct lconv.

MON_GROUPING

The same as the value returned by localeconv in the mon_grouping element of the struct lconv.

POSITIVE_SIGN

The same as the value returned by localeconv in the positive_sign element of the struct lconv.

NEGATIVE_SIGN

The same as the value returned by localeconv in the negative_sign element of the struct lconv.

INT_FRAC_DIGITS

The same as the value returned by localeconv in the int_frac_digits element of the struct lconv.

FRAC_DIGITS

The same as the value returned by localeconv in the frac_digits element of the struct lconv.

P_CS_PRECEDES

The same as the value returned by localeconv in the p_cs_precedes element of the struct lconv.

P_SEP_BY_SPACE

The same as the value returned by localeconv in the p_sep_by_space element of the struct lconv.

N_CS_PRECEDES

The same as the value returned by localeconv in the n_cs_precedes element of the struct lconv.

N_SEP_BY_SPACE

The same as the value returned by localeconv in the n_sep_by_space element of the struct lconv.

P_SIGN_POSN

The same as the value returned by localeconv in the p_sign_posn element of the struct lconv.

N_SIGN_POSN

The same as the value returned by localeconv in the n_sign_posn element of the struct lconv.

INT_P_CS_PRECEDES

The same as the value returned by localeconv in the int_p_cs_precedes element of the struct lconv.

INT_P_SEP_BY_SPACE

The same as the value returned by localeconv in the int_p_sep_by_space element of the struct lconv.

INT_N_CS_PRECEDES

The same as the value returned by localeconv in the int_n_cs_precedes element of the struct lconv.

INT_N_SEP_BY_SPACE

The same as the value returned by localeconv in the int_n_sep_by_space element of the struct lconv.

INT_P_SIGN_POSN

The same as the value returned by localeconv in the int_p_sign_posn element of the struct lconv.

INT_N_SIGN_POSN

The same as the value returned by localeconv in the int_n_sign_posn element of the struct lconv.

DECIMAL_POINT, RADIXCHAR

The same as the value returned by localeconv in the decimal_point element of the struct lconv.

The name RADIXCHAR is a deprecated alias still used in Unix98.

THOUSANDS_SEP, THOUSEP

The same as the value returned by localeconv in the thousands_sep element of the struct lconv.

The name THOUSEP is a deprecated alias still used in Unix98.

GROUPING

The same as the value returned by localeconv in the grouping element of the struct lconv.

YESEXPR

The return value is a regular expression which can be used with the regex function to recognize a positive response to a yes/no question. The GNU C library provides the rpmatch function for easier handling in applications.

NOEXPR

The return value is a regular expression which can be used with the regex function to recognize a negative response to a yes/no question.

YESSTR

The return value is a locale-specific translation of the positive response to a yes/no question.

Using this value is deprecated since it is a very special case of message translation, and is better handled by the message translation functions (Chapter 9).

The use of this symbol is deprecated. Instead message translation should be used.

NOSTR

The return value is a locale-specific translation of the negative response to a yes/no question. What is said for YESSTR is also true here.

The use of this symbol is deprecated. Instead message translation should be used.

The file langinfo.h defines a lot more symbols but none of them is official. Using them is not portable, and the format of the return values might change. Therefore we recommended you not use them.

Note that the return value for any valid argument can be used for in all situations (with the possible exception of the am/pm time formatting codes). If the user has not selected any locale for the appropriate category, nl_langinfo returns the information from the "C" locale. It is therefore possible to use this function as shown in the example below.

If the argument item is not valid, a pointer to an empty string is returned.

An example of nl_langinfo usage is a function which has to print a given date and time in a locale-specific way. At first one might think that, since strftime internally uses the locale information, writing something like the following is enough:

size_t
i18n_time_n_data (char *s, size_t len, const struct tm *tp)
{
  return strftime (s, len, "%X %D", tp);
}

The format contains no weekday or month names and therefore is internationally usable. Wrong! The output produced is something like "hh:mm:ss MM/DD/YY". This format is only recognizable in the USA. Other countries use different formats. Therefore the function should be rewritten like this:

size_t
i18n_time_n_data (char *s, size_t len, const struct tm *tp)
{
  return strftime (s, len, nl_langinfo (D_T_FMT), tp);
}

Now it uses the date and time format of the locale selected when the program runs. If the user selects the locale correctly there should never be a misunderstanding over the time and date format.