Most of the time, if you choose the proper C data type for each object in your program, you need not be concerned with just how it is represented or how many bits it uses. When you do need such information, the C language itself does not provide a way to get it. The header files limits.h and float.h contain macros which give you this information in full detail.
The most common reason that a program needs to know how many bits are in an integer type is for using an array of long int as a bit vector. You can access the bit at index n with
vector[n / LONGBITS] (1 (n % LONGBITS))
provided you define LONGBITS as the number of bits in a long int.
There is no operator in the C language that can give you the number of bits in an integer data type. But you can compute it from the macro CHAR_BIT, defined in the header file limits.h.
This is the number of bits in a char--eight, on most systems. The value has type int.
You can compute the number of bits in any data type type like this:
sizeof (type) * CHAR_BIT
Suppose you need to store an integer value which can range from zero to one million. Which is the smallest type you can use? There is no general rule; it depends on the C compiler and target machine. You can use the MIN and MAX macros in limits.h to determine which type will work.
Each signed integer type has a pair of macros which give the smallest and largest values that it can hold. Each unsigned integer type has one such macro, for the maximum value; the minimum value is, of course, zero.
The values of these macros are all integer constant expressions. The MAX and MIN macros for char and short int types have values of type int. The MAX and MIN macros for the other types have values of the same type described by the macro--thus, ULONG_MAX has type unsigned long int.
This is the minimum value that can be represented by a signed char.
These are the maximum values that can be represented by a signed char and unsigned char, respectively.
This is the minimum value that can be represented by a char. It's equal to SCHAR_MIN if char is signed, or zero otherwise.
This is the maximum value that can be represented by a char. It's equal to SCHAR_MAX if char is signed, or UCHAR_MAX otherwise.
This is the minimum value that can be represented by a signed short int. On most machines that the GNU C library runs on, short integers are 16-bit quantities.
These are the maximum values that can be represented by a signed short int and unsigned short int, respectively.
This is the minimum value that can be represented by a signed int. On most machines that the GNU C system runs on, an int is a 32-bit quantity.
These are the maximum values that can be represented by, respectively, the type signed int and the type unsigned int.
This is the minimum value that can be represented by a signed long int. On most machines that the GNU C system runs on, long integers are 32-bit quantities, the same size as int.
These are the maximum values that can be represented by a signed long int and unsigned long int, respectively.
This is the minimum value that can be represented by a signed long long int. On most machines that the GNU C system runs on, long long integers are 64-bit quantities.
These are the maximum values that can be represented by a signed long long int and unsigned long long int, respectively.
This is the maximum value that can be represented by a wchar_t. the section called “Introduction to Extended Characters”.
The header file limits.h also defines some additional constants that parameterize various operating system and file system limits. These constants are described in Chapter 32.
The specific representation of floating point numbers varies from machine to machine. Because floating point numbers are represented internally as approximate quantities, algorithms for manipulating floating point data often need to take account of the precise details of the machine's floating point representation.
Some of the functions in the C library itself need this information; for example, the algorithms for printing and reading floating point numbers (Chapter 13) and for calculating trigonometric and irrational functions (Chapter 20) use it to avoid round-off error and loss of accuracy. User programs that implement numerical analysis techniques also often need this information in order to minimize or compute error bounds.
The header file float.h describes the format used by your machine.
This section introduces the terminology for describing floating point representations.
You are probably already familiar with most of these concepts in terms of scientific or exponential notation for floating point numbers. For example, the number 123456.0 could be expressed in exponential notation as 1.23456e+05, a shorthand notation indicating that the mantissa 1.23456 is multiplied by the base 10 raised to power 5.
More formally, the internal representation of a floating point number can be characterized in terms of the following parameters:
The base or radix for exponentiation, an integer greater than 1. This is a constant for a particular representation.
The exponent to which the base is raised. The upper and lower bounds of the exponent value are constants for a particular representation.
Sometimes, in the actual bits representing the floating point number, the exponent is biased by adding a constant to it, to make it always be represented as an unsigned quantity. This is only important if you have some reason to pick apart the bit fields making up the floating point number by hand, which is something for which the GNU library provides no support. So this is ignored in the discussion that follows.
The mantissa or significand is an unsigned integer which is a part of each floating point number.
The precision of the mantissa. If the base of the representation is b, then the precision is the number of base-b digits in the mantissa. This is a constant for a particular representation.
Many floating point representations have an implicit hidden bit in the mantissa. This is a bit which is present virtually in the mantissa, but not stored in memory because its value is always 1 in a normalized number. The precision figure (see above) includes any hidden bits.
Again, the GNU library provides no facilities for dealing with such low-level aspects of the representation.
The mantissa of a floating point number represents an implicit fraction whose denominator is the base raised to the power of the precision. Since the largest representable mantissa is one less than this denominator, the value of the fraction is always strictly less than 1. The mathematical value of a floating point number is then the product of this fraction, the sign, and the base raised to the exponent.
We say that the floating point number is normalized if the fraction is at least 1/b, where b is the base. In other words, the mantissa would be too large to fit if it were multiplied by the base. Non-normalized numbers are sometimes called denormal; they contain less precision than the representation normally can hold.
If the number is not normalized, then you can subtract 1 from the exponent while multiplying the mantissa by the base, and get another floating point number with the same value. Normalization consists of doing this repeatedly until the number is normalized. Two distinct normalized floating point numbers cannot be equal in value.
(There is an exception to this rule: if the mantissa is zero, it is considered normalized. Another exception happens on certain machines where the exponent is as small as the representation can hold. Then it is impossible to subtract 1 from the exponent, so a number may be normalized even if its fraction is less than 1/b.)
These macro definitions can be accessed by including the header file float.h in your program.
Macro names starting with FLT_ refer to the float type, while names beginning with DBL_ refer to the double type and names beginning with LDBL_ refer to the long double type. (If GCC does not support long double as a distinct data type on a target machine then the values for the LDBL_ constants are equal to the corresponding constants for the double type.)
Of these macros, only FLT_RADIX is guaranteed to be a constant expression. The other macros listed here cannot be reliably used in places that require constant expressions, such as #if preprocessing directives or in the dimensions of static arrays.
Although the ISO C standard specifies minimum and maximum values for most of these parameters, the GNU C implementation uses whatever values describe the floating point representation of the target machine. So in principle GNU C actually satisfies the ISO C requirements only if the target machine is suitable. In practice, all the machines currently supported are suitable.
This value characterizes the rounding mode for floating point addition. The following values indicate standard rounding modes:
The mode is indeterminable.
Rounding is towards zero.
Rounding is to the nearest number.
Rounding is towards positive infinity.
Rounding is towards negative infinity.
Any other value represents a machine-dependent nonstandard rounding mode.
On most machines, the value is 1, in accordance with the IEEE standard for floating point.
Here is a table showing how certain values round for each possible value of FLT_ROUNDS, if the other aspects of the representation match the IEEE single-precision standard.
0 1 2 3 1.00000003 1.0 1.0 1.00000012 1.0 1.00000007 1.0 1.00000012 1.00000012 1.0 -1.00000003 -1.0 -1.0 -1.0 -1.00000012 -1.00000007 -1.0 -1.00000012 -1.0 -1.00000012
This is the value of the base, or radix, of the exponent representation. This is guaranteed to be a constant expression, unlike the other macros described in this section. The value is 2 on all machines we know of except the IBM 360 and derivatives.
This is the number of base-FLT_RADIX digits in the floating point mantissa for the float data type. The following expression yields 1.0 (even though mathematically it should not) due to the limited number of mantissa digits:
float radix = FLT_RADIX; 1.0f + 1.0f / radix / radix / … / radix
where radix appears FLT_MANT_DIG times.
This is the number of base-FLT_RADIX digits in the floating point mantissa for the data types double and long double, respectively.
This is the number of decimal digits of precision for the float data type. Technically, if p and b are the precision and base (respectively) for the representation, then the decimal precision q is the maximum number of decimal digits such that any floating point number with q base 10 digits can be rounded to a floating point number with p base b digits and back again, without change to the q decimal digits.
The value of this macro is supposed to be at least 6, to satisfy ISO C.
These are similar to FLT_DIG, but for the data types double and long double, respectively. The values of these macros are supposed to be at least 10.
This is the smallest possible exponent value for type float. More precisely, is the minimum negative integer such that the value FLT_RADIX raised to this power minus 1 can be represented as a normalized floating point number of type float.
These are similar to FLT_MIN_EXP, but for the data types double and long double, respectively.
This is the minimum negative integer such that 10 raised to this power minus 1 can be represented as a normalized floating point number of type float. This is supposed to be -37 or even less.
These are similar to FLT_MIN_10_EXP, but for the data types double and long double, respectively.
This is the largest possible exponent value for type float. More precisely, this is the maximum positive integer such that value FLT_RADIX raised to this power minus 1 can be represented as a floating point number of type float.
These are similar to FLT_MAX_EXP, but for the data types double and long double, respectively.
This is the maximum positive integer such that 10 raised to this power minus 1 can be represented as a normalized floating point number of type float. This is supposed to be at least 37.
These are similar to FLT_MAX_10_EXP, but for the data types double and long double, respectively.
The value of this macro is the maximum number representable in type float. It is supposed to be at least 1E+37. The value has type float.
The smallest representable number is - FLT_MAX.
These are similar to FLT_MAX, but for the data types double and long double, respectively. The type of the macro's value is the same as the type it describes.
The value of this macro is the minimum normalized positive floating point number that is representable in type float. It is supposed to be no more than 1E-37.
These are similar to FLT_MIN, but for the data types double and long double, respectively. The type of the macro's value is the same as the type it describes.
This is the maximum positive floating point number of type float such that 1.0 + FLT_EPSILON != 1.0 is true. It's supposed to be no greater than 1E-5.
These are similar to FLT_EPSILON, but for the data types double and long double, respectively. The type of the macro's value is the same as the type it describes. The values are not supposed to be greater than 1E-9.
Here is an example showing how the floating type measurements come out for the most common floating point representation, specified by the [IEEE Standard for Binary Floating Point Arithmetic (ANSI/IEEE Std 754-1985)]. Nearly all computers designed since the 1980s use this format.
The IEEE single-precision float representation uses a base of 2. There is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total precision is 24 base-2 digits), and an 8-bit exponent that can represent values in the range -125 to 128, inclusive.
So, for an implementation that uses this representation for the float data type, appropriate values for the corresponding parameters are:
FLT_RADIX 2 FLT_MANT_DIG 24 FLT_DIG 6 FLT_MIN_EXP -125 FLT_MIN_10_EXP -37 FLT_MAX_EXP 128 FLT_MAX_10_EXP +38 FLT_MIN 1.17549435E-38F FLT_MAX 3.40282347E+38F FLT_EPSILON 1.19209290E-07F
Here are the values for the double data type:
DBL_MANT_DIG 53 DBL_DIG 15 DBL_MIN_EXP -1021 DBL_MIN_10_EXP -307 DBL_MAX_EXP 1024 DBL_MAX_10_EXP 308 DBL_MAX 1.7976931348623157E+308 DBL_MIN 2.2250738585072014E-308 DBL_EPSILON 2.2204460492503131E-016
You can use offsetof to measure the location within a structure type of a particular structure member.
size_t function>offsetof/function> (type, member) This expands to a integer constant expression that is the offset of the structure member named member in a the structure type type. For example, offsetof (struct s, elem) is the offset, in bytes, of the member elem in a struct s.
This macro won't work if member is a bit field; you get an error from the C compiler in that case.