Arithmetic Functions

The C library provides functions to do basic operations on floating-point numbers. These include absolute value, maximum and minimum, normalization, bit twiddling, rounding, and a few others.

Absolute Value

These functions are provided for obtaining the absolute value (or magnitude) of a number. The absolute value of a real number x is x if x is positive, −x if x is negative. For a complex number z, whose real part is x and whose imaginary part is y, the absolute value is sqrt (x*x + y*y).

Prototypes for abs, labs and llabs are in stdlib.h; imaxabs is declared in inttypes.h; fabs, fabsf and fabsl are declared in math.h. cabs, cabsf and cabsl are declared in complex.h.

int function>abs/function> (int number) long int function>labs/function> (long int number) long long int function>llabs/function> (long long int number) intmax_t function>imaxabs/function> (intmax_t number) These functions return the absolute value of number.

Most computers use a two's complement integer representation, in which the absolute value of INT_MIN (the smallest possible int) cannot be represented; thus, abs (INT_MIN) is not defined.

llabs and imaxdiv are new to ISO C99.

See the section called “Integers” for a description of the intmax_t type.

double function>fabs/function> (double number) float function>fabsf/function> (float number) long double function>fabsl/function> (long double number) This function returns the absolute value of the floating-point number number.

double function>cabs/function> (complex double z) float function>cabsf/function> (complex float z) long double function>cabsl/function> (complex long double z) These functions return the absolute value of the complex number z (the section called “Complex Numbers”). The absolute value of a complex number is:

sqrt (creal (z) * creal (z) + cimag (z) * cimag (z))

This function should always be used instead of the direct formula because it takes special care to avoid losing precision. It may also take advantage of hardware support for this operation. See hypot in the section called “Exponentiation and Logarithms”.

Normalization Functions

The functions described in this section are primarily provided as a way to efficiently perform certain low-level manipulations on floating point numbers that are represented internally using a binary radix; see the section called “Floating Point Representation Concepts”. These functions are required to have equivalent behavior even if the representation does not use a radix of 2, but of course they are unlikely to be particularly efficient in those cases.

All these functions are declared in math.h.

double function>frexp/function> (double value, int *exponent) float function>frexpf/function> (float value, int *exponent) long double function>frexpl/function> (long double value, int *exponent) These functions are used to split the number value into a normalized fraction and an exponent.

If the argument value is not zero, the return value is value times a power of two, and is always in the range 1/2 (inclusive) to 1 (exclusive). The corresponding exponent is stored in *exponent; the return value multiplied by 2 raised to this exponent equals the original number value.

For example, frexp (12.8, exponent) returns 0.8 and stores 4 in exponent.

If value is zero, then the return value is zero and zero is stored in *exponent.

double function>ldexp/function> (double value, int exponent) float function>ldexpf/function> (float value, int exponent) long double function>ldexpl/function> (long double value, int exponent) These functions return the result of multiplying the floating-point number value by 2 raised to the power exponent. (It can be used to reassemble floating-point numbers that were taken apart by frexp.)

For example, ldexp (0.8, 4) returns 12.8.

The following functions, which come from BSD, provide facilities equivalent to those of ldexp and frexp. See also the ISO C function logb which originally also appeared in BSD.

double function>scalb/function> (double value, int exponent) float function>scalbf/function> (float value, int exponent) long double function>scalbl/function> (long double value, int exponent) The scalb function is the BSD name for ldexp.

long long int function>scalbn/function> (double x, int n) long long int function>scalbnf/function> (float x, int n) long long int function>scalbnl/function> (long double x, int n) scalbn is identical to scalb, except that the exponent n is an int instead of a floating-point number.

long long int function>scalbln/function> (double x, long int n) long long int function>scalblnf/function> (float x, long int n) long long int function>scalblnl/function> (long double x, long int n) scalbln is identical to scalb, except that the exponent n is a long int instead of a floating-point number.

long long int function>significand/function> (double x) long long int function>significandf/function> (float x) long long int function>significandl/function> (long double x) significand returns the mantissa of x scaled to the range [1, 2). It is equivalent to scalb (x, (double) -ilogb (x)).

This function exists mainly for use in certain standardized tests of IEEE 754 conformance.

Rounding Functions

The functions listed here perform operations such as rounding and truncation of floating-point values. Some of these functions convert floating point numbers to integer values. They are all declared in math.h.

You can also convert floating-point numbers to integers simply by casting them to int. This discards the fractional part, effectively rounding towards zero. However, this only works if the result can actually be represented as an int--for very large numbers, this is impossible. The functions listed here return the result as a double instead to get around this problem.

double function>ceil/function> (double x) float function>ceilf/function> (float x) long double function>ceill/function> (long double x) These functions round x upwards to the nearest integer, returning that value as a double. Thus, ceil (1.5) is 2.0.

double function>floor/function> (double x) float function>floorf/function> (float x) long double function>floorl/function> (long double x) These functions round x downwards to the nearest integer, returning that value as a double. Thus, floor (1.5) is 1.0 and floor (-1.5) is -2.0.

double function>trunc/function> (double x) float function>truncf/function> (float x) long double function>truncl/function> (long double x) The trunc functions round x towards zero to the nearest integer (returned in floating-point format). Thus, trunc (1.5) is 1.0 and trunc (-1.5) is -1.0.

double function>rint/function> (double x) float function>rintf/function> (float x) long double function>rintl/function> (long double x) These functions round x to an integer value according to the current rounding mode. the section called “Floating Point Parameters”, for information about the various rounding modes. The default rounding mode is to round to the nearest integer; some machines support other modes, but round-to-nearest is always used unless you explicitly select another.

If x was not initially an integer, these functions raise the inexact exception.

double function>nearbyint/function> (double x) float function>nearbyintf/function> (float x) long double function>nearbyintl/function> (long double x) These functions return the same value as the rint functions, but do not raise the inexact exception if x is not an integer.

double function>round/function> (double x) float function>roundf/function> (float x) long double function>roundl/function> (long double x) These functions are similar to rint, but they round halfway cases away from zero instead of to the nearest even integer.

long int function>lrint/function> (double x) long int function>lrintf/function> (float x) long int function>lrintl/function> (long double x) These functions are just like rint, but they return a long int instead of a floating-point number.

long long int function>llrint/function> (double x) long long int function>llrintf/function> (float x) long long int function>llrintl/function> (long double x) These functions are just like rint, but they return a long long int instead of a floating-point number.

long int function>lround/function> (double x) long int function>lroundf/function> (float x) long int function>lroundl/function> (long double x) These functions are just like round, but they return a long int instead of a floating-point number.

long long int function>llround/function> (double x) long long int function>llroundf/function> (float x) long long int function>llroundl/function> (long double x) These functions are just like round, but they return a long long int instead of a floating-point number.

double function>modf/function> (double value, double *integer-part) float function>modff/function> (float value, float *integer-part) long double function>modfl/function> (long double value, long double *integer-part) These functions break the argument value into an integer part and a fractional part (between -1 and 1, exclusive). Their sum equals value. Each of the parts has the same sign as value, and the integer part is always rounded toward zero.

modf stores the integer part in *integer-part, and returns the fractional part. For example, modf (2.5, intpart) returns 0.5 and stores 2.0 into intpart.

Remainder Functions

The functions in this section compute the remainder on division of two floating-point numbers. Each is a little different; pick the one that suits your problem.

double function>fmod/function> (double numerator, double denominator) float function>fmodf/function> (float numerator, float denominator) long double function>fmodl/function> (long double numerator, long double denominator) These functions compute the remainder from the division of numerator by denominator. Specifically, the return value is numerator - n * denominator, where n is the quotient of numerator divided by denominator, rounded towards zero to an integer. Thus, fmod (6.5, 2.3) returns 1.9, which is 6.5 minus 4.6.

The result has the same sign as the numerator and has magnitude less than the magnitude of the denominator.

If denominator is zero, fmod signals a domain error.

double function>drem/function> (double numerator, double denominator) float function>dremf/function> (float numerator, float denominator) long double function>dreml/function> (long double numerator, long double denominator) These functions are like fmod except that they rounds the internal quotient n to the nearest integer instead of towards zero to an integer. For example, drem (6.5, 2.3) returns -0.4, which is 6.5 minus 6.9.

The absolute value of the result is less than or equal to half the absolute value of the denominator. The difference between fmod (numerator, denominator) and drem (numerator, denominator) is always either denominator, minus denominator, or zero.

If denominator is zero, drem signals a domain error.

double function>remainder/function> (double numerator, double denominator) float function>remainderf/function> (float numerator, float denominator) long double function>remainderl/function> (long double numerator, long double denominator) This function is another name for drem.

Setting and modifying single bits of FP values

There are some operations that are too complicated or expensive to perform by hand on floating-point numbers. ISO C99 defines functions to do these operations, which mostly involve changing single bits.

double function>copysign/function> (double x, double y) float function>copysignf/function> (float x, float y) long double function>copysignl/function> (long double x, long double y) These functions return x but with the sign of y. They work even if x or y are NaN or zero. Both of these can carry a sign (although not all implementations support it) and this is one of the few operations that can tell the difference.

copysign never raises an exception.

This function is defined in IEC 559 (and the appendix with recommended functions in IEEE 754/IEEE 854).

int function>signbit/function> (float-typex) signbit is a generic macro which can work on all floating-point types. It returns a nonzero value if the value of x has its sign bit set.

This is not the same as x 0.0, because IEEE 754 floating point allows zero to be signed. The comparison -0.0 0.0 is false, but signbit (-0.0) will return a nonzero value.

double function>nextafter/function> (double x, double y) float function>nextafterf/function> (float x, float y) long double function>nextafterl/function> (long double x, long double y) The nextafter function returns the next representable neighbor of x in the direction towards y. The size of the step between x and the result depends on the type of the result. If x = y the function simply returns y. If either value is NaN, NaN is returned. Otherwise a value corresponding to the value of the least significant bit in the mantissa is added or subtracted, depending on the direction. nextafter will signal overflow or underflow if the result goes outside of the range of normalized numbers.

This function is defined in IEC 559 (and the appendix with recommended functions in IEEE 754/IEEE 854).

double function>nexttoward/function> (double x, long double y) float function>nexttowardf/function> (float x, long double y) long double function>nexttowardl/function> (long double x, long double y) These functions are identical to the corresponding versions of nextafter except that their second argument is a long double.

double function>nan/function> (const char *tagp) float function>nanf/function> (const char *tagp) long double function>nanl/function> (const char *tagp) The nan function returns a representation of NaN, provided that NaN is supported by the target platform. nan ("n-char-sequence") is equivalent to strtod ("NAN(n-char-sequence)").

The argument tagp is used in an unspecified manner. On IEEE 754 systems, there are many representations of NaN, and tagp selects one. On other systems it may do nothing.

Floating-Point Comparison Functions

The standard C comparison operators provoke exceptions when one or other of the operands is NaN. For example,

int v = a  1.0;

will raise an exception if a is NaN. (This does not happen with == and !=; those merely return false and true, respectively, when NaN is examined.) Frequently this exception is undesirable. ISO C99 therefore defines comparison functions that do not raise exceptions when NaN is examined. All of the functions are implemented as macros which allow their arguments to be of any floating-point type. The macros are guaranteed to evaluate their arguments only once.

int function>isgreater/function> (real-floatingx, real-floatingy) This macro determines whether the argument x is greater than y. It is equivalent to (x) (y), but no exception is raised if x or y are NaN.

int function>isgreaterequal/function> (real-floatingx, real-floatingy) This macro determines whether the argument x is greater than or equal to y. It is equivalent to (x) = (y), but no exception is raised if x or y are NaN.

int function>isless/function> (real-floatingx, real-floatingy) This macro determines whether the argument x is less than y. It is equivalent to (x) (y), but no exception is raised if x or y are NaN.

int function>islessequal/function> (real-floatingx, real-floatingy) This macro determines whether the argument x is less than or equal to y. It is equivalent to (x) = (y), but no exception is raised if x or y are NaN.

int function>islessgreater/function> (real-floatingx, real-floatingy) This macro determines whether the argument x is less or greater than y. It is equivalent to (x) (y) || (x) (y) (although it only evaluates x and y once), but no exception is raised if x or y are NaN.

This macro is not equivalent to x != y, because that expression is true if x or y are NaN.

int function>isunordered/function> (real-floatingx, real-floatingy) This macro determines whether its arguments are unordered. In other words, it is true if x or y are NaN, and false otherwise.

Not all machines provide hardware support for these operations. On machines that don't, the macros can be very slow. Therefore, you should not use these functions when NaN is not a concern.

Note: There are no macros isequal or isunequal. They are unnecessary, because the == and != operators do not throw an exception if one or both of the operands are NaN.

Miscellaneous FP arithmetic functions

The functions in this section perform miscellaneous but common operations that are awkward to express with C operators. On some processors these functions can use special machine instructions to perform these operations faster than the equivalent C code.

double function>fmin/function> (double x, double y) float function>fminf/function> (float x, float y) long double function>fminl/function> (long double x, long double y) The fmin function returns the lesser of the two values x and y. It is similar to the expression

((x)  (y) ? (x) : (y))

except that x and y are only evaluated once.

If an argument is NaN, the other argument is returned. If both arguments are NaN, NaN is returned.

double function>fmax/function> (double x, double y) float function>fmaxf/function> (float x, float y) long double function>fmaxl/function> (long double x, long double y) The fmax function returns the greater of the two values x and y.

If an argument is NaN, the other argument is returned. If both arguments are NaN, NaN is returned.

double function>fdim/function> (double x, double y) float function>fdimf/function> (float x, float y) long double function>fdiml/function> (long double x, long double y) The fdim function returns the positive difference between x and y. The positive difference is x - y if x is greater than y, and 0 otherwise.

If x, y, or both are NaN, NaN is returned.

double function>fma/function> (double x, double y, double z) float function>fmaf/function> (float x, float y, float z) long double function>fmal/function> (long double x, long double y, long double z) The fma function performs floating-point multiply-add. This is the operation (x * y) + z, but the intermediate result is not rounded to the destination type. This can sometimes improve the precision of a calculation.

This function was introduced because some processors have a special instruction to perform multiply-add. The C compiler cannot use it directly, because the expression x*y + z is defined to round the intermediate result. fma lets you choose when you want to round only once.

On processors which do not implement multiply-add in hardware, fma can be very slow since it must avoid intermediate rounding. math.h defines the symbols FP_FAST_FMA, FP_FAST_FMAF, and FP_FAST_FMAL when the corresponding version of fma is no slower than the expression x*y + z. In the GNU C library, this always means the operation is implemented in hardware.