Formatted Input

The functions described in this section (scanf and related functions) provide facilities for formatted input analogous to the formatted output facilities. These functions provide a mechanism for reading arbitrary values under the control of a format string or template string.

Formatted Input Basics

Calls to scanf are superficially similar to calls to printf in that arbitrary arguments are read under the control of a template string. While the syntax of the conversion specifications in the template is very similar to that for printf, the interpretation of the template is oriented more towards free-format input and simple pattern matching, rather than fixed-field formatting. For example, most scanf conversions skip over any amount of "white space" (including spaces, tabs, and newlines) in the input file, and there is no concept of precision for the numeric input conversions as there is for the corresponding output conversions. Ordinarily, non-whitespace characters in the template are expected to match characters in the input stream exactly, but a matching failure is distinct from an input error on the stream. Another area of difference between scanf and printf is that you must remember to supply pointers rather than immediate values as the optional arguments to scanf; the values that are read are stored in the objects that the pointers point to. Even experienced programmers tend to forget this occasionally, so if your program is getting strange errors that seem to be related to scanf, you might want to double-check this.

When a matching failure occurs, scanf returns immediately, leaving the first non-matching character as the next character to be read from the stream. The normal return value from scanf is the number of values that were assigned, so you can use this to determine if a matching error happened before all the expected values were read. The scanf function is typically used for things like reading in the contents of tables. For example, here is a function that uses scanf to initialize an array of double:

void
readarray (double *array, int n)
{
  int i;
  for (i=0; in; i++)
    if (scanf (" %lf", (array[i])) != 1)
      invalid_input_error ();
}

The formatted input functions are not used as frequently as the formatted output functions. Partly, this is because it takes some care to use them properly. Another reason is that it is difficult to recover from a matching error.

If you are trying to read input that doesn't match a single, fixed pattern, you may be better off using a tool such as Flex to generate a lexical scanner, or Bison to generate a parser, rather than using scanf. For more information about these tools, see , and .

Input Conversion Syntax

A scanf template string is a string that contains ordinary multibyte characters interspersed with conversion specifications that start with %.

Any whitespace character (as defined by the isspace function; the section called “Classification of Characters”) in the template causes any number of whitespace characters in the input stream to be read and discarded. The whitespace characters that are matched need not be exactly the same whitespace characters that appear in the template string. For example, write , in the template to recognize a comma with optional whitespace before and after.

Other characters in the template string that are not part of conversion specifications must match characters in the input stream exactly; if this is not the case, a matching failure occurs.

The conversion specifications in a scanf template string have the general form:

% flagswidthtypeconversion

In more detail, an input conversion specification consists of an initial % character followed in sequence by:

  • An optional flag character*, which says to ignore the text read for this specification. When scanf finds a conversion specification that uses this flag, it reads input as directed by the rest of the conversion specification, but it discards this input, does not use a pointer argument, and does not increment the count of successful assignments.

  • An optional flag character a (valid with string conversions only) which requests allocation of a buffer long enough to store the string in. (This is a GNU extension.) the section called “Dynamically Allocating String Conversions”.

  • An optional decimal integer that specifies the maximum field width. Reading of characters from the input stream stops either when this maximum is reached or when a non-matching character is found, whichever happens first. Most conversions discard initial whitespace characters (those that don't are explicitly documented), and these discarded characters don't count towards the maximum field width. String input conversions store a null character to mark the end of the input; the maximum field width does not include this terminator.

  • An optional type modifier character. For example, you can specify a type modifier of l with integer conversions such as %d to specify that the argument is a pointer to a long int rather than a pointer to an int.

  • A character that specifies the conversion to be applied.

The exact options that are permitted and how they are interpreted vary between the different conversion specifiers. See the descriptions of the individual conversions for information about the particular options that they allow.

With the -Wformat option, the GNU C compiler checks calls to scanf and related functions. It examines the format string and verifies that the correct number and types of arguments are supplied. There is also a GNU C syntax to tell the compiler that a function you write uses a scanf-style format string. , for more information.

Table of Input Conversions

Here is a table that summarizes the various conversion specifications:

%d

Matches an optionally signed integer written in decimal. the section called “Numeric Input Conversions”.

%i

Matches an optionally signed integer in any of the formats that the C language defines for specifying an integer constant. the section called “Numeric Input Conversions”.

%o

Matches an unsigned integer written in octal radix. the section called “Numeric Input Conversions”.

%u

Matches an unsigned integer written in decimal radix. the section called “Numeric Input Conversions”.

%x, %X

Matches an unsigned integer written in hexadecimal radix. the section called “Numeric Input Conversions”.

%e, %f, %g, %E, %G

Matches an optionally signed floating-point number. the section called “Numeric Input Conversions”.

%s

Matches a string containing only non-whitespace characters. the section called “String Input Conversions”. The presence of the l modifier determines whether the output is stored as a wide character string or a multibyte string. If %s is used in a wide character function the string is converted as with multiple calls to wcrtomb into a multibyte string. This means that the buffer must provide room for MB_CUR_MAX bytes for each wide character read. In case %ls is used in a multibyte function the result is converted into wide characters as with multiple calls of mbrtowc before being stored in the user provided buffer.

%S

This is an alias for %ls which is supported for compatibility with the Unix standard.

%[

Matches a string of characters that belong to a specified set. the section called “String Input Conversions”. The presence of the l modifier determines whether the output is stored as a wide character string or a multibyte string. If %[ is used in a wide character function the string is converted as with multiple calls to wcrtomb into a multibyte string. This means that the buffer must provide room for MB_CUR_MAX bytes for each wide character read. In case %l[ is used in a multibyte function the result is converted into wide characters as with multiple calls of mbrtowc before being stored in the user provided buffer.

%c

Matches a string of one or more characters; the number of characters read is controlled by the maximum field width given for the conversion. the section called “String Input Conversions”.

If the %c is used in a wide stream function the read value is converted from a wide character to the corresponding multibyte character before storing it. Note that this conversion can produce more than one byte of output and therefore the provided buffer be large enough for up to MB_CUR_MAX bytes for each character. If %lc is used in a multibyte function the input is treated as a multibyte sequence (and not bytes) and the result is converted as with calls to mbrtowc.

%C

This is an alias for %lc which is supported for compatibility with the Unix standard.

%p

Matches a pointer value in the same implementation-defined format used by the %p output conversion for printf. the section called “Other Input Conversions”.

%n

This conversion doesn't read any characters; it records the number of characters read so far by this call. the section called “Other Input Conversions”.

%%

This matches a literal % character in the input stream. No corresponding argument is used. the section called “Other Input Conversions”.

If the syntax of a conversion specification is invalid, the behavior is undefined. If there aren't enough function arguments provided to supply addresses for all the conversion specifications in the template strings that perform assignments, or if the arguments are not of the correct types, the behavior is also undefined. On the other hand, extra arguments are simply ignored.

Numeric Input Conversions

This section describes the scanf conversions for reading numeric values.

The %d conversion matches an optionally signed integer in decimal radix. The syntax that is recognized is the same as that for the strtol function (the section called “Parsing of Integers”) with the value 10 for the base argument.

The %i conversion matches an optionally signed integer in any of the formats that the C language defines for specifying an integer constant. The syntax that is recognized is the same as that for the strtol function (the section called “Parsing of Integers”) with the value 0 for the base argument. (You can print integers in this syntax with printf by using the # flag character with the %x, %o, or %d conversion. the section called “Integer Conversions”.)

For example, any of the strings 10, 0xa, or 012 could be read in as integers under the %i conversion. Each of these specifies a number with decimal value 10.

The %o, %u, and %x conversions match unsigned integers in octal, decimal, and hexadecimal radices, respectively. The syntax that is recognized is the same as that for the strtoul function (the section called “Parsing of Integers”) with the appropriate value (8, 10, or 16) for the base argument.

The %X conversion is identical to the %x conversion. They both permit either uppercase or lowercase letters to be used as digits.

The default type of the corresponding argument for the %d and %i conversions is int *, and unsigned int * for the other integer conversions. You can use the following type modifiers to specify other sizes of integer:

hh

Specifies that the argument is a signed char * or unsigned char *.

This modifier was introduced in ISO C99.

h

Specifies that the argument is a short int * or unsigned short int *.

j

Specifies that the argument is a intmax_t * or uintmax_t *.

This modifier was introduced in ISO C99.

l

Specifies that the argument is a long int * or unsigned long int *. Two l characters is like the L modifier, below.

If used with %c or %s the corresponding parameter is considered as a pointer to a wide character or wide character string respectively. This use of l was introduced in Amendment 1 to ISO C90.

ll, L, q

Specifies that the argument is a long long int * or unsigned long long int *. (The long long type is an extension supported by the GNU C compiler. For systems that don't provide extra-long integers, this is the same as long int.)

The q modifier is another name for the same thing, which comes from 4.4 BSD; a long long int is sometimes called a "quad" int.

t

Specifies that the argument is a ptrdiff_t *.

This modifier was introduced in ISO C99.

z

Specifies that the argument is a size_t *.

This modifier was introduced in ISO C99.

All of the %e, %f, %g, %E, and %G input conversions are interchangeable. They all match an optionally signed floating point number, in the same syntax as for the strtod function (the section called “Parsing of Floats”).

For the floating-point input conversions, the default argument type is float *. (This is different from the corresponding output conversions, where the default type is double; remember that float arguments to printf are converted to double by the default argument promotions, but float * arguments are not promoted to double *.) You can specify other sizes of float using these type modifiers:

l

Specifies that the argument is of type double *.

L

Specifies that the argument is of type long double *.

For all the above number parsing formats there is an additional optional flag '. When this flag is given the scanf function expects the number represented in the input string to be formatted according to the grouping rules of the currently selected locale (the section called “Generic Numeric Formatting Parameters”).

If the "C" or "POSIX" locale is selected there is no difference. But for a locale which specifies values for the appropriate fields in the locale the input must have the correct form in the input. Otherwise the longest prefix with a correct form is processed.

String Input Conversions

This section describes the scanf input conversions for reading string and character values: %s, %S, %[, %c, and %C.

You have two options for how to receive the input from these conversions:

  • Provide a buffer to store it in. This is the default. You should provide an argument of type char * or wchar_t * (the latter of the l modifier is present).

    Warning: To make a robust program, you must make sure that the input (plus its terminating null) cannot possibly exceed the size of the buffer you provide. In general, the only way to do this is to specify a maximum field width one less than the buffer size. If you provide the buffer, always specify a maximum field width to prevent overflow.

  • Ask scanf to allocate a big enough buffer, by specifying the a flag character. This is a GNU extension. You should provide an argument of type char ** for the buffer address to be stored in. the section called “Dynamically Allocating String Conversions”.

The %c conversion is the simplest: it matches a fixed number of characters, always. The maximum field width says how many characters to read; if you don't specify the maximum, the default is 1. This conversion doesn't append a null character to the end of the text it reads. It also does not skip over initial whitespace characters. It reads precisely the next n characters, and fails if it cannot get that many. Since there is always a maximum field width with %c (whether specified, or 1 by default), you can always prevent overflow by making the buffer long enough.

If the format is %lc or %C the function stores wide characters which are converted using the conversion determined at the time the stream was opened from the external byte stream. The number of bytes read from the medium is limited by MB_CUR_LEN * n but at most n wide character get stored in the output string.

The %s conversion matches a string of non-whitespace characters. It skips and discards initial whitespace, but stops when it encounters more whitespace after having read something. It stores a null character at the end of the text that it reads.

For example, reading the input:

 hello, world

with the conversion %10c produces " hello, wo", but reading the same input with the conversion %10s produces "hello,".

Warning: If you do not specify a field width for %s, then the number of characters read is limited only by where the next whitespace character appears. This almost certainly means that invalid input can make your program crash--which is a bug.

The %ls and %S format are handled just like %s except that the external byte sequence is converted using the conversion associated with the stream to wide characters with their own encoding. A width or precision specified with the format do not directly determine how many bytes are read from the stream since they measure wide characters. But an upper limit can be computed by multiplying the value of the width or precision by MB_CUR_MAX.

To read in characters that belong to an arbitrary set of your choice, use the %[ conversion. You specify the set between the [ character and a following ] character, using the same syntax used in regular expressions. As special cases:

  • A literal ] character can be specified as the first character of the set.

  • An embedded - character (that is, one that is not the first or last character of the set) is used to specify a range of characters.

  • If a caret character ^ immediately follows the initial [, then the set of allowed input characters is the everything except the characters listed.

The %[ conversion does not skip over initial whitespace characters.

Here are some examples of %[ conversions and what they mean:

%25[1234567890]

Matches a string of up to 25 digits.

%25[][]

Matches a string of up to 25 square brackets.

%25[^ \f\n\r\t\v]

Matches a string up to 25 characters long that doesn't contain any of the standard whitespace characters. This is slightly different from %s, because if the input begins with a whitespace character, %[ reports a matching failure while %s simply discards the initial whitespace.

%25[a-z]

Matches up to 25 lowercase characters.

As for %c and %s the %[ format is also modified to produce wide characters if the l modifier is present. All what is said about %ls above is true for %l[.

One more reminder: the %s and %[ conversions are dangerous if you don't specify a maximum width or use the a flag, because input too long would overflow whatever buffer you have provided for it. No matter how long your buffer is, a user could supply input that is longer. A well-written program reports invalid input with a comprehensible error message, not with a crash.

Dynamically Allocating String Conversions

A GNU extension to formatted input lets you safely read a string with no maximum size. Using this feature, you don't supply a buffer; instead, scanf allocates a buffer big enough to hold the data and gives you its address. To use this feature, write a as a flag character, as in %as or %a[0-9a-z].

The pointer argument you supply for where to store the input should have type char **. The scanf function allocates a buffer and stores its address in the word that the argument points to. You should free the buffer with free when you no longer need it.

Here is an example of using the a flag with the %[…] conversion specification to read a "variable assignment" of the form variable = value.

{
  char *variable, *value;

  if (2  scanf ("%a[a-zA-Z0-9] = %a[^\n]\n",
                 variable, value))
    {
      invalid_input_error ();
      return 0;
    }

  …
}

Other Input Conversions

This section describes the miscellaneous input conversions.

The %p conversion is used to read a pointer value. It recognizes the same syntax used by the %p output conversion for printf (the section called “Other Output Conversions”); that is, a hexadecimal number just as the %x conversion accepts. The corresponding argument should be of type void **; that is, the address of a place to store a pointer.

The resulting pointer value is not guaranteed to be valid if it was not originally written during the same program execution that reads it in.

The %n conversion produces the number of characters read so far by this call. The corresponding argument should be of type int *. This conversion works in the same way as the %n conversion for printf; see the section called “Other Output Conversions”, for an example.

The %n conversion is the only mechanism for determining the success of literal matches or conversions with suppressed assignments. If the %n follows the locus of a matching failure, then no value is stored for it since scanf returns before processing the %n. If you store -1 in that argument slot before calling scanf, the presence of -1 after scanf indicates an error occurred before the %n was reached.

Finally, the %% conversion matches a literal % character in the input stream, without using an argument. This conversion does not permit any flags, field width, or type modifier to be specified.

Formatted Input Functions

Here are the descriptions of the functions for performing formatted input. Prototypes for these functions are in the header file stdio.h. int function>scanf/function> (const char *template, …) The scanf function reads formatted input from the stream stdin under the control of the template string template. The optional arguments are pointers to the places which receive the resulting values.

The return value is normally the number of successful assignments. If an end-of-file condition is detected before any matches are performed, including matches against whitespace and literal characters in the template, then EOF is returned.

int function>wscanf/function> (const wchar_t *template, …) The wscanf function reads formatted input from the stream stdin under the control of the template string template. The optional arguments are pointers to the places which receive the resulting values.

The return value is normally the number of successful assignments. If an end-of-file condition is detected before any matches are performed, including matches against whitespace and literal characters in the template, then WEOF is returned.

int function>fscanf/function> (FILE *stream, const char *template, …) This function is just like scanf, except that the input is read from the stream stream instead of stdin.

int function>fwscanf/function> (FILE *stream, const wchar_t *template, …) This function is just like wscanf, except that the input is read from the stream stream instead of stdin.

int function>sscanf/function> (const char *s, const char *template, …) This is like scanf, except that the characters are taken from the null-terminated string s instead of from a stream. Reaching the end of the string is treated as an end-of-file condition.

The behavior of this function is undefined if copying takes place between objects that overlap--for example, if s is also given as an argument to receive a string read under control of the %s, %S, or %[ conversion.

int function>swscanf/function> (const wchar_t *ws, const char *template, …) This is like wscanf, except that the characters are taken from the null-terminated string ws instead of from a stream. Reaching the end of the string is treated as an end-of-file condition.

The behavior of this function is undefined if copying takes place between objects that overlap--for example, if ws is also given as an argument to receive a string read under control of the %s, %S, or %[ conversion.

Variable Arguments Input Functions

The functions vscanf and friends are provided so that you can define your own variadic scanf-like functions that make use of the same internals as the built-in formatted output functions. These functions are analogous to the vprintf series of output functions. the section called “Variable Arguments Output Functions”, for important information on how to use them.

Portability Note: The functions listed in this section were introduced in ISO C99 and were before available as GNU extensions.

int function>vscanf/function> (const char *template, va_list ap) This function is similar to scanf, but instead of taking a variable number of arguments directly, it takes an argument list pointer ap of type va_list (the section called “Variadic Functions”).

int function>vwscanf/function> (const wchar_t *template, va_list ap) This function is similar to wscanf, but instead of taking a variable number of arguments directly, it takes an argument list pointer ap of type va_list (the section called “Variadic Functions”).

int function>vfscanf/function> (FILE *stream, const char *template, va_list ap) This is the equivalent of fscanf with the variable argument list specified directly as for vscanf.

int function>vfwscanf/function> (FILE *stream, const wchar_t *template, va_list ap) This is the equivalent of fwscanf with the variable argument list specified directly as for vwscanf.

int function>vsscanf/function> (const char *s, const char *template, va_list ap) This is the equivalent of sscanf with the variable argument list specified directly as for vscanf.

int function>vswscanf/function> (const wchar_t *s, const wchar_t *template, va_list ap) This is the equivalent of swscanf with the variable argument list specified directly as for vwscanf.

In GNU C, there is a special construct you can use to let the compiler know that a function uses a scanf-style format string. Then it can check the number and types of arguments in each call to the function, and warn you when they do not match the format string. For details, .