Chapter 26. The Basic Program/System Interface

Processes are the primitive units for allocation of system resources. Each process has its own address space and (usually) one thread of control. A process executes a program; you can have multiple processes executing the same program, but each process has its own copy of the program within its own address space and executes it independently of the other copies. Though it may have multiple threads of control within the same program and a program may be composed of multiple logically separate modules, a process always executes exactly one program.

Note that we are using a specific definition of "program" for the purposes of this manual, which corresponds to a common definition in the context of Unix system. In popular usage, "program" enjoys a much broader definition; it can refer for example to a system's kernel, an editor macro, a complex package of software, or a discrete section of code executing within a process.

Writing the program is what this manual is all about. This chapter explains the most basic interface between your program and the system that runs, or calls, it. This includes passing of parameters (arguments and environment) from the system, requesting basic services from the system, and telling the system the program is done.

A program starts another program with the exec family of system calls. This chapter looks at program startup from the execee's point of view. To see the event from the execor's point of view, the section called “Executing a File”.

Program Arguments

The system starts a C program by calling the function main. It is up to you to write a function named main--otherwise, you won't even be able to link your program without errors.

In ISO C you can define main either to take no arguments, or to take two arguments that represent the command line arguments to the program, like this:

int main (int argc, char *argv[])

The command line arguments are the whitespace-separated tokens given in the shell command used to invoke the program; thus, in cat foo bar, the arguments are foo and bar. The only way a program can look at its command line arguments is via the arguments of main. If main doesn't take arguments, then you cannot get at the command line.

The value of the argc argument is the number of command line arguments. The argv argument is a vector of C strings; its elements are the individual command line argument strings. The file name of the program being run is also included in the vector as the first element; the value of argc counts this element. A null pointer always follows the last element: argv[argc] is this null pointer.

For the command cat foo bar, argc is 3 and argv has three elements, "cat", "foo" and "bar".

In Unix systems you can define main a third way, using three arguments:

int main (int argc, char *argv[], char *envp[])

The first two arguments are just the same. The third argument envp gives the program's environment; it is the same as the value of environ. the section called “Environment Variables”. POSIX.1 does not allow this three-argument form, so to be portable it is best to write main to take two arguments, and use the value of environ.

Program Argument Syntax Conventions

POSIX recommends these conventions for command line arguments. getopt (the section called “Parsing program options using getopt”) and argp_parse (the section called “Parsing Program Options with Argp”) make it easy to implement them.

  • Arguments are options if they begin with a hyphen delimiter (-).

  • Multiple options may follow a hyphen delimiter in a single token if the options do not take arguments. Thus, -abc is equivalent to -a -b -c.

  • Option names are single alphanumeric characters (as for isalnum; the section called “Classification of Characters”).

  • Certain options require an argument. For example, the -o command of the ld command requires an argument--an output file name.

  • An option and its argument may or may not appear as separate tokens. (In other words, the whitespace separating them is optional.) Thus, -o foo and -ofoo are equivalent.

  • Options typically precede other non-option arguments.

    The implementations of getopt and argp_parse in the GNU C library normally make it appear as if all the option arguments were specified before all the non-option arguments for the purposes of parsing, even if the user of your program intermixed option and non-option arguments. They do this by reordering the elements of the argv array. This behavior is nonstandard; if you want to suppress it, define the _POSIX_OPTION_ORDER environment variable. the section called “Standard Environment Variables”.

  • The argument - terminates all options; any following arguments are treated as non-option arguments, even if they begin with a hyphen.

  • A token consisting of a single hyphen character is interpreted as an ordinary non-option argument. By convention, it is used to specify input from or output to the standard input and output streams.

  • Options may be supplied in any order, or appear multiple times. The interpretation is left up to the particular application program.

GNU adds long options to these conventions. Long options consist of - followed by a name made of alphanumeric characters and dashes. Option names are typically one to three words long, with hyphens to separate words. Users can abbreviate the option names as long as the abbreviations are unique.

To specify an argument for a long option, write -name=value. This syntax enables a long option to accept an argument that is itself optional.

Eventually, the GNU system will provide completion for long option names in the shell.

Parsing Program Arguments

If the syntax for the command line arguments to your program is simple enough, you can simply pick the arguments off from argv by hand. But unless your program takes a fixed number of arguments, or all of the arguments are interpreted in the same way (as file names, for example), you are usually better off using getopt (the section called “Parsing program options using getopt”) or argp_parse (the section called “Parsing Program Options with Argp”) to do the parsing.

getopt is more standard (the short-option only version of it is a part of the POSIX standard), but using argp_parse is often easier, both for very simple and very complex option structures, because it does more of the dirty work for you.