This section describes what a shell must do to implement job control, by presenting an extensive sample program to illustrate the concepts involved.
All of the program examples included in this chapter are part of a simple shell program. This section presents data structures and utility functions which are used throughout the example.
The sample shell deals mainly with two data structures. The job type contains information about a job, which is a set of subprocesses linked together with pipes. The process type holds information about a single subprocess. Here are the relevant data structure declarations:
/* A process is a single process. */ typedef struct process { struct process *next; /* next process in pipeline */ char **argv; /* for exec */ pid_t pid; /* process ID */ char completed; /* true if process has completed */ char stopped; /* true if process has stopped */ int status; /* reported status value */ } process; /* A job is a pipeline of processes. */ typedef struct job { struct job *next; /* next active job */ char *command; /* command line, used for messages */ process *first_process; /* list of processes in this job */ pid_t pgid; /* process group ID */ char notified; /* true if user told about stopped job */ struct termios tmodes; /* saved terminal modes */ int stdin, stdout, stderr; /* standard i/o channels */ } job; /* The active jobs are linked into a list. This is its head. */ job *first_job = NULL;
Here are some utility functions that are used for operating on job objects.
/* Find the active job with the indicated pgid. */ job * find_job (pid_t pgid) { job *j; for (j = first_job; j; j = j-next) if (j-pgid == pgid) return j; return NULL; } /* Return true if all processes in the job have stopped or completed. */ int job_is_stopped (job *j) { process *p; for (p = j-first_process; p; p = p-next) if (!p-completed !p-stopped) return 0; return 1; } /* Return true if all processes in the job have completed. */ int job_is_completed (job *j) { process *p; for (p = j-first_process; p; p = p-next) if (!p-completed) return 0; return 1; }
When a shell program that normally performs job control is started, it has to be careful in case it has been invoked from another shell that is already doing its own job control.
A subshell that runs interactively has to ensure that it has been placed in the foreground by its parent shell before it can enable job control itself. It does this by getting its initial process group ID with the getpgrp function, and comparing it to the process group ID of the current foreground job associated with its controlling terminal (which can be retrieved using the tcgetpgrp function).
If the subshell is not running as a foreground job, it must stop itself by sending a SIGTTIN signal to its own process group. It may not arbitrarily put itself into the foreground; it must wait for the user to tell the parent shell to do this. If the subshell is continued again, it should repeat the check and stop itself again if it is still not in the foreground.
Once the subshell has been placed into the foreground by its parent shell, it can enable its own job control. It does this by calling setpgid to put itself into its own process group, and then calling tcsetpgrp to place this process group into the foreground.
When a shell enables job control, it should set itself to ignore all the job control stop signals so that it doesn't accidentally stop itself. You can do this by setting the action for all the stop signals to SIG_IGN.
A subshell that runs non-interactively cannot and should not support job control. It must leave all processes it creates in the same process group as the shell itself; this allows the non-interactive shell and its child processes to be treated as a single job by the parent shell. This is easy to do--just don't use any of the job control primitives--but you must remember to make the shell do it.
Here is the initialization code for the sample shell that shows how to do all of this.
/* Keep track of attributes of the shell. */ #include sys/types.h #include termios.h #include unistd.h pid_t shell_pgid; struct termios shell_tmodes; int shell_terminal; int shell_is_interactive; /* Make sure the shell is running interactively as the foreground job before proceeding. */ void init_shell () { /* See if we are running interactively. */ shell_terminal = STDIN_FILENO; shell_is_interactive = isatty (shell_terminal); if (shell_is_interactive) { /* Loop until we are in the foreground. */ while (tcgetpgrp (shell_terminal) != (shell_pgid = getpgrp ())) kill (- shell_pgid, SIGTTIN); /* Ignore interactive and job-control signals. */ signal (SIGINT, SIG_IGN); signal (SIGQUIT, SIG_IGN); signal (SIGTSTP, SIG_IGN); signal (SIGTTIN, SIG_IGN); signal (SIGTTOU, SIG_IGN); signal (SIGCHLD, SIG_IGN); /* Put ourselves in our own process group. */ shell_pgid = getpid (); if (setpgid (shell_pgid, shell_pgid) 0) { perror ("Couldn't put the shell in its own process group"); exit (1); } /* Grab control of the terminal. */ tcsetpgrp (shell_terminal, shell_pgid); /* Save default terminal attributes for shell. */ tcgetattr (shell_terminal, shell_tmodes); } }
Once the shell has taken responsibility for performing job control on its controlling terminal, it can launch jobs in response to commands typed by the user.
To create the processes in a process group, you use the same fork and exec functions described in the section called “Process Creation Concepts”. Since there are multiple child processes involved, though, things are a little more complicated and you must be careful to do things in the right order. Otherwise, nasty race conditions can result.
You have two choices for how to structure the tree of parent-child relationships among the processes. You can either make all the processes in the process group be children of the shell process, or you can make one process in group be the ancestor of all the other processes in that group. The sample shell program presented in this chapter uses the first approach because it makes bookkeeping somewhat simpler.
As each process is forked, it should put itself in the new process group by calling setpgid; see the section called “Process Group Functions”. The first process in the new group becomes its process group leader, and its process ID becomes the process group ID for the group.
The shell should also call setpgid to put each of its child processes into the new process group. This is because there is a potential timing problem: each child process must be put in the process group before it begins executing a new program, and the shell depends on having all the child processes in the group before it continues executing. If both the child processes and the shell call setpgid, this ensures that the right things happen no matter which process gets to it first.
If the job is being launched as a foreground job, the new process group also needs to be put into the foreground on the controlling terminal using tcsetpgrp. Again, this should be done by the shell as well as by each of its child processes, to avoid race conditions.
The next thing each child process should do is to reset its signal actions.
During initialization, the shell process set itself to ignore job control signals; see the section called “Initializing the Shell”. As a result, any child processes it creates also ignore these signals by inheritance. This is definitely undesirable, so each child process should explicitly set the actions for these signals back to SIG_DFL just after it is forked.
Since shells follow this convention, applications can assume that they inherit the correct handling of these signals from the parent process. But every application has a responsibility not to mess up the handling of stop signals. Applications that disable the normal interpretation of the SUSP character should provide some other mechanism for the user to stop the job. When the user invokes this mechanism, the program should send a SIGTSTP signal to the process group of the process, not just to the process itself. the section called “Signaling Another Process”.
Finally, each child process should call exec in the normal way. This is also the point at which redirection of the standard input and output channels should be handled. the section called “Duplicating Descriptors”, for an explanation of how to do this.
Here is the function from the sample shell program that is responsible for launching a program. The function is executed by each child process immediately after it has been forked by the shell, and never returns.
void launch_process (process *p, pid_t pgid, int infile, int outfile, int errfile, int foreground) { pid_t pid; if (shell_is_interactive) { /* Put the process into the process group and give the process group the terminal, if appropriate. This has to be done both by the shell and in the individual child processes because of potential race conditions. */ pid = getpid (); if (pgid == 0) pgid = pid; setpgid (pid, pgid); if (foreground) tcsetpgrp (shell_terminal, pgid); /* Set the handling for job control signals back to the default. */ signal (SIGINT, SIG_DFL); signal (SIGQUIT, SIG_DFL); signal (SIGTSTP, SIG_DFL); signal (SIGTTIN, SIG_DFL); signal (SIGTTOU, SIG_DFL); signal (SIGCHLD, SIG_DFL); } /* Set the standard input/output channels of the new process. */ if (infile != STDIN_FILENO) { dup2 (infile, STDIN_FILENO); close (infile); } if (outfile != STDOUT_FILENO) { dup2 (outfile, STDOUT_FILENO); close (outfile); } if (errfile != STDERR_FILENO) { dup2 (errfile, STDERR_FILENO); close (errfile); } /* Exec the new process. Make sure we exit. */ execvp (p-argv[0], p-argv); perror ("execvp"); exit (1); }
If the shell is not running interactively, this function does not do anything with process groups or signals. Remember that a shell not performing job control must keep all of its subprocesses in the same process group as the shell itself.
Next, here is the function that actually launches a complete job. After creating the child processes, this function calls some other functions to put the newly created job into the foreground or background; these are discussed in the section called “Foreground and Background”.
void launch_job (job *j, int foreground) { process *p; pid_t pid; int mypipe[2], infile, outfile; infile = j-stdin; for (p = j-first_process; p; p = p-next) { /* Set up pipes, if necessary. */ if (p-next) { if (pipe (mypipe) 0) { perror ("pipe"); exit (1); } outfile = mypipe[1]; } else outfile = j-stdout; /* Fork the child processes. */ pid = fork (); if (pid == 0) /* This is the child process. */ launch_process (p, j-pgid, infile, outfile, j-stderr, foreground); else if (pid 0) { /* The fork failed. */ perror ("fork"); exit (1); } else { /* This is the parent process. */ p-pid = pid; if (shell_is_interactive) { if (!j-pgid) j-pgid = pid; setpgid (pid, j-pgid); } } /* Clean up after pipes. */ if (infile != j-stdin) close (infile); if (outfile != j-stdout) close (outfile); infile = mypipe[0]; } format_job_info (j, "launched"); if (!shell_is_interactive) wait_for_job (j); else if (foreground) put_job_in_foreground (j, 0); else put_job_in_background (j, 0); }
Now let's consider what actions must be taken by the shell when it launches a job into the foreground, and how this differs from what must be done when a background job is launched.
When a foreground job is launched, the shell must first give it access to the controlling terminal by calling tcsetpgrp. Then, the shell should wait for processes in that process group to terminate or stop. This is discussed in more detail in the section called “Stopped and Terminated Jobs”.
When all of the processes in the group have either completed or stopped, the shell should regain control of the terminal for its own process group by calling tcsetpgrp again. Since stop signals caused by I/O from a background process or a SUSP character typed by the user are sent to the process group, normally all the processes in the job stop together.
The foreground job may have left the terminal in a strange state, so the shell should restore its own saved terminal modes before continuing. In case the job is merely stopped, the shell should first save the current terminal modes so that it can restore them later if the job is continued. The functions for dealing with terminal modes are tcgetattr and tcsetattr; these are described in the section called “Terminal Modes”.
Here is the sample shell's function for doing all of this.
/* Put job j in the foreground. If cont is nonzero, restore the saved terminal modes and send the process group a SIGCONT signal to wake it up before we block. */ void put_job_in_foreground (job *j, int cont) { /* Put the job into the foreground. */ tcsetpgrp (shell_terminal, j-pgid); /* Send the job a continue signal, if necessary. */ if (cont) { tcsetattr (shell_terminal, TCSADRAIN, j-tmodes); if (kill (- j-pgid, SIGCONT) 0) perror ("kill (SIGCONT)"); } /* Wait for it to report. */ wait_for_job (j); /* Put the shell back in the foreground. */ tcsetpgrp (shell_terminal, shell_pgid); /* Restore the shell's terminal modes. */ tcgetattr (shell_terminal, j-tmodes); tcsetattr (shell_terminal, TCSADRAIN, shell_tmodes); }
If the process group is launched as a background job, the shell should remain in the foreground itself and continue to read commands from the terminal.
In the sample shell, there is not much that needs to be done to put a job into the background. Here is the function it uses:
/* Put a job in the background. If the cont argument is true, send the process group a SIGCONT signal to wake it up. */ void put_job_in_background (job *j, int cont) { /* Send the job a continue signal, if necessary. */ if (cont) if (kill (-j-pgid, SIGCONT) 0) perror ("kill (SIGCONT)"); }
When a foreground process is launched, the shell must block until all of the processes in that job have either terminated or stopped. It can do this by calling the waitpid function; see the section called “Process Completion”. Use the WUNTRACED option so that status is reported for processes that stop as well as processes that terminate.
The shell must also check on the status of background jobs so that it can report terminated and stopped jobs to the user; this can be done by calling waitpid with the WNOHANG option. A good place to put a such a check for terminated and stopped jobs is just before prompting for a new command.
The shell can also receive asynchronous notification that there is status information available for a child process by establishing a handler for SIGCHLD signals. Chapter 25.
In the sample shell program, the SIGCHLD signal is normally ignored. This is to avoid reentrancy problems involving the global data structures the shell manipulates. But at specific times when the shell is not using these data structures--such as when it is waiting for input on the terminal--it makes sense to enable a handler for SIGCHLD. The same function that is used to do the synchronous status checks (do_job_notification, in this case) can also be called from within this handler.
Here are the parts of the sample shell program that deal with checking the status of jobs and reporting the information to the user.
/* Store the status of the process pid that was returned by waitpid. Return 0 if all went well, nonzero otherwise. */ int mark_process_status (pid_t pid, int status) { job *j; process *p; if (pid 0) { /* Update the record for the process. */ for (j = first_job; j; j = j-next) for (p = j-first_process; p; p = p-next) if (p-pid == pid) { p-status = status; if (WIFSTOPPED (status)) p-stopped = 1; else { p-completed = 1; if (WIFSIGNALED (status)) fprintf (stderr, "%d: Terminated by signal %d.\n", (int) pid, WTERMSIG (p-status)); } return 0; } fprintf (stderr, "No child process %d.\n", pid); return -1; } else if (pid == 0 || errno == ECHILD) /* No processes ready to report. */ return -1; else { /* Other weird errors. */ perror ("waitpid"); return -1; } } /* Check for processes that have status information available, without blocking. */ void update_status (void) { int status; pid_t pid; do pid = waitpid (WAIT_ANY, status, WUNTRACED|WNOHANG); while (!mark_process_status (pid, status)); } /* Check for processes that have status information available, blocking until all processes in the given job have reported. */ void wait_for_job (job *j) { int status; pid_t pid; do pid = waitpid (WAIT_ANY, status, WUNTRACED); while (!mark_process_status (pid, status) !job_is_stopped (j) !job_is_completed (j)); } /* Format information about job status for the user to look at. */ void format_job_info (job *j, const char *status) { fprintf (stderr, "%ld (%s): %s\n", (long)j-pgid, status, j-command); } /* Notify the user about stopped or terminated jobs. Delete terminated jobs from the active job list. */ void do_job_notification (void) { job *j, *jlast, *jnext; process *p; /* Update status information for child processes. */ update_status (); jlast = NULL; for (j = first_job; j; j = jnext) { jnext = j-next; /* If all processes have completed, tell the user the job has completed and delete it from the list of active jobs. */ if (job_is_completed (j)) { format_job_info (j, "completed"); if (jlast) jlast-next = jnext; else first_job = jnext; free_job (j); } /* Notify the user about stopped jobs, marking them so that we won't do this more than once. */ else if (job_is_stopped (j) !j-notified) { format_job_info (j, "stopped"); j-notified = 1; jlast = j; } /* Don't say anything about jobs that are still running. */ else jlast = j; } }
The shell can continue a stopped job by sending a SIGCONT signal to its process group. If the job is being continued in the foreground, the shell should first invoke tcsetpgrp to give the job access to the terminal, and restore the saved terminal settings. After continuing a job in the foreground, the shell should wait for the job to stop or complete, as if the job had just been launched in the foreground.
The sample shell program handles both newly created and continued jobs with the same pair of functions, put_job_in_foreground and put_job_in_background. The definitions of these functions were given in the section called “Foreground and Background”. When continuing a stopped job, a nonzero value is passed as the cont argument to ensure that the SIGCONT signal is sent and the terminal modes reset, as appropriate.
This leaves only a function for updating the shell's internal bookkeeping about the job being continued:
/* Mark a stopped job J as being running again. */ void mark_job_as_running (job *j) { Process *p; for (p = j-first_process; p; p = p-next) p-stopped = 0; j-notified = 0; } /* Continue the job J. */ void continue_job (job *j, int foreground) { mark_job_as_running (j); if (foreground) put_job_in_foreground (j, 1); else put_job_in_background (j, 1); }
The code extracts for the sample shell included in this chapter are only a part of the entire shell program. In particular, nothing at all has been said about how job and program data structures are allocated and initialized.
Most real shells provide a complex user interface that has support for a command language; variables; abbreviations, substitutions, and pattern matching on file names; and the like. All of this is far too complicated to explain here! Instead, we have concentrated on showing how to implement the core process creation and job control functions that can be called from such a shell.
Here is a table summarizing the major entry points we have presented:
Initialize the shell's internal state. the section called “Initializing the Shell”.
Launch the job j as either a foreground or background job. the section called “Launching Jobs”.
Check for and report any jobs that have terminated or stopped. Can be called synchronously or within a handler for SIGCHLD signals. the section called “Stopped and Terminated Jobs”.
Continue the job j. the section called “Continuing Stopped Jobs”.
Of course, a real shell would also want to provide other functions for managing jobs. For example, it would be useful to have commands to list all active jobs or to send a signal (such as SIGKILL) to a job.