Word expansion means the process of splitting a string into words and substituting for variables, commands, and wildcards just as the shell does.
For example, when you write ls -l foo.c, this string is split into three separate words--ls, -l and foo.c. This is the most basic function of word expansion.
When you write ls *.c, this can become many words, because the word *.c can be replaced with any number of file names. This is called wildcard expansion, and it is also a part of word expansion.
When you use echo $PATH to print your path, you are taking advantage of variable substitution, which is also part of word expansion.
Ordinary programs can perform word expansion just like the shell by calling the library function wordexp.
When word expansion is applied to a sequence of words, it performs the following transformations in the order shown here:
Tilde expansion: Replacement of ~foo with the name of the home directory of foo.
Next, three different transformations are applied in the same step, from left to right:
Variable substitution: Environment variables are substituted for references such as $foo.
Command substitution: Constructs such as `cat foo` and the equivalent $(cat foo) are replaced with the output from the inner command.
Arithmetic expansion: Constructs such as $(($x-1)) are replaced with the result of the arithmetic computation.
Wildcard expansion: The replacement of a construct such as *.c with a list of .c file names. Wildcard expansion applies to an entire word at a time, and replaces that word with 0 or more file names that are themselves words.
Quote removal: The deletion of string-quotes, now that they have done their job by inhibiting the above transformations when appropriate.
For the details of these transformations, and how to write the constructs that use them, see [The BASH Manual] (to appear).
All the functions, constants and data types for word expansion are declared in the header file wordexp.h.
Word expansion produces a vector of words (strings). To return this vector, wordexp uses a special data type, wordexp_t, which is a structure. You pass wordexp the address of the structure, and it fills in the structure's fields to tell you about the results.
function>wordexp_t/function> This data type holds a pointer to a word vector. More precisely, it records both the address of the word vector and its size.
The number of elements in the vector.
The address of the vector. This field has type char **.
The offset of the first real element of the vector, from its nominal address in the we_wordv field. Unlike the other fields, this is always an input to wordexp, rather than an output from it.
If you use a nonzero offset, then that many elements at the beginning of the vector are left empty. (The wordexp function fills them with null pointers.)
The we_offs field is meaningful only if you use the WRDE_DOOFFS flag. Otherwise, the offset is always zero regardless of what is in this field, and the first real element comes at the beginning of the vector.
int function>wordexp/function> (const char *words, wordexp_t *word-vector-ptr, int flags) Perform word expansion on the string words, putting the result in a newly allocated vector, and store the size and address of this vector into *word-vector-ptr. The argument flags is a combination of bit flags; see the section called “Flags for Word Expansion”, for details of the flags.
You shouldn't use any of the characters |; in the string words unless they are quoted; likewise for newline. If you use these characters unquoted, you will get the WRDE_BADCHAR error code. Don't use parentheses or braces unless they are quoted or part of a word expansion construct. If you use quotation characters '"`, they should come in pairs that balance.
The results of word expansion are a sequence of words. The function wordexp allocates a string for each resulting word, then allocates a vector of type char ** to store the addresses of these strings. The last element of the vector is a null pointer. This vector is called the word vector.
To return this vector, wordexp stores both its address and its length (number of elements, not counting the terminating null pointer) into *word-vector-ptr.
If wordexp succeeds, it returns 0. Otherwise, it returns one of these error codes:
The input string words contains an unquoted invalid character such as |.
The input string refers to an undefined shell variable, and you used the flag WRDE_UNDEF to forbid such references.
The input string uses command substitution, and you used the flag WRDE_NOCMD to forbid command substitution.
It was impossible to allocate memory to hold the result. In this case, wordexp can store part of the results--as much as it could allocate room for.
There was a syntax error in the input string. For example, an unmatched quoting character is a syntax error.
void function>wordfree/function> (wordexp_t *word-vector-ptr) Free the storage used for the word-strings and vector that *word-vector-ptr points to. This does not free the structure *word-vector-ptr itself--only the other data it points to.
This section describes the flags that you can specify in the flags argument to wordexp. Choose the flags you want, and combine them with the C operator |.
Append the words from this expansion to the vector of words produced by previous calls to wordexp. This way you can effectively expand several words as if they were concatenated with spaces between them.
In order for appending to work, you must not modify the contents of the word vector structure between calls to wordexp. And, if you set WRDE_DOOFFS in the first call to wordexp, you must also set it when you append to the results.
Leave blank slots at the beginning of the vector of words. The we_offs field says how many slots to leave. The blank slots contain null pointers.
Don't do command substitution; if the input requests command substitution, report an error.
Reuse a word vector made by a previous call to wordexp. Instead of allocating a new vector of words, this call to wordexp will use the vector that already exists (making it larger if necessary).
Note that the vector may move, so it is not safe to save an old pointer and use it again after calling wordexp. You must fetch we_pathv anew after each call.
Do show any error messages printed by commands run by command substitution. More precisely, allow these commands to inherit the standard error output stream of the current process. By default, wordexp gives these commands a standard error stream that discards all output.
If the input refers to a shell variable that is not defined, report an error.
Here is an example of using wordexp to expand several strings and use the results to run a shell command. It also shows the use of WRDE_APPEND to concatenate the expansions and of wordfree to free the space allocated by wordexp.
int expand_and_execute (const char *program, const char *options) { wordexp_t result; pid_t pid int status, i; /* Expand the string for the program to run. */ switch (wordexp (program, result, 0)) { case 0: /* Successful. */ break; case WRDE_NOSPACE: /* If the error was WRDE_NOSPACE, then perhaps part of the result was allocated. */ wordfree (result); default: /* Some other error. */ return -1; } /* Expand the strings specified for the arguments. */ for (i = 0; args[i]; i++) { if (wordexp (options, result, WRDE_APPEND)) { wordfree (result); return -1; } } pid = fork (); if (pid == 0) { /* This is the child process. Execute the command. */ execv (result.we_wordv[0], result.we_wordv); exit (EXIT_FAILURE); } else if (pid 0) /* The fork failed. Report failure. */ status = -1; else /* This is the parent process. Wait for the child to complete. */ if (waitpid (pid, status, 0) != pid) status = -1; wordfree (result); return status; }
It's a standard part of shell syntax that you can use ~ at the beginning of a file name to stand for your own home directory. You can use ~user to stand for user's home directory.
Tilde expansion is the process of converting these abbreviations to the directory names that they stand for.
Tilde expansion applies to the ~ plus all following characters up to whitespace or a slash. It takes place only at the beginning of a word, and only if none of the characters to be transformed is quoted in any way.
Plain ~ uses the value of the environment variable HOME as the proper home directory name. ~ followed by a user name uses getpwname to look up that user in the user database, and uses whatever directory is recorded there. Thus, ~ followed by your own name can give different results from plain ~, if the value of HOME is not really your home directory.
Part of ordinary shell syntax is the use of $variable to substitute the value of a shell variable into a command. This is called variable substitution, and it is one part of doing word expansion.
There are two basic ways you can write a variable reference for substitution:
If you write braces around the variable name, then it is completely unambiguous where the variable name ends. You can concatenate additional letters onto the end of the variable value by writing them immediately after the close brace. For example, ${foo}s expands into tractors.
If you do not put braces around the variable name, then the variable name consists of all the alphanumeric characters and underscores that follow the $. The next punctuation character ends the variable name. Thus, $foo-bar refers to the variable foo and expands into tractor-bar.
When you use braces, you can also use various constructs to modify the value that is substituted, or test it in various ways.
Substitute the value of variable, but if that is empty or undefined, use default instead.
Substitute the value of variable, but if that is empty or undefined, use default instead and set the variable to default.
If variable is defined and not empty, substitute its value.
Otherwise, print message as an error message on the standard error stream, and consider word expansion a failure.
Substitute replacement, but only if variable is defined and nonempty. Otherwise, substitute nothing for this construct.
Substitute a numeral which expresses in base ten the number of characters in the value of variable. ${#foo} stands for 7, because tractor is seven characters.
These variants of variable substitution let you remove part of the variable's value before substituting it. The prefix and suffix are not mere strings; they are wildcard patterns, just like the patterns that you use to match multiple file names. But in this context, they match against parts of the variable value rather than against file names.
Substitute the value of variable, but first discard from that variable any portion at the end that matches the pattern suffix.
If there is more than one alternative for how to match against suffix, this construct uses the longest possible match.
Thus, ${foo%%r*} substitutes t, because the largest match for r* at the end of tractor is ractor.
Substitute the value of variable, but first discard from that variable any portion at the end that matches the pattern suffix.
If there is more than one alternative for how to match against suffix, this construct uses the shortest possible alternative.
Thus, ${foo%%r*} substitutes tracto, because the shortest match for r* at the end of tractor is just r.
Substitute the value of variable, but first discard from that variable any portion at the beginning that matches the pattern prefix.
If there is more than one alternative for how to match against prefix, this construct uses the longest possible match.
Thus, ${foo%%r*} substitutes t, because the largest match for r* at the end of tractor is ractor.
Substitute the value of variable, but first discard from that variable any portion at the beginning that matches the pattern prefix.
If there is more than one alternative for how to match against prefix, this construct uses the shortest possible alternative.
Thus, ${foo%%r*} substitutes tracto, because the shortest match for r* at the end of tractor is just r.