Chapter 9. Programming in M4

Autoconf is written on top of two layers: M4sugar, which provides convenient macros for pure M4 programming, and M4sh, which provides macros dedicated to shell script generation.

As of this version of Autoconf, these two layers are still experimental, and their interface might change in the future. As a matter of fact, anything that is not documented must not be used.

M4 Quotation

The most common problem with existing macros is an improper quotation. This section, which users of Autoconf can skip, but which macro writers must read, first justifies the quotation scheme that was chosen for Autoconf and then ends with a rule of thumb. Understanding the former helps one to follow the latter.

Active Characters

To fully understand where proper quotation is important, you first need to know what are the special characters in Autoconf: # introduces a comment inside which no macro expansion is performed, , separates arguments, [ and ] are the quotes themselves, and finally ( and ) (which m4 tries to match by pairs).

In order to understand the delicate case of macro calls, we first have to present some obvious failures. Below they are "obvious-ified", although you find them in real life, they are usually in disguise.

Comments, introduced by a hash and running up to the newline, are opaque tokens to the top level: active characters are turned off, and there is no macro expansion:

# define([def], ine)
=># define([def], ine)

Each time there can be a macro expansion, there is a quotation expansion; i.e., one level of quotes is stripped:

int tab[10];
=>int tab10;
[int tab[10];]
=>int tab[10];

Without this in mind, the reader will try hopelessly to use her macro array:

define([array], [int tab[10];])
array
=>int tab10;
[array]
=>array

How can you correctly output the intended results[16]?

One Macro Call

Let's proceed on the interaction between active characters and macros with this small macro, which just returns its first argument:

define([car], [$1])

The two pairs of quotes above are not part of the arguments of define; rather, they are understood by the top level when it tries to find the arguments of define. Therefore, it is equivalent to write:

define(car, $1)

But, while it is acceptable for a configure.ac to avoid unnecessary quotes, it is bad practice for Autoconf macros which must both be more robust and also advocate perfect style.

At the top level, there are only two possibilities: either you quote or you don't:

car(foo, bar, baz)
=>foo
[car(foo, bar, baz)]
=>car(foo, bar, baz)

Let's pay attention to the special characters:

car(#)
error-->EOF in argument list

The closing parenthesis is hidden in the comment; with a hypothetical quoting, the top level understood it this way:

car([#)]

Proper quotation, of course, fixes the problem:

car([#])
=>#

The reader will easily understand the following examples:

car(foo, bar)
=>foo
car([foo, bar])
=>foo, bar
car((foo, bar))
=>(foo, bar)
car([(foo], [bar)])
=>(foo
car([], [])
=>
car([[]], [[]])
=>[]

With this in mind, we can explore the cases where macros invoke macros…

Quotation and Nested Macros

The examples below use the following macros:

define([car], [$1])
define([active], [ACT, IVE])
define([array], [int tab[10]])

Each additional embedded macro call introduces other possible interesting quotations:

car(active)
=>ACT
car([active])
=>ACT, IVE
car([[active]])
=>active

In the first case, the top level looks for the arguments of car, and finds active. Because m4 evaluates its arguments before applying the macro, active is expanded, which results in:

car(ACT, IVE)
=>ACT

In the second case, the top level gives active as first and only argument of car, which results in:

active
=>ACT, IVE

i.e., the argument is evaluated after the macro that invokes it. In the third case, car receives [active], which results in:

[active]
=>active

exactly as we already saw above.

The example above, applied to a more realistic example, gives:

car(int tab[10];)
=>int tab10;
car([int tab[10];])
=>int tab10;
car([[int tab[10];]])
=>int tab[10];

Huh? The first case is easily understood, but why is the second wrong, and the third right? To understand that, you must know that after m4 expands a macro, the resulting text is immediately subjected to macro expansion and quote removal. This means that the quote removal occurs twice--first before the argument is passed to the car macro, and second after the car macro expands to the first argument.

As the author of the Autoconf macro car, you then consider it to be incorrect that your users have to double-quote the arguments of car, so you "fix" your macro. Let's call it qar for quoted car:

define([qar], [[$1]])

and check that qar is properly fixed:

qar([int tab[10];])
=>int tab[10];

Ahhh! That's much better.

But note what you've done: now that the arguments are literal strings, if the user wants to use the results of expansions as arguments, she has to use an unquoted macro call:

qar(active)
=>ACT

where she wanted to reproduce what she used to do with car:

car([active])
=>ACT, IVE

Worse yet: she wants to use a macro that produces a set of cpp macros:

define([my_includes], [#include stdio.h])
car([my_includes])
=>#include stdio.h
qar(my_includes)
error-->EOF in argument list

This macro, qar, because it double quotes its arguments, forces its users to leave their macro calls unquoted, which is dangerous. Commas and other active symbols are interpreted by m4 before they are given to the macro, often not in the way the users expect. Also, because qar behaves differently from the other macros, it's an exception that should be avoided in Autoconf.

changequoteis Evil

The temptation is often high to bypass proper quotation, in particular when it's late at night. Then, many experienced Autoconf hackers finally surrender to the dark side of the force and use the ultimate weapon: changequote.

The M4 builtin changequote belongs to a set of primitives that allow one to adjust the syntax of the language to adjust it to her needs. For instance, by default M4 uses ` and ' as quotes, but in the context of shell programming (and actually of most programming languages), it's about the worst choice one can make: because of strings and back quoted expression in shell (such as 'this' and `that`), because of literal characters in usual programming language (as in '0'), there are many unbalanced ` and '. Proper M4 quotation then becomes a nightmare, if not impossible. In order to make M4 useful in such a context, its designers have equipped it with changequote, which makes it possible to chose another pair of quotes. M4sugar, M4sh, Autoconf, and Autotest all have chosen to use [ and ]. Not especially because they are unlikely characters, but because they are characters unlikely to be unbalanced.

There are other magic primitives, such as changecom to specify what syntactic forms are comments (it is common to see changecom(!-, -) when M4 is used to produce HTML pages), changeword and changesyntax to change other syntactic details (such as the character to denote the n-th argument, $ by default, the parenthesis around arguments etc.).

These primitives are really meant to make M4 more useful for specific domains: they should be considered like command line options: -quotes, -comments, -words, and -syntax. Nevertheless, they are implemented as M4 builtins, as it makes M4 libraries self contained (no need for additional options).

There lies the problem...

The problem is that it is then tempting to use them in the middle of an M4 script, as opposed to its initialization. This, if not carefully thought, can lead to disastrous effects: you are changing the language in the middle of the execution. Changing and restoring the syntax is often not enough: if you happened to invoke macros in between, these macros will be lost, as the current syntax will probably not be the one they were implemented with.

Quadrigraphs

When writing an Autoconf macro you may occasionally need to generate special characters that are difficult to express with the standard Autoconf quoting rules. For example, you may need to output the regular expression [^[], which matches any character other than [. This expression contains unbalanced brackets so it cannot be put easily into an M4 macro.

You can work around this problem by using one of the following quadrigraphs:

@:@

[

@:@

]

@S|@

$

@%:@

#

@t@

Expands to nothing.

Quadrigraphs are replaced at a late stage of the translation process, after m4 is run, so they do not get in the way of M4 quoting. For example, the string ^@:@, independently of its quotation, will appear as ^[ in the output.

The empty quadrigraph can be used:

  • to mark explicitly trailing spaces

    Trailing spaces are smashed by autom4te. This is a feature.

  • to produce other quadrigraphs

    For instance @@t@:@ produces @:@.

  • to escape occurrences of forbidden patterns

    For instance you might want to mention AC_FOO is a comment, while still being sure that autom4te will still catch unexpanded AC_*. Then write AC@t@_FOO.

The name @t@ was suggested by Paul Eggert:

I should give some credit to the @t@ pun. The is my own invention, but the t came from the source code of the algol68c compiler, written by Steve Bourne (of Bourne shell fame), and which used mt to denote the empty string. In C, it would have looked like something like:

char const mt[] = "";

but of course the source code was written in Algol 68.

I don't know where he got mt from: it could have been his own invention, and I suppose it could have been a common pun around the Cambridge University computer lab at the time.

Quotation Rule Of Thumb

To conclude, the quotation rule of thumb is:

One pair of quotes per pair of parentheses. Never over-quote, never under-quote, in particular in the definition of macros. In the few places where the macros need to use brackets (usually in C program text or regular expressions), properly quote the arguments!

It is common to read Autoconf programs with snippets like:

AC_TRY_LINK(
changequote(, )dnl
#include time.h
#ifndef tzname /* For SGI.  */
extern char *tzname[]; /* RS6000 and others reject char **tzname.  */
#endif,
changequote([, ])dnl
[atoi (*tzname);], ac_cv_var_tzname=yes, ac_cv_var_tzname=no)

which is incredibly useless since AC_TRY_LINK is already double quoting, so you just need:

AC_TRY_LINK(
[#include time.h
#ifndef tzname /* For SGI.  */
extern char *tzname[]; /* RS6000 and others reject char **tzname.  */
#endif],
            [atoi (*tzname);],
            [ac_cv_var_tzname=yes],
            [ac_cv_var_tzname=no])

The M4-fluent reader will note that these two examples are rigorously equivalent, since m4 swallows both the changequote(, ) and when it collects the arguments: these quotes are not part of the arguments!

Simplified, the example above is just doing this:

changequote(, )dnl
[]
changequote([, ])dnl

instead of simply:

[[]]

With macros that do not double quote their arguments (which is the rule), double-quote the (risky) literals:

AC_LINK_IFELSE([AC_LANG_PROGRAM(
[[#include time.h
#ifndef tzname /* For SGI.  */
extern char *tzname[]; /* RS6000 and others reject char **tzname.  */
#endif]],
                                [atoi (*tzname);])],
               [ac_cv_var_tzname=yes],
               [ac_cv_var_tzname=no])

the section called “Quadrigraphs”, for what to do if you run into a hopeless case where quoting does not suffice.

When you create a configure script using newly written macros, examine it carefully to check whether you need to add more quotes in your macros. If one or more words have disappeared in the m4 output, you need more quotes. When in doubt, quote.

However, it's also possible to put on too many layers of quotes. If this happens, the resulting configure script will contain unexpanded macros. The autoconf program checks for this problem by doing grep AC_ configure.



[16] Using defn.