[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter describes various builtin macros for controlling the input
to m4
.
8.1 Deleting whitespace in input | ||
8.2 Changing the quote characters | ||
8.3 Changing the comment delimiters | ||
8.4 Changing the lexical structure of words | ||
8.5 Saving text until end of input |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The builtin dnl
stands for “Discard to Next Line”:
All characters, up to and including the next newline, are discarded without performing any macro expansion. A warning is issued if the end of the file is encountered without a newline.
The expansion of dnl
is void.
It is often used in connection with define
, to remove the
newline that follows the call to define
. Thus
define(`foo', `Macro `foo'.')dnl A very simple macro, indeed. foo ⇒Macro foo. |
The input up to and including the next newline is discarded, as opposed
to the way comments are treated (see section Comments in m4
input).
Usually, dnl
is immediately followed by an end of line or some
other whitespace. GNU m4
will produce a warning diagnostic if
dnl
is followed by an open parenthesis. In this case, dnl
will collect and process all arguments, looking for a matching close
parenthesis. All predictable side effects resulting from this
collection will take place. dnl
will return no output. The
input following the matching close parenthesis up to and including the
next newline, on whatever line containing it, will still be discarded.
dnl(`args are ignored, but side effects occur', define(`foo', `like this')) while this text is ignored: undefine(`foo') error-->m4:stdin:1: Warning: excess arguments to builtin `dnl' ignored See how `foo' was defined, foo? ⇒See how foo was defined, like this? |
If the end of file is encountered without a newline character, a warning is issued and dnl stops consuming input.
m4wrap(`m4wrap(`2 hi ')0 hi dnl 1 hi') ⇒ define(`hi', `HI') ⇒ ^D error-->m4:stdin:1: Warning: end of file treated as newline ⇒0 HI 2 HI |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The default quote delimiters can be changed with the builtin
changequote
:
This sets start as the new begin-quote delimiter and end as
the new end-quote delimiter. If both arguments are missing, the default
quotes (`
and '
) are used. If start is void, then
quoting is disabled. Otherwise, if end is missing or void, the
default end-quote delimiter ('
) is used. The quote delimiters
can be of any length.
The expansion of changequote
is void.
changequote(`[', `]') ⇒ define([foo], [Macro [foo].]) ⇒ foo ⇒Macro foo. |
The quotation strings can safely contain eight-bit characters. If no single character is appropriate, start and end can be of any length. Other implementations cap the delimiter length to five characters, but GNU has no inherent limit.
changequote(`[[[', `]]]') ⇒ define([[[foo]]], [[[Macro [[[[[foo]]]]].]]]) ⇒ foo ⇒Macro [[foo]]. |
Calling changequote
with start as the empty string will
effectively disable the quoting mechanism, leaving no way to quote text.
However, using an empty string is not portable, as some other
implementations of m4
revert to the default quoting, while others
preserve the prior non-empty delimiter. If start is not empty,
then an empty end will use the default end-quote delimiter of
‘'’, as otherwise, it would be impossible to end a quoted string.
Again, this is not portable, as some other m4
implementations
reuse start as the end-quote delimiter, while others preserve the
previous non-empty value. Omitting both arguments restores the default
begin-quote and end-quote delimiters; fortunately this behavior is
portable to all implementations of m4
.
define(`foo', `Macro `FOO'.') ⇒ changequote(`', `') ⇒ foo ⇒Macro `FOO'. `foo' ⇒`Macro `FOO'.' changequote(`,) ⇒ foo ⇒Macro FOO. |
There is no way in m4
to quote a string containing an unmatched
begin-quote, except using changequote
to change the current
quotes.
If the quotes should be changed from, say, ‘[’ to ‘[[’,
temporary quote characters have to be defined. To achieve this, two
calls of changequote
must be made, one for the temporary quotes
and one for the new quotes.
Macros are recognized in preference to the begin-quote string, so if a
prefix of start can be recognized as part of a potential macro
name, the quoting mechanism is effectively disabled. Unless you use
changeword
(see section Changing the lexical structure of words), this means that start
should not begin with a letter, digit, or ‘_’ (underscore).
However, even though quoted strings are not recognized, the quote
characters can still be discerned in macro expansion and in trace
output.
define(`echo', `$@') ⇒ define(`hi', `HI') ⇒ changequote(`q', `Q') ⇒ q hi Q hi ⇒q HI Q HI echo(hi) ⇒qHIQ changequote ⇒ changequote(`-', `EOF') ⇒ - hi EOF hi ⇒ hi HI changequote ⇒ changequote(`1', `2') ⇒ hi1hi2 ⇒hi1hi2 hi 1hi2 ⇒HI hi |
Quotes are recognized in preference to argument collection. In particular, if start is a single ‘(’, then argument collection is effectively disabled. For portability with other implementations, it is a good idea to avoid ‘(’, ‘,’, and ‘)’ as the first character in start.
define(`echo', `$#:$@:') ⇒ define(`hi', `HI') ⇒ changequote(`(',`)') ⇒ echo(hi) ⇒0::hi changequote ⇒ changequote(`((', `))') ⇒ echo(hi) ⇒1:HI: echo((hi)) ⇒0::hi changequote ⇒ changequote(`,', `)') ⇒ echo(hi,hi)bye) ⇒1:HIhibye: |
However, if you are not worried about portability, using ‘(’ and
‘)’ as quoting characters has an interesting property—you can use
it to compute a quoted string containing the expansion of any quoted
text, as long as the expansion results in both balanced quotes and
balanced parentheses. The trick is realizing expand
uses
‘$1’ unquoted, to trigger its expansion using the normal quoting
characters, but uses extra parentheses to group unquoted commas that
occur in the expansion without consuming whitespace following those
commas. Then _expand
uses changequote
to convert the
extra parentheses back into quoting characters. Note that it takes two
more changequote
invocations to restore the original quotes.
Contrast the behavior on whitespace when using ‘$*’, via
quote
, to attempt the same task.
changequote(`[', `]')dnl define([a], [1, (b)])dnl define([b], [2])dnl define([quote], [[$*]])dnl define([expand], [_$0(($1))])dnl define([_expand], [changequote([(], [)])$1changequote`'changequote(`[', `]')])dnl expand([a, a, [a, a], [[a, a]]]) ⇒1, (2), 1, (2), a, a, [a, a] quote(a, a, [a, a], [[a, a]]) ⇒1,(2),1,(2),a, a,[a, a] |
If end is a prefix of start, the end-quote will be recognized in preference to a nested begin-quote. In particular, changing the quotes to have the same string for start and end disables nesting of quotes. When quote nesting is disabled, it is impossible to double-quote strings across macro expansions, so using the same string is not done very often.
define(`hi', `HI') ⇒ changequote(`""', `"') ⇒ ""hi"""hi" ⇒hihi ""hi" ""hi" ⇒hi hi ""hi"" "hi" ⇒hi" "HI" changequote ⇒ `hi`hi'hi' ⇒hi`hi'hi changequote(`"', `"') ⇒ "hi"hi"hi" ⇒hiHIhi |
It is an error if the end of file occurs within a quoted string.
`hello world' ⇒hello world `dangling quote ^D error-->m4:stdin:2: ERROR: end of file in string |
ifelse(`dangling quote ^D error-->m4:stdin:1: ERROR: end of file in string |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The default comment delimiters can be changed with the builtin
macro changecom
:
This sets start as the new begin-comment delimiter and end as the new end-comment delimiter. If both arguments are missing, or start is void, then comments are disabled. Otherwise, if end is missing or void, the default end-comment delimiter of newline is used. The comment delimiters can be of any length.
The expansion of changecom
is void.
define(`comment', `COMMENT') ⇒ # A normal comment ⇒# A normal comment changecom(`/*', `*/') ⇒ # Not a comment anymore ⇒# Not a COMMENT anymore But: /* this is a comment now */ while this is not a comment ⇒But: /* this is a comment now */ while this is not a COMMENT |
Note how comments are copied to the output, much as if they were quoted strings. If you want the text inside a comment expanded, quote the begin-comment delimiter.
Calling changecom
without any arguments, or with start as
the empty string, will effectively disable the commenting mechanism. To
restore the original comment start of ‘#’, you must explicitly ask
for it. If start is not empty, then an empty end will use
the default end-comment delimiter of newline, as otherwise, it would be
impossible to end a comment. However, this is not portable, as some
other m4
implementations preserve the previous non-empty
delimiters instead.
define(`comment', `COMMENT') ⇒ changecom ⇒ # Not a comment anymore ⇒# Not a COMMENT anymore changecom(`#', `') ⇒ # comment again ⇒# comment again |
The comment strings can safely contain eight-bit characters. If no single character is appropriate, start and end can be of any length. Other implementations cap the delimiter length to five characters, but GNU has no inherent limit.
Comments are recognized in preference to macros. However, this is not compatible with other implementations, where macros and even quoting takes precedence over comments, so it may change in a future release. For portability, this means that start should not begin with a letter, digit, or ‘_’ (underscore), and that neither the start-quote nor the start-comment string should be a prefix of the other.
define(`hi', `HI') ⇒ define(`hi1hi2', `hello') ⇒ changecom(`q', `Q') ⇒ q hi Q hi ⇒q hi Q HI changecom(`1', `2') ⇒ hi1hi2 ⇒hello hi 1hi2 ⇒HI 1hi2 |
Comments are recognized in preference to argument collection. In particular, if start is a single ‘(’, then argument collection is effectively disabled. For portability with other implementations, it is a good idea to avoid ‘(’, ‘,’, and ‘)’ as the first character in start.
define(`echo', `$#:$*:$@:') ⇒ define(`hi', `HI') ⇒ changecom(`(',`)') ⇒ echo(hi) ⇒0:::(hi) changecom ⇒ changecom(`((', `))') ⇒ echo(hi) ⇒1:HI:HI: echo((hi)) ⇒0:::((hi)) changecom(`,', `)') ⇒ echo(hi,hi)bye) ⇒1:HI,hi)bye:HI,hi)bye: changecom ⇒ echo(hi,`,`'hi',hi) ⇒3:HI,,HI,HI:HI,,`'hi,HI: echo(hi,`,`'hi',hi`'changecom(`,,', `hi')) ⇒3:HI,,`'hi,HI:HI,,`'hi,HI: |
It is an error if the end of file occurs within a comment.
changecom(`/*', `*/') ⇒ /*dangling comment ^D error-->m4:stdin:2: ERROR: end of file in comment |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The macro
changeword
and all associated functionality is experimental. It is only available if the ‘--enable-changeword’ option was given toconfigure
, at GNUm4
installation time. The functionality will go away in the future, to be replaced by other new features that are more efficient at providing the same capabilities. Do not rely on it. Please direct your comments about it the same way you would do for bugs.
A file being processed by m4
is split into quoted strings, words
(potential macro names) and simple tokens (any other single character).
Initially a word is defined by the following regular expression:
[_a-zA-Z][_a-zA-Z0-9]* |
Using changeword
, you can change this regular expression:
Changes the regular expression for recognizing macro names to be regex. If regex is empty, use ‘[_a-zA-Z][_a-zA-Z0-9]*’. regex must obey the constraint that every prefix of the desired final pattern is also accepted by the regular expression. If regex contains grouping parentheses, the macro invoked is the portion that matched the first group, rather than the entire matching string.
The expansion of changeword
is void.
The macro changeword
is recognized only with parameters.
Relaxing the lexical rules of m4
might be useful (for example) if
you wanted to apply translations to a file of numbers:
ifdef(`changeword', `', `errprint(` skipping: no changeword support ')m4exit(`77')')dnl changeword(`[_a-zA-Z0-9]+') ⇒ define(`1', `0')1 ⇒0 |
Tightening the lexical rules is less useful, because it will generally make some of the builtins unavailable. You could use it to prevent accidental call of builtins, for example:
ifdef(`changeword', `', `errprint(` skipping: no changeword support ')m4exit(`77')')dnl define(`_indir', defn(`indir')) ⇒ changeword(`_[_a-zA-Z0-9]*') ⇒ esyscmd(`foo') ⇒esyscmd(foo) _indir(`esyscmd', `echo hi') ⇒hi ⇒ |
Because m4
constructs its words a character at a time, there
is a restriction on the regular expressions that may be passed to
changeword
. This is that if your regular expression accepts
‘foo’, it must also accept ‘f’ and ‘fo’.
ifdef(`changeword', `', `errprint(` skipping: no changeword support ')m4exit(`77')')dnl define(`foo ', `bar ') ⇒ dnl This example wants to recognize changeword, dnl, and `foo\n'. dnl First, we check that our regexp will match. regexp(`changeword', `[cd][a-z]*\|foo[ ]') ⇒0 regexp(`foo ', `[cd][a-z]*\|foo[ ]') ⇒0 regexp(`f', `[cd][a-z]*\|foo[ ]') ⇒-1 foo ⇒foo changeword(`[cd][a-z]*\|foo[ ]') ⇒ dnl Even though `foo\n' matches, we forgot to allow `f'. foo ⇒foo changeword(`[cd][a-z]*\|fo*[ ]?') ⇒ dnl Now we can call `foo\n'. foo ⇒bar |
changeword
has another function. If the regular expression
supplied contains any grouped subexpressions, then text outside
the first of these is discarded before symbol lookup. So:
ifdef(`changeword', `', `errprint(` skipping: no changeword support ')m4exit(`77')')dnl ifdef(`__unix__', , `errprint(` skipping: syscmd does not have unix semantics ')m4exit(`77')')dnl changecom(`/*', `*/')dnl define(`foo', `bar')dnl changeword(`#\([_a-zA-Z0-9]*\)') ⇒ #esyscmd(`echo foo \#foo') ⇒foo bar ⇒ |
m4
now requires a ‘#’ mark at the beginning of every
macro invocation, so one can use m4
to preprocess plain
text without losing various words like ‘divert’.
In m4
, macro substitution is based on text, while in TeX, it
is based on tokens. changeword
can throw this difference into
relief. For example, here is the same idea represented in TeX and
m4
. First, the TeX version:
\def\a{\message{Hello}} \catcode`\@=0 \catcode`\\=12 @a @bye ⇒Hello |
Then, the m4
version:
ifdef(`changeword', `', `errprint(` skipping: no changeword support ')m4exit(`77')')dnl define(`a', `errprint(`Hello')')dnl changeword(`@\([_a-zA-Z0-9]*\)') ⇒ @a ⇒errprint(Hello) |
In the TeX example, the first line defines a macro a
to
print the message ‘Hello’. The second line defines <@> to
be usable instead of <\> as an escape character. The third line
defines <\> to be a normal printing character, not an escape.
The fourth line invokes the macro a
. So, when TeX is run
on this file, it displays the message ‘Hello’.
When the m4
example is passed through m4
, it outputs
‘errprint(Hello)’. The reason for this is that TeX does
lexical analysis of macro definition when the macro is defined.
m4
just stores the text, postponing the lexical analysis until
the macro is used.
You should note that using changeword
will slow m4
down
by a factor of about seven, once it is changed to something other
than the default regular expression. You can invoke changeword
with the empty string to restore the default word definition, and regain
the parsing speed.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
It is possible to ‘save’ some text until the end of the normal input has
been seen. Text can be saved, to be read again by m4
when the
normal input has been exhausted. This feature is normally used to
initiate cleanup actions before normal exit, e.g., deleting temporary
files.
To save input text, use the builtin m4wrap
:
Stores string in a safe place, to be reread when end of input is reached. As a GNU extension, additional arguments are concatenated with a space to the string.
The expansion of m4wrap
is void.
The macro m4wrap
is recognized only with parameters.
define(`cleanup', `This is the `cleanup' action. ') ⇒ m4wrap(`cleanup') ⇒ This is the first and last normal input line. ⇒This is the first and last normal input line. ^D ⇒This is the cleanup action. |
The saved input is only reread when the end of normal input is seen, and
not if m4exit
is used to exit m4
.
It is safe to call m4wrap
from saved text, but then the order in
which the saved text is reread is undefined. If m4wrap
is not used
recursively, the saved pieces of text are reread in the opposite order
in which they were saved (LIFO—last in, first out). However, this
behavior is likely to change in a future release, to match
POSIX, so you should not depend on this order.
It is possible to emulate POSIX behavior even with older versions of GNU M4 by including the file ‘m4-1.4.16/examples/wrapfifo.m4’ from the distribution:
$ m4 -I examples undivert(`wrapfifo.m4')dnl ⇒dnl Redefine m4wrap to have FIFO semantics. ⇒define(`_m4wrap_level', `0')dnl ⇒define(`m4wrap', ⇒`ifdef(`m4wrap'_m4wrap_level, ⇒ `define(`m4wrap'_m4wrap_level, ⇒ defn(`m4wrap'_m4wrap_level)`$1')', ⇒ `builtin(`m4wrap', `define(`_m4wrap_level', ⇒ incr(_m4wrap_level))dnl ⇒m4wrap'_m4wrap_level)dnl ⇒define(`m4wrap'_m4wrap_level, `$1')')')dnl include(`wrapfifo.m4') ⇒ m4wrap(`a`'m4wrap(`c ', `d')')m4wrap(`b') ⇒ ^D ⇒abc |
It is likewise possible to emulate LIFO behavior without resorting to
the GNU M4 extension of builtin
, by including the file
‘m4-1.4.16/examples/wraplifo.m4’ from the
distribution. (Unfortunately, both examples shown here share some
subtle bugs. See if you can find and correct them; or see section Answers).
$ m4 -I examples undivert(`wraplifo.m4')dnl ⇒dnl Redefine m4wrap to have LIFO semantics. ⇒define(`_m4wrap_level', `0')dnl ⇒define(`_m4wrap', defn(`m4wrap'))dnl ⇒define(`m4wrap', ⇒`ifdef(`m4wrap'_m4wrap_level, ⇒ `define(`m4wrap'_m4wrap_level, ⇒ `$1'defn(`m4wrap'_m4wrap_level))', ⇒ `_m4wrap(`define(`_m4wrap_level', incr(_m4wrap_level))dnl ⇒m4wrap'_m4wrap_level)dnl ⇒define(`m4wrap'_m4wrap_level, `$1')')')dnl include(`wraplifo.m4') ⇒ m4wrap(`a`'m4wrap(`c ', `d')')m4wrap(`b') ⇒ ^D ⇒bac |
Here is an example of implementing a factorial function using
m4wrap
:
define(`f', `ifelse(`$1', `0', `Answer: 0!=1 ', eval(`$1>1'), `0', `Answer: $2$1=eval(`$2$1') ', `m4wrap(`f(decr(`$1'), `$2$1*')')')') ⇒ f(`10') ⇒ ^D ⇒Answer: 10*9*8*7*6*5*4*3*2*1=3628800 |
Invocations of m4wrap
at the same recursion level are
concatenated and rescanned as usual:
define(`aa', `AA ') ⇒ m4wrap(`a')m4wrap(`a') ⇒ ^D ⇒AA |
however, the transition between recursion levels behaves like an end of file condition between two input files.
m4wrap(`m4wrap(`)')len(abc') ⇒ ^D error-->m4:stdin:1: ERROR: end of file in argument list |
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by root on March 13, 2013 using texi2html 1.82.