4.12 Regular expressions
Regular expressions can be used in cfengine in connection with
editfiles
and processes
to search for lines matching
certain expressions. A regular expression is a generalized wildcard. In
cfengine wildcards, you can use the characters '*' and '?' to match any
character or number of characters. Regular expressions are more
complicated than wildcards, but have far more flexibility.
NOTE: the special characters `*' and `?'
used in wildcards do not have the
same meanings as regular expressions!.
Some regular expressions match only a single string. For example, every
string which contains no special characters is a regular expression
which matches only a string identical to itself. Thus the regular
expression `cfengine' would match only the string "cfengine", not
"Cfengine" or "cfengin" etc. Other regular expressions could match more
general strings. For instance, the regular expression `c*' matches
any number of c's (including none). Thus this expression would match the
empty string, "c", "cccc", "ccccccccc", but not "cccx".
Here is a list of regular expression special characters and operators.
- `\'
- The backslash character normally has a special purpose: either to
introduce a special command, or to tell the expression interpreter that
the next character is not to be treated as a special character.
The backslash character stands for itself only when protected by square
brackets
[\]
or quoted with a backslash itself `\\'.
- `\b'
- Matches word boundary operator.
- `\B'
- Match within a word (operator).
- `\<'
- Match beginning of word.
- `\>'
- Match end of word.
- `\w'
- Match a character which can be part of a word.
- `\W'
- Match a character which cannot be part of a word.
- `any character'
- Matches itself.
- `.'
- Matches any character
- `*'
- Match zero or more instances of the previous object. e.g. `c*'.
If no object precedes it, it represents a literal asterisk.
- `+'
- Match one or more instances of the preceding object.
- `?'
- Match zero or one instance of the preceding object.
- `{ }'
- Number of matches operator. `{5}' would match exactly 5
instances of the previous object. `{6,}' would match at least
6 instances of the previous object. `{7,12}' would match at least
7 instances of, but no more than 12 instances of the preceding object.
Clearly the first number must be less than the second to make a valid
search expression.
- `|'
- The logical OR operator, OR's any two regular expressions.
- `[list]'
- Defines a list of characters which are to be considered as a single
object (ORed). e.g. `[a-z]' matches any character in the range a to
z, `abcd' matches either a, b, c or d. Most characters are
ordinary inside a list, but there are some exceptions: `]' ends the
list unless it is the first item, `\' quotes the next character,
`[:' and `:]' define a character class operator (see below),
and `-' represents a range of characters unless it is the first
or last character in the list.
- `[^list]'
- Defines a list of characters which are NOT to be matched. i.e.
match any character except those in the list.
- ``[:class:]''
- Defines a class of characters, using the ctype-library.
alnum
- Alpha numeric character
alpha
- An alphabetic character
blank
- A space or a TAB
cntrl
- A control character.
digit
- 0-9
graph
- same as print, without space
lower
- a lower case letter
print
- printable characters (non control characters)
punct
- neither control nor alphanumeric symbols
space
- space, carriage return, line-feed, vertical tab and form-feed.
upper
- upper case letter
xdigit
- a hexadecimal digit 0-9, a-f
- ``( )''
- Groups together any number of operators.
- `\digit'
- Back-reference operator (refer to the GNU regex documentation).
- `^'
- Match start of a line.
- `$'
- Match the end of a line.
Here is a few examples. Remember that some commands look for
a regular expression match of part of a string, while others
require a match of the entire string (see Reference manual).
^# match string beginning with the # symbol
^[^#] match string not beginning with the # symbol
^[A-Z].+ match a string beginning with an uppercase letter
followed by at least one other character