Next: , Previous: Quoted strings, Up: More advanced concepts


4.12 Regular expressions

Regular expressions can be used in cfengine in connection with editfiles and processes to search for lines matching certain expressions. A regular expression is a generalized wildcard. In cfengine wildcards, you can use the characters '*' and '?' to match any character or number of characters. Regular expressions are more complicated than wildcards, but have far more flexibility.

NOTE: the special characters `*' and `?' used in wildcards do not have the same meanings as regular expressions!.

Some regular expressions match only a single string. For example, every string which contains no special characters is a regular expression which matches only a string identical to itself. Thus the regular expression `cfengine' would match only the string "cfengine", not "Cfengine" or "cfengin" etc. Other regular expressions could match more general strings. For instance, the regular expression `c*' matches any number of c's (including none). Thus this expression would match the empty string, "c", "cccc", "ccccccccc", but not "cccx".

Here is a list of regular expression special characters and operators.

`\'
The backslash character normally has a special purpose: either to introduce a special command, or to tell the expression interpreter that the next character is not to be treated as a special character. The backslash character stands for itself only when protected by square brackets [\] or quoted with a backslash itself `\\'.
`\b'
Matches word boundary operator.
`\B'
Match within a word (operator).
`\<'
Match beginning of word.
`\>'
Match end of word.
`\w'
Match a character which can be part of a word.
`\W'
Match a character which cannot be part of a word.
`any character'
Matches itself.
`.'
Matches any character
`*'
Match zero or more instances of the previous object. e.g. `c*'. If no object precedes it, it represents a literal asterisk.
`+'
Match one or more instances of the preceding object.
`?'
Match zero or one instance of the preceding object.
`{ }'
Number of matches operator. `{5}' would match exactly 5 instances of the previous object. `{6,}' would match at least 6 instances of the previous object. `{7,12}' would match at least 7 instances of, but no more than 12 instances of the preceding object. Clearly the first number must be less than the second to make a valid search expression.
`|'
The logical OR operator, OR's any two regular expressions.
`[list]'
Defines a list of characters which are to be considered as a single object (ORed). e.g. `[a-z]' matches any character in the range a to z, `abcd' matches either a, b, c or d. Most characters are ordinary inside a list, but there are some exceptions: `]' ends the list unless it is the first item, `\' quotes the next character, `[:' and `:]' define a character class operator (see below), and `-' represents a range of characters unless it is the first or last character in the list.
`[^list]'
Defines a list of characters which are NOT to be matched. i.e. match any character except those in the list.
``[:class:]''
Defines a class of characters, using the ctype-library.
alnum
Alpha numeric character
alpha
An alphabetic character
blank
A space or a TAB
cntrl
A control character.
digit
0-9
graph
same as print, without space
lower
a lower case letter
print
printable characters (non control characters)
punct
neither control nor alphanumeric symbols
space
space, carriage return, line-feed, vertical tab and form-feed.
upper
upper case letter
xdigit
a hexadecimal digit 0-9, a-f

``( )''
Groups together any number of operators.
`\digit'
Back-reference operator (refer to the GNU regex documentation).
`^'
Match start of a line.
`$'
Match the end of a line.

Here is a few examples. Remember that some commands look for a regular expression match of part of a string, while others require a match of the entire string (see Reference manual).

     
     ^#        match string beginning with the # symbol
     ^[^#]      match string not beginning with the # symbol
     ^[A-Z].+  match a string beginning with an uppercase letter
               followed by at least one other character