Previous: Operator In, Up: Conditions


6.5.5 The matches Condition (Regular Expressions)

The condition expr matches pattern or expr matches (pattern) interprets pattern as a pattern (a regular expression) and tests whether expr matches pattern. Patterns are defined as follows:

pattern ::= alternative {‘|alternative}
The string must be identical with one of the alternatives.
alternative ::= {atom [‘*’ | ‘?’ | ‘+’]}
An alternative is a (possibly empty) sequence of atoms. An atom in a pattern corresponds to a character in a string. By using an optional postfix operator it is possible to specify for any atom how often it may be repeated within the string at that location: zero times or once (‘?’), at least once (‘+’), or arbitrarily often, including zero times (‘*’).

Normally, these operators are greedy, i.e. they try to match as much as possible. If you put a ‘?’ behind a postfix operator, it will try to match as few characters as possible. This can make a difference if you're assigning variables in your pattern.

atom ::= ‘(pattern)
A pattern may be grouped by parentheses.
atom ::= ‘[’ [‘^’] range {range} ‘]
A character class. It represents exactly one character from one of the ranges. If the symbol ‘^’ is the first one in the class, the expression represents exactly one character that is not contained in one of the ranges.
atom ::= ‘.
Represents any character.
atom ::= character
Represents the character itself.
range ::= character1 [‘-character2]
The range contains any character with a code at least as big as the code of character1 and not bigger than the code of character2. The code of character2 must be at least as big as the code of character1. If character2 is omitted, the range only contains character1.
character ::= Any character except ‘*?+[]^-.\|()
To use one of the characters ‘*?+[]^-.|()’, it must be preceded by a ‘\\’ (pattern escape). To insert the pattern escape itself, you have to double it: ‘\\\\’.

You can divide the pattern into segments:

     $surf matches ("un|in|im|ir|il", ".*", "(en)?")

is is the same as

     $surf matches ("(un|in|im|ir|il).*(en)?")

A section of the string can be stored in a variable by suffixing the respective pattern with ‘: variable_name’, as in

     $surf matches ("un|in|im|ir|il": $a, ".*")

For backwards compatibility, you may also prefix the pattern with the variable name, as in

     $surf matches $a: "un|in|im|ir|il", ".*"

The variables defined by pattern matching are only defined in the statement sequence which is being executed if the pattern matching is successful. A matches condition may not have variable definitions in it if it is