Regular expression string matching. Matches pat in str and returns the position and matching substrings or empty values if there are none.
The matched pattern pat can include any of the standard regex operators, including:
.
- Match any character
* + ? {}
- Repetition operators, representing
*
- Match zero or more times
+
- Match one or more times
?
- Match zero or one times
{}
- Match range operator, which is of the form
{
n}
to match exactly n times,{
m,}
to match m or more times,{
m,
n}
to match between m and n times.[...] [^...]
- List operators, where for example
[ab]c
matchesac
andbc
()
- Grouping operator
|
- Alternation operator. Match one of a choice of regular expressions. The alternatives must be delimited by the grouping operator
()
above^ $
- Anchoring operator.
^
matches the start of the string str and$
the endIn addition the following escaped characters have special meaning. It should be noted that it is recommended to quote pat in single quotes rather than double quotes, to avoid the escape sequences being interpreted by octave before being passed to
regexp
.
\b
- Match a word boundary
\B
- Match within a word
\w
- Matches any word character
\W
- Matches any non word character
\<
- Matches the beginning of a word
\>
- Matches the end of a word
\s
- Matches any whitespace character
\S
- Matches any non whitespace character
\d
- Matches any digit
\D
- Matches any non-digit
The outputs of
regexp
by default are in the order as given below
- s
- The start indices of each of the matching substrings
- e
- The end indices of each matching substring
- te
- The extents of each of the matched token surrounded by
(...)
in pat.- m
- A cell array of the text of each match.
- t
- A cell array of the text of each token matched.
- nm
- A structure containing the text of each matched named token, with the name being used as the fieldname. A named token is denoted as
(?<name>...)
Particular output arguments or the order of the output arguments can be selected by additional opts arguments. These are strings and the correspondence between the output arguments and the optional argument are
'start' s 'end' e 'tokenExtents' te 'match' m 'tokens' t 'names' nm A further optional argument is 'once', that limits the number of returned matches to the first match. Additional arguments are
- matchcase
- Make the matching case sensitive.
- ignorecase
- Make the matching case insensitive.
- stringanchors
- Match the anchor characters at the beginning and end of the string.
- lineanchors
- Match the anchor characters at the beginning and end of the line.
- dotall
- The character
.
matches the newline character.- dotexceptnewline
- The character
.
matches all but the newline character.- freespacing
- The pattern can include arbitrary whitespace and comments starting with
#
.- literalspacing
- The pattern is taken literally.