REGEXP Regular Expression Matching Function
Section: String Functions
Usage
Matches regular expressions in the provided string. This function is complicated, and compatibility with MATLABs syntax is not perfect. The syntax for its use is
regexp('str','expr')
which returns a row vector containing the starting index of each substring
of str that matches the regular expression described by expr. The
second form of regexp returns six outputs in the following order:
[start stop tokenExtents match tokens names] = regexp('str','expr')
where the meaning of each of the outputs is defined below.
-
startis a row vector containing the starting index of each substring that matches the regular expression. -
stopis a row vector containing the ending index of each substring that matches the regular expression. -
tokenExtentsis a cell array containing the starting and ending indices of each substring that matches thetokensin the regular expression. A token is a captured part of the regular expression. If the'once'mode is used, then this output is adoublearray. -
matchis a cell array containing the text for each substring that matches the regular expression. In'once'mode, this is a string. -
tokensis a cell array of cell arrays of strings that correspond to the tokens in the regular expression. In'once'mode, this is a cell array of strings. -
namedis a structure array containing the named tokens captured in a regular expression. Each named token is assigned a field in the resulting structure array, and each element of the array corresponds to a different match.
regexp:
[o1 o2 ...] = regexp('str','expr', 'p1', 'p2', ...)
where p1 etc. are the names of the outputs (and the order we want
the outputs in). As a final variant, you can supply some mode
flags to regexp
[o1 o2 ...] = regexp('str','expr', p1, p2, ..., 'mode1', 'mode2')
where acceptable mode flags are:
-
'once'- only the first match is returned. -
'matchcase'- letter case must match (selected by default forregexp) -
'ignorecase'- letter case is ignored (selected by default forregexpi) -
'dotall'- the'.'operator matches any character (default) -
'dotexceptnewline'- the'.'operator does not match the newline character -
'stringanchors'- the^and$operators match at the beginning and end (respectively) of a string. -
'lineanchors'- the^and$operators match at the beginning and end (respectively) of a line. -
'literalspacing'- the space characters and comment characters#are matched as literals, just like any other ordinary character (default). -
'freespacing'- all spaces and comments are ignored in the regular expression. You must use '\ ' and '\#' to match spaces and comment characters, respectively.
- If you have an old version of
pcreinstalled, then named tokens must use the older<?P<name>syntax, instead of the new<?<name>syntax. - The
pcrelibrary is pickier about named tokens and their appearance in expressions. So, for example, the regexp from the MATLAB manual'(?<first>\\w+)\\s+(?<last>\\w+)(?<last>\\w+),\\s+(?<first>\\w+)'| does not work correctly (as of this writing) because the same named tokens appear multiple times. The workaround is to assign different names to each token, and then collapse the results later.
Example
Some examples of using theregexp function
--> [start,stop,tokenExtents,match,tokens,named] = regexp('quick down town zoo','(.)own')
start =
7 12
stop =
10 15
tokenExtents =
[1 2 double array] [1 2 double array]
match =
[down] [town]
tokens =
[1 1 cell array] [1 1 cell array]
named =
[]
