REGEXP Regular Expression Matching Function
Section: String Functions
UsageMatches regular expressions in the provided string. This function is complicated, and compatibility with MATLABs syntax is not perfect. The syntax for its use is
which returns a row vector containing the starting index of each substring
str that matches the regular expression described by
second form of
regexp returns six outputs in the following order:
[start stop tokenExtents match tokens names] = regexp('str','expr')
where the meaning of each of the outputs is defined below.
startis a row vector containing the starting index of each substring that matches the regular expression.
stopis a row vector containing the ending index of each substring that matches the regular expression.
tokenExtentsis a cell array containing the starting and ending indices of each substring that matches the
tokensin the regular expression. A token is a captured part of the regular expression. If the
'once'mode is used, then this output is a
matchis a cell array containing the text for each substring that matches the regular expression. In
'once'mode, this is a string.
tokensis a cell array of cell arrays of strings that correspond to the tokens in the regular expression. In
'once'mode, this is a cell array of strings.
namedis a structure array containing the named tokens captured in a regular expression. Each named token is assigned a field in the resulting structure array, and each element of the array corresponds to a different match.
[o1 o2 ...] = regexp('str','expr', 'p1', 'p2', ...)
p1 etc. are the names of the outputs (and the order we want
the outputs in). As a final variant, you can supply some mode
[o1 o2 ...] = regexp('str','expr', p1, p2, ..., 'mode1', 'mode2')
mode flags are:
'once'- only the first match is returned.
'matchcase'- letter case must match (selected by default for
'ignorecase'- letter case is ignored (selected by default for
'.'operator matches any character (default)
'.'operator does not match the newline character
$operators match at the beginning and end (respectively) of a string.
$operators match at the beginning and end (respectively) of a line.
'literalspacing'- the space characters and comment characters
#are matched as literals, just like any other ordinary character (default).
'freespacing'- all spaces and comments are ignored in the regular expression. You must use '\ ' and '\#' to match spaces and comment characters, respectively.
- If you have an old version of
pcreinstalled, then named tokens must use the older
<?P<name>syntax, instead of the new
pcrelibrary is pickier about named tokens and their appearance in expressions. So, for example, the regexp from the MATLAB manual
'(?<first>\\w+)\\s+(?<last>\\w+)(?<last>\\w+),\\s+(?<first>\\w+)'| does not work correctly (as of this writing) because the same named tokens appear multiple times. The workaround is to assign different names to each token, and then collapse the results later.
ExampleSome examples of using the
--> [start,stop,tokenExtents,match,tokens,named] = regexp('quick down town zoo','(.)own') start = 7 12 stop = 10 15 tokenExtents = [1 2 double array] [1 2 double array] match = [down] [town] tokens = [1 1 cell array] [1 1 cell array] named =