Node:POSIX Regexp Compilation, Next:Flags for POSIX Regexps, Up:Regular Expressions
Before you can actually match a regular expression, you must compile it. This is not true compilation--it produces a special data structure, not machine instructions. But it is like ordinary compilation in that its purpose is to enable you to "execute" the pattern fast. (See Matching POSIX Regexps, for how to use the compiled regular expression for matching.) There is a special data type for compiled regular expressions:
regex_t | Data Type |
This type of object holds a compiled regular expression.
It is actually a structure. It has just one field that your programs
should look at:
|
regex_t
object, you can compile a regular
expression into it by calling regcomp
.
int regcomp (regex_t *compiled, const char *pattern, int cflags) | Function |
The function regcomp "compiles" a regular expression into a
data structure that you can use with regexec to match against a
string. The compiled regular expression format is designed for
efficient matching. regcomp stores it into *compiled .
It's up to you to allocate an object of type regex_t and pass its
address to regcomp .
The argument cflags lets you specify various options that control
the syntax and semantics of regular expressions. See Flags for POSIX
Regexps.
If you use the flag REG_NOSUB , then regcomp omits from
the compiled regular expression the information necessary to record
how subexpressions actually match. In this case, you might as well
pass 0 for the matchptr and nmatch arguments when
you call regexec .
If you don't use REG_NOSUB , then the compiled regular expression
does have the capacity to record how subexpressions match. Also,
regcomp tells you how many subexpressions pattern has, by
storing the number in compiled->re_nsub . You can use that
value to decide how long an array to allocate to hold information about
subexpression matches.
regcomp returns 0 if it succeeds in compiling the regular
expression; otherwise, it returns a nonzero error code (see the table
below). You can use regerror to produce an error message string
describing the reason for a nonzero value; see Regexp Cleanup.
|
regcomp
can return:
REG_BADBR
\{...\}
construct in the regular
expression. A valid \{...\}
construct must contain either
a single number, or two numbers in increasing order separated by a
comma.
REG_BADPAT
REG_BADRPT
?
or *
appeared in a bad
position (with no preceding subexpression to act on).
REG_ECOLLATE
REG_ECTYPE
REG_EESCAPE
\
.
REG_ESUBREG
\digit
construct.
REG_EBRACK
REG_EPAREN
\(
and \)
.
REG_EBRACE
\{
and \}
.
REG_ERANGE
REG_ESPACE
regcomp
ran out of memory.