#include "regex.h"
int
regcomp(regex_t *preg,
const char
*regex, int
cflags);
The regcomp() function compiles the regex string pointed to by regex to an internal representation and stores the result in the pattern buffer structure pointed to by preg.
The cflags argument is a the bitwise inclusive OR of zero or more of the following flags (defined in the header "regex.h"):
- REG_EXTENDED
- Use POSIX Extended Regular Expression (ERE) compatible syntax when compiling regex.
This flag is always set sincere4asmonly supports ERE.- REG_ICASE
- Ignore case. Subsequent searches with the regexec function using this pattern buffer will be case insensitive.
- REG_NOSUB
- Do not report submatches. Subsequent searches with the regexec function will only report whether a match was found or not and will not fill the match array.
- REG_NEWLINE
- Normally the newline character is treated as an ordinary character. When this flag is used, the newline character ('\n', ASCII code 10) is treated specially as follows:
- The match-any-character operator (dot "." outside a bracket expression) does not match a newline.
- A non-matching list ([^...]) not containing a newline does not match a newline.
- The match-beginning-of-line operator ^ matches the empty string immediately after a newline as well as the empty string at the beginning of the string (but see the
REG_NOTBOLregexec()flag below).- The match-end-of-line operator $ matches the empty string immediately before a newline as well as the empty string at the end of the string (but see the
REG_NOTEOLregexec()flag below).- REG_SETMINLEN
- Sets the minlen field in preg.
After a successful call to regcomp it is
possible to use the preg pattern buffer for
searching for matches in strings (see below).
The regex_t structure has the following fields that the application can read:
- size_t re_nsub
- Number of parenthesized subexpressions in regex.
- size_t minlen
- Minimum length of a match. The data in this field is valid only if the
REG_SETMINLENflag was passed to regcomp.
If minlen < string, then string can safely be skipped.
The regcomp function returns zero if the compilation was successful, or one of the following (POSIX) error codes if there was an error:
- REG_BADPAT
- Invalid regexp.
- REG_ECOLLATE
- Invalid collating element referenced.
re4asmreturns this whenever equivalence classes or multicharacter collating elements are used in bracket expressions (they are not supported).- REG_ECTYPE
- Unknown character class name in [[:name:]].
- REG_EESCAPE
- The last character of regex was a backslash (\).
- REG_ESUBREG
- Invalid back reference; number in \digit invalid.
- REG_EBRACK
- [] imbalance.
- REG_EPAREN
- \(\) or () imbalance.
- REG_EBRACE
- \{\} or {} imbalance.
- REG_BADBR
- {} content invalid: not a number, more than two numbers, first larger than second, or number too large.
- REG_ERANGE
- Invalid character range, e.g. ending point is earlier in the collating order than the starting point.
- REG_ESPACE
- Out of memory, or an internal limit exceeded.
- REG_BADRPT
- Invalid use of repetition operators: two or more repetition operators have been chained in an undefined way.
The regcomp function also returns the following (non-POSIX) error codes if there was an error:
- REG_INTERNAL
- Internal error (aka "bug").
- REG_STATES
- NFA state limit exceeded. The current implementation of
re4asmsupports only patterns that have a character width of less than 33.- REG_EMPTYPAT
- regex is the empty string or evaluates to an empty regular expression.
- REG_EMPTYSET
- A bracket expression [] evaluates to an empty set.
- REG_INVCHAR
- regex contains a non-ASCII character, i.e., a character with an ASCII code greater than 127.
- REG_UNIONOP
- Misplaced | operator.
- REG_NESTLEVEL
- Parenthesis () nesting level exceeded.
- REG_EMPTYPARENS
- Empty pair of parenthesis ().
- REG_ANCHINPAREN
- Anchor (^ or $) within parenthesis ()
- REG_ANCHINEXPR
- Anchor (^ or $) within regular expression, i.e., not first or last character of regular expression.
- REG_ANCHALONE
- Stand-alone anchor, i.e., ^ not followed by anything, or $ not preceded by anything.
#include "regex.h"
int regexec(const
regex_t *preg, const char *string,
size_t nmatch,
regmatch_t pmatch[], int
eflags);
The regexec() function matches the null-terminated string against the compiled regexp preg, initialized by a previous call to to the regcomp function. The eflags argument is a bitwise OR of zero or more of the following flags:
REG_NOTBOLWhen this flag is used, the match-beginning-of-line operator ^ does not match the empty string at the beginning of string. If
REG_NEWLINEwas used when compiling preg the empty string immediately after a newline character will still be matched.REG_NOTEOLWhen this flag is used, the match-end-of-line operator $ does not match the empty string at the end of string. If
REG_NEWLINEwas used when compiling preg the empty string immediately before a newline character will still be matched.These flags are useful when different portions of a string are passed to
regexecand the beginning or end of the partial string should not be interpreted as the beginning or end of a line.
If REG_NOSUB was used when compiling preg, nmatch is zero, or pmatch is NULL, then the
pmatch argument is ignored.
Otherwise, the start and end of the entire match is filled in the first element of pmatch.
Since re4asm does not support submatch addressing, setting nmatch > 1 and passing more than one
regmatch_t structure in
pmatch makes no sense.
The regmatch_t structure contains at least the following fields:
- regoff_t rm_so
- Offset from start of string to start of substring.
- regoff_t rm_eo
- Offset from start of string to the first character after the substring.
The length of a match can be computed by subtracting rm_so from rm_eo.
The regexec() function returns zero if a match was found,
otherwise it returns REG_NOMATCH to indicate no match.
#include "regex.h"
size_t regerror(int errcode,
const regex_t *preg,
char *errbuf,
size_t errbuf_size);
The regerror() function is used to turn the error codes that can be returned by both regcomp and regexec into error message strings.
regerror() is passed the error code, errcode, the pattern buffer, preg, a pointer to a character string buffer, errbuf, and the size of the string buffer, errbuf_size. It returns the size of the errbuf required to contain the null-terminated error message string. If both errbuf and errbuf_size are nonzero, errbuf is filled in with the first errbuf_size - 1 characters of the error message and a terminating null.
#include "regex.h"
void regfree(regex_t *preg);
The regfree() function is (usually) used to free
the memory allocated by regcomp.
In re4asm the size of regex_t is
constant and neither regcomp() nor
regexec() allocate memory for it.
It is therefore not necessary to call regfree(), which
is just a stub function doing nothing but immediately return und exists only for completeness/compatibility.