This is patch16 to PennMUSH 1.7.6. After applying this patch, you will have version 1.7.6p16 To apply this patch, save it to a file in your top-level MUSH directory, and do the following: patch -p1 < 1.7.6-patch16 make install If you use GNU patch 2.2, you probably want the above to be 'patch -b -p1', not just 'patch -p1'. Unix (or cygwin) users need not worry about failed hunks in src/switchinc.c, hdrs/switches.h, hdrs/cmds.h, or hdrs/funs.h. These files are automatically rebuilt on compile. On the off chance they appear not to be, simply rm them and re-run make. Then @shutdown and restart your MUSH. - Alan/Javelin In this patch: Fixes: * PCRE updated to 4.5 [SW] Prereq: 1.7.6p15 *** 1_7_6.184/Patchlevel Sun, 25 Jan 2004 20:27:09 -0600 dunemush (pennmush/5_Patchlevel 1.17.1.11.1.1.1.15 600) --- 1_7_6.186(w)/Patchlevel Wed, 28 Apr 2004 10:29:39 -0500 dunemush (pennmush/5_Patchlevel 1.17.1.11.1.1.1.16 600) *************** *** 1,2 **** Do not edit this file. It is maintained by the official PennMUSH patches. ! This is PennMUSH 1.7.6p15 --- 1,2 ---- Do not edit this file. It is maintained by the official PennMUSH patches. ! This is PennMUSH 1.7.6p16 *** 1_7_6.184/CHANGES.176 Wed, 28 Jan 2004 12:10:46 -0600 dunemush (pennmush/g/17_CHANGES 1.10.1.6.1.2.1.2.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.3.1.1.1.1.1.9.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1 600) --- 1_7_6.186(w)/CHANGES.176 Wed, 28 Apr 2004 10:29:59 -0500 dunemush (pennmush/g/17_CHANGES 1.10.1.6.1.2.1.2.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.3.1.1.1.1.1.9.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1 600) *************** *** 18,23 **** --- 18,29 ---- ========================================================================== + Version 1.7.6 patchlevel 16 April 28, 2004 + + Fixes: + * PCRE updated to 4.5 [SW] + + Version 1.7.6 patchlevel 15 January 25, 2004 Fixes: *** 1_7_6.184/game/txt/hlp/pennvOLD.hlp Sat, 24 Jan 2004 13:15:41 -0600 dunemush (pennmush/g/30_pennvOLD.h 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 660) --- 1_7_6.186(w)/game/txt/hlp/pennvOLD.hlp Wed, 28 Apr 2004 10:30:37 -0500 dunemush (pennmush/g/30_pennvOLD.h 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 660) *************** *** 4417,4423 **** For information on a specific patchlevel of one of the versions listed, type 'help p'. For example, 'help 1.7.2p3' ! 1.7.6: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 1.7.5: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 1.7.4: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 --- 4417,4423 ---- For information on a specific patchlevel of one of the versions listed, type 'help p'. For example, 'help 1.7.2p3' ! 1.7.6: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 1.7.5: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 1.7.4: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 *** 1_7_6.184/game/txt/hlp/pennv176.hlp Wed, 28 Jan 2004 12:10:46 -0600 dunemush (pennmush/g/33_pennv176.h 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.2.1.1.1.1.1.1.1.2 660) --- 1_7_6.186(w)/game/txt/hlp/pennv176.hlp Wed, 28 Apr 2004 10:30:37 -0500 dunemush (pennmush/g/33_pennv176.h 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.1.1.1.1.2.1.1.1.1.1.2.1.1.1.1.1.1.1.2.1.1.1.1 660) *************** *** 1,4 **** ! & 1.7.6p15 & changes This is a list of changes in this patchlevel which are probably of interest to players. More information about new commands and functions --- 1,4 ---- ! & 1.7.6p16 & changes This is a list of changes in this patchlevel which are probably of interest to players. More information about new commands and functions *************** *** 11,16 **** --- 11,23 ---- A list of the patchlevels associated with each release can be read in 'help patchlevels'. + Version 1.7.6 patchlevel 16 April 28, 2004 + + Fixes: + * PCRE updated to 4.5 [SW] + + + & 1.7.6p15 Version 1.7.6 patchlevel 15 January 25, 2004 Fixes: *** 1_7_6.184/game/txt/hlp/pennfunc.hlp Sat, 31 May 2003 16:30:52 -0500 dunemush (pennmush/16_pennfunc.h 1.2.1.50.1.1.1.1.1.2.1.7.1.8.1.1.1.1.1.1.1.1.1.1.1.1.1.3.1.1.1.1.1.1.1.1.1.1.1.1.1.9.1.1.1.1.1.3.1.1.1.1.1.1.1.1.1.1.1.1.1.1 600) --- 1_7_6.186(w)/game/txt/hlp/pennfunc.hlp Fri, 26 Mar 2004 16:25:04 -0600 dunemush (pennmush/16_pennfunc.h 1.2.1.50.1.1.1.1.1.2.1.7.1.8.1.1.1.1.1.1.1.1.1.1.1.1.1.3.1.1.1.1.1.1.1.1.1.1.1.1.1.9.1.1.1.1.1.3.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 600) *************** *** 2557,2564 **** If the specified register is -1, the substring is not copied into a register. Under regmatchi, case of the substring may be modified. ! For example, if is 'cookies=30', and is '(.+)=([0-9]*)' ! (parsed; note that escaping may be necessary), then the 0th substring matched is 'cookies=30', the 1st substring is 'cookies', and the 2nd substring is '30'. If is '0 3 5', then %q0 will become "cookies=30", %q3 will become "cookies", and %q5 will become "30". --- 2557,2564 ---- If the specified register is -1, the substring is not copied into a register. Under regmatchi, case of the substring may be modified. ! For example, in regmatch( cookies=30 , (.+)=(\[0-9\]*) ) ! (note use of escaping for MUSH parser), then the 0th substring matched is 'cookies=30', the 1st substring is 'cookies', and the 2nd substring is '30'. If is '0 3 5', then %q0 will become "cookies=30", %q3 will become "cookies", and %q5 will become "30". *** 1_7_6.184/hdrs/version.h Sun, 25 Jan 2004 20:27:09 -0600 dunemush (pennmush/c/47_version.h 1.32.1.2.1.7.1.9.1.1.1.17.1.2.1.14 660) --- 1_7_6.186(w)/hdrs/version.h Wed, 28 Apr 2004 10:31:21 -0500 dunemush (pennmush/c/47_version.h 1.32.1.2.1.7.1.9.1.1.1.17.1.2.1.15 660) *************** *** 1,2 **** ! #define VERSION "PennMUSH version 1.7.6 patchlevel 15 [01/25/2004]" ! #define SHORTVN "PennMUSH 1.7.6p15" --- 1,2 ---- ! #define VERSION "PennMUSH version 1.7.6 patchlevel 16 [04/28/2004]" ! #define SHORTVN "PennMUSH 1.7.6p16" *** 1_7_6.184/src/pcre.c Thu, 17 Apr 2003 09:49:14 -0500 dunemush (pennmush/d/36_pcre.c 1.4.1.3.1.3.1.1.1.1.1.1.1.4 660) --- 1_7_6.186(w)/src/pcre.c Wed, 28 Apr 2004 10:31:20 -0500 dunemush (pennmush/d/36_pcre.c 1.4.1.3.1.3.1.1.1.1.1.1.1.4.1.1 660) *************** *** 42,59 **** #include #include #include #include "pcre.h" #include "confmagic.h" ! /* Bits of PCRE's config.h */ ! #define LINK_SIZE 2 ! #define MATCH_LIMIT 100000 #define NEWLINE '\n' /* Bits of internal.h */ /* This header contains definitions that are shared between the different modules, but which are not relevant to the outside. */ /* PCRE keeps offsets in its compiled code as 2-byte quantities by default. These are used, for example, to link from the start of a subpattern to its --- 42,64 ---- #include #include #include + #include #include "pcre.h" #include "confmagic.h" ! ! /* Bits of PCRE's conf.h */ #define NEWLINE '\n' + #define LINK_SIZE 2 + #define MATCH_LIMIT 100000 + #define NO_RECURSE /* Bits of internal.h */ /* This header contains definitions that are shared between the different modules, but which are not relevant to the outside. */ + #define PCRE_DEFINITION /* Win32 __declspec(export) trigger for .dll */ + #define EXPORT /* PCRE keeps offsets in its compiled code as 2-byte quantities by default. These are used, for example, to link from the start of a subpattern to its *************** *** 120,129 **** #define PUBLIC_OPTIONS \ (PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \ PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8| \ ! PCRE_NO_AUTO_CAPTURE) #define PUBLIC_EXEC_OPTIONS \ ! (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY) #define PUBLIC_STUDY_OPTIONS 0 /* None defined */ --- 125,134 ---- #define PUBLIC_OPTIONS \ (PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \ PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8| \ ! PCRE_NO_AUTO_CAPTURE|PCRE_NO_UTF8_CHECK) #define PUBLIC_EXEC_OPTIONS \ ! (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NO_UTF8_CHECK) #define PUBLIC_STUDY_OPTIONS 0 /* None defined */ *************** *** 169,176 **** #define ESC_r '\r' #endif ! #ifndef ESC_t ! #define ESC_t '\t' #endif /* These are escaped items that aren't just an encoding of a particular data --- 174,184 ---- #define ESC_r '\r' #endif ! /* We can't officially use ESC_t because it is a POSIX reserved identifier ! (presumably because of all the others like size_t). */ ! ! #ifndef ESC_tee ! #define ESC_tee '\t' #endif /* These are escaped items that aren't just an encoding of a particular data *************** *** 275,314 **** class - the difference is relevant only when a UTF-8 character > 255 is encountered. */ ! OP_XCLASS, /* 56 Extended class for handling UTF-8 chars within the class. This does both positive and negative. */ ! OP_REF, /* 57 Match a back reference */ ! OP_RECURSE, /* 58 Match a numbered subpattern (possibly recursive) */ ! OP_CALLOUT, /* 59 Call out to external function if provided */ ! ! OP_ALT, /* 60 Start of alternation */ ! OP_KET, /* 61 End of group that doesn't have an unbounded repeat */ ! OP_KETRMAX, /* 62 These two must remain together and in this */ ! OP_KETRMIN, /* 63 order. They are for groups the repeat for ever. */ /* The assertions must come before ONCE and COND */ ! OP_ASSERT, /* 64 Positive lookahead */ ! OP_ASSERT_NOT, /* 65 Negative lookahead */ ! OP_ASSERTBACK, /* 66 Positive lookbehind */ ! OP_ASSERTBACK_NOT, /* 67 Negative lookbehind */ ! OP_REVERSE, /* 68 Move pointer back - used in lookbehind assertions */ /* ONCE and COND must come after the assertions, with ONCE first, as there's a test for >= ONCE for a subpattern that isn't an assertion. */ ! OP_ONCE, /* 69 Once matched, don't back up into the subpattern */ ! OP_COND, /* 70 Conditional group */ ! OP_CREF, /* 71 Used to hold an extraction string number (cond ref) */ ! OP_BRAZERO, /* 72 These two must remain together and in this */ ! OP_BRAMINZERO, /* 73 order. */ ! OP_BRANUMBER, /* 74 Used for extracting brackets whose number is greater than can fit into an opcode. */ ! OP_BRA /* 75 This and greater values are used for brackets that extract substrings up to a basic limit. After that, use is made of OP_BRANUMBER. */ }; --- 283,322 ---- class - the difference is relevant only when a UTF-8 character > 255 is encountered. */ ! OP_XCLASS, /* 57 Extended class for handling UTF-8 chars within the class. This does both positive and negative. */ ! OP_REF, /* 58 Match a back reference */ ! OP_RECURSE, /* 59 Match a numbered subpattern (possibly recursive) */ ! OP_CALLOUT, /* 60 Call out to external function if provided */ ! ! OP_ALT, /* 61 Start of alternation */ ! OP_KET, /* 62 End of group that doesn't have an unbounded repeat */ ! OP_KETRMAX, /* 63 These two must remain together and in this */ ! OP_KETRMIN, /* 64 order. They are for groups the repeat for ever. */ /* The assertions must come before ONCE and COND */ ! OP_ASSERT, /* 65 Positive lookahead */ ! OP_ASSERT_NOT, /* 66 Negative lookahead */ ! OP_ASSERTBACK, /* 67 Positive lookbehind */ ! OP_ASSERTBACK_NOT, /* 68 Negative lookbehind */ ! OP_REVERSE, /* 69 Move pointer back - used in lookbehind assertions */ /* ONCE and COND must come after the assertions, with ONCE first, as there's a test for >= ONCE for a subpattern that isn't an assertion. */ ! OP_ONCE, /* 70 Once matched, don't back up into the subpattern */ ! OP_COND, /* 71 Conditional group */ ! OP_CREF, /* 72 Used to hold an extraction string number (cond ref) */ ! OP_BRAZERO, /* 73 These two must remain together and in this */ ! OP_BRAMINZERO, /* 74 order. */ ! OP_BRANUMBER, /* 75 Used for extracting brackets whose number is greater than can fit into an opcode. */ ! OP_BRA /* 76 This and greater values are used for brackets that extract substrings up to a basic limit. After that, use is made of OP_BRANUMBER. */ }; *************** *** 351,360 **** 1, 1, 1, 1, 2, 1, 1, /* Any, Anybyte, \Z, \z, Opt, ^, $ */ \ 2, /* Chars - the minimum length */ \ 2, /* not */ \ ! /* Positive single-char repeats */ \ ! 2, 2, 2, 2, 2, 2, /* *, *?, +, +?, ?, ?? ** These are */ \ ! 4, 4, 4, /* upto, minupto, exact ** minima */ \ ! /* Negative single-char repeats */ \ 2, 2, 2, 2, 2, 2, /* NOT *, *?, +, +?, ?, ?? */ \ 4, 4, 4, /* NOT upto, minupto, exact */ \ /* Positive type repeats */ \ --- 359,368 ---- 1, 1, 1, 1, 2, 1, 1, /* Any, Anybyte, \Z, \z, Opt, ^, $ */ \ 2, /* Chars - the minimum length */ \ 2, /* not */ \ ! /* Positive single-char repeats ** These are */ \ ! 2, 2, 2, 2, 2, 2, /* *, *?, +, +?, ?, ?? ** minima in */ \ ! 4, 4, 4, /* upto, minupto, exact ** UTF-8 mode */ \ ! /* Negative single-char repeats - only for chars < 256 */ \ 2, 2, 2, 2, 2, 2, /* NOT *, *?, +, +?, ?, ?? */ \ 4, 4, 4, /* NOT upto, minupto, exact */ \ /* Positive type repeats */ \ *************** *** 446,451 **** --- 454,460 ---- #define ERR41 "unrecognized character after (?P" #define ERR42 "syntax error after (?P" #define ERR43 "two named groups have the same name" + #define ERR44 "invalid UTF-8 string" /* All character handling must be done as unsigned characters. Otherwise there are problems with top-bit-set characters and functions such as isspace(). *************** *** 509,515 **** call within the pattern. */ typedef struct recursion_info { ! struct recursion_info *prev; /* Previous recursion record (or NULL) */ int group_num; /* Number of group that was called */ const uschar *after_call; /* "Return value": points after the call in the expr */ const uschar *save_start; /* Old value of md->start_match */ --- 518,524 ---- call within the pattern. */ typedef struct recursion_info { ! struct recursion_info *prevrec; /* Previous recursion record (or NULL) */ int group_num; /* Number of group that was called */ const uschar *after_call; /* "Return value": points after the call in the expr */ const uschar *save_start; /* Old value of md->start_match */ *************** *** 517,522 **** --- 526,541 ---- int saved_max; /* Number of saved offsets */ } recursion_info; + /* When compiling in a mode that doesn't use recursive calls to match(), + a structure is used to remember local variables on the heap. It is defined in + pcre.c, close to the match() function, so that it is easy to keep it in step + with any changes of local variable. However, the pointer to the current frame + must be saved in some "static" place over a longjmp(). We declare the + structure here so that we can put a pointer in the match_data structure. + NOTE: This isn't used for a "normal" compilation of pcre. */ + + struct heapframe; + /* Structure for passing "static" information around between the functions doing the matching, so that they are thread-safe. */ *************** *** 544,549 **** --- 563,569 ---- int start_offset; /* The start offset value */ recursion_info *recursive; /* Linked list of recursion data */ void *callout_data; /* To pass back to callouts */ + struct heapframe *thisframe; /* Used only when compiling for no recursion */ } match_data; /* Bit definitions for entries in the pcre_ctypes table. */ *************** *** 766,771 **** --- 786,796 ---- /* End of chartables.c */ /* get.c */ + /* This module contains some convenience functions for extracting substrings + from the subject string after a regex match has succeeded. The original idea + for these functions came from Scott Wimer . */ + + /************************************************* * Copy captured string to given buffer * *************************************************/ *************** *** 809,814 **** --- 834,841 ---- return yield; } + + /* End of get.c */ /* maketables.c */ /************************************************* *************** *** 818,824 **** /* This function builds a set of character tables for use by PCRE and returns a pointer to them. They are build using the ctype functions, and consequently their contents will depend upon the current locale setting. When compiled as ! part of the library, the store is obtained via malloc(), but when compiled inside dftables, use malloc(). Arguments: none --- 845,851 ---- /* This function builds a set of character tables for use by PCRE and returns a pointer to them. They are build using the ctype functions, and consequently their contents will depend upon the current locale setting. When compiled as ! part of the library, the store is obtained via pcre_malloc(), but when compiled inside dftables, use malloc(). Arguments: none *************** *** 831,837 **** --- 858,868 ---- unsigned char *yield, *p; int i; + #ifndef DFTABLES yield = (unsigned char *) malloc(tables_length); + #else + yield = (unsigned char *) malloc(tables_length); + #endif if (yield == NULL) return NULL; *************** *** 899,904 **** --- 930,941 ---- x += ctype_xdigit; if (isalnum(i) || i == '_') x += ctype_word; + + /* Note: strchr includes the terminating zero in the characters it considers. + In this instance, that is ok because we want binary zero to be flagged as a + meta-character, which in this sense is any character that terminates a run + of data characters. */ + if (strchr("*+?{^.$|()[", i) != 0) x += ctype_meta; *p++ = x; *************** *** 909,915 **** /* End of maketables.c */ /* study.c */ - /************************************************* * Set a bit and maybe its alternate case * *************************************************/ --- 946,951 ---- *************** *** 1124,1129 **** --- 1160,1168 ---- case OP_TYPEQUERY: case OP_TYPEMINQUERY: switch (tcode[1]) { + case OP_ANY: + return FALSE; + case OP_NOT_DIGIT: for (c = 0; c < 32; c++) start_bits[c] |= ~cd->cbits[c + cbit_digit]; *************** *** 1161,1181 **** /* Character class where all the information is in a bit map: set the bits and either carry on or not, according to the repeat count. If it was a negative class, and we are operating with UTF-8 characters, any byte ! with the top-bit set is a potentially valid starter because it may start ! a character with a value > 255. (This is sub-optimal in that the ! character may be in the range 128-255, and those characters might be ! unwanted, but that's as far as we go for the moment.) */ case OP_NCLASS: ! if (utf8) ! memset(start_bits + 16, 0xff, 16); /* Fall through */ case OP_CLASS: { tcode++; ! for (c = 0; c < 32; c++) ! start_bits[c] |= tcode[c]; tcode += 32; switch (*tcode) { case OP_CRSTAR: --- 1200,1246 ---- /* Character class where all the information is in a bit map: set the bits and either carry on or not, according to the repeat count. If it was a negative class, and we are operating with UTF-8 characters, any byte ! with a value >= 0xc4 is a potentially valid starter because it starts a ! character with a value > 255. */ case OP_NCLASS: ! if (utf8) { ! start_bits[24] |= 0xf0; /* Bits for 0xc4 - 0xc8 */ ! memset(start_bits + 25, 0xff, 7); /* Bits for 0xc9 - 0xff */ ! } /* Fall through */ case OP_CLASS: { tcode++; ! ! /* In UTF-8 mode, the bits in a bit map correspond to character ! values, not to byte values. However, the bit map we are constructing is ! for byte values. So we have to do a conversion for characters whose ! value is > 127. In fact, there are only two possible starting bytes for ! characters in the range 128 - 255. */ ! ! if (utf8) { ! for (c = 0; c < 16; c++) ! start_bits[c] |= tcode[c]; ! for (c = 128; c < 256; c++) { ! if ((tcode[c / 8] && (1 << (c & 7))) != 0) { ! int d = (c >> 6) | 0xc0; /* Set bit for this starter */ ! start_bits[d / 8] |= (1 << (d & 7)); /* and then skip on to the */ ! c = (c & 0xc0) + 0x40 - 1; /* next relevant character. */ ! } ! } ! } ! ! /* In non-UTF-8 mode, the two bit maps are completely compatible. */ ! ! else { ! for (c = 0; c < 32; c++) ! start_bits[c] |= tcode[c]; ! } ! ! /* Advance past the bit map, and act on what follows */ ! tcode += 32; switch (*tcode) { case OP_CRSTAR: *************** *** 1230,1236 **** NULL on error or if no optimization possible */ ! pcre_extra * pcre_study(const pcre * external_re, int options, const char **errorptr) { uschar start_bits[32]; --- 1295,1301 ---- NULL on error or if no optimization possible */ ! EXPORT pcre_extra * pcre_study(const pcre * external_re, int options, const char **errorptr) { uschar start_bits[32]; *************** *** 1281,1288 **** the pcre_fullinfo() function so that if it becomes variable in the future, we don't have to change that code. */ ! extra = (pcre_extra *) (malloc) ! (sizeof(pcre_extra) + sizeof(pcre_study_data)); if (extra == NULL) { *errorptr = "failed to get memory"; --- 1346,1352 ---- the pcre_fullinfo() function so that if it becomes variable in the future, we don't have to change that code. */ ! extra = (pcre_extra *) malloc(sizeof(pcre_extra) + sizeof(pcre_study_data)); if (extra == NULL) { *errorptr = "failed to get memory"; *************** *** 1337,1343 **** /* Table of sizes for the fixed-length opcodes. It's defined in a macro so that the definition is next to the definition of the opcodes in internal.h. */ ! static uschar OP_lengths[] = { OP_LENGTHS }; /* Min and max values for the common repeats; for the maxima, 0 => infinity */ --- 1401,1407 ---- /* Table of sizes for the fixed-length opcodes. It's defined in a macro so that the definition is next to the definition of the opcodes in internal.h. */ ! static const uschar OP_lengths[] = { OP_LENGTHS }; /* Min and max values for the common repeats; for the maxima, 0 => infinity */ *************** *** 1358,1372 **** 0, 0, -ESC_Z, '[', '\\', ']', '^', '_', /* X - _ */ '`', 7, -ESC_b, 0, -ESC_d, ESC_e, ESC_f, 0, /* ` - g */ 0, 0, 0, 0, 0, 0, ESC_n, 0, /* h - o */ ! 0, 0, ESC_r, -ESC_s, ESC_t, 0, 0, -ESC_w, /* p - w */ 0, 0, -ESC_z /* x - z */ }; /* Tables of names of POSIX character classes and their lengths. The list is terminated by a zero length entry. The first three must be alpha, upper, lower, as this is assumed for handling case independence. */ ! static const char *posix_names[] = { "alpha", "lower", "upper", "alnum", "ascii", "blank", "cntrl", "digit", "graph", "print", "punct", "space", "word", "xdigit" --- 1422,1438 ---- 0, 0, -ESC_Z, '[', '\\', ']', '^', '_', /* X - _ */ '`', 7, -ESC_b, 0, -ESC_d, ESC_e, ESC_f, 0, /* ` - g */ 0, 0, 0, 0, 0, 0, ESC_n, 0, /* h - o */ ! 0, 0, ESC_r, -ESC_s, ESC_tee, 0, 0, -ESC_w, /* p - w */ 0, 0, -ESC_z /* x - z */ }; + + /* Tables of names of POSIX character classes and their lengths. The list is terminated by a zero length entry. The first three must be alpha, upper, lower, as this is assumed for handling case independence. */ ! static const char *const posix_names[] = { "alpha", "lower", "upper", "alnum", "ascii", "blank", "cntrl", "digit", "graph", "print", "punct", "space", "word", "xdigit" *************** *** 1397,1402 **** --- 1463,1520 ---- cbit_xdigit, -1, -1 /* xdigit */ }; + /* Table to identify digits and hex digits. This is used when compiling + patterns. Note that the tables in chartables are dependent on the locale, and + may mark arbitrary characters as digits - but the PCRE compiling code expects + to handle only 0-9, a-z, and A-Z as digits when compiling. That is why we have + a private table here. It costs 256 bytes, but it is a lot faster than doing + character value tests (at least in some simple cases I timed), and in some + applications one wants PCRE to compile efficiently as well as match + efficiently. + + For convenience, we use the same bit definitions as in chartables: + + 0x04 decimal digit + 0x08 hexadecimal digit + + Then we can use ctype_digit and ctype_xdigit in the code. */ + + static const unsigned char digitab[] = { + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0- 7 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 8- 15 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 16- 23 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 24- 31 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* - ' */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* ( - / */ + 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, /* 0 - 7 */ + 0x0c, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 8 - ? */ + 0x00, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x00, /* @ - G */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* H - O */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* P - W */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* X - _ */ + 0x00, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x00, /* ` - g */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* h - o */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* p - w */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* x -127 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 128-135 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 136-143 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 144-151 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 152-159 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 160-167 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 168-175 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 176-183 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 184-191 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 192-199 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 200-207 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 208-215 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 216-223 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 224-231 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 232-239 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 240-247 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 + }; /* 248-255 */ + + /* Definition to allow mutual recursion */ *************** *** 1407,1417 **** /* Structure for building a chain of data that actually lives on the stack, for holding the values of the subject pointer at the start of each subpattern, so as to detect when an empty string has been matched by a ! subpattern - to break infinite loops. */ typedef struct eptrblock { ! struct eptrblock *prev; ! const uschar *saved_eptr; } eptrblock; /* Flag bits for the match() function */ --- 1525,1536 ---- /* Structure for building a chain of data that actually lives on the stack, for holding the values of the subject pointer at the start of each subpattern, so as to detect when an empty string has been matched by a ! subpattern - to break infinite loops. When NO_RECURSE is set, these blocks ! are on the heap, not on the stack. */ typedef struct eptrblock { ! struct eptrblock *epb_prev; ! const uschar *epb_saved_eptr; } eptrblock; /* Flag bits for the match() function */ *************** *** 1432,1455 **** *************************************************/ /* PCRE is thread-clean and doesn't use any global variables in the normal ! sense. However, it calls memory allocation and free functions via the two indirections below, and it can optionally do callouts. These values can be changed by the caller, but are shared between all threads. However, when compiling for Virtual Pascal, things are done differently (see pcre.in). */ int (*pcre_callout) (pcre_callout_block *) = NULL; /************************************************* * Macros and tables for character handling * *************************************************/ #define GETCHAR(c, eptr) c = *eptr; #define GETCHARINC(c, eptr) c = *eptr++; #define GETCHARINCTEST(c, eptr) c = *eptr++; #define GETCHARLEN(c, eptr, len) c = *eptr; #define BACKCHAR(eptr) /************************************************* * Handle escapes * *************************************************/ --- 1551,1593 ---- *************************************************/ /* PCRE is thread-clean and doesn't use any global variables in the normal ! sense. However, it calls memory allocation and free functions via the four indirections below, and it can optionally do callouts. These values can be changed by the caller, but are shared between all threads. However, when compiling for Virtual Pascal, things are done differently (see pcre.in). */ + #ifndef VPCOMPAT + #ifdef __cplusplus + extern "C" void *(*pcre_malloc) (size_t) = malloc; + extern "C" void (*pcre_free) (void *) = free; + extern "C" void *(*pcre_stack_malloc) (size_t) = malloc; + extern "C" void (*pcre_stack_free) (void *) = free; + extern "C" int (*pcre_callout) (pcre_callout_block *) = NULL; + #else + void *(*pcre_malloc) (size_t) = malloc; + void (*pcre_free) (void *) = free; + void *(*pcre_stack_malloc) (size_t) = malloc; + void (*pcre_stack_free) (void *) = free; int (*pcre_callout) (pcre_callout_block *) = NULL; + #endif + #endif /************************************************* * Macros and tables for character handling * *************************************************/ + /* When UTF-8 encoding is being used, a character is no longer just a single + byte. The macros for character handling generate simple sequences when used in + byte-mode, and more complicated ones for UTF-8 characters. */ + #define GETCHAR(c, eptr) c = *eptr; #define GETCHARINC(c, eptr) c = *eptr++; #define GETCHARINCTEST(c, eptr) c = *eptr++; #define GETCHARLEN(c, eptr, len) c = *eptr; #define BACKCHAR(eptr) + /************************************************* * Handle escapes * *************************************************/ *************** *** 1466,1472 **** bracount number of previous extracting brackets options the options bits isclass TRUE if inside a character class - cd pointer to char tables block Returns: zero or positive => a data character negative => a special escape sequence --- 1604,1609 ---- *************** *** 1475,1481 **** static int check_escape(const uschar ** ptrptr, const char **errorptr, int bracount, ! int options, BOOL isclass, compile_data * cd) { const uschar *ptr = *ptrptr; int c, i; --- 1612,1618 ---- static int check_escape(const uschar ** ptrptr, const char **errorptr, int bracount, ! int options, BOOL isclass) { const uschar *ptr = *ptrptr; int c, i; *************** *** 1486,1502 **** if (c == 0) *errorptr = ERR1; ! /* Digits or letters may have special meaning; all others are literals. */ else if (c < '0' || c > 'z') { ! } ! ! /* Do an initial lookup in a table. A non-zero result is something that can be ! returned immediately. Otherwise further processing may be required. */ ! else if ((i = escapes[c - '0']) != 0) c = i; /* Escapes that need further processing, or are illegal. */ else { --- 1623,1638 ---- if (c == 0) *errorptr = ERR1; ! /* Non-alphamerics are literals. For digits or letters, do an initial lookup in ! a table. A non-zero result is something that can be returned immediately. ! Otherwise further processing may be required. */ else if (c < '0' || c > 'z') { ! } /* Not alphameric */ else if ((i = escapes[c - '0']) != 0) c = i; + /* Escapes that need further processing, or are illegal. */ else { *************** *** 1541,1547 **** if (!isclass) { oldptr = ptr; c -= '0'; ! while ((cd->ctypes[ptr[1]] & ctype_digit) != 0) c = c * 10 + *(++ptr) - '0'; if (c < 10 || c <= bracount) { c = -(ESC_REF + c); --- 1677,1683 ---- if (!isclass) { oldptr = ptr; c -= '0'; ! while ((digitab[ptr[1]] & ctype_digit) != 0) c = c * 10 + *(++ptr) - '0'; if (c < 10 || c <= bracount) { c = -(ESC_REF + c); *************** *** 1565,1572 **** case '0': c -= '0'; ! while (i++ < 2 && (cd->ctypes[ptr[1]] & ctype_digit) != 0 && ! ptr[1] != '8' && ptr[1] != '9') c = c * 8 + *(++ptr) - '0'; c &= 255; /* Take least significant 8 bits */ break; --- 1701,1707 ---- case '0': c -= '0'; ! while (i++ < 2 && ptr[1] >= '0' && ptr[1] <= '7') c = c * 8 + *(++ptr) - '0'; c &= 255; /* Take least significant 8 bits */ break; *************** *** 1579,1588 **** /* Read just a single hex char */ c = 0; ! while (i++ < 2 && (cd->ctypes[ptr[1]] & ctype_xdigit) != 0) { ! ptr++; ! c = c * 16 + cd->lcc[*ptr] - ! (((cd->ctypes[*ptr] & ctype_digit) != 0) ? '0' : 'W'); } break; --- 1714,1725 ---- /* Read just a single hex char */ c = 0; ! while (i++ < 2 && (digitab[ptr[1]] & ctype_xdigit) != 0) { ! int cc; /* Some compilers don't like ++ */ ! cc = *(++ptr); /* in initializers */ ! if (cc >= 'a') ! cc -= 32; /* Convert to upper case */ ! c = c * 16 + cc - ((cc < 'A') ? '0' : ('A' - 10)); } break; *************** *** 1595,1604 **** return 0; } ! /* A letter is upper-cased; then the 0x40 bit is flipped */ if (c >= 'a' && c <= 'z') ! c = cd->fcc[c]; c ^= 0x40; break; --- 1732,1743 ---- return 0; } ! /* A letter is upper-cased; then the 0x40 bit is flipped. This coding ! is ASCII-specific, but then the whole concept of \cx is ASCII-specific. ! (However, an EBCDIC equivalent has now been added.) */ if (c >= 'a' && c <= 'z') ! c -= 32; c ^= 0x40; break; *************** *** 1636,1652 **** Arguments: p pointer to the first char after '{' - cd pointer to char tables block Returns: TRUE or FALSE */ static BOOL ! is_counted_repeat(const uschar * p, compile_data * cd) { ! if ((cd->ctypes[*p++] & ctype_digit) == 0) return FALSE; ! while ((cd->ctypes[*p] & ctype_digit) != 0) p++; if (*p == '}') return TRUE; --- 1775,1790 ---- Arguments: p pointer to the first char after '{' Returns: TRUE or FALSE */ static BOOL ! is_counted_repeat(const uschar * p) { ! if ((digitab[*p++] & ctype_digit) == 0) return FALSE; ! while ((digitab[*p] & ctype_digit) != 0) p++; if (*p == '}') return TRUE; *************** *** 1656,1665 **** if (*p == '}') return TRUE; ! if ((cd->ctypes[*p++] & ctype_digit) == 0) return FALSE; ! while ((cd->ctypes[*p] & ctype_digit) != 0) p++; return (*p == '}'); } --- 1794,1804 ---- if (*p == '}') return TRUE; ! if ((digitab[*p++] & ctype_digit) == 0) return FALSE; ! while ((digitab[*p] & ctype_digit) != 0) p++; + return (*p == '}'); } *************** *** 1679,1685 **** maxp pointer to int for max returned as -1 if no max errorptr points to pointer to error message - cd pointer to character tables clock Returns: pointer to '}' on success; current ptr on error, with errorptr set --- 1818,1823 ---- *************** *** 1687,1698 **** static const uschar * read_repeat_counts(const uschar * p, int *minp, int *maxp, ! const char **errorptr, compile_data * cd) { int min = 0; int max = -1; ! while ((cd->ctypes[*p] & ctype_digit) != 0) min = min * 10 + *p++ - '0'; if (*p == '}') --- 1825,1836 ---- static const uschar * read_repeat_counts(const uschar * p, int *minp, int *maxp, ! const char **errorptr) { int min = 0; int max = -1; ! while ((digitab[*p] & ctype_digit) != 0) min = min * 10 + *p++ - '0'; if (*p == '}') *************** *** 1700,1706 **** else { if (*(++p) != '}') { max = 0; ! while ((cd->ctypes[*p] & ctype_digit) != 0) max = max * 10 + *p++ - '0'; if (max < min) { *errorptr = ERR4; --- 1838,1844 ---- else { if (*(++p) != '}') { max = 0; ! while ((digitab[*p] & ctype_digit) != 0) max = max * 10 + *p++ - '0'; if (max < min) { *errorptr = ERR4; *************** *** 1968,1975 **** */ static const uschar * ! find_bracket(const uschar * code, int number) { for (;;) { register int c = *code; --- 2106,2114 ---- */ static const uschar * ! find_bracket(const uschar * code, BOOL utf8, int number) { + utf8 = utf8; /* Stop pedantic compilers complaining */ for (;;) { register int c = *code; *************** *** 1987,1996 **** } else { code += OP_lengths[c]; ! /* In UTF-8 mode, opcodes that are followed by a character may be followed ! by a multi-byte character. The length in the table is a minimum, so we have ! to scan along to skip the extra characters. All opcodes are less than 128, ! so we can use relatively efficient code. */ } } --- 2126,2168 ---- } else { code += OP_lengths[c]; ! } ! } ! } ! ! ! ! /************************************************* ! * Scan compiled regex for recursion reference * ! *************************************************/ ! ! /* This little function scans through a compiled pattern until it finds an ! instance of OP_RECURSE. ! ! Arguments: ! code points to start of expression ! utf8 TRUE in UTF-8 mode ! ! Returns: pointer to the opcode for OP_RECURSE, or NULL if not found ! */ ! ! static const uschar * ! find_recurse(const uschar * code, BOOL utf8) ! { ! utf8 = utf8; /* Stop pedantic compilers complaining */ ! ! for (;;) { ! register int c = *code; ! if (c == OP_END) ! return NULL; ! else if (c == OP_RECURSE) ! return code; ! else if (c == OP_CHARS) ! code += code[1] + OP_lengths[c]; ! else if (c > OP_BRA) { ! code += OP_lengths[OP_BRA]; ! } else { ! code += OP_lengths[c]; } } *************** *** 2051,2062 **** switch (c) { /* Check for quantifiers after a class */ - case OP_CLASS: case OP_NCLASS: ccode = code + 33; - switch (*ccode) { case OP_CRSTAR: /* These could be empty; continue */ case OP_CRMINSTAR: --- 2223,2232 ---- *************** *** 2108,2113 **** --- 2278,2286 ---- case OP_ALT: return TRUE; + /* In UTF-8 mode, STAR, MINSTAR, QUERY, MINQUERY, UPTO, and MINUPTO may be + followed by a multibyte character */ + } } *************** *** 2213,2218 **** --- 2386,2426 ---- } + /************************************************* + * Adjust OP_RECURSE items in repeated group * + *************************************************/ + + /* OP_RECURSE items contain an offset from the start of the regex to the group + that is referenced. This means that groups can be replicated for fixed + repetition simply by copying (because the recursion is allowed to refer to + earlier groups that are outside the current group). However, when a group is + optional (i.e. the minimum quantifier is zero), OP_BRAZERO is inserted before + it, after it has been compiled. This means that any OP_RECURSE items within it + that refer to the group itself or any contained groups have to have their + offsets adjusted. That is the job of this function. Before it is called, the + partially compiled regex must be temporarily terminated with OP_END. + + Arguments: + group points to the start of the group + adjust the amount by which the group is to be moved + utf8 TRUE in UTF-8 mode + cd contains pointers to tables etc. + + Returns: nothing + */ + + static void + adjust_recurse(uschar * group, int adjust, BOOL utf8, compile_data * cd) + { + uschar *ptr = group; + while ((ptr = (uschar *) find_recurse(ptr, utf8)) != NULL) { + int offset = GET(ptr, 1); + if (cd->start_code + offset >= group) + PUT(ptr, 1, offset + adjust); + ptr += 1 + LINK_SIZE; + } + } + /************************************************* *************** *** 2470,2488 **** posix_class *= 3; for (i = 0; i < 3; i++) { ! BOOL isblank = strncmp((char *) ptr, "blank", 5) == 0; int taboffset = posix_class_maps[posix_class + i]; if (taboffset < 0) break; if (local_negate) { for (c = 0; c < 32; c++) class[c] |= ~cbits[c + taboffset]; ! if (isblank) class[1] |= 0x3c; } else { for (c = 0; c < 32; c++) class[c] |= cbits[c + taboffset]; ! if (isblank) class[1] &= ~0x3c; } } --- 2678,2696 ---- posix_class *= 3; for (i = 0; i < 3; i++) { ! BOOL blankclass = strncmp((char *) ptr, "blank", 5) == 0; int taboffset = posix_class_maps[posix_class + i]; if (taboffset < 0) break; if (local_negate) { for (c = 0; c < 32; c++) class[c] |= ~cbits[c + taboffset]; ! if (blankclass) class[1] |= 0x3c; } else { for (c = 0; c < 32; c++) class[c] |= cbits[c + taboffset]; ! if (blankclass) class[1] &= ~0x3c; } } *************** *** 2501,2507 **** character in them, so set class_charcount bigger than one. */ if (c == '\\') { ! c = check_escape(&ptr, errorptr, *brackets, options, TRUE, cd); if (-c == ESC_b) c = '\b'; /* \b is backslash in a class */ --- 2709,2715 ---- character in them, so set class_charcount bigger than one. */ if (c == '\\') { ! c = check_escape(&ptr, errorptr, *brackets, options, TRUE); if (-c == ESC_b) c = '\b'; /* \b is backslash in a class */ *************** *** 2584,2590 **** if (d == '\\') { const uschar *oldptr = ptr; ! d = check_escape(&ptr, errorptr, *brackets, options, TRUE, cd); /* \b is backslash; any other special means the '-' was literal */ --- 2792,2798 ---- if (d == '\\') { const uschar *oldptr = ptr; ! d = check_escape(&ptr, errorptr, *brackets, options, TRUE); /* \b is backslash; any other special means the '-' was literal */ *************** *** 2632,2637 **** --- 2840,2847 ---- LONE_SINGLE_CHARACTER: + /* Handle a multibyte character */ + /* Handle a single-byte character */ { class[c / 8] |= (1 << (c & 7)); *************** *** 2716,2724 **** /* Various kinds of repeat */ case '{': ! if (!is_counted_repeat(ptr + 1, cd)) goto NORMAL_CHAR; ! ptr = read_repeat_counts(ptr + 1, &repeat_min, &repeat_max, errorptr, cd); if (*errorptr != NULL) goto FAILED; goto REPEAT; --- 2926,2934 ---- /* Various kinds of repeat */ case '{': ! if (!is_counted_repeat(ptr + 1)) goto NORMAL_CHAR; ! ptr = read_repeat_counts(ptr + 1, &repeat_min, &repeat_max, errorptr); if (*errorptr != NULL) goto FAILED; goto REPEAT; *************** *** 2997,3005 **** } /* If the maximum is 1 or unlimited, we just have to stick in the ! BRAZERO and do no more at this point. */ if (repeat_max <= 1) { memmove(previous + 1, previous, len); code++; *previous++ = OP_BRAZERO + repeat_type; --- 3207,3220 ---- } /* If the maximum is 1 or unlimited, we just have to stick in the ! BRAZERO and do no more at this point. However, we do need to adjust ! any OP_RECURSE calls inside the group that refer to the group itself or ! any internal group, because the offset is from the start of the whole ! regex. Temporarily terminate the pattern while doing this. */ if (repeat_max <= 1) { + *code = OP_END; + adjust_recurse(previous, 1, utf8, cd); memmove(previous + 1, previous, len); code++; *previous++ = OP_BRAZERO + repeat_type; *************** *** 3009,3019 **** in a nested fashion, sticking OP_BRAZERO before each set of brackets. The first one has to be handled carefully because it's the original copy, which has to be moved up. The remainder can be handled by code ! that is common with the non-zero minimum case below. We just have to ! adjust the value or repeat_max, since one less copy is required. */ else { int offset; memmove(previous + 2 + LINK_SIZE, previous, len); code += 2 + LINK_SIZE; *previous++ = OP_BRAZERO + repeat_type; --- 3224,3237 ---- in a nested fashion, sticking OP_BRAZERO before each set of brackets. The first one has to be handled carefully because it's the original copy, which has to be moved up. The remainder can be handled by code ! that is common with the non-zero minimum case below. We have to ! adjust the value or repeat_max, since one less copy is required. Once ! again, we may have to adjust any OP_RECURSE calls inside the group. */ else { int offset; + *code = OP_END; + adjust_recurse(previous, 2 + LINK_SIZE, utf8, cd); memmove(previous + 2 + LINK_SIZE, previous, len); code += 2 + LINK_SIZE; *previous++ = OP_BRAZERO + repeat_type; *************** *** 3170,3178 **** ptr += 3; } ! /* Condition to test for a numbered subpattern match */ ! else if ((cd->ctypes[ptr[1]] & ctype_digit) != 0) { int condref; /* Don't amalgamate; some compilers */ condref = *(++ptr) - '0'; /* grumble at autoincrement in declaration */ while (*(++ptr) != ')') --- 3388,3398 ---- ptr += 3; } ! /* Condition to test for a numbered subpattern match. We know that ! if a digit follows ( then there will just be digits until ) because ! the syntax was checked in the first pass. */ ! else if ((digitab[ptr[1]] && ctype_digit) != 0) { int condref; /* Don't amalgamate; some compilers */ condref = *(++ptr) - '0'; /* grumble at autoincrement in declaration */ while (*(++ptr) != ')') *************** *** 3223,3229 **** *code++ = OP_CALLOUT; { int n = 0; ! while ((cd->ctypes[*(++ptr)] & ctype_digit) != 0) n = n * 10 + *ptr - '0'; if (n > 255) { *errorptr = ERR38; --- 3443,3449 ---- *code++ = OP_CALLOUT; { int n = 0; ! while ((digitab[*(++ptr)] & ctype_digit) != 0) n = n * 10 + *ptr - '0'; if (n > 255) { *errorptr = ERR38; *************** *** 3326,3333 **** { const uschar *called; recno = 0; ! ! while ((cd->ctypes[*ptr] & ctype_digit) != 0) recno = recno * 10 + *ptr++ - '0'; /* Come here from code above that handles a named recursion */ --- 3546,3552 ---- { const uschar *called; recno = 0; ! while ((digitab[*ptr] & ctype_digit) != 0) recno = recno * 10 + *ptr++ - '0'; /* Come here from code above that handles a named recursion */ *************** *** 3341,3347 **** *code = OP_END; called = (recno == 0) ? ! cd->start_code : find_bracket(cd->start_code, recno); if (called == NULL) { *errorptr = ERR15; --- 3560,3566 ---- *code = OP_END; called = (recno == 0) ? ! cd->start_code : find_bracket(cd->start_code, utf8, recno); if (called == NULL) { *errorptr = ERR15; *************** *** 3588,3594 **** case '\\': tempptr = ptr; ! c = check_escape(&ptr, errorptr, *brackets, options, FALSE, cd); /* Handle metacharacters introduced by \. For ones like \d, the ESC_ values are arranged to be the negation of the corresponding OP_values. For the --- 3807,3813 ---- case '\\': tempptr = ptr; ! c = check_escape(&ptr, errorptr, *brackets, options, FALSE); /* Handle metacharacters introduced by \. For ones like \d, the ESC_ values are arranged to be the negation of the corresponding OP_values. For the *************** *** 3682,3695 **** if (c == '\\') { tempptr = ptr; ! c = check_escape(&ptr, errorptr, *brackets, options, FALSE, cd); if (c < 0) { ptr = tempptr; break; } /* If a character is > 127 in UTF-8 mode, we have to turn it into ! two or more characters in the UTF-8 encoding. */ } --- 3901,3914 ---- if (c == '\\') { tempptr = ptr; ! c = check_escape(&ptr, errorptr, *brackets, options, FALSE); if (c < 0) { ptr = tempptr; break; } /* If a character is > 127 in UTF-8 mode, we have to turn it into ! two or more bytes in the UTF-8 encoding. */ } *************** *** 4186,4191 **** --- 4405,4413 ---- + + + /************************************************* * Compile a Regular Expression * *************************************************/ *************** *** 4204,4210 **** with errorptr and erroroffset set */ ! pcre * pcre_compile(const char *pattern, int options, const char **errorptr, int *erroroffset, const unsigned char *tables) { --- 4426,4432 ---- with errorptr and erroroffset set */ ! EXPORT pcre * pcre_compile(const char *pattern, int options, const char **errorptr, int *erroroffset, const unsigned char *tables) { *************** *** 4322,4330 **** case '\\': { const uschar *save_ptr = ptr; ! c = ! check_escape(&ptr, errorptr, bracount, options, FALSE, ! &compile_block); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; if (c >= 0) { --- 4544,4550 ---- case '\\': { const uschar *save_ptr = ptr; ! c = check_escape(&ptr, errorptr, bracount, options, FALSE); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; if (c >= 0) { *************** *** 4355,4363 **** if (refnum > compile_block.top_backref) compile_block.top_backref = refnum; length += 2; /* For single back reference */ ! if (ptr[1] == '{' && is_counted_repeat(ptr + 2, &compile_block)) { ! ptr = ! read_repeat_counts(ptr + 2, &min, &max, errorptr, &compile_block); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; if ((min == 0 && (max == 1 || max == -1)) || (min == 1 && max == -1)) --- 4575,4582 ---- if (refnum > compile_block.top_backref) compile_block.top_backref = refnum; length += 2; /* For single back reference */ ! if (ptr[1] == '{' && is_counted_repeat(ptr + 2)) { ! ptr = read_repeat_counts(ptr + 2, &min, &max, errorptr); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; if ((min == 0 && (max == 1 || max == -1)) || (min == 1 && max == -1)) *************** *** 4386,4394 **** class, or back reference. */ case '{': ! if (!is_counted_repeat(ptr + 1, &compile_block)) goto NORMAL_CHAR; ! ptr = read_repeat_counts(ptr + 1, &min, &max, errorptr, &compile_block); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; --- 4605,4613 ---- class, or back reference. */ case '{': ! if (!is_counted_repeat(ptr + 1)) goto NORMAL_CHAR; ! ptr = read_repeat_counts(ptr + 1, &min, &max, errorptr); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; *************** *** 4443,4448 **** --- 4662,4668 ---- case '[': class_optcount = 0; + if (*(++ptr) == '^') ptr++; *************** *** 4463,4470 **** /* Outside \Q...\E, check for escapes */ if (*ptr == '\\') { ! int ch = check_escape(&ptr, errorptr, bracount, options, TRUE, ! &compile_block); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; --- 4683,4689 ---- /* Outside \Q...\E, check for escapes */ if (*ptr == '\\') { ! int ch = check_escape(&ptr, errorptr, bracount, options, TRUE); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; *************** *** 4521,4539 **** else { length += 33; ! /* A repeat needs either 1 or 5 bytes. */ ! if (*ptr != 0 && ptr[1] == '{' ! && is_counted_repeat(ptr + 2, &compile_block)) { ! ptr = ! read_repeat_counts(ptr + 2, &min, &max, errorptr, &compile_block); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; if ((min == 0 && (max == 1 || max == -1)) || (min == 1 && max == -1)) length++; else length += 5; ! if (ptr[1] == '?') ptr++; } } --- 4740,4760 ---- else { length += 33; ! /* A repeat needs either 1 or 5 bytes. If it is a possessive quantifier, ! we also need extra for wrapping the whole thing in a sub-pattern. */ ! if (*ptr != 0 && ptr[1] == '{' && is_counted_repeat(ptr + 2)) { ! ptr = read_repeat_counts(ptr + 2, &min, &max, errorptr); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; if ((min == 0 && (max == 1 || max == -1)) || (min == 1 && max == -1)) length++; else length += 5; ! if (ptr[1] == '+') { ! ptr++; ! length += 2 + 2 * LINK_SIZE; ! } else if (ptr[1] == '?') ptr++; } } *************** *** 4598,4604 **** case '9': ptr += 2; if (c != 'R') ! while ((compile_block.ctypes[*(++ptr)] & ctype_digit) != 0) ; if (*ptr != ')') { *errorptr = ERR29; goto PCRE_ERROR_RETURN; --- 4819,4825 ---- case '9': ptr += 2; if (c != 'R') ! while ((digitab[*(++ptr)] & ctype_digit) != 0) ; if (*ptr != ')') { *errorptr = ERR29; goto PCRE_ERROR_RETURN; *************** *** 4622,4628 **** case 'C': ptr += 2; ! while ((compile_block.ctypes[*(++ptr)] & ctype_digit) != 0) ; if (*ptr != ')') { *errorptr = ERR39; goto PCRE_ERROR_RETURN; --- 4843,4849 ---- case 'C': ptr += 2; ! while ((digitab[*(++ptr)] & ctype_digit) != 0) ; if (*ptr != ')') { *errorptr = ERR39; goto PCRE_ERROR_RETURN; *************** *** 4683,4692 **** if (ptr[3] == 'R' && ptr[4] == ')') { ptr += 4; length += 3; ! } else if ((compile_block.ctypes[ptr[3]] & ctype_digit) != 0) { ptr += 4; length += 3; ! while ((compile_block.ctypes[*ptr] & ctype_digit) != 0) ptr++; if (*ptr != ')') { *errorptr = ERR26; --- 4904,4913 ---- if (ptr[3] == 'R' && ptr[4] == ')') { ptr += 4; length += 3; ! } else if ((digitab[ptr[3]] & ctype_digit) != 0) { ptr += 4; length += 3; ! while ((digitab[*ptr] & ctype_digit) != 0) ptr++; if (*ptr != ')') { *errorptr = ERR26; *************** *** 4869,4876 **** /* Leave ptr at the final char; for read_repeat_counts this happens automatically; for the others we need an increment. */ ! if ((c = ptr[1]) == '{' && is_counted_repeat(ptr + 2, &compile_block)) { ! ptr = read_repeat_counts(ptr + 2, &min, &max, errorptr, &compile_block); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; } else if (c == '*') { --- 5090,5097 ---- /* Leave ptr at the final char; for read_repeat_counts this happens automatically; for the others we need an increment. */ ! if ((c = ptr[1]) == '{' && is_counted_repeat(ptr + 2)) { ! ptr = read_repeat_counts(ptr + 2, &min, &max, errorptr); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; } else if (c == '*') { *************** *** 4961,4968 **** if (c == '\\') { const uschar *saveptr = ptr; ! c = check_escape(&ptr, errorptr, bracount, options, FALSE, ! &compile_block); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; if (c < 0) { --- 5182,5188 ---- if (c == '\\') { const uschar *saveptr = ptr; ! c = check_escape(&ptr, errorptr, bracount, options, FALSE); if (*errorptr != NULL) goto PCRE_ERROR_RETURN; if (c < 0) { *************** *** 5013,5019 **** externally provided function. */ size = length + sizeof(real_pcre) + name_count * (max_name_size + 3); ! re = (real_pcre *) (malloc) (size); if (re == NULL) { *errorptr = ERR21; --- 5233,5239 ---- externally provided function. */ size = length + sizeof(real_pcre) + name_count * (max_name_size + 3); ! re = (real_pcre *) malloc(size); if (re == NULL) { *errorptr = ERR21; *************** *** 5075,5081 **** /* Failed to compile, or error while post-processing */ if (*errorptr != NULL) { ! (free) (re); PCRE_ERROR_RETURN: *erroroffset = ptr - (const uschar *) pattern; return NULL; --- 5295,5301 ---- /* Failed to compile, or error while post-processing */ if (*errorptr != NULL) { ! free(re); PCRE_ERROR_RETURN: *erroroffset = ptr - (const uschar *) pattern; return NULL; *************** *** 5121,5126 **** --- 5341,5349 ---- re->options |= PCRE_REQCHSET; } + /* Print out the compiled data for debugging */ + + return (pcre *) re; } *************** *** 5149,5154 **** --- 5372,5378 ---- { const uschar *p = md->start_subject + md->offset_vector[offset]; + /* Always fail if not enough characters left */ if (length > md->end_subject - eptr) *************** *** 5170,5175 **** --- 5394,5543 ---- } + + + /*************************************************************************** + **************************************************************************** + RECURSION IN THE match() FUNCTION + + The match() function is highly recursive. Some regular expressions can cause + it to recurse thousands of times. I was writing for Unix, so I just let it + call itself recursively. This uses the stack for saving everything that has + to be saved for a recursive call. On Unix, the stack can be large, and this + works fine. + + It turns out that on non-Unix systems there are problems with programs that + use a lot of stack. (This despite the fact that every last chip has oodles + of memory these days, and techniques for extending the stack have been known + for decades.) So.... + + There is a fudge, triggered by defining NO_RECURSE, which avoids recursive + calls by keeping local variables that need to be preserved in blocks of memory + obtained from malloc instead instead of on the stack. Macros are used to + achieve this so that the actual code doesn't look very different to what it + always used to. + **************************************************************************** + ***************************************************************************/ + + + /* These versions of the macros use the stack, as normal */ + + #ifndef NO_RECURSE + #define REGISTER register + #define RMATCH(rx,ra,rb,rc,rd,re,rf,rg) rx = match(ra,rb,rc,rd,re,rf,rg) + #define RRETURN(ra) return ra + #else + + + /* These versions of the macros manage a private stack on the heap. Note + that the rd argument of RMATCH isn't actually used. It's the md argument of + match(), which never actually changes. */ + + #define REGISTER + + #define RMATCH(rx,ra,rb,rc,rd,re,rf,rg)\ + {\ + heapframe *newframe = (pcre_stack_malloc)(sizeof(heapframe));\ + if (setjmp(frame->Xwhere) == 0)\ + {\ + newframe->Xeptr = ra;\ + newframe->Xecode = rb;\ + newframe->Xoffset_top = rc;\ + newframe->Xims = re;\ + newframe->Xeptrb = rf;\ + newframe->Xflags = rg;\ + newframe->Xprevframe = frame;\ + frame = newframe;\ + DPRINTF(("restarting from line %d\n", __LINE__));\ + goto HEAP_RECURSE;\ + }\ + else\ + {\ + DPRINTF(("longjumped back to line %d\n", __LINE__));\ + frame = md->thisframe;\ + rx = frame->Xresult;\ + }\ + } + + #define RRETURN(ra)\ + {\ + heapframe *newframe = frame;\ + frame = newframe->Xprevframe;\ + (pcre_stack_free)(newframe);\ + if (frame != NULL)\ + {\ + frame->Xresult = ra;\ + md->thisframe = frame;\ + longjmp(frame->Xwhere, 1);\ + }\ + return ra;\ + } + + + /* Structure for remembering the local variables in a private frame */ + + typedef struct heapframe { + struct heapframe *Xprevframe; + + /* Function arguments that may change */ + + const uschar *Xeptr; + const uschar *Xecode; + int Xoffset_top; + long int Xims; + eptrblock *Xeptrb; + int Xflags; + + /* Function local variables */ + + const uschar *Xcallpat; + const uschar *Xcharptr; + const uschar *Xdata; + const uschar *Xlastptr; + const uschar *Xnext; + const uschar *Xpp; + const uschar *Xprev; + const uschar *Xsaved_eptr; + + recursion_info Xnew_recursive; + + BOOL Xcur_is_word; + BOOL Xcondition; + BOOL Xminimize; + BOOL Xprev_is_word; + + unsigned long int Xoriginal_ims; + + int Xctype; + int Xfc; + int Xfi; + int Xlength; + int Xmax; + int Xmin; + int Xnumber; + int Xoffset; + int Xop; + int Xsave_capture_last; + int Xsave_offset1, Xsave_offset2, Xsave_offset3; + int Xstacksave[REC_STACK_SAVE_MAX]; + + eptrblock Xnewptrb; + + /* Place to pass back result, and where to jump back to */ + + int Xresult; + jmp_buf Xwhere; + + } heapframe; + + #endif + + + /*************************************************************************** + ***************************************************************************/ + + + /************************************************* * Match from current position * *************************************************/ *************** *** 5205,5240 **** */ static int ! match(register const uschar * eptr, register const uschar * ecode, int offset_top, match_data * md, unsigned long int ims, eptrblock * eptrb, int flags) { ! unsigned long int original_ims = ims; /* Save for resetting on ')' */ ! register int rrc; eptrblock newptrb; if (md->match_call_count++ >= md->match_limit) ! return PCRE_ERROR_MATCHLIMIT; /* At the start of a bracketed group, add the current subject pointer to the stack of such pointers, to be re-instated at the end of the group when we hit the closing ket. When match() is called in other circumstances, we don't add to ! the stack. */ if ((flags & match_isgroup) != 0) { ! newptrb.prev = eptrb; ! newptrb.saved_eptr = eptr; eptrb = &newptrb; } /* Now start processing the operations. */ for (;;) { ! int op = (int) *ecode; ! int min, max, ctype; ! register int i; ! register int c; ! BOOL minimize = FALSE; /* Opening capturing bracket. If there is space in the offset vector, save the current subject position in the working slot at the top of the vector. We --- 5573,5727 ---- */ static int ! match(REGISTER const uschar * eptr, REGISTER const uschar * ecode, int offset_top, match_data * md, unsigned long int ims, eptrblock * eptrb, int flags) { ! /* These variables do not need to be preserved over recursion in this function, ! so they can be ordinary variables in all cases. Mark them with "register" ! because they are used a lot in loops. */ ! ! register int rrc; /* Returns from recursive calls */ ! register int i; /* Used for loops not involving calls to RMATCH() */ ! register int c; /* Character values not kept over RMATCH() calls */ ! ! /* When recursion is not being used, all "local" variables that have to be ! preserved over calls to RMATCH() are part of a "frame" which is obtained from ! heap storage. Set up the top-level frame here; others are obtained from the ! heap whenever RMATCH() does a "recursion". See the macro definitions above. */ ! ! #ifdef NO_RECURSE ! heapframe *frame = (pcre_stack_malloc) (sizeof(heapframe)); ! frame->Xprevframe = NULL; /* Marks the top level */ ! ! /* Copy in the original argument variables */ ! ! frame->Xeptr = eptr; ! frame->Xecode = ecode; ! frame->Xoffset_top = offset_top; ! frame->Xims = ims; ! frame->Xeptrb = eptrb; ! frame->Xflags = flags; ! ! /* This is where control jumps back to to effect "recursion" */ ! ! HEAP_RECURSE: ! ! /* Macros make the argument variables come from the current frame */ ! ! #define eptr frame->Xeptr ! #define ecode frame->Xecode ! #define offset_top frame->Xoffset_top ! #define ims frame->Xims ! #define eptrb frame->Xeptrb ! #define flags frame->Xflags ! ! /* Ditto for the local variables */ ! ! #define callpat frame->Xcallpat ! #define charptr frame->Xcharptr ! #define data frame->Xdata ! #define lastptr frame->Xlastptr ! #define next frame->Xnext ! #define pp frame->Xpp ! #define prev frame->Xprev ! #define saved_eptr frame->Xsaved_eptr ! ! #define new_recursive frame->Xnew_recursive ! ! #define cur_is_word frame->Xcur_is_word ! #define condition frame->Xcondition ! #define minimize frame->Xminimize ! #define prev_is_word frame->Xprev_is_word ! ! #define original_ims frame->Xoriginal_ims ! ! #define ctype frame->Xctype ! #define fc frame->Xfc ! #define fi frame->Xfi ! #define length frame->Xlength ! #define max frame->Xmax ! #define min frame->Xmin ! #define number frame->Xnumber ! #define offset frame->Xoffset ! #define op frame->Xop ! #define save_capture_last frame->Xsave_capture_last ! #define save_offset1 frame->Xsave_offset1 ! #define save_offset2 frame->Xsave_offset2 ! #define save_offset3 frame->Xsave_offset3 ! #define stacksave frame->Xstacksave ! ! #define newptrb frame->Xnewptrb ! ! /* When recursion is being used, local variables are allocated on the stack and ! get preserved during recursion in the normal way. In this environment, fi and ! i, and fc and c, can be the same variables. */ ! ! #else ! #define fi i ! #define fc c ! ! const uschar *callpat; /* Many of these variables are used ony */ ! const uschar *charptr; /* small blocks of the code. My normal */ ! const uschar *data; /* style of coding would have declared */ ! const uschar *lastptr; /* them within each of those blocks. */ ! const uschar *next; /* However, in order to accommodate the */ ! const uschar *pp; /* version of this code that uses an */ ! const uschar *prev; /* external "stack" implemented on the */ ! const uschar *saved_eptr; /* heap, it is easier to declare them */ ! /* all here, so the declarations can */ ! recursion_info new_recursive; /* be cut out in a block. The only */ ! /* declarations within blocks below are */ ! BOOL cur_is_word; /* for variables that do not have to */ ! BOOL condition; /* be preserved over a recursive call */ ! BOOL minimize; /* to RMATCH(). */ ! BOOL prev_is_word; ! ! unsigned long int original_ims; ! ! int ctype; ! int length; ! int max; ! int min; ! int number; ! int offset; ! int op; ! int save_capture_last; ! int save_offset1, save_offset2, save_offset3; ! int stacksave[REC_STACK_SAVE_MAX]; ! eptrblock newptrb; + #endif + + + /* OK, now we can get on with the real code of the function. Recursion is + specified by the macros RMATCH and RRETURN. When NO_RECURSE is *not* defined, + these just turn into a recursive call to match() and a "return", respectively. + However, RMATCH isn't like a function call because it's quite a complicated + macro. It has to be used in one particular way. This shouldn't, however, impact + performance when true recursion is being used. */ if (md->match_call_count++ >= md->match_limit) ! RRETURN(PCRE_ERROR_MATCHLIMIT); ! ! original_ims = ims; /* Save for resetting on ')' */ /* At the start of a bracketed group, add the current subject pointer to the stack of such pointers, to be re-instated at the end of the group when we hit the closing ket. When match() is called in other circumstances, we don't add to ! this stack. */ if ((flags & match_isgroup) != 0) { ! newptrb.epb_prev = eptrb; ! newptrb.epb_saved_eptr = eptr; eptrb = &newptrb; } /* Now start processing the operations. */ for (;;) { ! op = *ecode; ! minimize = FALSE; /* Opening capturing bracket. If there is space in the offset vector, save the current subject position in the working slot at the top of the vector. We *************** *** 5251,5258 **** here; that is handled in the code for KET. */ if (op > OP_BRA) { ! int offset; ! int number = op - OP_BRA; /* For extended extraction brackets (large number), we have to fish out the number from a dummy opcode at the start. */ --- 5738,5744 ---- here; that is handled in the code for KET. */ if (op > OP_BRA) { ! number = op - OP_BRA; /* For extended extraction brackets (large number), we have to fish out the number from a dummy opcode at the start. */ *************** *** 5262,5280 **** offset = number << 1; if (offset < md->offset_max) { ! int save_offset1 = md->offset_vector[offset]; ! int save_offset2 = md->offset_vector[offset + 1]; ! int save_offset3 = md->offset_vector[md->offset_end - number]; ! int save_capture_last = md->capture_last; DPRINTF(("saving %d %d %d\n", save_offset1, save_offset2, save_offset3)); md->offset_vector[md->offset_end - number] = eptr - md->start_subject; do { ! if ((rrc = match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, ! eptrb, match_isgroup)) != MATCH_NOMATCH) ! return rrc; md->capture_last = save_capture_last; ecode += GET(ecode, 1); } --- 5748,5767 ---- offset = number << 1; if (offset < md->offset_max) { ! save_offset1 = md->offset_vector[offset]; ! save_offset2 = md->offset_vector[offset + 1]; ! save_offset3 = md->offset_vector[md->offset_end - number]; ! save_capture_last = md->capture_last; DPRINTF(("saving %d %d %d\n", save_offset1, save_offset2, save_offset3)); md->offset_vector[md->offset_end - number] = eptr - md->start_subject; do { ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! match_isgroup); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); md->capture_last = save_capture_last; ecode += GET(ecode, 1); } *************** *** 5286,5292 **** md->offset_vector[offset + 1] = save_offset2; md->offset_vector[md->offset_end - number] = save_offset3; ! return MATCH_NOMATCH; } /* Insufficient room for saving captured contents */ --- 5773,5779 ---- md->offset_vector[offset + 1] = save_offset2; md->offset_vector[md->offset_end - number] = save_offset3; ! RRETURN(MATCH_NOMATCH); } /* Insufficient room for saving captured contents */ *************** *** 5301,5315 **** case OP_BRA: /* Non-capturing bracket: optimized */ DPRINTF(("start bracket 0\n")); do { ! if ((rrc = ! match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! match_isgroup)) != MATCH_NOMATCH) ! return rrc; ecode += GET(ecode, 1); } while (*ecode == OP_ALT); DPRINTF(("bracket 0 failed\n")); ! return MATCH_NOMATCH; /* Conditional group: compilation checked that there are no more than two branches. If the condition is false, skipping the first branch takes us --- 5788,5802 ---- case OP_BRA: /* Non-capturing bracket: optimized */ DPRINTF(("start bracket 0\n")); do { ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! match_isgroup); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ecode += GET(ecode, 1); } while (*ecode == OP_ALT); DPRINTF(("bracket 0 failed\n")); ! RRETURN(MATCH_NOMATCH); /* Conditional group: compilation checked that there are no more than two branches. If the condition is false, skipping the first branch takes us *************** *** 5318,5348 **** case OP_COND: if (ecode[LINK_SIZE + 1] == OP_CREF) { /* Condition extract or recurse test */ ! int offset = GET2(ecode, LINK_SIZE + 2) << 1; /* Doubled ref number */ ! BOOL condition = (offset == CREF_RECURSE * 2) ? (md->recursive != NULL) : (offset < offset_top && md->offset_vector[offset] >= 0); ! return match(eptr, ecode + (condition ? ! (LINK_SIZE + 4) : (LINK_SIZE + 1 + ! GET(ecode, 1))), ! offset_top, md, ims, eptrb, match_isgroup); } /* The condition is an assertion. Call match() to evaluate it - setting the final argument TRUE causes it to stop at the end of an assertion. */ else { ! if ((rrc = match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, NULL, ! match_condassert | match_isgroup)) == MATCH_MATCH) { ecode += 1 + LINK_SIZE + GET(ecode, LINK_SIZE + 2); while (*ecode == OP_ALT) ecode += GET(ecode, 1); ! } else if (rrc != MATCH_NOMATCH) ! return rrc; ! else ecode += GET(ecode, 1); ! return match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! match_isgroup); } /* Control never reaches here */ --- 5805,5838 ---- case OP_COND: if (ecode[LINK_SIZE + 1] == OP_CREF) { /* Condition extract or recurse test */ ! offset = GET2(ecode, LINK_SIZE + 2) << 1; /* Doubled ref number */ ! condition = (offset == CREF_RECURSE * 2) ? (md->recursive != NULL) : (offset < offset_top && md->offset_vector[offset] >= 0); ! RMATCH(rrc, eptr, ecode + (condition ? ! (LINK_SIZE + 4) : (LINK_SIZE + 1 + ! GET(ecode, 1))), ! offset_top, md, ims, eptrb, match_isgroup); ! RRETURN(rrc); } /* The condition is an assertion. Call match() to evaluate it - setting the final argument TRUE causes it to stop at the end of an assertion. */ else { ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, NULL, ! match_condassert | match_isgroup); ! if (rrc == MATCH_MATCH) { ecode += 1 + LINK_SIZE + GET(ecode, LINK_SIZE + 2); while (*ecode == OP_ALT) ecode += GET(ecode, 1); ! } else if (rrc != MATCH_NOMATCH) { ! RRETURN(rrc); /* Need braces because of following else */ ! } else ecode += GET(ecode, 1); ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! match_isgroup); ! RRETURN(rrc); } /* Control never reaches here */ *************** *** 5361,5367 **** if (md->recursive != NULL && md->recursive->group_num == 0) { recursion_info *rec = md->recursive; DPRINTF(("Hit the end in a (?0) recursion\n")); ! md->recursive = rec->prev; memmove(md->offset_vector, rec->offset_save, rec->saved_max * sizeof(int)); md->start_match = rec->save_start; --- 5851,5857 ---- if (md->recursive != NULL && md->recursive->group_num == 0) { recursion_info *rec = md->recursive; DPRINTF(("Hit the end in a (?0) recursion\n")); ! md->recursive = rec->prevrec; memmove(md->offset_vector, rec->offset_save, rec->saved_max * sizeof(int)); md->start_match = rec->save_start; *************** *** 5374,5383 **** string - backtracking will then try other alternatives, if any. */ if (md->notempty && eptr == md->start_match) ! return MATCH_NOMATCH; md->end_match_ptr = eptr; /* Record where we ended */ md->end_offset_top = offset_top; /* and how many extracts were taken */ ! return MATCH_MATCH; /* Change option settings */ --- 5864,5873 ---- string - backtracking will then try other alternatives, if any. */ if (md->notempty && eptr == md->start_match) ! RRETURN(MATCH_NOMATCH); md->end_match_ptr = eptr; /* Record where we ended */ md->end_offset_top = offset_top; /* and how many extracts were taken */ ! RRETURN(MATCH_MATCH); /* Change option settings */ *************** *** 5396,5416 **** case OP_ASSERT: case OP_ASSERTBACK: do { ! if ((rrc = match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, NULL, ! match_isgroup)) == MATCH_MATCH) break; if (rrc != MATCH_NOMATCH) ! return rrc; ecode += GET(ecode, 1); } while (*ecode == OP_ALT); if (*ecode == OP_KET) ! return MATCH_NOMATCH; /* If checking an assertion for a condition, return MATCH_MATCH. */ if ((flags & match_condassert) != 0) ! return MATCH_MATCH; /* Continue from after the assertion, updating the offsets high water mark, since extracts may have been taken during the assertion. */ --- 5886,5907 ---- case OP_ASSERT: case OP_ASSERTBACK: do { ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, NULL, ! match_isgroup); ! if (rrc == MATCH_MATCH) break; if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ecode += GET(ecode, 1); } while (*ecode == OP_ALT); if (*ecode == OP_KET) ! RRETURN(MATCH_NOMATCH); /* If checking an assertion for a condition, return MATCH_MATCH. */ if ((flags & match_condassert) != 0) ! RRETURN(MATCH_MATCH); /* Continue from after the assertion, updating the offsets high water mark, since extracts may have been taken during the assertion. */ *************** *** 5427,5443 **** case OP_ASSERT_NOT: case OP_ASSERTBACK_NOT: do { ! if ((rrc = match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, NULL, ! match_isgroup)) == MATCH_MATCH) ! return MATCH_NOMATCH; if (rrc != MATCH_NOMATCH) ! return rrc; ecode += GET(ecode, 1); } while (*ecode == OP_ALT); if ((flags & match_condassert) != 0) ! return MATCH_MATCH; ecode += 1 + LINK_SIZE; continue; --- 5918,5935 ---- case OP_ASSERT_NOT: case OP_ASSERTBACK_NOT: do { ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, NULL, ! match_isgroup); ! if (rrc == MATCH_MATCH) ! RRETURN(MATCH_NOMATCH); if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ecode += GET(ecode, 1); } while (*ecode == OP_ALT); if ((flags & match_condassert) != 0) ! RRETURN(MATCH_MATCH); ecode += 1 + LINK_SIZE; continue; *************** *** 5448,5457 **** back a number of characters, not bytes. */ case OP_REVERSE: - eptr -= GET(ecode, 1); ! if (eptr < md->start_subject) ! return MATCH_NOMATCH; ecode += 1 + LINK_SIZE; break; --- 5940,5956 ---- back a number of characters, not bytes. */ case OP_REVERSE: ! /* No UTF-8 support, or not in UTF-8 mode: count is byte count */ ! ! { ! eptr -= GET(ecode, 1); ! if (eptr < md->start_subject) ! RRETURN(MATCH_NOMATCH); ! } ! ! /* Skip to next op code */ ! ecode += 1 + LINK_SIZE; break; *************** *** 5473,5489 **** cb.capture_last = md->capture_last; cb.callout_data = md->callout_data; if ((rrc = (*pcre_callout) (&cb)) > 0) ! return MATCH_NOMATCH; if (rrc < 0) ! return rrc; } ecode += 2; break; /* Recursion either matches the current regex, or some subexpression. The offset data is the offset to the starting bracket from the start of the ! whole pattern. However, it is possible that a BRAZERO was inserted before ! this bracket after we took the offset - we just skip it if encountered. If there are any capturing brackets started but not finished, we have to save their starting points and reinstate them after the recursion. However, --- 5972,5987 ---- cb.capture_last = md->capture_last; cb.callout_data = md->callout_data; if ((rrc = (*pcre_callout) (&cb)) > 0) ! RRETURN(MATCH_NOMATCH); if (rrc < 0) ! RRETURN(rrc); } ecode += 2; break; /* Recursion either matches the current regex, or some subexpression. The offset data is the offset to the starting bracket from the start of the ! whole pattern. (This is so that it works from duplicated subpatterns.) If there are any capturing brackets started but not finished, we have to save their starting points and reinstate them after the recursion. However, *************** *** 5502,5514 **** case OP_RECURSE: { ! int stacksave[REC_STACK_SAVE_MAX]; ! recursion_info new_recursive; ! const uschar *callpat = md->start_code + GET(ecode, 1); ! ! if (*callpat == OP_BRAZERO) ! callpat++; ! new_recursive.group_num = *callpat - OP_BRA; /* For extended extraction brackets (large number), we have to fish out --- 6000,6006 ---- case OP_RECURSE: { ! callpat = md->start_code + GET(ecode, 1); new_recursive.group_num = *callpat - OP_BRA; /* For extended extraction brackets (large number), we have to fish out *************** *** 5519,5525 **** /* Add to "recursing stack" */ ! new_recursive.prev = md->recursive; md->recursive = &new_recursive; /* Find where to continue from afterwards */ --- 6011,6017 ---- /* Add to "recursing stack" */ ! new_recursive.prevrec = md->recursive; md->recursive = &new_recursive; /* Find where to continue from afterwards */ *************** *** 5534,5542 **** new_recursive.offset_save = stacksave; else { new_recursive.offset_save = ! (int *) (malloc) (new_recursive.saved_max * sizeof(int)); if (new_recursive.offset_save == NULL) ! return PCRE_ERROR_NOMEMORY; } memcpy(new_recursive.offset_save, md->offset_vector, --- 6026,6034 ---- new_recursive.offset_save = stacksave; else { new_recursive.offset_save = ! (int *) malloc(new_recursive.saved_max * sizeof(int)); if (new_recursive.offset_save == NULL) ! RRETURN(PCRE_ERROR_NOMEMORY); } memcpy(new_recursive.offset_save, md->offset_vector, *************** *** 5549,5562 **** DPRINTF(("Recursing into group %d\n", new_recursive.group_num)); do { ! if ((rrc = match(eptr, callpat + 1 + LINK_SIZE, offset_top, md, ims, ! eptrb, match_isgroup)) == MATCH_MATCH) { ! md->recursive = new_recursive.prev; if (new_recursive.offset_save != stacksave) ! (free) (new_recursive.offset_save); ! return MATCH_MATCH; } else if (rrc != MATCH_NOMATCH) ! return rrc; md->recursive = &new_recursive; memcpy(md->offset_vector, new_recursive.offset_save, --- 6041,6055 ---- DPRINTF(("Recursing into group %d\n", new_recursive.group_num)); do { ! RMATCH(rrc, eptr, callpat + 1 + LINK_SIZE, offset_top, md, ims, ! eptrb, match_isgroup); ! if (rrc == MATCH_MATCH) { ! md->recursive = new_recursive.prevrec; if (new_recursive.offset_save != stacksave) ! free(new_recursive.offset_save); ! RRETURN(MATCH_MATCH); } else if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); md->recursive = &new_recursive; memcpy(md->offset_vector, new_recursive.offset_save, *************** *** 5566,5575 **** while (*callpat == OP_ALT); DPRINTF(("Recursion didn't match\n")); ! md->recursive = new_recursive.prev; if (new_recursive.offset_save != stacksave) ! (free) (new_recursive.offset_save); ! return MATCH_NOMATCH; } /* Control never reaches here */ --- 6059,6068 ---- while (*callpat == OP_ALT); DPRINTF(("Recursion didn't match\n")); ! md->recursive = new_recursive.prevrec; if (new_recursive.offset_save != stacksave) ! free(new_recursive.offset_save); ! RRETURN(MATCH_NOMATCH); } /* Control never reaches here */ *************** *** 5582,5596 **** case OP_ONCE: { ! const uschar *prev = ecode; ! const uschar *saved_eptr = eptr; do { ! if ((rrc = match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, ! eptrb, match_isgroup)) == MATCH_MATCH) break; if (rrc != MATCH_NOMATCH) ! return rrc; ecode += GET(ecode, 1); } while (*ecode == OP_ALT); --- 6075,6090 ---- case OP_ONCE: { ! prev = ecode; ! saved_eptr = eptr; do { ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, ! eptrb, match_isgroup); ! if (rrc == MATCH_MATCH) break; if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ecode += GET(ecode, 1); } while (*ecode == OP_ALT); *************** *** 5598,5604 **** /* If hit the end of the group (which could be repeated), fail */ if (*ecode != OP_ONCE && *ecode != OP_ALT) ! return MATCH_NOMATCH; /* Continue as from after the assertion, updating the offsets high water mark, since extracts may have been taken. */ --- 6092,6098 ---- /* If hit the end of the group (which could be repeated), fail */ if (*ecode != OP_ONCE && *ecode != OP_ALT) ! RRETURN(MATCH_NOMATCH); /* Continue as from after the assertion, updating the offsets high water mark, since extracts may have been taken. */ *************** *** 5632,5655 **** } if (*ecode == OP_KETRMIN) { ! if ((rrc = match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, ! eptrb, 0)) != MATCH_NOMATCH) ! return rrc; ! if ((rrc = match(eptr, prev, offset_top, md, ims, eptrb, ! match_isgroup)) != MATCH_NOMATCH) ! return rrc; } else { /* OP_KETRMAX */ ! if ((rrc = match(eptr, prev, offset_top, md, ims, eptrb, ! match_isgroup)) != MATCH_NOMATCH) ! return rrc; ! if ((rrc = ! match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! 0)) != MATCH_NOMATCH) ! return rrc; } } ! return MATCH_NOMATCH; /* An alternation is the end of a branch; scan along to find the end of the bracketed group and go to there. */ --- 6126,6150 ---- } if (*ecode == OP_KETRMIN) { ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! RMATCH(rrc, eptr, prev, offset_top, md, ims, eptrb, match_isgroup); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); } else { /* OP_KETRMAX */ ! RMATCH(rrc, eptr, prev, offset_top, md, ims, eptrb, match_isgroup); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); } } ! RRETURN(MATCH_NOMATCH); /* An alternation is the end of a branch; scan along to find the end of the bracketed group and go to there. */ *************** *** 5668,5677 **** case OP_BRAZERO: { ! const uschar *next = ecode + 1; ! if ((rrc = match(eptr, next, offset_top, md, ims, eptrb, match_isgroup)) ! != MATCH_NOMATCH) ! return rrc; do next += GET(next, 1); while (*next == OP_ALT); --- 6163,6172 ---- case OP_BRAZERO: { ! next = ecode + 1; ! RMATCH(rrc, eptr, next, offset_top, md, ims, eptrb, match_isgroup); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); do next += GET(next, 1); while (*next == OP_ALT); *************** *** 5681,5693 **** case OP_BRAMINZERO: { ! const uschar *next = ecode + 1; do next += GET(next, 1); while (*next == OP_ALT); ! if ((rrc = match(eptr, next + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! match_isgroup)) != MATCH_NOMATCH) ! return rrc; ecode++; } break; --- 6176,6189 ---- case OP_BRAMINZERO: { ! next = ecode + 1; do next += GET(next, 1); while (*next == OP_ALT); ! RMATCH(rrc, eptr, next + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! match_isgroup); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ecode++; } break; *************** *** 5701,5717 **** case OP_KETRMIN: case OP_KETRMAX: { ! const uschar *prev = ecode - GET(ecode, 1); ! const uschar *saved_eptr = eptrb->saved_eptr; ! eptrb = eptrb->prev; /* Back up the stack of bracket start pointers */ if (*prev == OP_ASSERT || *prev == OP_ASSERT_NOT || *prev == OP_ASSERTBACK || *prev == OP_ASSERTBACK_NOT || *prev == OP_ONCE) { md->end_match_ptr = eptr; /* For ONCE */ md->end_offset_top = offset_top; ! return MATCH_MATCH; } /* In all other cases except a conditional group we have to check the --- 6197,6215 ---- case OP_KETRMIN: case OP_KETRMAX: { ! prev = ecode - GET(ecode, 1); ! saved_eptr = eptrb->epb_saved_eptr; ! ! /* Back up the stack of bracket start pointers. */ ! eptrb = eptrb->epb_prev; if (*prev == OP_ASSERT || *prev == OP_ASSERT_NOT || *prev == OP_ASSERTBACK || *prev == OP_ASSERTBACK_NOT || *prev == OP_ONCE) { md->end_match_ptr = eptr; /* For ONCE */ md->end_offset_top = offset_top; ! RRETURN(MATCH_MATCH); } /* In all other cases except a conditional group we have to check the *************** *** 5719,5726 **** extraction by setting the offsets and bumping the high water mark. */ if (*prev != OP_COND) { ! int offset; ! int number = *prev - OP_BRA; /* For extended extraction brackets (large number), we have to fish out the number from a dummy opcode at the start. */ --- 6217,6223 ---- extraction by setting the offsets and bumping the high water mark. */ if (*prev != OP_COND) { ! number = *prev - OP_BRA; /* For extended extraction brackets (large number), we have to fish out the number from a dummy opcode at the start. */ *************** *** 5752,5758 **** if (md->recursive != NULL && md->recursive->group_num == number) { recursion_info *rec = md->recursive; DPRINTF(("Recursion (%d) succeeded - continuing\n", number)); ! md->recursive = rec->prev; md->start_match = rec->save_start; memcpy(md->offset_vector, rec->offset_save, rec->saved_max * sizeof(int)); --- 6249,6255 ---- if (md->recursive != NULL && md->recursive->group_num == number) { recursion_info *rec = md->recursive; DPRINTF(("Recursion (%d) succeeded - continuing\n", number)); ! md->recursive = rec->prevrec; md->start_match = rec->save_start; memcpy(md->offset_vector, rec->offset_save, rec->saved_max * sizeof(int)); *************** *** 5784,5817 **** preceding bracket, in the appropriate order. */ if (*ecode == OP_KETRMIN) { ! if ((rrc = ! match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! 0)) != MATCH_NOMATCH) ! return rrc; ! if ((rrc = match(eptr, prev, offset_top, md, ims, eptrb, ! match_isgroup)) != MATCH_NOMATCH) ! return rrc; } else { /* OP_KETRMAX */ ! if ((rrc = match(eptr, prev, offset_top, md, ims, eptrb, ! match_isgroup)) != MATCH_NOMATCH) ! return rrc; ! if ((rrc = ! match(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! 0)) != MATCH_NOMATCH) ! return rrc; } } ! return MATCH_NOMATCH; /* Start of subject unless notbol, or after internal newline if multiline */ case OP_CIRC: if (md->notbol && eptr == md->start_subject) ! return MATCH_NOMATCH; if ((ims & PCRE_MULTILINE) != 0) { if (eptr != md->start_subject && eptr[-1] != NEWLINE) ! return MATCH_NOMATCH; ecode++; break; } --- 6281,6315 ---- preceding bracket, in the appropriate order. */ if (*ecode == OP_KETRMIN) { ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! RMATCH(rrc, eptr, prev, offset_top, md, ims, eptrb, match_isgroup); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); } else { /* OP_KETRMAX */ ! RMATCH(rrc, eptr, prev, offset_top, md, ims, eptrb, match_isgroup); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! RMATCH(rrc, eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, eptrb, ! 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); } } ! ! RRETURN(MATCH_NOMATCH); /* Start of subject unless notbol, or after internal newline if multiline */ case OP_CIRC: if (md->notbol && eptr == md->start_subject) ! RRETURN(MATCH_NOMATCH); if ((ims & PCRE_MULTILINE) != 0) { if (eptr != md->start_subject && eptr[-1] != NEWLINE) ! RRETURN(MATCH_NOMATCH); ecode++; break; } *************** *** 5821,5827 **** case OP_SOD: if (eptr != md->start_subject) ! return MATCH_NOMATCH; ecode++; break; --- 6319,6325 ---- case OP_SOD: if (eptr != md->start_subject) ! RRETURN(MATCH_NOMATCH); ecode++; break; *************** *** 5829,5835 **** case OP_SOM: if (eptr != md->start_subject + md->start_offset) ! return MATCH_NOMATCH; ecode++; break; --- 6327,6333 ---- case OP_SOM: if (eptr != md->start_subject + md->start_offset) ! RRETURN(MATCH_NOMATCH); ecode++; break; *************** *** 5840,5859 **** if ((ims & PCRE_MULTILINE) != 0) { if (eptr < md->end_subject) { if (*eptr != NEWLINE) ! return MATCH_NOMATCH; } else { if (md->noteol) ! return MATCH_NOMATCH; } ecode++; break; } else { if (md->noteol) ! return MATCH_NOMATCH; if (!md->endonly) { if (eptr < md->end_subject - 1 || (eptr == md->end_subject - 1 && *eptr != NEWLINE)) ! return MATCH_NOMATCH; ecode++; break; } --- 6338,6357 ---- if ((ims & PCRE_MULTILINE) != 0) { if (eptr < md->end_subject) { if (*eptr != NEWLINE) ! RRETURN(MATCH_NOMATCH); } else { if (md->noteol) ! RRETURN(MATCH_NOMATCH); } ecode++; break; } else { if (md->noteol) ! RRETURN(MATCH_NOMATCH); if (!md->endonly) { if (eptr < md->end_subject - 1 || (eptr == md->end_subject - 1 && *eptr != NEWLINE)) ! RRETURN(MATCH_NOMATCH); ecode++; break; } *************** *** 5864,5870 **** case OP_EOD: if (eptr < md->end_subject) ! return MATCH_NOMATCH; ecode++; break; --- 6362,6368 ---- case OP_EOD: if (eptr < md->end_subject) ! RRETURN(MATCH_NOMATCH); ecode++; break; *************** *** 5873,5879 **** case OP_EODN: if (eptr < md->end_subject - 1 || (eptr == md->end_subject - 1 && *eptr != NEWLINE)) ! return MATCH_NOMATCH; ecode++; break; --- 6371,6377 ---- case OP_EODN: if (eptr < md->end_subject - 1 || (eptr == md->end_subject - 1 && *eptr != NEWLINE)) ! RRETURN(MATCH_NOMATCH); ecode++; break; *************** *** 5882,5888 **** case OP_NOT_WORD_BOUNDARY: case OP_WORD_BOUNDARY: { - BOOL prev_is_word, cur_is_word; /* Find out if the previous and current characters are "word" characters. It takes a bit more work in UTF-8 mode. Characters > 255 are assumed to --- 6380,6385 ---- *************** *** 5902,5908 **** if ((*ecode++ == OP_WORD_BOUNDARY) ? cur_is_word == prev_is_word : cur_is_word != prev_is_word) ! return MATCH_NOMATCH; } break; --- 6399,6405 ---- if ((*ecode++ == OP_WORD_BOUNDARY) ? cur_is_word == prev_is_word : cur_is_word != prev_is_word) ! RRETURN(MATCH_NOMATCH); } break; *************** *** 5911,5919 **** case OP_ANY: if ((ims & PCRE_DOTALL) == 0 && eptr < md->end_subject && *eptr == NEWLINE) ! return MATCH_NOMATCH; if (eptr++ >= md->end_subject) ! return MATCH_NOMATCH; ecode++; break; --- 6408,6416 ---- case OP_ANY: if ((ims & PCRE_DOTALL) == 0 && eptr < md->end_subject && *eptr == NEWLINE) ! RRETURN(MATCH_NOMATCH); if (eptr++ >= md->end_subject) ! RRETURN(MATCH_NOMATCH); ecode++; break; *************** *** 5922,5982 **** case OP_ANYBYTE: if (eptr++ >= md->end_subject) ! return MATCH_NOMATCH; ecode++; break; case OP_NOT_DIGIT: if (eptr >= md->end_subject) ! return MATCH_NOMATCH; GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_digit) != 0) ! return MATCH_NOMATCH; ecode++; break; case OP_DIGIT: if (eptr >= md->end_subject) ! return MATCH_NOMATCH; GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_digit) == 0) ! return MATCH_NOMATCH; ecode++; break; case OP_NOT_WHITESPACE: if (eptr >= md->end_subject) ! return MATCH_NOMATCH; GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_space) != 0) ! return MATCH_NOMATCH; ecode++; break; case OP_WHITESPACE: if (eptr >= md->end_subject) ! return MATCH_NOMATCH; GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_space) == 0) ! return MATCH_NOMATCH; ecode++; break; case OP_NOT_WORDCHAR: if (eptr >= md->end_subject) ! return MATCH_NOMATCH; GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_word) != 0) ! return MATCH_NOMATCH; ecode++; break; case OP_WORDCHAR: if (eptr >= md->end_subject) ! return MATCH_NOMATCH; GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_word) == 0) ! return MATCH_NOMATCH; ecode++; break; --- 6419,6479 ---- case OP_ANYBYTE: if (eptr++ >= md->end_subject) ! RRETURN(MATCH_NOMATCH); ecode++; break; case OP_NOT_DIGIT: if (eptr >= md->end_subject) ! RRETURN(MATCH_NOMATCH); GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_digit) != 0) ! RRETURN(MATCH_NOMATCH); ecode++; break; case OP_DIGIT: if (eptr >= md->end_subject) ! RRETURN(MATCH_NOMATCH); GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_digit) == 0) ! RRETURN(MATCH_NOMATCH); ecode++; break; case OP_NOT_WHITESPACE: if (eptr >= md->end_subject) ! RRETURN(MATCH_NOMATCH); GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_space) != 0) ! RRETURN(MATCH_NOMATCH); ecode++; break; case OP_WHITESPACE: if (eptr >= md->end_subject) ! RRETURN(MATCH_NOMATCH); GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_space) == 0) ! RRETURN(MATCH_NOMATCH); ecode++; break; case OP_NOT_WORDCHAR: if (eptr >= md->end_subject) ! RRETURN(MATCH_NOMATCH); GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_word) != 0) ! RRETURN(MATCH_NOMATCH); ecode++; break; case OP_WORDCHAR: if (eptr >= md->end_subject) ! RRETURN(MATCH_NOMATCH); GETCHARINCTEST(c, eptr); if ((md->ctypes[c] & ctype_word) == 0) ! RRETURN(MATCH_NOMATCH); ecode++; break; *************** *** 5990,5997 **** case OP_REF: { ! int length; ! int offset = GET2(ecode, 1) << 1; /* Doubled ref number */ ecode += 3; /* Advance past item */ /* If the reference is unset, set the length to be longer than the amount --- 6487,6493 ---- case OP_REF: { ! offset = GET2(ecode, 1) << 1; /* Doubled ref number */ ecode += 3; /* Advance past item */ /* If the reference is unset, set the length to be longer than the amount *************** *** 6032,6038 **** default: /* No repeat follows */ if (!match_ref(offset, eptr, length, md, ims)) ! return MATCH_NOMATCH; eptr += length; continue; /* With the main loop */ } --- 6528,6534 ---- default: /* No repeat follows */ if (!match_ref(offset, eptr, length, md, ims)) ! RRETURN(MATCH_NOMATCH); eptr += length; continue; /* With the main loop */ } *************** *** 6049,6055 **** for (i = 1; i <= min; i++) { if (!match_ref(offset, eptr, length, md, ims)) ! return MATCH_NOMATCH; eptr += length; } --- 6545,6551 ---- for (i = 1; i <= min; i++) { if (!match_ref(offset, eptr, length, md, ims)) ! RRETURN(MATCH_NOMATCH); eptr += length; } *************** *** 6062,6073 **** /* If minimizing, keep trying and advancing the pointer */ if (minimize) { ! for (i = min;; i++) { ! if ((rrc = match(eptr, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; ! if (i >= max || !match_ref(offset, eptr, length, md, ims)) ! return MATCH_NOMATCH; eptr += length; } /* Control never gets here */ --- 6558,6569 ---- /* If minimizing, keep trying and advancing the pointer */ if (minimize) { ! for (fi = min;; fi++) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! if (fi >= max || !match_ref(offset, eptr, length, md, ims)) ! RRETURN(MATCH_NOMATCH); eptr += length; } /* Control never gets here */ *************** *** 6076,6094 **** /* If maximizing, find the longest string and work backwards */ else { ! const uschar *pp = eptr; for (i = min; i < max; i++) { if (!match_ref(offset, eptr, length, md, ims)) break; eptr += length; } while (eptr >= pp) { ! if ((rrc = match(eptr, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; eptr -= length; } ! return MATCH_NOMATCH; } } /* Control never gets here */ --- 6572,6590 ---- /* If maximizing, find the longest string and work backwards */ else { ! pp = eptr; for (i = min; i < max; i++) { if (!match_ref(offset, eptr, length, md, ims)) break; eptr += length; } while (eptr >= pp) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); eptr -= length; } ! RRETURN(MATCH_NOMATCH); } } /* Control never gets here */ *************** *** 6107,6113 **** case OP_NCLASS: case OP_CLASS: { ! const uschar *data = ecode + 1; /* Save for matching */ ecode += 33; /* Advance past the item */ switch (*ecode) { --- 6603,6609 ---- case OP_NCLASS: case OP_CLASS: { ! data = ecode + 1; /* Save for matching */ ecode += 33; /* Advance past the item */ switch (*ecode) { *************** *** 6146,6155 **** { for (i = 1; i <= min; i++) { if (eptr >= md->end_subject) ! return MATCH_NOMATCH; c = *eptr++; if ((data[c / 8] & (1 << (c & 7))) == 0) ! return MATCH_NOMATCH; } } --- 6642,6651 ---- { for (i = 1; i <= min; i++) { if (eptr >= md->end_subject) ! RRETURN(MATCH_NOMATCH); c = *eptr++; if ((data[c / 8] & (1 << (c & 7))) == 0) ! RRETURN(MATCH_NOMATCH); } } *************** *** 6165,6179 **** if (minimize) { /* Not UTF-8 mode */ { ! for (i = min;; i++) { ! if ((rrc = match(eptr, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; ! if (i >= max || eptr >= md->end_subject) ! return MATCH_NOMATCH; c = *eptr++; if ((data[c / 8] & (1 << (c & 7))) == 0) ! return MATCH_NOMATCH; } } /* Control never gets here */ --- 6661,6675 ---- if (minimize) { /* Not UTF-8 mode */ { ! for (fi = min;; fi++) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! if (fi >= max || eptr >= md->end_subject) ! RRETURN(MATCH_NOMATCH); c = *eptr++; if ((data[c / 8] & (1 << (c & 7))) == 0) ! RRETURN(MATCH_NOMATCH); } } /* Control never gets here */ *************** *** 6182,6188 **** /* If maximizing, find the longest possible run, then work backwards. */ else { ! const uschar *pp = eptr; /* Not UTF-8 mode */ { --- 6678,6684 ---- /* If maximizing, find the longest possible run, then work backwards. */ else { ! pp = eptr; /* Not UTF-8 mode */ { *************** *** 6195,6231 **** eptr++; } while (eptr >= pp) { ! if ((rrc = match(eptr--, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; } } ! return MATCH_NOMATCH; } } /* Control never gets here */ /* Match a run of characters */ case OP_CHARS: { ! register int length = ecode[1]; ecode += 2; ! if (length > md->end_subject - eptr) ! return MATCH_NOMATCH; if ((ims & PCRE_CASELESS) != 0) { ! while (length-- > 0) if (md->lcc[*ecode++] != md->lcc[*eptr++]) ! return MATCH_NOMATCH; } else { ! while (length-- > 0) if (*ecode++ != *eptr++) ! return MATCH_NOMATCH; } } break; --- 6691,6731 ---- eptr++; } while (eptr >= pp) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! eptr--; ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); } } ! RRETURN(MATCH_NOMATCH); } } /* Control never gets here */ + /* Match an extended character class. This opcode is encountered only + in UTF-8 mode, because that's the only time it is compiled. */ + /* Match a run of characters */ case OP_CHARS: { ! register int slen = ecode[1]; ecode += 2; ! if (slen > md->end_subject - eptr) ! RRETURN(MATCH_NOMATCH); if ((ims & PCRE_CASELESS) != 0) { ! while (slen-- > 0) if (md->lcc[*ecode++] != md->lcc[*eptr++]) ! RRETURN(MATCH_NOMATCH); } else { ! while (slen-- > 0) if (*ecode++ != *eptr++) ! RRETURN(MATCH_NOMATCH); } } break; *************** *** 6267,6277 **** /* When not in UTF-8 mode, load a single-byte character. */ { if (min > md->end_subject - eptr) ! return MATCH_NOMATCH; ! c = *ecode++; } ! /* The value of c at this point is always less than 256, though we may or may not be in UTF-8 mode. The code is duplicated for the caseless and caseful cases, for speed, since matching characters is likely to be quite common. First, ensure the minimum number of matches are present. If min = --- 6767,6777 ---- /* When not in UTF-8 mode, load a single-byte character. */ { if (min > md->end_subject - eptr) ! RRETURN(MATCH_NOMATCH); ! fc = *ecode++; } ! /* The value of fc at this point is always less than 256, though we may or may not be in UTF-8 mode. The code is duplicated for the caseless and caseful cases, for speed, since matching characters is likely to be quite common. First, ensure the minimum number of matches are present. If min = *************** *** 6280,6316 **** matching character if failing, up to the maximum. Alternatively, if maximizing, find the maximum number of characters and work backwards. */ ! DPRINTF(("matching %c{%d,%d} against subject %.*s\n", c, min, max, max, eptr)); if ((ims & PCRE_CASELESS) != 0) { ! c = md->lcc[c]; for (i = 1; i <= min; i++) ! if (c != md->lcc[*eptr++]) ! return MATCH_NOMATCH; if (min == max) continue; if (minimize) { ! for (i = min;; i++) { ! if ((rrc = match(eptr, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; ! if (i >= max || eptr >= md->end_subject || c != md->lcc[*eptr++]) ! return MATCH_NOMATCH; } /* Control never gets here */ } else { ! const uschar *pp = eptr; for (i = min; i < max; i++) { ! if (eptr >= md->end_subject || c != md->lcc[*eptr]) break; eptr++; } ! while (eptr >= pp) ! if ((rrc = match(eptr--, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; ! return MATCH_NOMATCH; } /* Control never gets here */ } --- 6780,6818 ---- matching character if failing, up to the maximum. Alternatively, if maximizing, find the maximum number of characters and work backwards. */ ! DPRINTF(("matching %c{%d,%d} against subject %.*s\n", fc, min, max, max, eptr)); if ((ims & PCRE_CASELESS) != 0) { ! fc = md->lcc[fc]; for (i = 1; i <= min; i++) ! if (fc != md->lcc[*eptr++]) ! RRETURN(MATCH_NOMATCH); if (min == max) continue; if (minimize) { ! for (fi = min;; fi++) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! if (fi >= max || eptr >= md->end_subject || fc != md->lcc[*eptr++]) ! RRETURN(MATCH_NOMATCH); } /* Control never gets here */ } else { ! pp = eptr; for (i = min; i < max; i++) { ! if (eptr >= md->end_subject || fc != md->lcc[*eptr]) break; eptr++; } ! while (eptr >= pp) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! eptr--; ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! } ! RRETURN(MATCH_NOMATCH); } /* Control never gets here */ } *************** *** 6319,6349 **** else { for (i = 1; i <= min; i++) ! if (c != *eptr++) ! return MATCH_NOMATCH; if (min == max) continue; if (minimize) { ! for (i = min;; i++) { ! if ((rrc = match(eptr, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; ! if (i >= max || eptr >= md->end_subject || c != *eptr++) ! return MATCH_NOMATCH; } /* Control never gets here */ } else { ! const uschar *pp = eptr; for (i = min; i < max; i++) { ! if (eptr >= md->end_subject || c != *eptr) break; eptr++; } ! while (eptr >= pp) ! if ((rrc = match(eptr--, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; ! return MATCH_NOMATCH; } } /* Control never gets here */ --- 6821,6853 ---- else { for (i = 1; i <= min; i++) ! if (fc != *eptr++) ! RRETURN(MATCH_NOMATCH); if (min == max) continue; if (minimize) { ! for (fi = min;; fi++) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! if (fi >= max || eptr >= md->end_subject || fc != *eptr++) ! RRETURN(MATCH_NOMATCH); } /* Control never gets here */ } else { ! pp = eptr; for (i = min; i < max; i++) { ! if (eptr >= md->end_subject || fc != *eptr) break; eptr++; } ! while (eptr >= pp) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! eptr--; ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! } ! RRETURN(MATCH_NOMATCH); } } /* Control never gets here */ *************** *** 6353,6368 **** case OP_NOT: if (eptr >= md->end_subject) ! return MATCH_NOMATCH; ecode++; GETCHARINCTEST(c, eptr); if ((ims & PCRE_CASELESS) != 0) { c = md->lcc[c]; if (md->lcc[*ecode++] == c) ! return MATCH_NOMATCH; } else { if (*ecode++ == c) ! return MATCH_NOMATCH; } break; --- 6857,6872 ---- case OP_NOT: if (eptr >= md->end_subject) ! RRETURN(MATCH_NOMATCH); ecode++; GETCHARINCTEST(c, eptr); if ((ims & PCRE_CASELESS) != 0) { c = md->lcc[c]; if (md->lcc[*ecode++] == c) ! RRETURN(MATCH_NOMATCH); } else { if (*ecode++ == c) ! RRETURN(MATCH_NOMATCH); } break; *************** *** 6405,6412 **** REPEATNOTCHAR: if (min > md->end_subject - eptr) ! return MATCH_NOMATCH; ! c = *ecode++; /* The code is duplicated for the caseless and caseful cases, for speed, since matching characters is likely to be quite common. First, ensure the --- 6909,6916 ---- REPEATNOTCHAR: if (min > md->end_subject - eptr) ! RRETURN(MATCH_NOMATCH); ! fc = *ecode++; /* The code is duplicated for the caseless and caseful cases, for speed, since matching characters is likely to be quite common. First, ensure the *************** *** 6416,6432 **** maximum. Alternatively, if maximizing, find the maximum number of characters and work backwards. */ ! DPRINTF(("negative matching %c{%d,%d} against subject %.*s\n", c, min, max, max, eptr)); if ((ims & PCRE_CASELESS) != 0) { ! c = md->lcc[c]; /* Not UTF-8 mode */ { for (i = 1; i <= min; i++) ! if (c == md->lcc[*eptr++]) ! return MATCH_NOMATCH; } if (min == max) --- 6920,6937 ---- maximum. Alternatively, if maximizing, find the maximum number of characters and work backwards. */ ! DPRINTF(("negative matching %c{%d,%d} against subject %.*s\n", fc, min, max, max, eptr)); if ((ims & PCRE_CASELESS) != 0) { ! fc = md->lcc[fc]; ! /* Not UTF-8 mode */ { for (i = 1; i <= min; i++) ! if (fc == md->lcc[*eptr++]) ! RRETURN(MATCH_NOMATCH); } if (min == max) *************** *** 6435,6446 **** if (minimize) { /* Not UTF-8 mode */ { ! for (i = min;; i++) { ! if ((rrc = match(eptr, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; ! if (i >= max || eptr >= md->end_subject || c == md->lcc[*eptr++]) ! return MATCH_NOMATCH; } } /* Control never gets here */ --- 6940,6952 ---- if (minimize) { /* Not UTF-8 mode */ { ! for (fi = min;; fi++) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! if (fi >= max || eptr >= md->end_subject ! || fc == md->lcc[*eptr++]) ! RRETURN(MATCH_NOMATCH); } } /* Control never gets here */ *************** *** 6449,6472 **** /* Maximize case */ else { ! const uschar *pp = eptr; /* Not UTF-8 mode */ { for (i = min; i < max; i++) { ! if (eptr >= md->end_subject || c == md->lcc[*eptr]) break; eptr++; } while (eptr >= pp) { ! if ((rrc = match(eptr, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; eptr--; } } ! return MATCH_NOMATCH; } /* Control never gets here */ } --- 6955,6978 ---- /* Maximize case */ else { ! pp = eptr; /* Not UTF-8 mode */ { for (i = min; i < max; i++) { ! if (eptr >= md->end_subject || fc == md->lcc[*eptr]) break; eptr++; } while (eptr >= pp) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); eptr--; } } ! RRETURN(MATCH_NOMATCH); } /* Control never gets here */ } *************** *** 6477,6484 **** /* Not UTF-8 mode */ { for (i = 1; i <= min; i++) ! if (c == *eptr++) ! return MATCH_NOMATCH; } if (min == max) --- 6983,6990 ---- /* Not UTF-8 mode */ { for (i = 1; i <= min; i++) ! if (fc == *eptr++) ! RRETURN(MATCH_NOMATCH); } if (min == max) *************** *** 6487,6498 **** if (minimize) { /* Not UTF-8 mode */ { ! for (i = min;; i++) { ! if ((rrc = match(eptr, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; ! if (i >= max || eptr >= md->end_subject || c == *eptr++) ! return MATCH_NOMATCH; } } /* Control never gets here */ --- 6993,7004 ---- if (minimize) { /* Not UTF-8 mode */ { ! for (fi = min;; fi++) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! if (fi >= max || eptr >= md->end_subject || fc == *eptr++) ! RRETURN(MATCH_NOMATCH); } } /* Control never gets here */ *************** *** 6501,6524 **** /* Maximize case */ else { ! const uschar *pp = eptr; /* Not UTF-8 mode */ { for (i = min; i < max; i++) { ! if (eptr >= md->end_subject || c == *eptr) break; eptr++; } while (eptr >= pp) { ! if ((rrc = match(eptr, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; eptr--; } } ! return MATCH_NOMATCH; } } /* Control never gets here */ --- 7007,7030 ---- /* Maximize case */ else { ! pp = eptr; /* Not UTF-8 mode */ { for (i = min; i < max; i++) { ! if (eptr >= md->end_subject || fc == *eptr) break; eptr++; } while (eptr >= pp) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); eptr--; } } ! RRETURN(MATCH_NOMATCH); } } /* Control never gets here */ *************** *** 6569,6575 **** is tidier. */ if (min > md->end_subject - eptr) ! return MATCH_NOMATCH; if (min > 0) { /* Code for the non-UTF-8 case for minimum matching */ --- 7075,7081 ---- is tidier. */ if (min > md->end_subject - eptr) ! RRETURN(MATCH_NOMATCH); if (min > 0) { /* Code for the non-UTF-8 case for minimum matching */ *************** *** 6579,6585 **** if ((ims & PCRE_DOTALL) == 0) { for (i = 1; i <= min; i++) if (*eptr++ == NEWLINE) ! return MATCH_NOMATCH; } else eptr += min; break; --- 7085,7091 ---- if ((ims & PCRE_DOTALL) == 0) { for (i = 1; i <= min; i++) if (*eptr++ == NEWLINE) ! RRETURN(MATCH_NOMATCH); } else eptr += min; break; *************** *** 6591,6627 **** case OP_NOT_DIGIT: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_digit) != 0) ! return MATCH_NOMATCH; break; case OP_DIGIT: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_digit) == 0) ! return MATCH_NOMATCH; break; case OP_NOT_WHITESPACE: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_space) != 0) ! return MATCH_NOMATCH; break; case OP_WHITESPACE: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_space) == 0) ! return MATCH_NOMATCH; break; case OP_NOT_WORDCHAR: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_word) != 0) ! return MATCH_NOMATCH; break; case OP_WORDCHAR: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_word) == 0) ! return MATCH_NOMATCH; break; } } --- 7097,7133 ---- case OP_NOT_DIGIT: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_digit) != 0) ! RRETURN(MATCH_NOMATCH); break; case OP_DIGIT: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_digit) == 0) ! RRETURN(MATCH_NOMATCH); break; case OP_NOT_WHITESPACE: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_space) != 0) ! RRETURN(MATCH_NOMATCH); break; case OP_WHITESPACE: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_space) == 0) ! RRETURN(MATCH_NOMATCH); break; case OP_NOT_WORDCHAR: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_word) != 0) ! RRETURN(MATCH_NOMATCH); break; case OP_WORDCHAR: for (i = 1; i <= min; i++) if ((md->ctypes[*eptr++] & ctype_word) == 0) ! RRETURN(MATCH_NOMATCH); break; } } *************** *** 6637,6653 **** if (minimize) { /* Not UTF-8 mode */ { ! for (i = min;; i++) { ! if ((rrc = match(eptr, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; ! if (i >= max || eptr >= md->end_subject) ! return MATCH_NOMATCH; c = *eptr++; switch (ctype) { case OP_ANY: if ((ims & PCRE_DOTALL) == 0 && c == NEWLINE) ! return MATCH_NOMATCH; break; case OP_ANYBYTE: --- 7143,7159 ---- if (minimize) { /* Not UTF-8 mode */ { ! for (fi = min;; fi++) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); ! if (fi >= max || eptr >= md->end_subject) ! RRETURN(MATCH_NOMATCH); c = *eptr++; switch (ctype) { case OP_ANY: if ((ims & PCRE_DOTALL) == 0 && c == NEWLINE) ! RRETURN(MATCH_NOMATCH); break; case OP_ANYBYTE: *************** *** 6655,6686 **** case OP_NOT_DIGIT: if ((md->ctypes[c] & ctype_digit) != 0) ! return MATCH_NOMATCH; break; case OP_DIGIT: if ((md->ctypes[c] & ctype_digit) == 0) ! return MATCH_NOMATCH; break; case OP_NOT_WHITESPACE: if ((md->ctypes[c] & ctype_space) != 0) ! return MATCH_NOMATCH; break; case OP_WHITESPACE: if ((md->ctypes[c] & ctype_space) == 0) ! return MATCH_NOMATCH; break; case OP_NOT_WORDCHAR: if ((md->ctypes[c] & ctype_word) != 0) ! return MATCH_NOMATCH; break; case OP_WORDCHAR: if ((md->ctypes[c] & ctype_word) == 0) ! return MATCH_NOMATCH; break; } } --- 7161,7192 ---- case OP_NOT_DIGIT: if ((md->ctypes[c] & ctype_digit) != 0) ! RRETURN(MATCH_NOMATCH); break; case OP_DIGIT: if ((md->ctypes[c] & ctype_digit) == 0) ! RRETURN(MATCH_NOMATCH); break; case OP_NOT_WHITESPACE: if ((md->ctypes[c] & ctype_space) != 0) ! RRETURN(MATCH_NOMATCH); break; case OP_WHITESPACE: if ((md->ctypes[c] & ctype_space) == 0) ! RRETURN(MATCH_NOMATCH); break; case OP_NOT_WORDCHAR: if ((md->ctypes[c] & ctype_word) != 0) ! RRETURN(MATCH_NOMATCH); break; case OP_WORDCHAR: if ((md->ctypes[c] & ctype_word) == 0) ! RRETURN(MATCH_NOMATCH); break; } } *************** *** 6693,6699 **** UTF-8 stuff separate. */ else { ! const uschar *pp = eptr; /* Not UTF-8 mode */ --- 7199,7205 ---- UTF-8 stuff separate. */ else { ! pp = eptr; /* Not UTF-8 mode */ *************** *** 6775,6789 **** /* eptr is now past the end of the maximum run */ while (eptr >= pp) { ! if ((rrc = match(eptr--, ecode, offset_top, md, ims, eptrb, 0)) != ! MATCH_NOMATCH) ! return rrc; } } /* Get here if we can't make it match with any permitted repetitions */ ! return MATCH_NOMATCH; } /* Control never gets here */ --- 7281,7296 ---- /* eptr is now past the end of the maximum run */ while (eptr >= pp) { ! RMATCH(rrc, eptr, ecode, offset_top, md, ims, eptrb, 0); ! eptr--; ! if (rrc != MATCH_NOMATCH) ! RRETURN(rrc); } } /* Get here if we can't make it match with any permitted repetitions */ ! RRETURN(MATCH_NOMATCH); } /* Control never gets here */ *************** *** 6794,6800 **** default: DPRINTF(("Unknown opcode %d\n", *ecode)); ! return PCRE_ERROR_UNKNOWN_NODE; } /* Do not stick any code in here without much thought; it is assumed --- 7301,7307 ---- default: DPRINTF(("Unknown opcode %d\n", *ecode)); ! RRETURN(PCRE_ERROR_UNKNOWN_NODE); } /* Do not stick any code in here without much thought; it is assumed *************** *** 6806,6811 **** --- 7313,7375 ---- } + /*************************************************************************** + **************************************************************************** + RECURSION IN THE match() FUNCTION + + Undefine all the macros that were defined above to handle this. */ + + #ifdef NO_RECURSE + #undef eptr + #undef ecode + #undef offset_top + #undef ims + #undef eptrb + #undef flags + + #undef callpat + #undef charptr + #undef data + #undef lastptr + #undef next + #undef pp + #undef prev + #undef saved_eptr + + #undef new_recursive + + #undef cur_is_word + #undef condition + #undef minimize + #undef prev_is_word + + #undef original_ims + + #undef ctype + #undef length + #undef max + #undef min + #undef number + #undef offset + #undef op + #undef save_capture_last + #undef save_offset1 + #undef save_offset2 + #undef save_offset3 + #undef stacksave + + #undef newptrb + + #endif + + /* These two are defined as macros in both cases */ + + #undef fc + #undef fi + + /*************************************************************************** + ***************************************************************************/ + /************************************************* *************** *** 6832,6838 **** < -1 => some kind of unexpected problem */ ! int pcre_exec(const pcre * external_re, const pcre_extra * extra_data, const char *subject, int length, int start_offset, int options, int *offsets, int offsetcount) --- 7396,7402 ---- < -1 => some kind of unexpected problem */ ! EXPORT int pcre_exec(const pcre * external_re, const pcre_extra * extra_data, const char *subject, int length, int start_offset, int options, int *offsets, int offsetcount) *************** *** 6872,6878 **** if (extra_data != NULL) { register unsigned int flags = extra_data->flags; if ((flags & PCRE_EXTRA_STUDY_DATA) != 0) ! study = extra_data->study_data; if ((flags & PCRE_EXTRA_MATCH_LIMIT) != 0) match_block.match_limit = extra_data->match_limit; if ((flags & PCRE_EXTRA_CALLOUT_DATA) != 0) --- 7436,7442 ---- if (extra_data != NULL) { register unsigned int flags = extra_data->flags; if ((flags & PCRE_EXTRA_STUDY_DATA) != 0) ! study = (const pcre_study_data *) extra_data->study_data; if ((flags & PCRE_EXTRA_MATCH_LIMIT) != 0) match_block.match_limit = extra_data->match_limit; if ((flags & PCRE_EXTRA_CALLOUT_DATA) != 0) *************** *** 6907,6912 **** --- 7471,7477 ---- match_block.lcc = re->tables + lcc_offset; match_block.ctypes = re->tables + ctypes_offset; + /* The ims options can vary during the matching as a result of the presence of (?ims) items in the pattern. They are kept in a local variable so that restoring at the exit of a group is easy. */ *************** *** 6922,6928 **** if (re->top_backref > 0 && re->top_backref >= ocount / 3) { ocount = re->top_backref * 3 + 3; ! match_block.offset_vector = (int *) (malloc) (ocount * sizeof(int)); if (match_block.offset_vector