summaryrefslogtreecommitdiffstats
path: root/regex.c
Commit message (Collapse)AuthorAgeFilesLines
* Printing of regular expression objects implemented.Kaz Kylheku2014-10-041-1/+151
| | | | | | | | * regex.c (regex_print): New static function. (regex_obj_ops): Registered regex_print. (print_class_char, paren_print_rec, print_rec): New static functions. * dep.mk: Regenerated.
* Keep regex source code in regex objects, in anticipationKaz Kylheku2014-10-041-2/+13
| | | | | | | | | of pretty-printing. Fix object construction bugs. * regex.c (struct regex): New member, source. (regex_mark): Ensure source is visited by garbage collector. (regex_compile): Store regex_sexp in source. Fix violations of section 3.2 of HACKING document.
* Using unified COBJ representation for both regex kinds,Kaz Kylheku2014-10-021-29/+42
| | | | | | | | | | | | | | | | | | | | | | | | rather than the list-based notation for derivative-based regexes, and an encapsulated COBJ for NFA-based regexes. * lib.c (compiled_regex_s): Variable removed. (obj_init): Initialization of compiled_regex_s removed. * lib.h (compiled_regex_s): Declaration removed. * regex.c (struct regex, regex_t): New type. (regex_destroy): Object is now a regex_t, not nfa_t. (regex_mark): New function. (regex_obj_ops): Register regex_mark operation. (reg_nullable, reg_derivative): Remove cases that handles compiled_regex_s. (regex_compile): Output of dv_compile_regex becomes a cobj nwo. Output of nfa_compile_regex must be embedded in regex_t structure. (regexp): Drop the check for compiles_regex_s. (regex_nfa): Function removed. (regex_run, regex_machine_init): Use cobj_handle to retrieve regex_t * pointer and dispatch appropriate code based on regex->kind.
* GC correctness fixes: make sure we pin down objects for which we borrowKaz Kylheku2014-08-251-1/+8
| | | | | | | | | | | | | | | low level C pointers, while we execute code that can cons memory. * lib.c (list_str): Protect the str argument. (int_str): Likewise. * regex.c (search_regex): protect the haystack string, while using the h pointer to its data, since regex_run can use the derivative-based engine which conses. * stream.c (vformat_str): Protect str argument, since put_char might conceivably cons. (vformat): Protect fmtstr.
* * Makefile, arith.c, arith.h, combi.c, combi.h, configure, debug.c,Kaz Kylheku2014-07-231-16/+16
| | | | | | | | debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, hash.c, hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, syslog.c, syslog.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Synchronize license header with LICENSE.
* * eval.c (eval_init): register range_regex and tok_whereKaz Kylheku2014-06-261-0/+13
| | | | | | | | | | | | | | as intrinsics. * lib.c (tok_where): New function. * lib.h (tok_where): Declared. * regex.c (range_regex): New function. * regex.h (range_regex): Declared. * txr.1: Documented tok-where and range-regex.
* * eval.c, gc.c, rand.c, regex.c, signal.c: Remove inclusion of unneededKaz Kylheku2014-04-131-1/+0
| | | | headers.
* * parser.l (regex_parse, lisp_parse): Fix neglected handling ofKaz Kylheku2014-03-141-1/+1
| | | | | | | | optional arguments. This problem can cause the symbol : to be planted as the std_error stream, resulting in an error loop that blows the stack. * regex.c (regex_compile): Likewise.
* Issue: match_regex and search_regex were continuing to feed charactersKaz Kylheku2014-03-091-20/+44
| | | | | | | | | | | | | | | | | | | | to the regex machine even when there is no transition available. This was due to the broken return value protocol of regex_machine_feed. For instance for the regex / +/ (one or more spaces), after matching some spaces, it would report REGM_INCOMPLETE for additional non-space characters, never reporting REGM_FAIL. * regex.c (regm_result_t): Block comment added, documenting protocol. (regex_machine_feed): Return REGM_FAIL if there are no transitions for the given character, even a partial match has been recorded. This is a signal to stop feeding more characters. At that point, the function can be called with a null character to distinguish the three cases: fail, partial or full match. (search_regex): Now when the search loop gets a REGM_FAIL, it can no longer assume that nothing was matched and the search must restart at the next position. Upon the REGM_FAIL signal, it is necesary to seal the search by feeding in the 0 character. Only if that returns REGM_FAIL is it a no match situation. Otherwise it is actually a match!
* Replacing uses of the eq function which are used only as C booleans,Kaz Kylheku2014-02-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with just using the == operator. Removing cobj_equal_op since it's indistinguishable from eq. Streamlining missingp and null_or_missing_p. * eval.c (transform_op): eq to ==. (c_var_ops): cobj_equal_op to eq. * filter.c (trie_compress, trie_lookup_feed_char, filter_string_tree, html_hex_continue, html_dec_continue): eq to ==. * hash.c (hash_iter_ops): cobj_equal to eq. * lib.c (countq, getplist, getplist_f, search_str_tree, posq): eq to ==. (cobj_equal_op): Function removed. * lib.h (cobj_equal_op): Declaration removed. (missingp): Becomes a simple macro that yields a C boolean instead of t/nil val, because it's only used that way. (null_or_missing_p): Becomes inline function returning int. * match.c (v_output): eq to ==. * rand.c (random_state_ops): cobj_equal_op to eq. * regex.c (char_set_obj_ops, regex_obj_ops): cobj_equal_op to eq. (reg_derivative): Silly if3 expression replaced by null. (regexp): Redundant if2 expression wrapped around eq removed. * stream.c (null_ops, stdio_ops, tail_ops, pipe_ops, string_in_ops, byte_in_ops, string_out_ops, strlist_out_ops, dir_ops, cat_stream_ops): cobj_equal_op to eq. * syslog.c (syslog_strm_ops): cobj_equal_op to eq.
* The C function nullp is being renamed to null, and the rarelyKaz Kylheku2014-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | used global variable null which holds a symbol becomes null_s. A new macro called nilp is added that more efficiently checks whether an object is nil, producing a C boolean value rather than t or nil. Most of the uses of nullp in the codebase just become the more streamlined nilp. * debug.c (show_bindings): nullp to nilp * eval.c (lookup_var, lookup_var_l, lookup_fun, lookup_sym_lisp1, do_eval, expand_qquote, expand_quasi, expand_op): nullp to nilp. (op_modplace): nullp to null. (eval_init): Update registration of null and not from C function nullp to null. * filter.c (trie_compress, html_hex_continue): nullp to nil. (filter_string_tree): null to null_s. * hash.c (hash_next): nullp to nilp. * lib.c (null): Variable renamed to null_s. (code2type): null to null_s. (lazy_flatten_scan, chainv, lazy_str, lazy_str_force_upto, obj_print, obj_pprint): nullp to nilp. (obj_init): null to null_s; nullp to null. * lib.h (null): declaration changed to null_s. (nullp): Inline function renamed to null. (nilp): New macro. * match.c (do_match_line): nullp to nilp. * rand.c (make_random_state): Likewise. * regex.c (compile_regex): Likewise.
* * arith.c (lognot): Conform to new scheme for defaulting optional args.Kaz Kylheku2014-02-051-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * eval.c (apply): Unconditionally use colon_k for missing optional args, for intrinsic functions. (eval_intrinsic, rangev, rangev_star, errno_wrap): Conform to new scheme for defaulting optional args. (reg_fun_mark): Function removed. (eval_init): Switch reduce_left and reduce_right back to reg_fun registration. * hash.c (gethash_n): Conform to new scheme for defaulting optional arguments. * lib.c (sub_list, replace_list, remove_if, keep_if, remove_if_lazy, keep_if_lazy, tree_find, count_if, some_satisfy, all_satisfy, none_satisfy, search_str, match_str, match_str_tree, sub_str, replace_str, cat_str, tok_str, intern, rehome_sym, sub_vec, replace_vec, lazy_str, sort, multi_sort, find, find_if, set_diff, obj_print, obj_pprint): Conform to new scheme for defaulting optional arguments. (func_f0, func_f1, func_f2, func_f3, func_f4, func_n0, func_n1, func_n2, func_n3, func_n4, func_n5, func_n6, func_n7, func_f0v, func_f1v, func_f2v, func_f3v, func_f4v, func_n0v, func_n1v, func_n2v, func_n3v, func_n4v, func_n5v, func_n6v, func_n7v): Remove references to removed mark_missing_args member of struct func. (func_set_mark_missing): Function removed. (generic_funcall): Unconditionally use colon_k for missing optional args, for intrinsic functions. * lib.h (struct func): mark_missing_args member removed. (func_set_mark_missing): Declaration removed. (default_arg, default_bool_arg): New inline functions. * rand.c (random): Left argument is not optional. (rnd): Conform to new scheme for defaulting optional arguments. * regex.c (search_regex, match_regex): Conform to new scheme for defaulting optional arguments. * stream.c (unget_char, unget_byte, put_string, put_char, put_byte, put_line): Conform to new scheme for defaulting optional arguments. * syslog.c (openlog_wrap): Conform to new scheme for defaulting optional arguments. * txr.1: Remove the specification that nil is a sentinel value in default arguments, where necessary. Use consistent syntax for specifying variable parts in argument lists. A few errors and omissions addressed.
* * regex.c (match_regex_right): Fix not returning value.Kaz Kylheku2014-01-291-0/+2
|
* * regex.c (match_regex_right): Fix semantics of second argumentKaz Kylheku2014-01-271-5/+6
| | | | | | | | to something more useful. * regex.h (match_regex_right): Change name of parameter. * txr.1: Documented match-regex-right.
* * regex.c (match_regex_right): New function.Kaz Kylheku2014-01-261-0/+20
| | | | | | * regex.h (match_regex_right): Declared. * eval.c (eval_init): Register match_regex_right as instrinsic.
* Changes to the list collection mechanism to improveKaz Kylheku2014-01-221-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the extension of list operations over vectors and strings. * eval.c (do_eval_args, bindings_helper, op_each, subst_vars, supplement_op_syms, mapcarv, mappendv): Switch from list_collect_* macros to functions. * lib.c (copy_list): Switch from list_collect* macros to functions. Use list_collect_nconc for the final terminator. Doing a copy there with list_collect_append was actually wasteful, and now that list_collect_append calls copy_list in places, it triggered runaway recursion. (make_like): Bugfix: list_vector was used instead of vector_list. (to_seq, list_collect, list_collect_nconc, list_collect_append): New functions. (append2, appendv, nappend2, sub_list, replace_list, ldiff, remq, remql, remqual, remove_if, keep_if, proper_plist_to_alist, improper_plist_to_alist, split_str, split_str_set, tok_str, list_str, chain, andf, orf, lis_vector, mapcar, mapcon, mappend, merge, set_diff, env): Switch from list_collect* macros to functions. (replace_str, replace_vec): Allow single item replacement sequence. * lib.h (to_seq): Declared. (list_collect, list_collect_nconc, list_collect_append): Macros removed, replaced by function declarations of the same name. These functions return the new ptail since they cannot assign to it, requiring all uses to be updated to do the assignment of the returned value. (list_collect_decl): Use val rather than obj_t *. * match.c (vars_to_bindings, h_coll, subst_vars, extract_vars, extract_bindings, do_output_line, do_output, v_gather, v_collect): Switch from list_collect* macros to functions. * parser.y (o_elems_transform): Likewise. * regex.c (dv_compile_regex, regsub): Likewise. * txr.c (txr_main): Likewise.
* Bugfix in regex char ranges affecting ranges whose upper endKaz Kylheku2014-01-131-4/+7
| | | | | | | | | | | | | corresponds to the high bit of a bitmap cell: for instance the character \x7f when the cell size is 32 bits. * regex.c (BITCELL_ALL1): Unused macro removed. (BITCELL_BIT): New macro to replace occurrences of a repeated expression. (CHAR_SET_INDEX, CHAR_SET_BIT): Updated to use BITCELL_BIT. (L0_fill_range): Bugfix: the mask1 calculation was producing all-zero in the condition bt1 == BITCELL_BIT; it should produce an all-ones mask.
* First cut at signal handling support.Kaz Kylheku2013-12-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Makefile (OBJS-y): Include signal.o if have_posix_sigs is "y". * configure (have_posix_sigs): New variable, set by detecting POSIX signal stuff. * dep.mk: Regenerated. * arith.c, debug.c, eval.c, filter.c, hash.c, match.c, parser.y, parser.l, rand.c, regex.c, syslog.c, txr.c, utf8.c: Include new signal.h header, now required by unwind, and the <signal.h> system header. * eval.c (exit_wrap): New function. (eval_init): New functions registered as intrinsics: exit_wrap, set_sig_handler, get_sig_handler, sig_check. * gc.c (release): Unused functions removed. * gc.h (release): Declaration removed. * lib.c (init): Call sig_init. * stream.c (set_putc, se_getc, se_fflush): New static functions. (stdio_put_char_callback, stdio_get_char_callback, stdio_put_byte, stdio_flush, stdio_get_byte): Use new functions to enable signals when blocked on I/O. (tail_strategy): Allow signals across sleep. (pipev_close): Allow signals across waitpid. (se_pclose): New static function. (pipe_close): Use new function to enable signals across pclose. * unwind.c (uw_unwind_to_exit_point): use extended_longjmp instead of longjmp. * unwind.h (struct uw_block, struct uw_catch): jb member changes from jmp_buf to extended_jmp_buf. (uw_block_begin, uw_simple_catch_begin, uw_catch_begin): Use extended_setjmp instead of setjmp. * signal.c: New file. * signal.h: New file.
* Bumping copyrights to 2014 and expressing them as year ranges.Kaz Kylheku2013-12-101-1/+1
| | | | Fixing some errors in copyright comments.
* * eval.c (eval_init): Update registration of regex-compileKaz Kylheku2013-12-061-3/+3
| | | | | | | | | | | | | | | | | | | to reflect that it has two arguments now. * parser.y (grammar): Update calls to regex_compile to pass two arguments. Since we don't expect regex_compile to parse, we specify the error stream as nil. (spec): The "secret syntax" for a regex is simplified not to include the slashes. This provides better diagnostics for unterminated syntax and requires less string processing to generate. Also, the form returned doesn't have the regex symbol consed onto it, which parse_regex just throws away. * regex.c (regex_compile): Now takes a stream argument. * regex.h (regex_compile): Declaration updated. * txr.1: Updated
* * regex.c (regex_compile): Handle string input.Kaz Kylheku2013-12-051-1/+5
| | | | | | | * regex.h (regex_compile): Don't call argument regex_sexp, since it can be a string. * txr.1: Updated.
* * regex.c (regex_space_chars): Variable removed.Kaz Kylheku2012-04-201-22/+16
| | | | | | | | | (char_set_addr_str): New function. (char_set_compile): Use char_set_addr_str to add spaces to set. (init_special_char_sets): Use char_set_addr_str to add spaces to set. Bugfix: word_cs, cword_cs wrongly initialized. (regex_init): Removed reference to regex_space_chars.
* * parser.y (regtoken): New nonterminal symbol.Kaz Kylheku2012-04-201-1/+30
| | | | | | | | | | | | | | | | (regterm): REGTOKEN production factored out to regtoken. (regclass): Reverted prior commmit's changes. (regclassterm): Reverted prior commit, removing REGTOKEN production for character classes, and introduced a regtoken production. So now the keyword symbols are part of the character class abstract syntax. (regtoken): New production rule. * regex.c (regex_space_chars): Converted to internal linkage. (char_set_compile): Handle token keywords in character class abstract syntax. * regex.h (regex_space_chars): External declaration removed.
* First cut at implementing \s, \d, \w, \S, \D and \W regex tokens.Kaz Kylheku2012-04-191-3/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * lib.c (init): Call regex_init. * parser.l: return new REGTOKEN kind. * parser.y (REGTOKEN): New token type. (REGTERM): Translate REGTERM to keyword. (regclass): Restructured to handle inherited nodes as lists. (regclassterm): Produce $$ as list. Add handling for REGTOKEN occurring inside character class by expanding it. This might not be the best approach. (yybadtoken): Handle REGTOKEN in switch. * regex.c (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): New bitfield member, stat. (char_set_create): New parameter for indicating static char set. (char_set_destroy): Do not free a static char set. (char_set_compile): Pass zero to new parameter of char_set_create. (spaces): New static array. (space_cs, digit_cs, word_cs, cspace_cs, cdigit_cs, cword_cs): New static pointers to char_set_t. (init_special_char_sets, nfa_compile_given_set): New static function. (nfa_compile_regex, dv_compile_regex): Handle new character set token keywords. (space_k, digit_k, word_char_k, cspace_k, cdigit_k, cword_char_k, regex_space_chars): New variables. (regex_init): New function. * regex.h (space_k, digit_k, word_char_k, cspace_k, cdigit_k, cword_char_k, regex_space_chars, regex_init): Declared.
* Improve the regex Lisp syntax by allowing strings to specifyKaz Kylheku2012-04-121-4/+12
| | | | | | | | | | | character compounds. I.e. the syntax "foo" is equivalent to the cumbersome canonical form (compound #\f #\o #\o). * regex.c (nfa_compile_regex, dv_compile_regex): Use chrp function instead of typeof. Handle stringp case by forming a compound out of the characters and recursing. Check for some bad objects in the regex that would never come out of our regex parser but could occur in a "hand crafted" syntax tree.
* * eval.c (eval_init): Expose regex-compile and regexp as intrinsics.Kaz Kylheku2012-04-101-0/+5
| | | | | | | | | | | | | * lib.c (obj_init): Change spelling of nongreedy operator and put it into the user package so that it is available for use with regex-compile. * regex.c (match_regex, search_regex): Bugfix: optional start position argument argument not defaulting to zero. * txr.1: Documented regex-compile and regexp. * txr.vim: Highlighting regex-compile and regexp.
* Changing type function to not blow up on nil, which makes a lot of codeKaz Kylheku2012-03-171-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | simpler. A pseudo type code is introduced called NIL with value 0. * lib.h (enum type): New enumeration value, NIL. (type): Function accepts object nil and maps it to code NIL. * eval.c (dwim_loc, op_dwim): test for nil obj and goto hack is gone, just handle NIL in the switch. * gc.c (make_obj, mark): Handle new NIL type code in switch. * hash.c (equal_hash): Handle NIL in the switch instead of nil test. * lib.c (code2type): Map new NIL type code to null. (typeof, typecheck): Code simplified. (class_check, car): Move nil test into switch. (eql, equal, consp, bignump, stringp, lazy_stringp, symbolp, functionp, vectorp, cobjp): Simplified. (length, sub, ref, refset, replace, obj_print, obj_pprint): Handle NIL in switch instead of nil test. goto hack removed from refset. * match.c (do_match_line, do_output_line): switch condition simplified. * regex.c (regexp): Simplified. (regex_nfa): Assert condition simplified.
* * regex.c (regsub): the replacement argumentKaz Kylheku2012-03-131-1/+4
| | | | | | can now be a function of one argument which maps the original piece of text matched by the regex to a replacement text.
* Bug #35718. Workaround good enough to get some code working.Kaz Kylheku2012-03-041-1/+1
| | | | | | | | | | | | * eval.c (cons_find): New function. (expand_op): Use cons_find rather than tree_find to look for rest_gensym. * regex.c (regsub): Rearranged arguments so that the string is last. This is better for partial evaluaton via the op operator. * regex.h (regsub): Updated declaration.
* * eval.c (eval_init): New intrinsic function, regsub.Kaz Kylheku2012-03-041-0/+28
| | | | | | | | * regex.c (regsub): New function. * regex.h (regsub): Declared. * txr.1: Doc stub added.
* * arith.c: Updated copyright year.Kaz Kylheku2012-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * arith.h: Likewise. * debug.c: Added copyright header. * debug.h: Updated copyright year. * eval.c: Likewise. * eval.h: Likewise. * filter.c: Likewise. * filter.h: Likewise. * gc.c: Likewise. * gc.h: Likewise. * hash.c: Likewise. * hash.h: Likewise. * lib.c: Likewise. * lib.h: Likewise. * match.c: Likewise. * match.h: Likewise. * parser.h: Likewise. * regex.c: Likewise. * regex.h: Likewise. * stream.c: Likewise. * stream.h: Likewise. * txr.c: Likewise, and e-mail address. * txr.h: Updated copyright year. * unwind.c: Likewise. * unwind.h: Likewise.
* We don't include headers in headers in this project.Kaz Kylheku2011-10-301-0/+1
| | | | | | | | * parser.h: Do not include <stdio.h> * regex.c: Include <limits.h> * regex.h: Do not include <limits.h>
* Improved support for broken unicode.Kaz Kylheku2011-10-101-1/+38
| | | | | | | | | | | | | | | | | | | | | | | | | Regex support for extra-large character sets not compiled in if wchar_t is not wide enough for it. The utf-8 properly throws exceptions when encountering characters that it cannot represent, instead of silently ignoring the situation and continuing with incorrectly computed data. * regex.c (FULL_UNICODE): New macro. (CHAR_SET_L3, CHAR_SET_L2_LO, CHAR_SET_L2_HI): Only defined if full unicde is available. (CHSET_XLARGE, cset_L3_t, struct xlarge_char_set, L2_full, L3_fill_range, L3_contains): Ditto. (unon char_set): Member x1 present only under FULL_UNICODE. (char_set_destroy, char_set_add, char_set_add_range, char_set_contains): CHSET_XLARGE cases only available on FULL_UNICODE. (char_set_compile): Default cst variable to CHSET_LARGE. * utf8.c (FULL_UNICODE): New macro. (conversion_error): New function. (utf8_from_uc): Throw error if not FULL_UNICODE and character is outside the BMP. (utf8_decode): Likewise.
* * LICENSE, Makefile, configure, filter.c, filter.h, gc.c, gc.h, hash.c,Kaz Kylheku2011-10-041-1/+1
| | | | | | hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated e-mail address.
* * LICENSE, Makefile, configure, gc.c, gc.h, hash.c, hash.h, lib.c,Kaz Kylheku2011-09-231-1/+1
| | | | | | lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated copyright year.
* Bump copyrights to 2010.Kaz Kylheku2010-10-051-1/+1
|
* Fix inaccurate comment.Kaz Kylheku2010-01-261-4/+4
|
* Optimization in derivative-based regex engine.Kaz Kylheku2010-01-261-1/+54
| | | | | | | | Exponential memory consumption behavior was observed when matching the input aaaaaa.... against the regex a?a?a?a?....aaaa.... The fix is to eliminate common subexpressions from the derivative for the or operator.
* * regex.c (reg_derivative_list, reg_derivative): RecognitionKaz Kylheku2010-01-181-6/+29
| | | | | | | | | of cases to reduce consing. In reg_derivative_list, we avoid consing the full or expression if either branch is t, and also save a cons when the first element has a null derivative. In reg_derivative the oneplus and zeroplus cases are split, since zeroplus can re-use the input expression, when it's just a one-character match, deriving nil.
* Adjust semantics of non-greedy operator R%S, to avoid the brokenKaz Kylheku2010-01-181-3/+9
| | | | | | | | case whereby R%S matches nothing at all when S is not empty but equivalent to empty, or more generally when S is nullable. A much nicer definition is ``the intersection of R* and the set of all strings that do not contain a non-empty substring that matches S, followed by S''.
* Implemented non-greedy operator.Kaz Kylheku2010-01-151-1/+20
|
* * regex.c (reg_derivative_list): Bugfix: wrong algebra,Kaz Kylheku2010-01-151-1/+1
| | | | taking a double derivative of the first item.
* * regex.c (reg_derivative): Bugfix: remove invalidKaz Kylheku2010-01-141-9/+1
| | | | algebraic reductions in the derivative for the operator.
* Dynamically determine which regex implementation to use:Kaz Kylheku2010-01-131-2/+30
| | | | | | | NFA or derivatives. The default behavior is NFA, with derivatives used if the regular expression contains uses of complement or intersection. The --dv-regex option forces derivatives always.
* Impelement derivative-based regular expressions.Kaz Kylheku2010-01-131-248/+557
|
* Remove incorrect implementation of extendedKaz Kylheku2010-01-061-273/+32
| | | | | regex operations (complement, intersection). The syntax extensions documentation are retained.
* Implemented the regular expression ~ and & operators.Kaz Kylheku2010-01-051-32/+273
| | | | | | | | | | | | | | | This turns out to be easy to do in NFA land. The complement of an NFA has exactly the same number and configuration of states and transitions, except that the states have an inverted meaning; and furthermore, failed character transitions are routed to an extra state (which in this impelmentation is permanently allocated and shared by all regexes). The regex & is implemented trivially using DeMorgan's. Also, bugfix: regular expressions like A|B|C are allowed now by the syntax, rather than constituting syntax error. Previously, this would have been entered as (A|B)|C.
* All COBJ operations have default implementations now;Kaz Kylheku2009-12-081-6/+5
| | | | | | no null pointer check over struct cobj_ops operations. New typechecking function for COBJ objects.
* Eliminate the void * disease. Generic pointers are of mem_t *Kaz Kylheku2009-12-041-1/+1
| | | | | from now on, which is compatible with unsigned char *. No implicit conversion to or from this type, in C or C++.
* Code cleanup. All private functions static. Private stuffKaz Kylheku2009-11-281-36/+136
| | | | in regex module not exposed in header. Etc.