summaryrefslogtreecommitdiffstats
path: root/regex.h
Commit message (Collapse)AuthorAgeFilesLines
* New function rra.Kaz Kylheku2016-10-031-0/+2
| | | | | | | | | * regex.c (range_regex_all, regex_range_all): New functions. (regex_init): Register rra intrinsic function. * regex.c (range_regex_all, regex_range_all): Declared. * txr.1: Documented rra.
* New rr function.Kaz Kylheku2016-10-031-0/+1
| | | | | | | | | | * regex.c (regex_range_search): New function. (regex_init): Register regex_range_search as rr intrinsic. * regex.h (regex_range_search): Declared. * txr.1: Documented rr, and added reference to it in description of regex-range.
* Synchronize license comments with LICENSE.Kaz Kylheku2016-10-011-16/+17
| | | | | | | | | | | | | | | | | | | | * Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/except.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Revert to verbatim 2-Clause BSD.
* New function: regex-source.Kaz Kylheku2016-09-251-0/+1
| | | | | | | | | * regex.c (regex_source): New function. (regex_init): regex-source intrinsic registered. * regex.h (regex_source): Declared. * txr.1: Documented.
* New regex functions: m^$, m^, m$, and others.Kaz Kylheku2016-09-231-0/+9
| | | | | | | | | | | | | | | | | | | * regex.c (do_match_full, do_match_full_offs, do_match_left, do_match_left_offs, do_match_right, do_match_right_offs): New static functions. (regex_match_full_fun, regex_match_right_fun, regex_match_full, regex_match_left, regex_match_right, regex_range_full, regex_range_left, regex_range_right): New functions. (regex_init): Register f^$, f^, f$, m^$, m^, m$, r^$, r^ and r$ intrinsics. * regex.h (regex_match_full_fun, regex_match_right_fun, regex_match_full, regex_match_left, regex_match_right, regex_range_full, regex_range_left, regex_range_right): Declared. * txr.1: Documented new functions.
* Fix match-regex not conforming to documentation.Kaz Kylheku2016-09-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The documentation says that match-regex returns the length. Actually, it returns the position after the last character matched. This makes a difference when the match doesn't begin at character zero. The actual behavior is that of the match_regex C function which has behaved that way since the dawn of TXR, and internals depend on it behaving that way. So the internal function is being retained, and a new function is being registered as the match-regex intrinsic. The choice of binding for match-regex is subject to the compatibility option. The behavior of match-regst is also being fixed since its return value is incorrect due to this issue. Since its return value makes no sense at all (does not represent the matched text), it is not subject to the compatibility option; it is just fixed to conform with the documentation. * regex.c (match_regex_len): New function. (match_regst): Keep using match_regex, but use its return value properly. This simplifies the range extraction code, which is why match_regex works that way in the first place. (regex_init): Register match-regex to match_regex_len, unless compatibility <= 150 is requested; then register to match_regex. * regex.h (match_regex_len): Declared. * txr.1: Compatibility notes added.
* New --free-all option for freeing memory on exit.Kaz Kylheku2016-06-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although we are garbage-collected, being able to clean up on shutdown is nevertheless useful for uncovering leaks. Leaks can occur, for instance, due to neglect to free out-of-heap satellite data from objects that are reclaimed by gc. This feature is long overdue. * arith.c, arith.h (arith_free_all): New function. * gc.c, gc.h (gc_free_all): New function. * lib.c (init): Remove program name parameter and redundant initialization of progname globl variable. * lib.h (progname): Superfluous declaration removed. This is already declared in txr.h. (init): Declaration updated. * regex.c (char_set_destroy): Do not check the static allocation flag here; just destroy the object. Do check for a null pointer, though. (char_set_cobj_destroy): This cobj destructor now checks the static flag of the char set object and avoids freeing it. Thus our char set singletons are left alone by gc, but our global freeing function takes care of them. (wide_cs): New static variable moved out of wide_display_char_p to static scope. (regex_free_all): New function. * regex.h (regex_free_all): Declared. * txr.c (progname): const qualifier and initializer removed. (main): Ensure progname is always dynamically allocated, even in the argv[0] == 0 case. Do not pass progname to init; it doesn't take that argument any more. (free_all): New static function. (txr_main): Implement --free-all option. * txr.h (progname): Declaration updated.
* read-until-match can optionally keep matched text.Kaz Kylheku2016-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | * regex.c (read_until_match): New argument, include_match. Three times repeated termination code refactored into block reached by forward goto. (regex_init): Registration of read-until-match updated. * regex.h (read_until_match): Declaration updated. * stream.c (struct record_adapter_base): New member, include_match. (record_adapter_get_line): Pass match to read_until_match as new argument. (record_adapater): New argument, include_match. (stream_init): Update registration of record-adapter. * stream.h (record_adapter): Declaration updated. * txr.1: Updated.
* Record-delimiting stream adapter.Kaz Kylheku2016-01-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | * regex.c (read_until_match): New function. (regex_init): Registered read-until-match intrinsic. * regex.h (read_until_match): Declared. * stream.c (struct delegate_base): New struct type. (delegate_base_mark, delegate_put_string, delegate_put_char, delegate_put_byte, delegate_get_char, delegate_get_byte, delegate_unget_char, delegate_unget_byte, delegate_close, delegate_flush, delegate_seek, delegate_truncate, delegate_get_prop, delegate_set_prop, delegate_get_error, delegate_get_error_str, delegate_clear_error, make_delegate_stream): New static functions. (struct record_adapter_base): New struct type. (record_adapter_base_mark, record_adapter_mark_op, record_adapter_get_line): New static functions. (record_adapter_ops): New static structure. (record_adapter): New function. (stream_init): Registered record-adapter intrinsic. * stream.h (record_adapter): Declared. * txr.1: Documented read-until-match and record-adapter.
* Copyright year bump.Kaz Kylheku2015-12-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | * LICENSE, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/cadr.tl, share/txr/stdlib/except.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Add 2016 copyright. * linenoise/LICENSE, linenoise/linenoise.c, linenoise/linenoise.h: Bump one principal author's copyright from 2014 to 2015. The code is based on a snapshot of 2015 upstream work.
* Count East Asian Wide and Full Fidth chars as two columns.Kaz Kylheku2015-08-101-1/+1
| | | | | | | | | | | | | | | | | * regex.c (create_wide_cs): New static function. (wide_display_char_p): New function. * regex.h (wide_display_char_p): Declared. * stream.c (put_string, put_char): Use wide_display_char_p to determine whether an extra column need be counted. Also bugfix: iswprint evidently cannot be relied to work over the entire Unicode range, at least not in the C locale. Glibc's version and is reporting valid Japanese characters as unprintable on Ubuntu. As a hack we instead check for control characters and invert the result: control chars are unprintable. * tests/009/json.expected: Updated.
* String-returning wrappers for some regex matching functions.Kaz Kylheku2015-02-201-0/+3
| | | | | | | | | | | * eval.c (eval_init): Register search-regst, match-regst and match-regst-right intrinsics. * regex.c (search_regst, match_regst, match_regst_right): New functions. * regex.h (search_regst, match_regst, match_regst_right): Declared. * txr.1: Documented new variants.
* Update copyright notices from 2014 to 2015.Kaz Kylheku2015-02-011-1/+1
| | | | | | | | | | | * arith.c, arith.h, combi.c, combi.h, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, hash.c, hash.h, lib.c, lib.h, match.c, match.h, parser.h, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, sysif.c, sysif.h, syslog.c, syslog.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Update. * LICENSE, METALICENSE: Likewise.
* * eval.c (eval_init): Register chr_isblank and chr_isunisp asKaz Kylheku2014-10-111-0/+2
| | | | | | | | | | | | | | | | intrinsics. * lib.c (chr_isblank, chr_isunisp): New functions. * lib.h (chr_isblank, chr_isunisp): Declared. * regex.h (spaces): Declaration for existing variable added. * txr.1: Documented chr-isblank and chr-isunisp. * genvim.txr: Add missing sysif.c. * txr.vim: Regenerated.
* * Makefile, arith.c, arith.h, combi.c, combi.h, configure, debug.c,Kaz Kylheku2014-07-231-16/+16
| | | | | | | | debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, hash.c, hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, syslog.c, syslog.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Synchronize license header with LICENSE.
* * eval.c (eval_init): register range_regex and tok_whereKaz Kylheku2014-06-261-0/+1
| | | | | | | | | | | | | | as intrinsics. * lib.c (tok_where): New function. * lib.h (tok_where): Declared. * regex.c (range_regex): New function. * regex.h (range_regex): Declared. * txr.1: Documented tok-where and range-regex.
* * regex.c (match_regex_right): Fix semantics of second argumentKaz Kylheku2014-01-271-1/+1
| | | | | | | | to something more useful. * regex.h (match_regex_right): Change name of parameter. * txr.1: Documented match-regex-right.
* * regex.c (match_regex_right): New function.Kaz Kylheku2014-01-261-0/+1
| | | | | | * regex.h (match_regex_right): Declared. * eval.c (eval_init): Register match_regex_right as instrinsic.
* Bumping copyrights to 2014 and expressing them as year ranges.Kaz Kylheku2013-12-101-1/+1
| | | | Fixing some errors in copyright comments.
* * eval.c (eval_init): Update registration of regex-compileKaz Kylheku2013-12-061-1/+1
| | | | | | | | | | | | | | | | | | | to reflect that it has two arguments now. * parser.y (grammar): Update calls to regex_compile to pass two arguments. Since we don't expect regex_compile to parse, we specify the error stream as nil. (spec): The "secret syntax" for a regex is simplified not to include the slashes. This provides better diagnostics for unterminated syntax and requires less string processing to generate. Also, the form returned doesn't have the regex symbol consed onto it, which parse_regex just throws away. * regex.c (regex_compile): Now takes a stream argument. * regex.h (regex_compile): Declaration updated. * txr.1: Updated
* * regex.c (regex_compile): Handle string input.Kaz Kylheku2013-12-051-1/+1
| | | | | | | * regex.h (regex_compile): Don't call argument regex_sexp, since it can be a string. * txr.1: Updated.
* * parser.y (regtoken): New nonterminal symbol.Kaz Kylheku2012-04-201-1/+0
| | | | | | | | | | | | | | | | (regterm): REGTOKEN production factored out to regtoken. (regclass): Reverted prior commmit's changes. (regclassterm): Reverted prior commit, removing REGTOKEN production for character classes, and introduced a regtoken production. So now the keyword symbols are part of the character class abstract syntax. (regtoken): New production rule. * regex.c (regex_space_chars): Converted to internal linkage. (char_set_compile): Handle token keywords in character class abstract syntax. * regex.h (regex_space_chars): External declaration removed.
* First cut at implementing \s, \d, \w, \S, \D and \W regex tokens.Kaz Kylheku2012-04-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * lib.c (init): Call regex_init. * parser.l: return new REGTOKEN kind. * parser.y (REGTOKEN): New token type. (REGTERM): Translate REGTERM to keyword. (regclass): Restructured to handle inherited nodes as lists. (regclassterm): Produce $$ as list. Add handling for REGTOKEN occurring inside character class by expanding it. This might not be the best approach. (yybadtoken): Handle REGTOKEN in switch. * regex.c (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): New bitfield member, stat. (char_set_create): New parameter for indicating static char set. (char_set_destroy): Do not free a static char set. (char_set_compile): Pass zero to new parameter of char_set_create. (spaces): New static array. (space_cs, digit_cs, word_cs, cspace_cs, cdigit_cs, cword_cs): New static pointers to char_set_t. (init_special_char_sets, nfa_compile_given_set): New static function. (nfa_compile_regex, dv_compile_regex): Handle new character set token keywords. (space_k, digit_k, word_char_k, cspace_k, cdigit_k, cword_char_k, regex_space_chars): New variables. (regex_init): New function. * regex.h (space_k, digit_k, word_char_k, cspace_k, cdigit_k, cword_char_k, regex_space_chars, regex_init): Declared.
* Bug #35718. Workaround good enough to get some code working.Kaz Kylheku2012-03-041-1/+1
| | | | | | | | | | | | * eval.c (cons_find): New function. (expand_op): Use cons_find rather than tree_find to look for rest_gensym. * regex.c (regsub): Rearranged arguments so that the string is last. This is better for partial evaluaton via the op operator. * regex.h (regsub): Updated declaration.
* * eval.c (eval_init): New intrinsic function, regsub.Kaz Kylheku2012-03-041-0/+1
| | | | | | | | * regex.c (regsub): New function. * regex.h (regsub): Declared. * txr.1: Doc stub added.
* * arith.c: Updated copyright year.Kaz Kylheku2012-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * arith.h: Likewise. * debug.c: Added copyright header. * debug.h: Updated copyright year. * eval.c: Likewise. * eval.h: Likewise. * filter.c: Likewise. * filter.h: Likewise. * gc.c: Likewise. * gc.h: Likewise. * hash.c: Likewise. * hash.h: Likewise. * lib.c: Likewise. * lib.h: Likewise. * match.c: Likewise. * match.h: Likewise. * parser.h: Likewise. * regex.c: Likewise. * regex.h: Likewise. * stream.c: Likewise. * stream.h: Likewise. * txr.c: Likewise, and e-mail address. * txr.h: Updated copyright year. * unwind.c: Likewise. * unwind.h: Likewise.
* We don't include headers in headers in this project.Kaz Kylheku2011-10-301-2/+0
| | | | | | | | * parser.h: Do not include <stdio.h> * regex.c: Include <limits.h> * regex.h: Do not include <limits.h>
* * LICENSE, Makefile, configure, filter.c, filter.h, gc.c, gc.h, hash.c,Kaz Kylheku2011-10-041-1/+1
| | | | | | hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated e-mail address.
* * LICENSE, Makefile, configure, gc.c, gc.h, hash.c, hash.h, lib.c,Kaz Kylheku2011-09-231-1/+1
| | | | | | lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated copyright year.
* Bump copyrights to 2010.Kaz Kylheku2010-10-051-1/+1
|
* Impelement derivative-based regular expressions.Kaz Kylheku2010-01-131-22/+0
|
* Code cleanup. All private functions static. Private stuffKaz Kylheku2009-11-281-115/+1
| | | | in regex module not exposed in header. Etc.
* Changes to make the code portable to C++ compilers, whichKaz Kylheku2009-11-241-5/+5
| | | | can be taken advantage of for better diagnostics.
* Improving portability. It is no longer assumed that pointersKaz Kylheku2009-11-231-4/+4
| | | | | | | | can be converted to a type long and vice versa. The configure script tries to detect the appropriate type to use. Also, some run-time checking is performed in the streams module to detect which conversions specifier strings to use for printing numbers.
* Changing ``obj_t *'' occurences to a ``val'' typedef. (Ideally,Kaz Kylheku2009-11-201-7/+6
| | | | | we wouldn't have to declare object variables at all, so why use an obtuse syntax to do so?)
* Fixes for compliance to C89.Kaz Kylheku2009-11-171-10/+10
|
* Regular expression module updated to do unicode character sets.Kaz Kylheku2009-11-121-12/+57
| | | | | | | | | | | Most of the changes are in the area of representing sets. Also, a bug was found in the compilation of regex character sets: ranges straddling two adjacent blocks of 32 characters were not being added to the character set. However, ranges falling within a single 32 block, or spanning three or more such blocks, worked properly. This bug is not tickled by common ranges such as A-Z, or 0-9, which land within a 32 block.
* Big conversion to wide characters and UTF-8 support.Kaz Kylheku2009-11-111-1/+1
| | | | | | | | | This is incomplete. There are too many dependencies on wide character support from the C stream I/O library, and implicit use of some encoding which may not be UTF-8. The regex code does not handle wide characters properly. Character type is still int in some places, rather than wchar_t. Test suite passes though.
* Version 019txr-019Kaz Kylheku2009-11-031-3/+3
| | | | | | Regexps can be bound to variables. New freeform directive.
* Got regex working over lazy strings from freeform.Kaz Kylheku2009-11-021-2/+5
| | | | Bugfixes.
* Start of implementation for freestyle matching.Kaz Kylheku2009-11-021-4/+17
| | | | | | | | | | | Lazy strings implemented, incompletely. Changed string function to implicitly strdup; non-strdup version changed to string_own. Fixed wrong uses of strdup rather than chk_strdup. Functions added to regex module to provide regex matching as a state machine to which characters are fed.
* Trivial change allows regexps to be bound to variables,Kaz Kylheku2009-10-301-0/+1
| | | | | and used for matching. This Just Works because of the way match_line treats variables.
* txr-011 2009-09-25txr-011Kaz Kylheku2017-07-311-0/+107