summaryrefslogtreecommitdiffstats
path: root/regex.c
Commit message (Collapse)AuthorAgeFilesLines
* Copyright year bump 2023.Kaz Kylheku2023-01-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, alloca.h, args.c, args.h, arith.c, arith.h, autoload.c, autoload.h, buf.c, buf.h, cadr.c, cadr.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, gzio.c, gzio.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lex.yy.c.shipped, lib.c, lib.h, linenoise/linenoise.c, linenoise/linenoise.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, psquare.h, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, socket.c, socket.h, stdlib/arith-each.tl, stdlib/asm.tl, stdlib/awk.tl, stdlib/build.tl, stdlib/cadr.tl, stdlib/compiler.tl, stdlib/constfun.tl, stdlib/conv.tl, stdlib/copy-file.tl, stdlib/debugger.tl, stdlib/defset.tl, stdlib/doloop.tl, stdlib/each-prod.tl, stdlib/error.tl, stdlib/except.tl, stdlib/ffi.tl, stdlib/getopts.tl, stdlib/getput.tl, stdlib/hash.tl, stdlib/ifa.tl, stdlib/keyparams.tl, stdlib/match.tl, stdlib/op.tl, stdlib/optimize.tl, stdlib/package.tl, stdlib/param.tl, stdlib/path-test.tl, stdlib/pic.tl, stdlib/place.tl, stdlib/pmac.tl, stdlib/quips.tl, stdlib/save-exe.tl, stdlib/socket.tl, stdlib/stream-wrap.tl, stdlib/struct.tl, stdlib/tagbody.tl, stdlib/termios.tl, stdlib/trace.tl, stdlib/txr-case.tl, stdlib/type.tl, stdlib/vm-param.tl, stdlib/with-resources.tl, stdlib/with-stream.tl, stdlib/yield.tl, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr, y.tab.c.shipped: Copyright year bumped to 2023.
* regsub: allow string in place of regex.Kaz Kylheku2022-07-271-1/+12
| | | | | | * regex.c (regsub): Use search_str if regex is a string. * txr.1: Documented.
* regsub: avoid consing list.Kaz Kylheku2022-07-271-10/+10
| | | | | | * regex.c (regsub): Accumulate output string directly using string_extend, rather than accumulating a list of pieces which are catenated with cat_str.
* Fix some issues found by ubsan.Kaz Kylheku2022-02-141-1/+2
| | | | | | | | | | | | | | | | * lib.h (num_fast): Under HAVE_UBSAN, avoid left shifting a negative value, which is diagnosed by ubsan (by default), but is otherwise a documented GCC extension that behaves sanely in the manner one would expect on two's complement machines. * mpi/mpi.c (s_mp_mul_2d): Fix instance of undefined shift equal to the width of the type. This occurs in the case when d == 0. In that case, we try to calculate a 32 or 64 bit full mask by shifting 1 to the left that many times and then subtracting 1. However, the entire code for the fractional shift is not necessary in that case and may be skipped. I'm also removing the unnecesary s_mp_clamp call. If the number is clamped on entry into the function, it stays clamped.
* Few adjustments to no-implicit-conversion patch.Kaz Kylheku2022-02-141-2/+2
| | | | | | | | | | | | | | | | | | | | * lib.h (c_u): New inline function: unsafe conversion to ucnum, analogous to c_n for cnum. * hash.c (equal_hash, hash_iter_init): Use UINT_PTR_MAX instead of convert(ucnum, -1). (eql_hash): mp_hash returns unsigned long, so shouldn't require a cast to go to the uint_ptr_t. The types are of the same size, or at worst it is a widening. Also replace convert(ucnum, -1) by UINT_PTR_MAX here. Combining the TAG_CHR and TAG_NUM cases, and using c_u, which is more efficient since c_chr and c_num are non-inlined functions which redundantly check type. We no longer need a self variable in this function. (eq_hash): Same TAG_CHR and TAG_NUM changes as eql_hash. * regex.c (char_set_add): Reformat change to avoid line break across assignment.
* Fix various instances of implicit conversions.Paul A. Patience2022-02-141-1/+2
| | | | | | | | | | | | | | | | | | | | | The implicit conversions were discovered with Clang's UBSan (with the -fsanitizer=implicit-conversion option). * gc.c (sweep_one): Convert only the inverted REACHABLE, since block->t.type is already of the right type. * hash.c (eql_hash, eq_hash, hash_iter_init, us_hash_iter_init): Explicitly convert to ucnum. * linenoise/linenoise.c (enable_raw_mode): Explicitly convert the inverted flag sets to tcflag_t. * mpi/mpi.c (mp_set_uintptr): Explicitly convert to uint_ptr_t. * regex.c (char_set_add): Explicitly convert to bitcell_t. * struct.c (struct_inst_hash): Correct type of hash from cnum to ucnum.
* Copyright year bump 2022.Kaz Kylheku2022-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | *LICENSE, LICENSE-CYG, METALICENSE, Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lex.yy.c.shipped, lib.c, lib.h, linenoise/linenoise.c, linenoise/linenoise.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, psquare.h, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, socket.c, socket.h, stdlib/arith-each.tl, stdlib/asm.tl, stdlib/awk.tl, stdlib/build.tl, stdlib/cadr.tl, stdlib/compiler.tl, stdlib/constfun.tl, stdlib/conv.tl, stdlib/copy-file.tl, stdlib/debugger.tl, stdlib/defset.tl, stdlib/doloop.tl, stdlib/each-prod.tl, stdlib/error.tl, stdlib/except.tl, stdlib/ffi.tl, stdlib/getopts.tl, stdlib/getput.tl, stdlib/hash.tl, stdlib/ifa.tl, stdlib/keyparams.tl, stdlib/match.tl, stdlib/op.tl, stdlib/optimize.tl, stdlib/package.tl, stdlib/param.tl, stdlib/path-test.tl, stdlib/pic.tl, stdlib/place.tl, stdlib/pmac.tl, stdlib/quips.tl, stdlib/save-exe.tl, stdlib/socket.tl, stdlib/stream-wrap.tl, stdlib/struct.tl, stdlib/tagbody.tl, stdlib/termios.tl, stdlib/trace.tl, stdlib/txr-case.tl, stdlib/type.tl, stdlib/vm-param.tl, stdlib/with-resources.tl, stdlib/with-stream.tl, stdlib/yield.tl, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr, y.tab.c.shipped: Copyright year bumped to 2022.
* Eliminate declaration-after-statement everywhere.Kaz Kylheku2021-12-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of -ansi doesn't by itself diagnose instances of some constructs we don't want in the project, like mixed declarations and statements. * configure (diag_flags): Add -Werror=declaration-after-statement. This is C only, so filter it out for C++. Also add -Werror=vla. * HACKING: Update inaccurate statements about what dialect we are using. TXR isn't pure C90: some GCC extensions are used. We even use long long if the configure script detects it as working, and some C99 library features. * buf.c (replace_buf, buf_list): Fix by reordering. * eval.c (op_dohash, op_load_time_lit): Fix by reordering. * ffi.c (ffi_simple_release): Fix by reordering. (align_sw_get): Fix empty macro to expand to dummy declaration so a semicolon after it isn't interpreted as a statement. On platforms with alignment, remove a semicolon from the macro so that it requires one. (ffi_i8_put, ffi_u8_put): Fix by reordering. * gc.c (gc_init): Fix with extra braces. * hash.c (hash_init): Fix by reordering. * lib.c (list_collect_revappend, sub_iter, replace_str, replace_vec, mapcar_listout, mappend, mapdo, window_map_list, subst): Fix by reordering. (gensym, find, rfind, pos, rpos, in, search_common): Fix by renaming optional argument and using declaration instead of assignment. * linenoise/linenoise.c (edit_in_editor): Fix by reordering. * parser.c (is_balanced_line): Fix by reordering. * regex.c (nfa_count_one, print_rec): Fix by reordering. * signal.c (sig_mask): Fix by reordering. * stream.c (get_string): Fix by renaming optional argument and using declaration instead of assignment. * struct.c (lookup_static_slot_desc): Fix by turning mutated variable into block local. (umethod_args_fun): Fix by reordering. (get_special_slot): Fix by new scope via braces. * sysif.c (usleep_wrap): Fix by new scope via braces. (setrlimit_wrap): Fix by new scope via braces. * time.c (time_string_meth, time_parse_meth): Fix by reordering. * tree.c (tr_do_delete_spec): Fix by new scope via braces. * unwind.h (uw_block_beg): New macro which doesn't define RESULTVAR but expects it to refers to an existing one. (uw_block_begin): Replace do while (0) with enum trick so that we have a declaration that requires a semicolon, rather than a statement, allowing declarations to follow. (uw_match_env_begin): Now opens a scope and features the same enum trick as in uw_block_begin. This fixes a declaration-follows-statement issue in the v_output function in match.c. (uw_match_env_end): Closes scope opened by uw_match_env_begin. * unwind.c (revive_cont): Fix by introducing variable, and using new uw_block_beg macro. * vm.c (vm_execute_closure): Fix using combination of local variable and reordering.
* string-extend: third optional argument.Kaz Kylheku2021-09-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A Boolean optional argument to string-extend indicates whether this is likely the last call to string-extend, so memory can be trimmed accordingly. * eval.c (eval_init): Update string-extend registration. * filter.c (trie_filter_string): Pass nil for new argument of string_extend. * lib.c (str_seq, replace_str, lazy_str_force, lazy_str_force_upto): Pass nil for new argument of string_extend. (rem_impl, remove_if, separate): Pass t for new argument of string_extend on last iteration, nil otherwise. (string_extend): Implement new third argument, defaulted to nil. Switch from chk_grow_vec to the more specific chk_wrealloc, which simplifies the code. * lib.h (string_extend): Declaration updated. * parser.y (litchars): Pass t as last argument of string_extend since we know syntactically that these reductions finalize the string. (restlitchar): Pass nil as the last argument of string_extend, since we know syntactically that it isn't the last. * regex.c (scan_until_common): Pass nil for new argument of string_extend. * txr.1: Documented.
* license: reformat to fit 80 columns.Kaz Kylheku2021-08-161-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, socket.c, socket.h, stdlib/asm.tl, stdlib/awk.tl, stdlib/build.tl, stdlib/compiler.tl, stdlib/constfun.tl, stdlib/conv.tl, stdlib/copy-file.tl, stdlib/debugger.tl, stdlib/defset.tl, stdlib/doloop.tl, stdlib/each-prod.tl, stdlib/error.tl, stdlib/except.tl, stdlib/ffi.tl, stdlib/getopts.tl, stdlib/getput.tl, stdlib/hash.tl, stdlib/ifa.tl, stdlib/keyparams.tl, stdlib/match.tl, stdlib/op.tl, stdlib/optimize.tl, stdlib/package.tl, stdlib/param.tl, stdlib/path-test.tl, stdlib/pic.tl, stdlib/place.tl, stdlib/pmac.tl, stdlib/quips.tl, stdlib/save-exe.tl, stdlib/socket.tl, stdlib/stream-wrap.tl, stdlib/struct.tl, stdlib/tagbody.tl, stdlib/termios.tl, stdlib/trace.tl, stdlib/txr-case.tl, stdlib/type.tl, stdlib/vm-param.tl, stdlib/with-resources.tl, stdlib/with-stream.tl, stdlib/yield.tl, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h: License reformatted. * lex.yy.c.shipped, y.tab.c.shipped, y.tab.h.shipped: Updated.
* compat: fix glaringly broken init-time handling.Kaz Kylheku2021-07-211-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are doing numerous compat_ver checks in various init functions, to enact alternative symbol registrations. Only problem is, compat_ver is always zero during initialization; it is not set until the -C option is processed in txr_main. Registrations must be fixed up after initialization; that's what the compat_fixup mechanism is for. This is an long-standing problem which affects compatibility operation going back over 150 versions. * arith.c (arith_init): Move compat logic to arith_compat_fixup. (arith_compat_fixup): New function. * arith.h (arith_compat_fixup): Declared. * eval.c (eval_init): Move compat logic to eval_compat_fixup. * ffi.c (ffi_init): Move compat logic to ffi_compat_fixup. (ffi_compat_fixup): New function. * ffi.h (ffi_compat_fixup): Declared. * regex.c (regex_init): Move compat logic to regex_compat_fixup. (regex_compat_fixup): New function. * regex.h (regex_compat_fixup): Declared. * stream.c (stream_init): Move compat logic to stream_compat_fixup. (stream_compat_fixup): New function. * stream.h (stream_compat_fixup): Declared. * struct.c (struct_init): Move compat logic to struct_compat_fixup. (struct_compat_fixup): New function. * struct.h (stream_compat_fixup): Declared. * lib.c (compat_fixup): Call arith_compat_fixup, ffi_compat_fixup, regex_compat_fixup, stream_compat_fixup and struct_compat_fixup.
* type: disallow structs using built-in type names.Kaz Kylheku2021-07-081-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a big commit motivated by the need to clean up the situation with built-in type symbols, COBJ objects and structs. The struct type system allows struct types to be defined for symbols like regex or str, which are used by built-in or cobj types. This is a bad thing. What is worse, structure instances are COBJ types which identify their type using the COBJ class symbol mechanism. There are places in the C implementation which assume that when a COBJ has a certain class symbol, it is of a certain expected type, which is totally different from and incompatible form a struct instance. User code can define a structure object which will fool that code. There are multiple things going on in this patch. The major theme is that the COBJ representation is changing. Instead of a class symbol, COBJ instances now carry a "struct cobj_class *" pointer. This pointer is obtained by registration via the cobj_register function. All modules must register their class symbols to obtain these class handles, which are then used in cobj() calls for instantiation. The CPTR type was identical to COBJ until now, except for the type tag. This is changing; CPTR objects will keep the old representation with the class symbol. commit 20fdfc6008297001491308849c17498c006fe7b4 Author: Kaz Kylheku <kaz@kylheku.com> Date: Thu Jul 8 19:17:39 2021 -0700 * ffi.h (carray_cls): Declared. * hash.h (hash_cls): Declared. (hash_early_init): Declared. * lib.h (struct cobj_class): New struct. (struct cobj): cls member changing to struct cobj_class *. (struct cptr): New struct, same as previous struct cobj. (union obj): New member cp of type struct cptr, for CPTR. (builtin_type): Declared. (class_check): Declaration moved closer to COBJ-related functions and updated. (cobj_register, cobj_register_super, cobj_class_exists): New functions declared. (cobjclassp, cobj_handle, cobj_ops): Declarations updated. * parser.h (parser_cls): Declared. * rand.h (random_state_cls): Declared. * regex.h (regex_cls): Declared. * stream.h (stream_cls, stdio_stream_cls): Declared. * struct.h (struct_cls): Declared. * tree.h (tree_cls, tree_iter_cls): Declared. * vm.h (vm_desc_cls): Declared. * buf.c (buf_strm, make_buf_stream): Pass stream_cls functions instead of stream_s class symbol. * chksum.c (sha256_ctx_cls, md5_ctx_cls): New static class handles. (sha256_begin, sha256_hash, sha256_end, md5_begin, md5_hash, md5_end): Pass class handles to instead of class symbols. (chksum_init): Initialize class handle variables. * ffi.c (ffi_type_cls, ffi_call_desc_cls, ffi_closure_cls, union_cls): New static class handles. (carray_cls): New global variable. (ffi_type_struct_checked, ffi_type_print_op, ffi_closure_struct_checked, ffi_closure_print_op, make_ffi_type_builtin, make_ffi_type_pointer, make_ffi_type_struct, make_ffi_type_union, make_ffi_type_array, make_ffi_type_enum, ffi_call_desc_checked, ffi_call_desc_print_op, ffi_make_call_desc, ffi_make_closure, carray_struct_checked, carray_print_op, make_carray, cptr_getobj, cptr_out, uni_struct_checked, make_union_common): Pass class handles instead of class symbols. (ffi_init): Initialize class handle variables. * filter.c (regex_from_trie): Use hash_cls class handle instead of hash_s. * gc.c (mark_obj): Split COBJ and CPTR cases since the representation is different. * hash.c (hash_cls, hash_iter_cls): New class handles. (make_similar_hash, copy_hash, gethash_c, gethash_e, remhash, clearhash, hash_count, get_hash_userdata, set_hash_userdata, hashp, hash_iter_init, hash_begin, hash_next, hash_peek, hash_reset, hash_reset, hash_uni, hash_diff, hash_symdiff, hash_isec): Pass class handles instead of class symbols. (hash_early_init): New function. (hash_init): Set the class symbols in the class handles that were created in hash_early_init at a time when these symbols did not exist. * lib.c (nelem): New macro. (cobj_class): New static array. (cobj_ptr): New static pointer. (cobj_hash): New static hash. (seq_iter_cls): New static class handle. (builtin_type_p): New function. (typeof): Struct instances now all carry the same symbol, struct, as their COBJ class symbol. To get their type, we must call struct_type_name. (subtypep): Rearrangement of two cases: let's make the reflexive case first. Adjust code for different location of COBJ class symbol. (seq_iter_init_with_info, seq_begin, seq_next, seq_reset, iter_begin, iter_more, iter_item, iter_step, iter_reset, make_like, list_collect, do_generic_funcall): Use class handles instead of class symbols. (class_check, cobj, cobjclassp, cobj_handle, cobj_ops): Take class handle argument instead of class symbol. (cobj_register, cobj_register_super, cobj_class_exists): New functions. (cobj_populate_hash): New static function. (cobj_print_op): Adjust for different location of class (cptr_print_op, cptr_typed, cptr_type, cptr_handle, cptr_get): cptr functions now refer to obj->cp rather than obj->co. (copy, length, sub, ref, refset, replace, dwim_set, dwim_del, obj_print): Use class handles for various COBJ types rather than class symbols. (obj_init): gc-protect cobj_hash. Initialize seq_iter_cls class symbol and cobj_hash. Populate cobj_hash as the last initialization step. (init): Call hash_early_init immediately after gc_init. diff --git a/lib.c b/lib.c * match.c (do_match_line): Refer to regex_cls class handle instead of regex_s.. * parser.c (parser_cls): New global class handle. (parse, parser_get_impl, lisp_parse_impl, txr_parse, parser_errors): Use class handles instead of class symbols. (parse_init): Initialize parser_cls. * rand.c (random_state_cls): New global class handle. (make_state, random_state_p, make_random_state, random_state_get_vec, random_fixnum, random_float, random): Use class handles instead of class symbols. (rand_init): Initialize random_state_cls. * regex.c (regex_cls): New global class handle. (chset_cls): New static class handle. (reg_compile_csets, reg_derivative, regex_compile, regexp, regex_source, regex_print, regex_run, regex_machine_init): Use class handles instead of class symbols. (regex_init): Initialize regex_cls and chset_cls. * socket.c (make_dgram_sock_stream): Use stream_cls class symbol instead of stream_s. * stream.c (stream_cls, stdio_stream_cls): New class handles. (make_null_stream, stdio_get_fd, make_stdio_stream_common, stream_fd, sock_family, sock_type, sock_peer, sock_set_peer, make_dir_stream, make_string_input_stream, make_string_byte_input_stream, make_strlist_input_stream, make_string_output_stream, make_strlist_output_stream, get_list_from_stream, make_catenated_stream, make_delegate_stream, make_delegate_stream, stream_set_prop, stream_get_prop, close_stream, get_error, get_error_str, clear_error, get_line, get_char, get_byte, get_bytes, unget_char, unget_byte, put_buf, fill_buf, fill_buf_adjust, get_line_as_buf, format, put_string, put_char, put_byte, flush_stream, seek_stream, truncate_stream, get_indent_mode, test_set_indent_mode, test_neq_set_indent_mode, set_indent_mode, get_indent, set_indent, inc_indent, width_check, force_break, set_max_length, set_max_depth): Use class handle instead of symbol. (stream_init): Initialize stream_cls and stdio_stream_cls. * struct.c (struct_type_cls, struct_cls): New class handles. (struct_init): Initialize struct_type_cls and struct_cls. (struct_handle): Static function moved to avoid forward declaration. (stype_handle): Refer to struct_type_cls class handle instead of struct_type_s symbol. Handle instance objects in addition to types. (make_struct_type): Throw error if a built-in type is being defined as a struct type. Refer to class handle instead of class symbol. (find_struct_type, allocate_struct, make_struct_impl, make_lazy_struct, copy_struct): Refer to class handle instead of class symbol. * strudel.c (make_struct_delegate_stream): Refer to stream_cls class handle instead of stream_s symbol. * sysif.c (dir_cls): New class handle. (poll_wrap): Use typep instead of subtypep, eliminating access to class symbol. (opendir_wrap, closedir_wrap, readdir_wrap): Use class handles instead of class symbols. (sysif_init): Initialize dir_cls. * syslog.c (make_syslog_stream): Refer to stream_cls class handle instead of stream_s symbol. * tree.c (tree_cls, tree_iter_cls): New class handles. (tree_insert_node, tree_lookup_node, tree_delete_node, tree_root, tree_equal_op, tree, copy_search_tree, make_similar_tree, treep, tree_begin, copy_tree_iter, replace_tree_iter, tree_reset, tree_next, tree_peek, tree_clear): Use class handle instead of class symbol. (tree_init): Initialize tree_cls and tree_iter_cls. * unwind.c (sys_cont_cls): New static class handle. (revive_cont, capture_cont): Use class handle instead of class symbol. (uw_late_init): Initialize sys_cont_cls. * vm.c (vm_desc_cls): New global class handle. (vm_closure_cls): New static class handle. (vm_desc_struct, vm_make_desc, vm_closure_struct, vm_make_closure, vm_copy_closure): Use class handle instead of class symbol. (vm_init): Initialize vm_desc_cls and vm_closure_cls.
* regex-compile: argument defaulting issue.Kaz Kylheku2021-07-051-1/+1
| | | | | | | | | | | | | * regex.c (regex_compile): In the string case when regex_parse needs to be called, redundant argument defaulting is being performed on error_stream, defaulting it to nil. The regex_parse function doesn't like that; it no longer accepts nil as a facsimile of missing. The right thing to do here is nothing: just pass error_stream to regex_parse as-is and let it do the defaulting. And that is how the code was initially until commit a8b0e36b1760e51a8a3a25d4e22a325e407ef3d4 in March of 2014, which introduced the redundant defaulting. Reported by Paul A. Patience
* regex: bugfix: print/read consistency of n-ary ops.Kaz Kylheku2021-06-271-2/+4
| | | | | | | | | | | | | | | | | | The regex-compile function accepts syntax in which certain operators like (or ...) can be n-ary. This representation is converted to the strictly binary form that is understood by the compiler internals (and which regex-parse outputs), as well as the regex printer. Unfrotunately, regex-compile is attaching the original form with the n-ary operators to the regex object, and the printer does not reat this; it renders the syntax with portions missing. It needs the binary form. * regex.c (regex_compile): Capture the intermediate result from calling reg_nary_to_bin, and use that as regex_source. The pass that through the remaining two optimization passes to obtain regex_sexp which is compiled.
* regex: exposing optimization pass a regex-optimizeKaz Kylheku2021-06-271-2/+7
| | | | | | | | | | | | * regex.c (regex_optimize): New static function, capturing the three optimization passes. (regex_compile): Code moved into regex_optimize. (regex_init): Remove sys:reg-optimize function. Register regex-optimize. * txr.1: Documented. * stdlib/doc-syms.tl: Updated.
* c_str now takes a self argument.Kaz Kylheku2021-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding a self parameter to c_str so that when a non-string occurs, the error is reported against a function. Legend: A - Pass existing self to c_str. B - Define self and pass to c_str and possibly other functions. C - Take new self parameter and pass to c_str and possibly other functions. D - Pass existing self to c_str and/or other functions. E - Define self and pass to other functions, not c_str. X - Pass nil to c_str. * buf.c (buf_strm_put_string, buf_str): B. * chksum.c (sha256_str, md5_str): C. (sha256_hash, md5_hash): D. * eval.c (load): D. * ffi.c (ffi_varray_dynsize, ffi_str_put, ffi_wstr_put, ffi_bstr_put): A. (ffi_char_array_put, ffi_wchar_array_put): C. (ffi_bchar_array_put): A. (ffi_array_put, ffi_array_out, ffi_varray_put): D. * ftw.c (ftw_wrap): A. * glob.c (glob_wrap): A. * lib.c (copy_str, length_str, coded_length,split_str_set, list_str, cmp_str, num_str, out_json_str, out_json_rec, display_width): B. (upcase_str, downcase_str, string_extend, search_str, do_match_str, do_rmatch_str, sub_str, replace_str, cat_str_append, split_str_keep, trim_str, int_str, chr_str, span_str, compl_span_str, break_str, length_str_gt, length_str_ge, length_str_lt, length_str_le, find, rfind, pos, rpos, mismatch, rmismatch): A. (c_str): Add self parameter and use in type mismatch diagnostic. If the parameter is nil, use "internal error". (flo_str): B, and correction to "flot-str" typo. (out_lazy_str, out_quasi_str, obj_print_impl): D. * lib.h (c_str): Declaration updated. * match.c (dump_var): X. (v_load): D. * parser.c (open_txr_file): C. (load_rcfile): E. (find_matching_syms, provide_atom): X. (hist_save, repl): B. * parser.h (open_txr_file): Declaration updated. * parser.y (chrlit): X. * regex.c (search_regex): A. * socket.c (getaddrinfo_wrap, sockaddr_pack): A. (dgram_put_string): B. (open_sockfd): D. (sock_connect): E. * stream.c (stdio_put_string, tail_strategy, vformat_str, open_directory, open_file, open_tail, remove_path, rename_path, tmpfile_wrap, mkdtemp_wrap, mkstemp_wrap): B. (do_parse_mode, parse_mode, make_string_byte_input_stream): B. (normalize_mode, normalize_mode_no_bin): E. (string_out_put_string, formatv, put_string, open_fileno, open_subprocess, open_command, base_name, dir_name, short_suffix, long_suffix): A. (run): D. (win_escape_cmd, win_escape_arg): X. * stream.h (parse_mode, normalize_mode, normalize_mode_no_bin): Declarations updated. * sysif.c (mkdir_wrap, do_utimes, dlopen_wrap, dlsym_wrap, dlvsym_wrap): A. (do_stat, do_lstat): C. (mkdir_nothrow_exists, ensure_dir): E. (chdir_wrap, rmdir_wrap, mkfifo_wrap, chmod_wrap, symlink_wrap, link_wrap, readlink_wrap, exec_wrap, getenv_wrap, setenv_wrap, unsetenv_wrap, getpwnam_wrap, getgrnam_wrap, crypt_wrap, fnmatch_wrap, realpath_wrap, opendir_wrap): B. (stat_impl): statfn pointer-to-function argument now takes self parameter. When calling it, we pass name. * syslog.c (openlog_wrap, syslog_wrapv): A. * time.c (time_string_local, time_string_utc, time_string_meth, time_parse_meth): A. (strptime_wrap): B. * txr.c (txr_main): D. * y.tab.c.shipped: Updated.
* regex: regsub wrongly destructive.Kaz Kylheku2021-04-131-3/+4
| | | | | * regex.c (regsub): When the regex argument is actually a function, we must copy the string, because replace_str is destructive.
* Copyright year bump 2021.Kaz Kylheku2021-01-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * METALICENSE: 2020 copyrights bumped to 2021. Added note about SHA-256 routines from Colin Percival. * LICENSE, LICENSE-CYG, Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lex.yy.c.shipped, lib.c, lib.h, linenoise/linenoise.c, linenoise/linenoise.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/asm.tl, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/compiler.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/copy-file.tl, share/txr/stdlib/debugger.tl, share/txr/stdlib/defset.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/each-prod.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/param.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/quips.tl, share/txr/stdlib/save-exe.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/trace.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/vm-param.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, time.c, time.h, tree.c, tree.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr, y.tab.c.shipped: Copyright year bumped to 2021.
* New functions trim-left and trim-right.Kaz Kylheku2020-10-051-0/+28
| | | | | | | | | * regex.c (trim_left, trim_right): New static functions. (regex_init): New intrinsics registered. * tests/015/trim.tl, tests/015/trim.expected: New files. * txr.1: Documented.
* c_num: now takes self argument.Kaz Kylheku2020-06-291-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The c_num and c_unum functions now take a self argument for identifying the calling function. This requires changes in a large number of places. In a few places, additional functions acquire a self argument. The ffi module has the most extensive example of this. Some functions mention their name in a larger string, or have scattered literals giving their name; with the introduction of the self local variable, these are replaced by references to self. In the following changelog, the notation TS stands for "take self argument", meaning that the functions acquires a new "val self" argument. The notation DS means "define self": the functions in question defines a self variable, which they pass down. The notation PS means that the functions pass down an existing self variable to functions that now require it. * args.h (args_count): TS. * arith.c (c_unum, c_num): TS. (toint, exptv): DS. * buf.c (buf_check_len, buf_check_alloc_size, buf_check_index, buf_do_set_len, replace_buf, buf_put_buf, buf_put_i8, buf_put_u8, buf_put_char, buf_put_uchar, buf_get_bytes, buf_get_i8, buf_get_u8, buf_get_cptr, buf_strm_get_byte_callback, buf_strm_unget_byte, buf_swap32, str_buf, buf_int, buf_uint, int_buf, uint_buf): PS. (make_duplicate_buf, buf_shrink, sub_buf, buf_print, buf_pprint): DS. * chskum.c (sha256_stream_impl, sha256_buf, crc32_buf, md5_stream_impl, md5_buf): TS. (chksum_ensure_buf, sha256_stream, sha256, sha256_hash, md5_stream, md5, md5_hash): PS. (crc32_stream): DS. * combi.c (perm_while_fun, perm_gen_fun_common, perm_str_gen_fun, rperm_gen_fun, comb_vec_gen_fun, comb_str_gen_fun, rcomb_vec_gen_fun, rcomb_str_gen_fun): DS. * diff.c (dbg_clear, dbg_set, dbg_restore): DS. * eval.c (do_eval, gather_free_refs, maprodv, maprendv, maprodo, do_args_apf, do_args_ipf): DS. (op_dwim, me_op, map_common): PS. (prod_common): TS. * ffi.c (struct txr_ffi_type): release member TS. (make_ffi_type_pointer): PS and release argument TS. (ffi_varray_dynsize, ffi_array_in, ffi_array_put_common, ffi_array_get_common, ffi_varray_in, ffi_varray_null_term): PS. (ffi_simple_release, ffi_ptr_in_release, ffi_struct_release, ffi_wchar_array_get, ffi_array_release_common, ffi_array_release, ffi_varray_release): TS. (ffi_float_put, double_put, ffi_be_i16_put, ffi_be_u16_put, ffi_le_i16_put, ffi_le_u16_put, ffi_be_i32_put, ffi_be_u32_put, ffi_le_i32_put, ffi_sbit_put, ffi_ubit_put, ffi_buf_d_put, make_ffi_type_array, make_ffi_type_enum, ffi_type_compile, make_ffi_type_desc, ffi_make_call_desc, ffi_call_wrap, ffi_closure_dispatch_save, ffi_put_into, ffi_in, ffi_get, ffi_put, carray_set_length, carray_blank, carray_buf, carray_buf_sync, carray_cptr, carray_refset, carray_sub, carray_replace, carray_uint, carray_int): PS. (carray_vec, carray_list): DS. * filter.c (url_encode, url_decode, base64_stream_enc_impl): DS. * ftw.c (ftw_callback, ftw_wrap): DS. * gc.c (mark_obj, gc_set_delta): DS. * glob.c (glob_wrap): DS. * hash.c (equal_hash, eql_hash, eq_hash, do_make_hash, hash_equal, set_hash_traversal_limit, gen_hash_seed): DS. * itypes.c (c_i8, c_u8, c_i16, c_u16, c_i32, c_u32, c_i64, c_u64, c_short, c_ushort, c_int, c_uint, c_long, c_ulong): PS. * lib.c (seq_iter_rewind): TS and becomes internal. (seq_iter_init_with_info, seq_setpos, replace_str, less, replace_vec, diff, isec, obj_print_impl): PS. (nthcdr, equal, mkstring, mkustring, upcase_str, downcase_str, search_str, sub_str, cat_str, scat2, scat3, fmt_join, split_str_keep, split_str_set, trim_str, int_str, chr_int, chr_str, chr_str_set, vector, vecref, vecref_l, list_vec, copy_vec, sub_vec, cat_vec, lazy_str_put, lazy_str_gt, length_str_ge, length_str_lt, length_str_le, cptr_size_hint, cptr_int, out_lazy_str, out_quasi_str, time_string_local_time, time_string_utc, time_fields_local_time, time_fields_utc, time_struct_local, time_struct_utc, make_time, time_meth, time_parse_meth): DS. (init_str, cat_str_init, cat_str_measure, cat_str_append, vscat, time_fields_to_tm, time_struct_to_tm, make_time_impl): TS. * lib.h (seq_iter_rewind): Declaration removed. (c_num, c_unum, init_str): Declarations updated. * match.c (LOG_MISMATCH, LOG_MATCH): PS. (h_skip, h_coll, do_output_line, do_output, v_skip, v_fuzz, v_collect): DS. * parser.c (parser, circ_backpatch, report_security_problem, hist_save, repl, lino_fileno, lino_getch, lineno_getl, lineno_gets, lineno_open): DS. (parser_set_lineno, lisp_parse_impl): PS. * parser.l (YY_INPUT): PS. * rand.c (make_random_state): PS. * regex.c (print_rec): DS. (search_regex): PS. * signal.c (kill_wrap, raise_wrap, get_sig_handler, getitimer_wrap, setitimer_wrap): DS. * socket.c (addrinfo_in, sockaddr_pack, fd_timeout, to_connect, open_sockfd, sock_mark_connected, sock_timeout): TS. (getaddrinfo_wrap, dgram_set_sock_peer, sock_bind, sock_connect, sock_listen, sock_accept, sock_shutdown, sock_send_timeout, sock_recv_timeout, socketpair_wrap): DS. * stream.c (generic_fill_buf, errno_to_string, stdio_truncate, string_out_put_string, open_fileno, open_command, base_name, dir-name): DS. (unget_byte, put_buf, fill_buf, fill_buf_adjust, get_line_as_buf, formatv, put_byte, test_set_indent_mode, test_neq_set_indent_mode, set_indent_mode, set_indent, inc_indent, set_max_length, set_max_depth, open_subprocess, run ): PS. (fds_subst, fds_swizzle): TS. * struct.c (make_struct_type, super, umethod_args_fun): PS. (method_args_fun): DS. * strudel.c (strudel_put_buf, strudel_fill_buf): DS. * sysif.c (errno_wrap, exit_wrap, usleep_wrap, mkdir_wrap, ensure_dir, makedev_wrap, minor_wrap, major_wrap, mknod_wrap, mkfifo_wrap, wait_wrap, wifexited, wexitstatus, wifsignaled, wtermsig, wcoredump, wifstopped, wstopsig, wifcontinued, dup_wrap, close_wrap, exit_star_wrap, umask_wrap, setuid_wrap, seteuid_wrap, setgid_wrap, setegid_wrap, simulate_setuid_setgid, getpwuid_wrap, fnmatch_wrap, dlopen_wrap): DS. (chmod_wrap, do_chown, flock_pack, do_utimes, poll_wrap, setgroups_wrap, setresuid_wrap, setresgid_wrap, getgrgid_wrap): PS. (c_time): TS. * sysif.h (c_time): Declaration updated. * syslog.c (openlog_wrap, syslog_wrap): DS. * termios.c (termios_pack): TS. (tcgetattr_wrap, tcsetattr_wrap, tcsendbreak_wrap, tcdrain_wrap, tcflush_wrap, tcflow_rap, encode_speeds, decode_speeds): DS. * txr.c (compato, array_dim, gc_delta): DS. * unwind.c (uw_find_frames_by_mask): DS. * vm.c (vm_make_desc): PS. (vm_make_closure, vm_swtch): DS.
* regex: duplicate character range crash fix.Kaz Kylheku2020-04-171-6/+12
| | | | | | | | | | * regex.c (L1_fill_range, L2_fill_range, L3_fill_range): Bug: when the arguments ch0 and ch1 indicate that a block is to be filled entirely, we assume that the pointer is either null or else a pointer to an allocated block, either of which we can free and replace with a -1 pointer to indicate a full block. However the pointer may already be -1, in which case we wrongly pass that to free(). We must check for it.
* unicode: wide character upkeep 3.Kaz Kylheku2020-04-171-0/+7
| | | | | | * regex.c (create_wide_cs): Add some Emoji ranges from Plane 1, loosely following the Unicode 13.0 data given in https://en.wikipedia.org/wiki/Emoji.
* unicode: wide character upkeep 2.Kaz Kylheku2020-04-171-2/+2
| | | | | | * regex.c (create_wide_cs): Extend over the entire supplementary ideographic plane U+2XXXX and the tertiary one: U+3XXXX.
* unicode: character width upkeep.Kaz Kylheku2020-04-171-5/+4
| | | | | | | | | | | | | | | | | Updating the regex for matching code points corresponding to wide and full width characters, with regard to the old 1998 document: http://www.unicode.org/reports/tr11-2/ More to follow. I neglected to comment where the original data came from, and neglected to comment it. In some cases it has more coverage than the 1998 document; in some cases less. * regex.c (create_wide_cs): Extending the 1100-115F range to 11F9 to cover all of Korean Hangeul. Replace two occurrences of 3000-303E with one 3000-303F. Merge 3250-32FE with 3300-4DB5, and extend to 4DBF. Add private use range E000-E757.
* internals: rename misnamed curry_* functions.Kaz Kyheku2020-03-171-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The various curry_xx_yy functions perform partial application, not currying. The curry prefix is being renamed to pa (partially apply). * lib.c (remq_lazy, remql_lazy, remqual_lazy, tree_Find): Updated. (do_curry_12_1, do_curry_12_1_v, do_curry_12_2, do_curry_123_1, do_curry_123_23, do_curry_123_2, do_curry_123_3, do_curry_1234_1, do_curry_1234_34): Renamed to do_pa_12_1, do_pa_12_1_v, do_pa_12_2, do_pa_123_1, do_pa_123_23, do_pa_123_2, do_pa_123_3, do_pa_1234_1, do_pa_1234_34. (curry_12_1, curry_12_1_v, curry_12_2, curry_123_1, curry_123_23, curry_123_2, curry_123_3, curry_1234_1, curry_1234_34): Renamed to pa_12_1, pa_12_1_v, pa_12_2, pa_123_1, pa_123_23, pa_123_2, pa_123_3, pa_1234_1, pa_1234_34. (transposev, do_juxt): Updated. * lib.h: Declarations renamed. * eval.c (subst_vars, qquote_init, expand_catch, weavev): Updated. * filter.c (get_filter, build_filter_from_list, filter_string_tree, filter_init): Updated. * match.c (tx_subst_vars, do_txeval, v_freeform, v_bind, v_throw, v_deffilter, v_assert, h_assert): Updated. * parser.y (gather_clause): Updated. * regex.c (regex_range_full_fun, regex_range_left_fun, regex_range_right_fun, regex_range_search_fun): Updated. * stream.c (open_files, open_files_star): Updated. * txr.c (txr_main): Updated. * unwind.c (me_defex): Updated.
* Copyright year bump 2020.Kaz Kylheku2019-12-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, alloca.h, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, chksum.c, chksum.h, chksums/crc32.c, chksums/crc32.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, linenoise/linenoise.c, linenoise/linenoise.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/asm.tl, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/compiler.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/debugger.tl, share/txr/stdlib/defset.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/param.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/save-exe.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/trace.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/vm-param.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, tree.c, tree.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr: Extended copyright notices to 2020.
* safety: fix type tests that code can subvert.Kaz Kylheku2019-09-301-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes numerous instances of a safety hole which involves the type of a COBJ object being tested to be of a given class using logic that can be subverted by the definition of a like-named struct. Specifically logic like (typeof(obj) == hash_s) is broken, because if a struct type called hash is defined, then the test will yield true for instances of that struct type. Those instances can then be passed into code that only works on COBJ hashes, and relies on this test to reject invalid objects. * ffi.c (make_carray): Replace fragile test with strong one, using new cobjclassp function. * hash.c (hashp): Likewise. * lib.c (class_check): The expression used here for the type test moves into the new function cobjclassp and so is replaced by a call to that function. (cobjclassp): New function. * lib.h (cobjclassp): Declared. * rand.c (random_state_p): Replace fragile test using cobjclassp. * regex.c (char_set_compile): Replace fragile typeof tests for character type with is_chr. (reg_derivative, regexp): Replace fragile test with cobjclassp. * struct.c (struct_type_p): Replace fragile test with cobjclassp.
* Replace lt(x, zero) pattern.Kaz Kylheku2019-06-151-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | This slight inefficiency occurs in some 37 places in the code. In most places we replace lt(x, zero) with minusp(x). In a few places, !plusp(x) is used and surrounding logic is simplified. In one case, the silly pattern lt(x, zero) ? t : nil is replaced with just minusp(x). * buf.c (sub_buf, replace_buf): Replace lt. * combi.c (perm, rperm, comb, rcomb): Likewise. * eval.c (do_format_field): Likewise. * lib.c (listref, sub_list, replace_list, split_func, split_star_func, match_str, lazy_sub-str, sub_str, replace_str, sub_vec, replace_vec): Likewise. * match.c (weird_merge): Likewise. * regex.c (match_regex, match_regex_right_old, match_regex_right, regex_prefix_match, regex_range_left, regex_range_right): Likewise.
* scan-until-match, count-until-match: new functions.Kaz Kylheku2019-02-161-7/+35
| | | | | | | | | | | | | * regex.c (scan_until_common): New static function, made from read_until_match. (read_until_match): Now just wrapper for scan_until_common. (scan_until_match, count_until_match): New functions. (regex_init): Registered new intrinsics scan-until-match and count-until-match. * regex.h (read_until_match, scan_until_match): Declared. * txr.1: Documented.
* Copyright year bump 2019.Kaz Kylheku2019-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/asm.tl, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/compiler.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/trace.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/vm-param.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, vm.c, vm.h, vmop.h, win/cleansvg.txr: Extended Copyright line to 2018.
* Eliminate ALLOCA_H.Kaz Kylheku2018-12-311-1/+1
| | | | | | | | | | | | | * configure: Instead of generating a definition of ALLOCA_H, generate the variable HAVE_ALLOCA_<name> with a value of 1, where <name> is one of stdlib, alloca or malloc. * alloca.h: New header. * args.c, eval.c, ffi.c ffi.c, ftw.c, hash.c, lib.c, match.c, parser.c, parser.y, regex.c, socket.c, stream.c, struct.c, sysif.c, syslog.c, termios.c, unwind.c, vm.c: Include "alloca.h" instead of ALLOCA_H.
* Drastically reduce inclusion of <dirent.h>.Kaz Kylheku2018-12-111-1/+0
| | | | | | | | | | | | | | | | | | | The <dirent.h> header is included all over the place because it is needed by a single declaration in stream.h. That declaration is for a function that is only called within stream.c, so we make it internal. Now only stream.c has to include <dirent.h>. * buf.c, debug.c, eval.c, ffi.c, filter.c, gc.c, gencadr.txr, hash.c, lib.c, lisplib.c, match.c, parser.c, regex.c, socket.c, struct.c, strudel.c, sysif.c, syslog.c, termios.c, txr.c, unwind.c, vm.c: Remove #include <dirent.h>. * cadr.c: Regenerated. * stream.c (make_dir_stream): Make external function static. * stream.h (make_dir_stream): Declaration updated.
* regex: relocate unlikely case after other tests.Kaz Kylheku2018-11-121-3/+3
| | | | | | * regex.c (reg_derivative): When classifying the regex's operator, don't check for vanishingly unlikely internal error cases first; that should be done at the end.
* Better identify functions that misuse COBJ-s and hashes.Kaz Kylheku2018-11-071-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In this patch, the cobj_handle, cobj_ops and variants of gethash get an additional argument to identify the caller. Many functions are updated to pass this down. * buf.c (buf_strm): Pass self name to cobj_handle. * eval.c (env_fbind, env_vbind, rt_defvarl, me_case): Pass self name to gethash_c or gethash_e. (load): Pass self name to read_eval_stream and read_compiled_file. (reg_symacro): Pass situation-identifying string to gethash_c. * ffi.c (ffi_type_struct_checked, ffi_closure_struct_checked, ffi_call_desc_checked, uni_struct_checked): Take self name parameter, and pass down to cobj_handle. (ffi_get_type, ffi_get_lisp_type): Take self name and pass down to ffi_type_struct_checked. (union_get_ptr): Take self name and pass to uni_struct_checked. (ffi_union_in, ffi_union_put): Pass self name to union_get_ptr. (ffi_type_compile): Pass self name to ffi_get_lisp_type. (ffi_make_call_desc): Pass self name to ffi_type_struct_checked, ffi_get_type and ffi_call_desc_checked. (ffi_make_closure): Pass self name to ffi_call_desc_checked. (ffi_closure_get_fptr): Take self name, pass to ffi_closure_struct_checked. (ffi_typedef, ffi_size, ffi_alignof, ffi_offsetof, ffi_arraysize, ffi_elemsize, ffi_elemtype, ffi_put_into, ffi_put, ffi_in, ffi_get, ffi_out, make_carray): Pass self name to ffi_closure_struct_checked. (carray_struct_checked): Take self name, pass to cobj_handle. (carray_set_length, carray_dup, carray_own, carray_free, carray_type, length_carray, copy_carray, carray_ptr, buf_carray, vec_carray, list_carray, carray_ref, carray_refset, carray_sub, carray_replace, carray_get_common, carray_put_common, unum_carray, num_carray, put_carray, fill_carray): Pass self name to carray_struct_checked. (carray_blank, carray_buf, carray_cptr): Pass self name ffi_type_struct_checked. (carray_pun): Pass self name to carray_struct_checked and ffi_type_struct_checked. (make_union): Pass self name to ffi_type_struct_checked. (union_members, union_get, union_put, union_in, union_out): Pass self name to uni_struct_checked. (make_zstruct, zero_fill, put_obj, get_obj, fill_obj): Pass self-name to ffi_type_struct_checked. * ffi.h (ffi_closure_get_fptr, union_get_ptr): Declarations updated. * filter.c (trie_add): Pass self-name to gethash_l. * hash.c (make_similar_hash, copy_hash, hash_count, get_hash_userdata, set_hash_userdata, hash_begin, hash_next, hash_uni, hash_diff, hash_isec): Pass self name to cobj_handle. (gethash_c, gethash_e): Take self name parameter and pass down to cobj_handle. (gethash_f): Take self parameter and pass down to gethash_e. (gethash, inhash, gethash_n, sethash, pushhash, remhash, clearhash, hash_update_1): Pass self name to gethash_e or gethash_c. * hash.h (gethash_c, gethash_e, gethash_f): Declarations updated. (gethash_l): Take self name, and pass down to gethash_c. * lib.c (class_check): Take self name parameter and use in type mismatch diagnostic. (use_sym, unuse_sym, symbol_needs_prefix, find_symbol, intern, unintern, intern_fallback, unique, in, sel, obj_print_impl, populate_obj_hash, obj_hash_merge): Pass self name to gethash_f or gethash_l. (symbol_visible, obj_init): Pass situation-identifying string to gethash_e. (cobj_handle, cobj_ops): Take self name parameter and pass down to class_check. * lib.h (class_check, cobj_handle, cobj_ops): Declarations updated. * match.c (v_load): Pass self name to read_compiled_file and read_eval_stream. * parser.c (get_parser_impl): Take self name and pass to cobj_handle. (ensure_parser): Pass situation-identifying string to gethash_c. (parser_circ_def): Pass self-name to gethash_c. (lisp_parser_impl): Pass self name to get_parser_impl and class_check. (lisp_parse, nread, iread): Pass self-name to lisp_parser_impl. (read_file_common): Take self name parameter and pass down to get_parser_impl. (read_eval_stream, read_compiled_file): Take self name and pass down to read_file_common. (load_rcfile): Pass situation-identifying string to read_eval_streem. (get_visible_syms): Pass situation-identifying string to gethash_c. (parser_errors, parser_eof): Pass self name to cobj_handle. * parser.h (read_eval_stream, read_compiled_file): Declarations updated. * parser.y (rlset): Pass self name to gethash_c. * rand.c (make_random_state, random_state_get_vec,l random_fixnum, random_float): Pass self name to cobj_handle. * regex.c (regex_source, regex_print, regex_run): Pass self-name to cobj_handle. (regex_machine_init): Take self name param and pass to cobj_handle. (search_regex, match_regex, match_regex_right, regex_prefix_match, read_until_match): Pass self-name to regex_machine_init. * stream.c (stdio_get_fd): Pass self name to cobj_handle. (generic_get_line): Get COBJ operations via unsafe, diret object access rather than cobj_ops. (set_mode_props): Get object handle via unsafe, direct object access. (stream_fd, sock_family, sock_type, sock_peer, set_sock_peer, get_string_from_stream, get_list_from_stream, stream_set_prop, stream_get_prop, close_stream, get_error, get_error_str, clear_error, get_line, get_char, get_byte, unget_char, unget_byte, put_buf, fill_buf, put_string, put_char, put_byte, flush_stream, seek_stream, truncate_stream, get_indent_mode, test_set_indent_mode, set_indent_mode, get_indent, set_indent, inc_indent, width_check, force_break, get_set_ctx, get_ctx): Pass self name to cobj_ops. (make_delegate_stream): Take self name parameter, pass down to cobj_ops. (record_adapter): Pass self name down to make_delegate_stream. (format): Pass self name to class_check. * struct.c (stype_handle): Pass self name to cobj_handle. (make_struct_type): Pass self name to class_check. * txr.c (read_eval_stream_noerr): Take self name parameter, pass to read_eval_stream. (txr_main): Pass istuation-identifying string to read_compiled_file and read_eval_stream_noerr. * unwind.c (revive_cont): Pass self-name to cobj_handle. * vm.c (vm_desc_struct): Take self name parameter, pass to cobj_handle. (vm_desc_nlevels, vm_desc_nregs, vm_desc_bytecode, vm_desc_datavec, vm_desc_symvec, vm_execute_toplevel, vm_execute_closure, vm_closure_entry): Pass self name to vm_desc_struct. (vm_closure_struct): Take self name parameter, pass to cobj_handle.
* regex: fix double free corruption bug.Kaz Kylheku2018-04-041-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, the nfa_state_free function doesn't check the static flag on a character set and just calls chr_set_destroy. So when one of the static character sets is planted into the NFA graph, when that graph is garbage-collected, it blows away the static character set. Then when that happens twice for the same set, boom! We make an alteration to make the destruction more defensive. Callers of char_set_destroy are no longer saddled with the responsibility of honoring the static flag buried in the object. Instead, that function itself check the static flag. An argument is provided to force the deletion in spite of the static flag; that is needed for the global cleanup of the static states. (Only occurs if txr is run with --free-all and cleanly exited.) * regex.c (char_set_destroy): Take extra argument, force. If the set is marked static, then do nothing, unless force is nonzero. (char_set_cobj_destroy): Don't check the static flag, just call char_set_destroy, force zero. (nfa_state_free): Add force zero argument to char_set_destroy call. The double free bug is thereby fixed here; static sets are protected. (regex_free_all): Force all the char_set_destroy calls here.
* regex: read/print bug: escaped double quote.Kaz Kylheku2018-04-041-2/+2
| | | | | | | | | | | | | | | | | | | Because the regex printer wrongly uses out_str_char (for the sake of borrowing its semicolon-notation processing) when a regex prints, all characters that require escaping in a string literal get escaped, which includes the " character. Unfortunately the \" sequence which results is rejected by the regex parser. * lib.c (out_str_char): Kludge: add extra argument to distinguish regex use versus string use, and treat the double quote accordingly. (out_str_readable): Give 0 arg to new param of out_str_char. * lib.h (out_str_char): Declaration updated. * regex.c (print_class_char, print_rec): Pass 1 to new param of out_str_char.
* Copyright year bump 2018.Kaz Kylheku2018-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, buf.c, buf.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, ffi.c, ffi.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, itypes.c, itypes.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, protsym.c, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/doloop.tl, share/txr/stdlib/error.tl, share/txr/stdlib/except.tl, share/txr/stdlib/ffi.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/keyparams.tl, share/txr/stdlib/op.tl, share/txr/stdlib/package.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/pmac.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/stream-wrap.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, strudel.c, strudel.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, win/cleansvg.txr: Extended Copyright line to 2018.
* regex: bugfix: squash duplicates in move set.Kaz Kylheku2017-09-131-2/+1
| | | | | | | | * regex.c (nfa_move_closure): The move set calculation is wrongly assuming that all of the states are new and not testing their visited color. This could result in the same state being added twice. Though harmless, it wastefully inflates the set size.
* regex: factor out repeated visit-coloring pattern.Kaz Kylheku2017-09-131-13/+15
| | | | | | | * regex.c (nfa_test_set_visited): New inline function. (nfa_map_states, nfa_thread_epsilons, nfa_closure, nfa_move_closure): Use function instead of coding pattern which tests the state and sets the visited member.
* regex: re-introduce nfa_accept states.Kaz Kylheku2017-09-131-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The nfa_accept state label is re-introduced. This state type has the same representation as nfa_empty; essentially, this replaces the flag. This makes the state type smaller, and we don't have to access the flag to tell if a state is an acceptance state. * regex.c (nfa_kind_t): New enum label, nfa_accept. (struct nfa_state_empty): Member accept removed. (nfa_accept_state_p): Macro tests only for nfa_accept type. (nfa_empty_state_p): New macro. (nfa_state_accept): Set type of new state to nfa_accept; do not set accept flag. (nfa_state_empty): Do not set accept flag. (nfa_state_empty_convert): Do not clear accept flag. (nfa_map_states): Handle nfa_accept in switch, in the same case as nfa_empty. (nfa_thread_epsilons): Don't test for accept state in nfa_empty case; it would be always false now. Add nfa_accept case to switch which only arranges for a traversal of the two transitions. (Though these are expected to be null at the stage of the graph when this function is applied). (nfa_fold_accept): Switch type to nfa_accept rather than setting accept flag. (nfa_closure, nfa_move_closure): Use new macro for testing whether a state is empty.
* regex: retain unoptimized form for printing.Kaz Kylheku2017-09-121-5/+1
| | | | | | | regex.c (regex_compile): Take the source code to be the original code, rather than the version with AST-level optimizations and expansions related to the nongreedy operator.
* regex: bug printing #/abc(def|ghi)/Kaz Kylheku2017-09-121-1/+1
| | | | | | | | | | | | This was broken by the July 16 commit "regex: don't print superfluous parens around classes", 2411f779f47c441659720ad0ddcabf91df1d2529. * regex.c (print_rec): If an (or ...) appears as a compound element, it must be rendered in parentheses; or_s must be handled here just like and_s. Prior to the faulty commit, this was implicitly true because the logic was inverted and wasn't ruling out or_s.
* regex: accept-folding optimization.Kaz Kylheku2017-09-121-0/+23
| | | | | | | | | | | | In this optimization, we identify places in the NFA graph where empty states transition to accept states. We eliminate these transitions and turn the empty states into accept states. Accept states which thereby become unreachable from the start state are pruned away. * regex.c (nfa_fold_accept): New static function. (nfa_optimize): Add a pass which applies nfa_fold_accept to all states.
* regex: eliminate nfa_accept state type.Kaz Kylheku2017-09-121-20/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Acceptance states are instead represented by adding an accept flag to nfa_empty states. This will support an optimization which eliminates accept states that are referenced only by empty states. * regex.c (nfa_kind_t): Enum member nfa_accept removed. (struct nfa_state_accept): Renamed to nfa_state_any. (struct nfa_state_empty): New member, accept. (union nfa_state): Member a changes from struct nfa_state_accept to struct nfa_state_any. (nfa_accept_state_p): New macro. (nfa_state_accept): Now makes an nfe_empty type state, with no transitions out and the accept flag set. (nfa_state_empty): Initialize accept flag to zero. (nfa_state_empty_convert): Set the accept flag to zero. (nfa_state_merge): Use new macro in assertion. (nfa_map_states): Remove nfa_accept switch case. (nfa_thread_epsilons): We must not eliminate an empty state which is an acceptance state, even if it meets the other conditions. (nfa_closure): Use a local variable to eliminate repetition of set[i] expression. Test for accept state using new macro. (nfa_move_closure): Test for accept state using new macro.
* regex: epsilon-threading optimization on NFA graph.Kaz Kylheku2017-09-121-1/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is something done for the sake of faster NFA simulation. I think it's glossed over in regex literature because it makes no difference in a NFA to DFA transformation. Fewer states in the graph means less work in the nfa_move_closure function at regex run time. In the NFA graph, there can occur empty states which are useless. These are states which hold only one epsilon transition. Those states can be replaced by whatever state their epsilon transition points to. The terminology is inspired by "jump threading" in code optimization, whereby if a branch takes place to an instruction which holds an unconditional branch, the original branch can be retargetted to go to the ultimate target. I.e. we turn: x e S0 ---> S1 ----> S2 where e denotes an epsilon transition, and x represents any kind of transition to: x S0 ---> S2 We can also eliminate empty states which have no transition; those can be replaced by a null pointer. I suspect these do not actually occur. * regex.c (nfa_thread_epsilons, nfa_noop, nfa_optimize): New static functions. (regex_compile): Invoke nfa_optimize on the results of nfa_compile_regex.
* regex: elide needless increments of visited counter.Kaz Kylheku2017-09-121-5/+5
| | | | | | | | | | | | | * regex.c (nfa_run): Start with the existing value of the visited counter and pre-increment it when calling nfa_closure and nfa_move_closure. Thus it is incremented only as many times as those functions are called: one fewer than before. (regex_machine_reset): regm->n.visited is already incremented, since 1 was added to the prior value; there is no need to increment it again. (nfa_handle_wraparound): More prudent approach here. If we get within 8 increments of wraparound, we fast forward to UINT_MAX and then 0.
* regex: bugfix: incorrect use of nfa_move_closure.Kaz Kylheku2017-09-121-1/+1
| | | | | | | | | | | | | | This goes back to November 2, 2009 commit, "Start of implementation for freestyle matching", 6191fbb2ca7a9ac339dd3994bdea8273ceb0a24d It is exposed if we perform an epsilon threading optimization on the NFA graph. * regex.c (regex_machine_feed): We must pass the new, incremented value of the visited counter, to nfa_move_closure. States have already been tagged with the old value before the call to regex_machine_feed, so we risk failing to visit some states and include them in the closure.
* regex: new function, regex-prefix-match.Kaz Kylheku2017-09-111-0/+48
| | | | | | | | | | | | | | | | | This new function allows a program to determine whether a given string is the prefix of any of the strings denoted by a regular expression; or, in alternative words, whether a given string is the prefix of a possibly longer string which matches a regular expression. * regex.c (regex_machine_infer_init_state): New static function. (regex_prefix_match): New function. (regex_init): regex-prefix-match intrinsic registered. regex.h (regex_prefix_match): Declared. * txr.1: Documented.
* regex: remove nfa_reject representation.Kaz Kylheku2017-09-111-40/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reject states are no longer explicitly represented in the NFA graph. Any transition that previously would have led to a reject state is just a null pointer now: no transition. This just means we have to check one place for a null pointer. This change fixes a flaw in nfa_closure: the function was propagating nfa_reject states to the closure set. When the input set consisted only of reject states (or rather exactly one reject state in that special case), it would add it to the output set and return 1, making it seem as if the regex is one that can potentially match something. * regex.c (nfa_kind_t): Enum member nfa_reject removed. (nfa_state_reject): Static function removed. (nfa_compile_regex): Compile a t regex (match nothing) to a NFA graph with no transition at all into any start state; effectively an empty graph with no state nodes. (nfa_map_states): Remove nfa_reject case from switch. Also, stylistic change; state pointers are tested as Booleans elsewhere rather than by comparison to null pointer. (nfa_count_states, nfa_handle_wraparound): Handle null state pointer. (nfa_move_closure): Test the mov transition for null; anything can potentially now have a null transition, not only epsilon nodes. (nfa_run, regex_machine_reset): Handle nfa.start being null indicating an empty NFA graph. In that case we don't have to calculate the closure; we just have an empty set of states and set nfa.nclos to zero. The visited disambiguation counter is irrelevant since there are no states to visit. (regex_machine_feed): Don't try to propagate visited counter stored in regex machine to empty NFA graph which has no states.
* regex: don't print superfluous parens around classes.Kaz Kylheku2017-07-161-2/+2
| | | | | | | | * regex.c (print_rec): Switching from negative tests to positive: print parens around elements of a compound which have a lower precedence. The previously incorrectly omitted set and cset remain unmentioned, so under this inversion of logic, they print without parentheses.