2012-01-26 Kaz Kylheku Version 55 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2012-01-26 Kaz Kylheku * lib.c (replace_list): Always convert the input items to a list, even in the trivial case that an empty list is being replaced. Allow a string to be the replacement (split into a list of characters). (replace_str): Bugfix in assignment from vector; wrong index used over source vector. (split_str): If the splitting set is empty, just split the string into characters instead of getting into an infinite loop. (replace_vec): Allow replacement source to be a string. 2012-01-26 Kaz Kylheku * arith.c (plus, minus): Better wording in error messages. * eval.c (dwim_loc): Assignments to string indices and ranges supported. New arguments for this purpose. (op_modplace): Use new dwim_loc interface for returned value. (op_dwim): Support assignment to string ranges. (eval_init): replace_str registered. * lib.c (string_extend): If the argument is a number, let it specify the amount by which to extend the string. (replace_str): New function. * lib.h (replace_str): Declared. * txr.1: Updated. * txr.vim: Updated. 2012-01-26 Kaz Kylheku * lib.c (listref, listref_l): Negative indices must have semantics consistent with vecref and ranges. 2012-01-26 Kaz Kylheku * lib.c (cat_str): Throw error if one of the list elements is not a character or string instead of silently returning nil. 2012-01-26 Kaz Kylheku * txr.1: More discussion of ranges. 2012-01-26 Kaz Kylheku * match.c (format_field): Removed useless use of cat_str (no longer needed because str is already a string). The purpose was to reduce character to strings. (subst_vars): Some of the new logic in format_field must be replicated in the case when format_field is not called because the variable has no modifiers. Lists must be converted to a space-separated string. Bugfix here: occurence of pat and modifiers is not mutually exclusive. 2012-01-26 Kaz Kylheku * eval.c (dwim_loc, dwim_op): Eliminated redundant re-evaluation of range arguments. They are already evaluated since the cons expression is evaluates as part of the dwim arglist. Replaced some open code with function calls to the new listref and listref_l functions. (tostring, tostringp): made extern and moved to lib.c. * lib.c (listref, listref_l): New functions. (tostring, tostringp): moved here from eval.c. * lib.h (listref, listref_l, tostring, tostringp): Declared. * match.c (format_field): Handle index and range references. * txr.1: Documented new output variable syntax. 2012-01-25 Kaz Kylheku * eval.c (dwim_loc): Handles full responsibility for assigning to list and array ranges. (op_modplace): Pass extra arguments to dwim_loc so it can do the job for ranges. If dwim_loc returns 0, it means that it did everything. (op_dwim): Support list and array ranges. * txr.1: Documented. 2012-01-25 Kaz Kylheku * arith.c (zerop): Misspelling in error message. * lib.c (sub_list, replace_list, sub_vec, replace_vec): Allow the value t to specify one element past the end, so that t t refers to zero-length sequence just past the end of the array or list. Also, fixed out of bounds memmoves in replace_vec. 2012-01-25 Kaz Kylheku * eval.c (eval_init): New functions registered. * lib.c (sub_list, replace_list, vectorp): New functions. (sub_vec): Allow negative indices from end of array. (replace_vec): New function. * lib.h (sub_list, replace_list, vectorp, replace_vec): Declared. * parser.l (DOTDOT): Scan .. as new token. * parser.y (DOTDOT): New token. (expr): New syntax with DOTDOT. (yybadtoken): Handle DOTDOT. * txr.vim: Added new functions. Also missing append* and dwim. * txr.1: Updated. 2012-01-25 Kaz Kylheku * txr.vim (txr_chr): Fix for highlighting named characters like #\newline. 2012-01-25 Kaz Kylheku * eval.c (dwim_s): New symbol variable. (dwim_loc, op_dwim): New static functions. (op_modplace): Support assignment to dwim forms with the help of dwim_loc. (expand_place): Handle dwim places. (eval_init): Initialize dwim_s. Register dwim operator in op_table. * eval.h (dwim_s): Declared. * lib.c (chr_str, chr_str_set): Allow negative indices to index backwards from end of string. (vecref, vecref_l): Allow negative indices to index from rear of array. (obj_print, obj_pprint): Render (dwim ...) forms as [...]. * parser.l: Peoduce new METABKT token type for @[, and '[', ']' tokens. * parser.y (METABKT): New token. %type declaration for '['. (list): Support square-bracket style of list, translated into dwim form. (meta_expr): Support @[...] variant. (yybadtoken): Handle METABKT in switch. * txr.1: Documented [...] syntax and dwim operator. * txr.vim: Updated. 2012-01-21 Kaz Kylheku Version 54 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2012-01-21 Kaz Kylheku * debug.c (help): Added missing help for w command. (debug): In backtrace, show the renaming pairs for unbound variables (up_p_a_pairs) if they are present. * debug.h (debug_begin): Renamed to debug_frame. * eval.c (eval): Wrap debug_begin/debug_end around function dispatch, so TXR Lisp functions are included in backtraces. * match.c (h_fun): Follow rename of debug_begin to debug_frame. Pass in evaluated args, not the original ones. (v_fun): Likewise. * unwind.c (uw_push_debug): bindings argument renamed to env. Bugfix: args argument was being assigned to ub_p_a_pairs. * unwind.h (struct uw_debug): Member bindings renamed to env. (uw_push_debug): Declaration updated. 2012-01-21 Kaz Kylheku * debug.c (last_command): Do not initialize with lit(); this is not a constant expression in C. (debug): Handle the situation here. 2012-01-21 Kaz Kylheku * debug.c (help): Filled in. (debug): Some commands changed due to duplicates. 2012-01-21 Kaz Kylheku * match.c (v_fun): Removing all debugging instrumentation. (match_files): Moving debug_check out of directive case so it covers all forms handled by loop. All this makes the n command in the debugger work better: not skip over function calls or horizontal material. 2012-01-21 Kaz Kylheku Improved debugging. Debug nesting depth counter maintained and used for next/step/finish stepping. * Makefile (OBJS): debug.o moved to OBJS-y or OBJS-. (OBJS-y, OBJS-): New variables. $(PROG): Depends on OBJS-y also. clean: clean $(OBJS-y). depend: include $(OBJS-y) in dependency generation. * configure: Underscores and dashes are interchangeable in configure variables. (yaccname_given, yacc_given): Default value is y, not yes. (debug_support): New config variable. (CONFIG_DEBUG_SUPPORT): New config.h symbol. * debug.c (debug_depth): New global variable. (debug_block_s): New symbol variable. (next_depth): New static variable. (debug): Renamed some commands. Introduced separate next, step and finish. (debug_init): debug_block_s initialized. * debug.h (debug_depth, debug_block_s): Declared. (debug_enter, debug_leave, debug_return): New macros. (debug_check, debug_init): Conditionally defined based on if this is a debug build. * dep.mk: Regenerated. * eval.c (eval): Instrumented with debug_enter, debug_leave, debug_return. * match.c (match_line, v_fun, match_files): Likewise. * txr.c (txr_main): Bail if -d or --debug used in build that lacks debug support. 2012-01-19 Kaz Kylheku * debug.c (last_command): Initialize to empty string rather than nil, otherwise hitting enter tries to repeat the nil command. (show_bindings): New function. Prints all levels of bindings. (debug): Flip the corresponding print flags after printing the current form or data, so they are not printed for every prompt. On EOF from standard input, substitute the q command. If enter is hit and there is no last command, just re-print the prompt. The v command uses show_bindings to dump the environment. * eval.c (eval): When calling debug_check, pass the env objects, rather than the bindings it contains. 2012-01-19 Kaz Kylheku * lib.c (car_l, cdr_l): Bugfix: do not call the lazy cons force function if it is already nil, and set it to nil afterward. 2012-01-12 Kaz Kylheku * eval.c (eval_init): Make lazy_appendv function available as append*. * txr.1: Documented. 2012-01-11 Kaz Kylheku Before releasing 53, there is this. * eval.c (c_var_mark): Bugfix: we cannot use cptr_get from within the garbage collector because of its type check. Bugfix: synchronize the shadow binding with the variable's current contents so we don't hang on to a stale object. 2012-01-11 Kaz Kylheku Version 53 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2012-01-11 Kaz Kylheku TXR Lisp regression in C global variables. * eval.c (struct c_var): New struct type. (lookup_var, lookup_var_l): cptr type bindings now point to a struct c_var, which has to be handled properly here. (c_var_mark): New static function. (c_var_ops): New static struct. (reg_var): Register variables using struct c_var to provide a pointer to the location and a cached cons that can be returned as a binding. 2012-01-11 Kaz Kylheku * eval.c (each_s, each_star_s, collect_each_s, collect_each_star_s): New symbol variables. (op_each): New static function. (expand): Handle the four new operators. (eval_init): Intern new symbols, register new operators. * txr.1: Documented each, each*, collect-each and collect-each*. * txr.vim: Updated. 2012-01-11 Kaz Kylheku * eval.c (eval_init): list_str registered. * lib.c (list_str): New function. * lib.h (list_str): Declared. * txr.1: Doc stub section created. * txr.vim: Updated. 2012-01-10 Kaz Kylheku * eval.c (generate): Bugfix: do not call gen_fun before testing while_pred. 2012-01-10 Kaz Kylheku * eval.c (tostring, tostringp): New static functions. (eval_init): New functions registered. * txr.1: Stub sections created. * txr.vim: Updated. 2012-01-10 Kaz Kylheku Spat of new features having to do with lazy processing. * eval.c (prog1_s, gen_s, generate_s, delay_s, promise_s): New symbol variables. (eval_prog1, op_prog1, expand_gen, expand_delay): New static functions. (expand): Handle gen and delay. (lazy_mapcar_func, lazy_mapcar, lazy_mapcarv_func, lazy_mapcarv, lazy_mappendv): New static functions. (rangev_func, rangev, generate_func, generate, repeat_infinite_func, repeat_times_func, repeatv, force): New static functions. (eval_init): New operators and functions interned. lazy-flatten renamed to flatten*. * lib.c (null_f): New global variable. (ltail, lazy_appendv): New functions. (lazy_appendv_func): New static function. (obj_init): null_f protected and initialized. * lib.h (null_f, ltail, lazy_appendv): Declared. * txr.1: Documented. * txr.vim: Updated. 2012-01-09 Kaz Kylheku Non-broken way to achieve intent of previous commit. * eval.c (subst_vars): Do not evaluate modifiers as an argument list locally. Pass form-evaluating function to format_field. * match.c (format_field): Modified to accept new argument, a one-argument function for reducing a form to a value. Error checking for invalid modifiers made stricter. (subst_vars): Do not evaluate modifiers as an argument list. Pass form-evaluating function to format_field. * match.h (format_field): Declaration updated. 2012-01-09 Kaz Kylheku * eval.c (subst_vars): Evaluate the modifiers, so expressions can be used. * match.c (subst_vars): Likewise, but using txeval. 2012-01-07 Kaz Kylheku Version 52 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. Wrong December dates fixed. 2012-01-06 Kaz Kylheku * match.c (fuzz_s): New symbol variable. (v_fuzz): New static function. (syms_init): fuzz_s initialized. (dir_tables_init): v_fuzz entered into v_directive_table. * txr.1: Documented @(fuzz). 2012-01-06 Kaz Kylheku * match.c (v_gather): Implemented until/last clause. * parser.y (gather_parts, additional_gather_parts): New nonterminals. (gather_clause): Syntax refactored for until/last clause. * txr.1: Updated. 2012-01-02 Kaz Kylheku * eval.c (eval_init): Fix regression introduced in 2011-12-29 commit. We can't use mod_s, because the module which sets up that variable is not yet initialized. 2012-01-01 Kaz Kylheku Make C globals in TXR Lisp properly assignable, so that for instance assigning *stdout*, it really overwrites the underlying C variable. * eval.c (lookup_var): Handle new kind of toplevel binding. If the hash value is a cptr, it points to a val storage location. (lookup_val_l): New function. (op_modplace): Get location of variable using lookup_val_l rather than assuming there is a cons-based binding. (reg_var): Argument changed to val * pointer. Register the variable as a cptr referencing the location. (eval_init): reg_var calls pass address of each global. * eval.h (lookup_var_l): Declared. 2012-01-01 Kaz Kylheku * eval.c (eval_init): New gensym function registered. * lib.c (gensym_counter): New variable. (gensymv): New function. (obj_init): Initialize gensym_counter. * lib.h (gensym_counter, gensymv): Declared. 2011-12-30 Kaz Kylheku * match.c (counter_k): New keyword symbol variable. (do_output_line): Process new :counter argument of rep. (do_output): Ditto, for repeat. (syms_init): Intern new keyword symbol. * match.h (counter_k): Declared. * parser.l (REPEAT, REP): Lexical syntax changed to allow arguments. * parser.y (repeat_rep_helper): Takes extra argument, representing the repeat/rep args. This is inserted into the second position of the output list. (repeat_clause, rep_elem): Extract repeat/rep arguments and pass to repeat_rep_helper. (yybadtoken): Do not put quotes around the word "number". * txr.1: Updated. 2011-12-29 Kaz Kylheku New functionality: mod and modlast directives in repeat and rep. * eval.c (eval_init): Use new symbol variable mod_s instead of calling intern. * match.c (mod_s, modlast_s): Symbol variables defined. (do_output_line): mod and modlast directives implemented under rep. (do_output): likewise under repeat. (syms_init): Initialize new symbol variables. * match.h (mod_s, modlast_s): Declared. * parser.l (MOD, MODLAST): Parse new token types. * parser.y (MOD, MODLAST): New tokens. (repeat_parts_opt, rep_parts_opt): New syntax. (repeat_rep_helper): Handle mod and modlast syntax. * txr.1: Updated. * txr.vim: Updated. 2011-12-29 Kaz Kylheku * parser.y (repeat_rep_helper): Bugfix. Circular lists were being created here when clauses of the same kind appear multiple times. The problem is that append2 no longer copies the second list, which the code was relying on it to do. 2011-12-29 Kaz Kylheku * txr.1: Useless sentence under reduce-left and reduce-right removed. Missing Description headings added. 2011-12-28 Kaz Kylheku * genman.txr: Updated for recent man page changes. 2011-12-28 Kaz Kylheku Version 51 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2011-12-28 Kaz Kylheku * txr.1: Capitalize TXR where it makes sense. Introductory text rewritten. 2011-12-28 Kaz Kylheku * match.c (LOG_MATCH): Use < in format directive instead of -. * rand.c (random): Add back missing declaration. 2011-12-28 Kaz Kylheku * parser.y (quasi_item): Switch from var to o_var. This fixes cases like `@a@(foo)@b` where foo was being translated to (foo) rather than (sys:expr foo). 2011-12-27 Kaz Kylheku * mpi-patches/shrink-mpi-int (mpi_int): Fixed terrible bug in this patch, resulting in an insufficient bit field width for representing the allocation size of the MPI integer on 32 bit platforms. 2011-12-27 Kaz Kylheku * rand.c (make_state): Use ANSI C syntax for prototyped function of no arguments. This snuck through due to working with a C++ compiler. (random): Fixed unused variable warning that happens on 32-bit-pointer platforms. 2011-12-25 Kaz Kylheku * txr.1: Formatting fixes. 2011-12-25 Kaz Kylheku * dep.mk: Overdue update. 2011-12-25 Kaz Kylheku * match.c (v_next): Change flatten to lazy_flatten in the correct place. In the previous commit I did it in the code that handles the obsolescent :var syntax. 2011-12-25 Kaz Kylheku * eval.c (eval_init): New function interned. * lib.c:x (lazy_flatten_scan, lazy_flatten_func): New static functions. (lazy_flatten): New function. * lib.h (lazy_flatten): Declared. * match.c (v_next): Use lazy_flatten instead of flatten for processing a :list source. This means that @(next :list ...) can be used to process infinite lazy lists. * txr.1: Documented lazy-flatten. 2011-12-23 Kaz Kylheku * rand.c (rand32): Moved. (make_random_state): After initializing, retrieve eight random numbers to clear pathological initial behavior leading to duplicate values. 2011-12-23 Kaz Kylheku * arith.c (highest_bit): Changing to external linkage. * arith.h (highest_bit): Declared. * rand.c (random): Rewrote using different algorithm which ensures even distribution, and avoids doing a bignum mod operation. 2011-12-23 Kaz Kylheku Version 50 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2011-12-23 Kaz Kylheku * lib.c (memql): New function. (some_satisfy): Return the first non-nil result, rather than t. (all_satisfy): Return the value of the last item, if all items are processed. * lib.h (memql): Declared. * txr.1: Documented memq, memql, memqual, tree-find, some, all, none, eq, eql and equal. 2011-12-22 Kaz Kylheku * txr.1: Documented copy-list, reverse, nreverse, ldiff and flatten. 2011-12-22 Kaz Kylheku * txr.1: Documented reduce-left and reduce-right. 2011-12-22 Kaz Kylheku Bug #35010 * match.c (extract_bindings): Make sure there are no duplicate variables among the extracted bindings. This is needed because of the other changes. (do_output_line, do_output): In handling the rep/repeat directives, append the original bindings to the extracted set bindings for the variables which just occur in the clause, so that Lisp code can see all of the variables. 2011-12-22 Kaz Kylheku * stream.c (vformat): If width is specified for ~s or ~a, and the object is not a string or number, then print it to a string and treat it as a string, adjusting it within the field. Also, do not simply abort on an unknown format directive but throw a proper exception. 2011-12-22 Kaz Kylheku * stream.c (vformat): Left-adjusted field is now specified using < rather than '-'. The +, space and leading 0 are specified on the precision, not the width. 2011-12-22 Kaz Kylheku * rand.c (random): Fix for 64 bit fixnums: stick two random numbers together. Otherwise for fixnum moduli, we get only a 32 bit number no matter what the modulus is. 2011-12-22 Kaz Kylheku * stream.c (vformat): Combine ~a and ~s cases, so numbers and strings are printed the same way under ~s and ~a. The only difference is printing other kinds of objects. 2011-12-22 Kaz Kylheku Bug #35026 * stream.c (format_num): New argument: sign character. Rewrote to handle precision, width, zero padding and leading sign similarly to printf. (vformat): New syntax accepted: a space or + before the width specifies that a positive sign is to be explicitly written as a space or + character. Pass one more argument to vormat_num calls. Bugfix: go back to vf_init state after processing ~~. 2011-12-22 Kaz Kylheku Bug #35136 and cleanup. * arith.c (plus, minus, mul, gt, ge, lt, le, exptmod, gcd): Remove trailing abort; we already marked uw_throwf as noreturn. This hack should not be needed in functions where the last statement is a throw. (trunc, expt): Repeated error case handled in one place. Temp variable used to avoid two calls to mp_clear. Call to abort removed. (mod): Repeated error handed in one place. Plugged memory leak by moving throw past mp_clear calls. Call to abort removed. (isqrt): Repeated error case handed in one place. 2011-12-21 Kaz Kylheku * txr.vim: Fixed to char literal syntax. 2011-12-21 Kaz Kylheku * mpi-patches/bit-search-optimizations (s_highest_bit): It will take days to completely wipe the egg off my face. I forgot to fix this code for unsigned integers before pasting it into MPI. 2011-12-21 Kaz Kylheku * arith.c (normalize): Linkage changed to extern. * arith.h (normalize): Declared. * rand.c (random): Bugfix: normalize the bignum before returning it. * txr.1: Doc stubs for PRNG functionality. 2011-12-21 Kaz Kylheku * rand.c: Added comment about source of algorithm. 2011-12-21 Kaz Kylheku * rand.c (random): Bugfix: not building up sufficiently large bignums. Work properly when mp_digit is smaller than 32 bits. 2011-12-21 Kaz Kylheku * Makefile (OBJS): new object file, rand.o. * eval.c: Includes rand.h header. (eval_init): New variable and functions from rand module registered. * lib.c: Includes rand.h header. (init): Call rand_init. * rand.c: New file. * rand.h: New file. 2011-12-21 Kaz Kylheku Bug #35139 Better fix. * parser.y (YYEOF): If YYEOF is not defined, define it as zero. (yybadtoken): Undo previous changes: do not test for zero. 2011-12-21 Kaz Kylheku Bug #35139 * parser.y (yybadtoken): The current token (yychar) is 0 on byacc rather than YYEOF or YYEMPTY, so we have to handle that. 2011-12-21 Kaz Kylheku * Makefile (distclean): use rm -rf on mpi directory. 2011-12-20 Kaz Kylheku Test case for bug #35137 * tests/007/except-2.expected: New file. * tests/007/except-2.txr: New file. 2011-12-20 Kaz Kylheku * eval.c (eval_init): New function registered. * lib.c (cat_vec): New function. * lib.h (cat_vec): Declared. * txr.1: Documentation stub. 2011-12-20 Kaz Kylheku Bug #35137 * unwind.c (uw_unwind_to_exit_point): When jumping to a catch frame, do not mark it invisible. * unwind.h (uw_catch): Flip the matches to nil so that this catch frame can no longer be identified as an unwind point by uw_throw, and thus will not be re-entered for the purposes of handling an exception. It remains visible for the purposes of running the clean up code. (uw_unwind): Prior to executing cleanup forms, flip the visibility to 0. This means that the frame will no longer be re-entered for any reason. 2011-12-20 Kaz Kylheku Streamlining exception handling macros a little bit. * eval.c (op_unwind_protect): Use uw_simple_catch_begin, and remove the uw_catch (exsym, exvals) clause. Put explicit braces around the unwind code even though it is only one statement. * match.c (do_txeval): Got rid of empty uw_unwind clause. This is not needed any longer. (v_try): Got rid of explicit uw_do_unwind calls. * unwind.h (uw_simple_catch_begin): New macro. (uw_do_unwind): Macro removed. (uw_catch): Added goto uw_unwind_label at the front. This way if the previous clause falls through, control goes to the unwind logic. (uw_unwind): Got rid of initial break. Previous clause should fall through to unwind logic, whether it is the main clause, or one of the catches. (uw_catch_end): Default case aborts, because we don't expect this. 2011-12-20 Kaz Kylheku Critical regression. Hash lookup was crashing on some platforms due to negative hashing values being reduced modulo table size to a negative array index. * hash.c (equal_hash, eql_hash): Ensure that value returned is in the range [0,NUM_MAX]. (hash_obj): Unused function removed. (cobj_hash_op): Use hashing similar to eql hash for other kinds of references. (hash_eql, hash_equal): Removed bogus % NUM_MAX reduction. * hash.h (hash_obj): Declaration removed. 2011-12-20 Kaz Kylheku * eval.c (eval_init): New functions registered as intrinsics. * lib.c (copy_vec, sub_vec): New functions. * lib.h (copy_vec, sub_vec): Declared. * txr.1: Stub sections created. 2011-12-19 Kaz Kylheku Version 049 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2011-12-19 Kaz Kylheku * eval.c (subst_vars, op_quasi_list, expand_quasi): New static functions. (expand): New case for quasiliterals. (eval_init): Register quasi literal as special operator. * match.c (format_field): Linkage changed to external. * match.h (format_field): Declared. Declarations rearranged. 2011-12-18 Kaz Kylheku * eval.c (bindings_helper): Fix format arguments. (eval_init): Registered new functions: symbol-function, func-get-form, func-get-env, functionp, interp-fun-p. * lib.c (nappend2, getplist_f, improper_plist_to_alist): tail variable renamed to avoid clash in macro. (func_get_form, func_get_env, interp_fun_p): New functions. * lib.h (func_get_form, func_get_env, interp_fun_p): Declared. (list_collect): Fix macro not to throw error, but handle the case. * match.c (vars_to_bindings, extract_bindings): tail variable renamed to avoid clash in macro. * txr.1: Documentation stubs. 2011-12-16 Kaz Kylheku * hash.c (equal_hash): Eliminating displacement from character hashes. Simplifying some code. (eql_hash): Handle fixnums, characters and literals specially, rather than hashing all value types the same way. The shift applicable for object pointers causes adjacent integers to clash. 2011-12-16 Kaz Kylheku * eval.c (expand_vars): Bugfix: use expand_forms rather than expand on a list of forms. 2011-12-16 Kaz Kylheku * txr.vim: iskeyword updated. 2011-12-15 Kaz Kylheku * lib.c (appendv): bugfix: wrong way test. (vector_list): Wrong zero used, resulting in vector(nil) being called. 2011-12-15 Kaz Kylheku * eval.c (eval_init): not added as synonym for null. * lib.c (copy_list): Use list_collect_append rather than list_collect_terminate. (append2, appendv): Simplified using new list_collect_append. (nappend2): Simplified using new list_collect_nconc. * lib.h (list_collect): Added check for accidental usage of list_collect after list_append, since PTAIL has different semantics. (list_collect_nconc, list_collect_append): Semantics fixed so that append collecting works more like the Common Lisp append function, allowing trailing atoms or a lone atom. The meaning of PTAIL is changed, however. Now PTAIL actually tracks the head of the most recently appended segment. Each append operation has to first traverse the previously added piece to get to the end. (list_collect_terminate): Macro removed. * match.c (v_gather): Removed useless use of list_collect_terminate. * parser.y: Some headers added that are needed by list_collect. * txr.1: Documented append, list, atom, null, not, consp, make-lazy-cons, lcons-fun, listp, proper-listp, length-list, mapcar, mappend, and apply. 2011-12-14 Kaz Kylheku @# comments are becoming obsolescent. @# comments can now be used. Within nested forms, Lisp-compatible ; comments are suported. * parser.l: Support @# and ; comments. * txr.1: Documentation updated. * txr.vim: Updated. 2011-12-14 Kaz Kylheku * lib.c (car, cdr): Set the lazy cons function to nil after calling it. (rplacd): Do not set the lazy cons function to nil in. * txr.1: Documented a bunch of functions. 2011-12-14 Kaz Kylheku * eval.c (eval_init): Removed registration for vec_get_fil. Renamed vec_set_fill to vec-set-length. * hash.c (equal_hash): vec_fill to vec_length name change. (hash_grow, make_hash): No need to call vec_set_length. * lib.c (equal, vecref, vec_push, length_vec, list_vector, obj_print, obj_pprint): vec_fill to vec_length name change. (vector): Argument now represents actual length, not just allocated size. (vec_get_fill): Function removed; did exactly the same thing as length_vec. (vec_set_fill): Function renamed to vec_set_length. (vector_list): Allocate a 0 length vector initially. * lib.h (enum vecindex): member changes name from vec_fill to vec_length. (vector): Parameter name changed. (vec_set_fill): Redeclared. (vec_get_fill): Declaration removed. * txr.1: Doc stubs updated. 2011-12-14 Kaz Kylheku * lib.c (car, cdr): Semantics fix for lazy conses. Ignore the return value of the lazy cons function: do not return nil if the function returns nil. This useless behavior was a source of inconvenience in lazy cons programming, requiring the lazy function to return non-nil in addition to installing the car and cdr fields. 2011-12-14 Kaz Kylheku * arith.c (abso): broken for fixnums. 2011-12-14 Kaz Kylheku * txr.vim: Highlight hash prefix and quote. 2011-12-14 Kaz Kylheku * eval.c (op_dohash): Esbatlish anonymous block. * txr.1: Finished documenting special operators. 2011-12-14 Kaz Kylheku * genman.txr: Fix empty NAME section. 2011-12-14 Kaz Kylheku * arith.c (minus): Allow difference between characters. 2011-12-14 Kaz Kylheku * arith.c (plus, minus, gt, lt, ge, le): Handle character operands. * eval.c (eval_init): New functions interned. * lib.c (num_chr, chr_num): New functions. * lib.h (num_chr, chr_num): Declared. * txr.1: Documentation stubs. 2011-12-13 Kaz Kylheku Version 048 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2011-12-13 Kaz Kylheku * arith.c (exptmod, gcd): New functions. * eval.c (eval_init): New functions registered as intrisics. * lib.h (exptmod, gcd): Declared. * txr.1: Documentation stubs added. 2011-12-13 Kaz Kylheku * arith.c (evenp, oddp): New functions. * eval.c (eval_init): New functions registered as intrinsics. * lib.h (evenp, oddp): Declared. * txr.1: Documentation stub updated. 2011-12-13 Kaz Kylheku * arith.c (highest_bit): Linkage changed to static. (abso, isqrt): New functions. (isqrt_fixnum): New static function. * eval.c (eval_init): Registered abs, sqrt and numberp instrinsics. * lib.c (numberp): New function. * lib.h (numberp, abso, isqrt): Declared. * mpi-patches/series: New patch added. * mpi-patches/faster-square-root: New patch added. * txr.1: Documentation stubs for new functions. 2011-12-13 Kaz Kylheku * arith.c (expt): Fix broken bignum x fixnum combination. 2011-12-13 Kaz Kylheku * Makefile (repatch): New phony target. (distclean): Remove mpi directory. 2011-12-13 Kaz Kylheku Patch to shrink mpi-int to three words on 32 bit platforms, so that obj_t stays four pointers wide. * mpi-patches/series: New patch added. * mpi-patches/shrink-mpi-int: New file. 2011-12-12 Kaz Kylheku * mpi-patches/bit-search-optimizations (s_highest_bit): Added static storage class specifier. * mpi-patches/fix-mult-bug (s_mp_sqr): More braindamage found in MPI. This function performs additions and multiplication mp_digit, expecting a mp_word precision result without casting. This function is needed for exponentiation. 2011-12-12 Kaz Kylheku Git rid of some some loops in MPI where it is searching for the highest bit, replacing them with an adapation of the bit searching function used in arith.c. * mpi-patches/series: Patch added. * mpi-patches/bit-search-optimizations: New file. 2011-12-12 Kaz Kylheku * arith.c (expt): New function. * eval.c (eval_init): Registering new intrinsic functions, reduce-left, reduce-right and expt. * lib.c (minusv): Return one instead of num(1). (exptv, reduce_right): New functions. * lib.h (expt, exptv, reduce_right): Declared. * txr.1: Blank sections for new functions. 2011-12-12 Kaz Kylheku * mpi-patches/fix-mult-bug: One more flaw discovered in s_mp_mul_d and added to patch. This one caused malloc corruption and crashes, because the incorrect arithmetic causes the function to think that the multiplication will not be needing another digit, but then there is a carry out which does spill into a new digit. * mpi-patches/series: Arg! Somehow the patch fix-bad-shift went missing from the series file, even though the patch itself is in the GIT repository. 2011-12-06 Kaz Kylheku Version 047 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2011-12-11 Kaz Kylheku * arith.c (zerop, gt, lt, ge, le): Functions from lib.c reimplemented with bignum support. * eval.c (eval_init): Added bignump and zerop as intrinsic function. Renamed numberp to fixnump. * lib.c (zerop, gt, lt, ge, le): Functions removed. (numeq): Unused function removed. * lib.h (numeq): Declaration removed. * txr.1: Sections for zerop and bignump created. Changed reference to numberp to fixnump. 2011-12-11 Kaz Kylheku * arith.c (plus, mul): Plugged mpi_int memory leaks. (trunc): Plugged memory leaks. Straightened out semantics with negative modulus. (Residue comes out negative). 2011-12-11 Kaz Kylheku * arith.c (trunc): Error messages prefixed with trunc:. (mod): New function, reimplementation of removed mod from lib.c. * lib.c (mod): Function removed. 2011-12-11 Kaz Kylheku Bignum division implemented. More portability bugs found in MPI: code like 1 << n, where n exceeds the width of the type int. * arith.c (trunc): New function, reimplementation of removed trunc from lib.c. * lib.c (trunc): Removed. * mpi-patches/fix-bad-shifts: New file. 2011-12-11 Kaz Kylheku * arith.c (ABS): New macro. (plus, minus): Bugfix: must not pass signed values to mp_add_d and mp_sub_d functions. (mul): Must not pass signed value to mp_mul_d. Also, fixed type check on wrong argument in the (TAG_PTR, TAG_NUM) case. 2011-12-11 Kaz Kylheku Removing this crutch; it's not that useful. * arith.txr: File removed. 2011-12-11 Kaz Kylheku * arith.c: Regenerated. * arith.txr (normalize): Bugfix: was not turning +/- NUM_MAX bignums into fixnums. 2011-12-11 Kaz Kylheku * arith.c: Regenerated. * arith.txr (highest_bit): Missing #else added, fixing SIZEOF_PTR == 4 case. 2011-12-11 Kaz Kylheku * arith.c: Regenerated. * arith.txr (highest_bit): Oops, half the logic for the 64 bit case was missing due to to a cut and paste mistake. 2011-12-11 Kaz Kylheku * arith.c: Regenerated. * arith.txr (highest_bit): New function. (mul): Use highest_bit instead of shift based algorithm. 2011-12-10 Kaz Kylheku * txr.vim (txr_atat): New match. The @@ sequence is recognized properly and highlighted. 2011-12-10 Kaz Kylheku Bignum support in mult function. * arith.c: Regenerated. * arith.txr (CNUM_BIT): New constant. (bignum, bignum_dbl_ipt): New static functions. (@{add-fname}): Use bignum function. (mul): New functions, rewrite of mul from lib.c. * lib.c (mul): Function removed. * mpi-patches/add-mp-set-intptr (mp_set_intptr): Revised patch. Local variable v should be int_ptr_t not unsigned long. Also, the mp_set interface doesn't set the sign; it's an unsigned interface. We must do that ourselves. * mpi-patches/fix-mult-bug: The main multiplication function is also broken in the same way, requiring the cast. * mpi-patches/mpi-set-double-intptr: Fixed use of wrong type for local variable v. 2011-12-10 Kaz Kylheku * mpi-patches/mpi-set-mpi-word: Bugfix and refresh. * mpi-patches/mpi-set-double-intptr: New file. * mpi-patches/series (mpi-set-double-intptr): Patch added. 2011-12-10 Kaz Kylheku * configure: add to config.h the type double_intptr_t, which is twice the size of intptr_t. It may not be available, so there is a HAVE_ macro to detect it. 2011-12-10 Kaz Kylheku * eval.c (eval_init): New functions added as intrinsics. * hash.c (hash_eql, hash_equal): New external functions. * hash.h (hash_eql, hash_equal): Declared. * txr.1: Sections added. 2011-12-10 Kaz Kylheku * mpi-patches/add-mp-hash: Rewrote mp_hash to only hash enough low-order bit material from the bignum to fill an unsigned long. We don't need to walk the entire bignum. If the low order digit of the bignum is at least as large as an unsigned long, we just take that as the hash, otherwise we take enough of the digits to fill an unsigned long. For negative numbers, we just invert the bits of the hash. * mpi-patches/add-mpi-toradix-with-case: Refreshed. * mpi-patches/fix-mult-bug: Refreshed. 2011-12-10 Kaz Kylheku * lib.c (mulv): Recognize cases to eliminate a wasteful mul call with an initial element of 1. 2011-12-10 Kaz Kylheku * lib.c (plusv): Recognize cases to eliminate a wasteful plus call with an initial element of zero. 2011-12-10 Kaz Kylheku * arith.c: File is now generated using TXR. (NOOP): New macro. (plus): Use NOOP macro. (minus, neg): Function moved here from lib.c and rewritten for bignum support. * lib.c (minus, neg): Functions removed. * arith.txr: New file. 2011-12-09 Kaz Kylheku * configure: Fix patching without quilt. 2011-12-09 Kaz Kylheku Build and pass test suite on Cygwin. * configure (longlong, ulonglong, superlong, usuperlong): Initialize these variables so that if the detection tests fail, the script does not access unbound variables. Avoid adding junk like .bss.* into config.h. * mpi-patches/config-types: Fixed wrong use of nonexistent SIZEOF_LONG_T. 2011-12-09 Kaz Kylheku (Applies to previous commit.) * mpi-patches/config-types: Added missing definitions of MP_DIGIT_SIZE in two cases. 2011-12-09 Kaz Kylheku Bignum support, here we go! Bignums, based on Michael Fromberger's MPI library, are integrated into the input syntax, stream output, equality testing, the garbage collector, and hashing. The plus operation handles transitions between fixnums and bignums. Other operations are still fixnum only. * Makefile (CFLAGS): Add mpi directory to include file search. (OBJS): Include new arith.o module and all of MPI_OBJS. (MPI_OBJS, MPI_OBJS_BASE): New variables. * configure (mpi_version, have_quilt, have_patch): New variables. Script detects whether patch and quilt are available. Unpacks mpi library, applies patches. Detects 128 bit integer type. Records more information in config.h about the sizes of types. * dep.mk: Updated. * depend.txr: Make work with paths that have directory components. * eval.c (eval_init): Rename of nump to fixnump. * gc.c (finalize, mark_obj): Handle BGNUM case. * hash.c: (hash_c_str): Changed to return unsigned long instead of long. (equal_hash): Handle BGNUM case. (eql_hash): Handle bignums with equal-hash, but other objects as eq. * lib.c (num_s): Variable renamed to fixnum_s. (bignum_s): New symbol variable. (code2type): Follow rename of num_s. Handle BGNUM case. (typeof): Follow rename of num_s. (eql): Handle bignums using equal, and other types using eq. (equal): Handle BGNUM case. (chk_calloc): New function. (c_num): Wording change in error message: is not a fixnum. (nump): Renamed to fixnump. (bignump): New function. (plus): Function removed, reimplemented in arith.c. (int_str): Handle integers which are too large for wcstol using bignum conversion. Base 0 is no longer passed to wcstol but converted to 10 because the special semantics for 0 would be inconsistent for bignums. (obj_init): Follow rename of num_s. Initialize bignum_s. (obj_print, obj_pprint): Handle BGNUM. (init): Call arith_init. * lib.h: Includes "mpi.h", as an exception to the project rule against headers including headers. (enum type): New enumeration member, BGNUM. (struct bignum): New struct type. (union obj): New member bn. (mp): New inline function. (num_s): Redeclared as fixnum_s. (bignum_s, chk_calloc, bignump): Declared. (nump): Redeclared as fixnump. * match.c (h_var, h_line, h_skip, h_coll, h_fun, format_field, v_skip, v_freeform, v_collect, v_match_files): Follow nump to fixnump rename. * parser.l (NUM): New token type. Split up the parsing of identifiers and numbers once again. But since every number is also lexically also lexically an identifier, we put the action first. The action for making numbers handles bignums. It produces object numbers, not C numbers (change in yystype union). * parser.y (%union): num changes type from cnum to val. * stream.c (vformat): Handle bignums in numeric conversions. * arith.c: New file. * arith.h: New file. * mpi-1.8.6.tar.gz: New file. * mpi-patches/add-mp-hash: New file. * mpi-patches/add-mp-set-intptr: New file. * mpi-patches/add-mpi-toradix-with-case: New file. * mpi-patches/config-types: New file. * mpi-patches/export-mp-eq: New file. * mpi-patches/fix-mult-bug: New file. * mpi-patches/fix-warnings: New file. * mpi-patches/series: New file. * mpi-patches/use-txr-allocator: New file. 2011-12-08 Kaz Kylheku C++ maintenance. * eval.c (and_s, or_s): Redundant variables removed. * match.h (do_s): extern storage class specifier added. 2011-12-07 Kaz Kylheku * eval.c (op_defun): Transform a function body by inserting a named block around it, thereby imitating a Common Lisp feature. (op_for): Establish an anonymous block around the loop body, test form and increment forms. * txr.1: Documented named block in defun. Documented for and for *. 2011-12-07 Kaz Kylheku * txr.vim: Updated with all operators and functions. 2011-12-07 Kaz Kylheku * txr.1: flip operator documented. Bad syntax for pop fixed. Blank section for list-vector function added. 2011-12-07 Kaz Kylheku * eval.c (op_modplace): If the operator is push, then reverse the arguments. We want (push item list) for compatibility with CL. (expand): Bugfix: some of the cases were constructing new forms using unexpanded pieces from the original form. Added separate case for push, which handles the reversed arguments. 2011-12-07 Kaz Kylheku * debug.c (debug): Fix regression: repeat last command by hitting Enter stopped working. This was broken by recent bugfixes in the string splitting functions, which introduced a semantics change. * eval.c (flip_s, vecref_s): New symbol variables. (op_modplace): New places (vecref ...) and (flip ...). Bugfix: dec operator was incrementing. (expand_place): Handle vecref and flip. Bugfix: pop has no third argument and so is now handled by the same case as flip. Bugfix: if a modify form has no third argument, then do not resynthesize it with a nil third argument. (eval_init): Initialize new symbol variables. Register new flip operator. Register new list_vectory function as intrinsic. * lib.c (rplacd): When modifying the cdr field of a lazy cons, then lapse the lazy function to nil! This is needed by user-defined lazy conses, and it makes sense to do it this way rather than put in some explicit interface. (list_vector): New function. * lib.h (list_vector): Declared. 2011-12-07 Kaz Kylheku * eval.c (lookup_var, lookup_fun): Reversing assoc arguments. (eval_init): New intrinsics. * hash.c (struct_hash): assoc_fun parameters reversed. (gethash, gethash_f, gethash_n): Likewise. * lib.c (assoc, assq): Reversing parameters. (find_package, acons_new, acons_new_l, aconsq_new): Reversing arguments to assoc adn assq. * lib.h (assoc, assq): Declarations updated. * match.c (dest_set, dest_bind, h_var, h_coll, h_parallel, h_fun, subst_vars, do_txeval, v_next, v_parallel, v_gather, v_collect, v_flatten, v_cat, v_output, v_filter, f_fun, match_funcall): Reversing arguments to assoc. * unwind.c (uw_get_func, uw_exception_subtype_p, uw_register_subtype): Reversing arguments to assoc. * txr.1: Blank sections created for new functions. 2011-12-07 Kaz Kylheku * txr.1: Blank sections created for character functions. 2011-12-07 Kaz Kylheku * eval.c (eval_init): New functions registered as intrinsics. * lib.c (chr_toupper, chr_tolower): New functions. * lib.h (chr_toupper, chr_tolower): New functions declared. 2011-12-07 Kaz Kylheku * parser.l: In the CHRLIT state, return a nonblank character as an IDENT token. This allows for character literals like #\$. 2011-12-07 Kaz Kylheku * eval.c (eval_init): New character functions registered. * lib.c (c_num): Generalized to convert characters to numbers also. This allows functions like gt and lt to work with characters. (chr_isalnum, chr_isalpha, chr_isascii, chr_iscntrl, chr_isdigit, chr_isgraph, chr_islower, chr_isprint, chr_ispunct, chr_isspace, chr_isupper, chr_isxdigit): New functions added. * lib.h: (chr_isalnum, chr_isalpha, chr_isascii, chr_iscntrl, * chr_isdigit, chr_isgraph, chr_islower, chr_isprint, chr_ispunct, * chr_isspace, chr_isupper, chr_isxdigit): New functions declared. (c_true): New macro. 2011-12-07 Kaz Kylheku * eval.c (progn_s): New symbol variable. (op_progn): New static function. (eval_init): Initialize new variable, register progn operator. * txr.1: progn documented. 2011-12-06 Kaz Kylheku Version 046 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2011-12-06 Kaz Kylheku * stream.c (find_char): New function. (string_in_get_line): Following up TODO. Fixed broken function. Now get_line on a string stream properly returns characters up to and not including the next newlne character, and also consumes the newline character. Other cases are handled properly, also: the stream being at EOF already, or at the last line not being newline-terminated. 2011-12-06 Kaz Kylheku * eval.c (op_unwind_protect): Fixed uninitialized variable warning. (eval_init): New functions registered: typeof and vector functions, as well as length_list. * lib.c (length): Function renamed to length_list, because it is list specific. (length_vec, size_vec, vector_list): New functions. (length): New function, generic over lists, vectors and strings. * lib.h (length_list, length_vec, size_vec, vector_list): Declared. * match.c (h_var, h_fun, robust_length, v_deffilter, v_fun): Use length_list instead of length. * parser.l: Introduced # token. * parser.y (vector): New nonterminal. (expr): vector is a kind of expr. (chrlist): Bugfix: single-character syntax was not working; for instance #\x to denote the charcter x. (lit_char_helper): Use length_list instead of length. * stream.c (string_in_get_line): Bugfix: this was using the wrong length function: length was being applied to a string. The genericity of length makes that correct now, but changing to length_str anyway. * txr.1: Blank sections created for functions. Vector syntax documented. 2011-12-06 Kaz Kylheku * configure: Forgot to treat octal number in the processing of conftest.syms. Removed useless eval. 2011-12-05 Kaz Kylheku Version 045 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2011-12-05 Kaz Kylheku * eval.c (op_cond): Fixed behavior for singleton clauses. (eval_init): Use existing function objects car_f, cdr_f, eq_f, eql_f and equal_f. Added identity to function table. * lib.h (eql_f): Missing declaration added. * txr.1: Documented cond, and, if, or, defun, inc, dec, set, push and pop. 2011-12-04 Kaz Kylheku * parser.y (force_regular_quotes): Function removed. (list): Prior commit reversed. * txr.1: Prior commit reversed. * RELNOTES: No semantics clarification in quasiquote; bugfixes only. 2011-12-04 Kaz Kylheku * eval.c (op_qquote_error, op_unquote_error): New static functions. (expand_qquote): Bugfix: missing case added to handle directly quoted quasiquote. (eval_init): Error-catching pseudo-operators registered in op_table. * parser.y (force_regular_quotes): New function. (list): Quotes within unquotes and splices are regular. * txr.1: Clarified new rules. Removed description of ,'form and ,*'form special syntax. 2011-12-03 Kaz Kylheku Expose lazy lists in TXR Lisp. * eval.c (eval_init): New intrinsic functions. * lib.c (rplaca, rplacd, lcons_fun): New functions. (make_lazycons): Renamed to make_lazy_cons, relocated and turned into external function. (lazy_stream_func, lazy_stream_cons): Follow rename of make_lazycons. * lib.h (rplaca, rplacd, make_lazy_cons, lcons_fun): Declared. * txr.1: Stub sections created. 2011-12-03 Kaz Kylheku * eval.c (uw_protect_s, return_s, return_from_s): New symbol variables. (op_unwind_protect, op_block, op_return, op_return_from): New static functions. (expand): Removed case for call, if, and, and or. These operators evaluate all their arguments, so the code walker can treat them as a function calls. Added case for block and return-from. (eval_init): New symbols interned. New operator functions registered in op_table. * txr.1: Blank sections added. 2011-12-03 Kaz Kylheku * lib.c (split_str, split_str_set): Bugfix: access beyond the end of the input string. 2011-12-03 Kaz Kylheku * eval.c (eval_init): String and character functions exposed as intrinsics. * txr.1: Blank sections created. 2011-12-02 Kaz Kylheku * txr.1: Added stub sections for new functions. 2011-12-02 Kaz Kylheku * eval.c: Symbol related intrinsic functions and variables made available: * lib.h (sym_name): Dangling declaration removed. 2011-12-02 Kaz Kylheku * parser.y (list): unquote and splice actions look inside the argument form. If an unquote or splice are applied to a quoted form, its quote becomes a regular quote. This behavior is necessary to make ,',form work in nested quotes, otherwise the ' is a quasiquote which captures the comma in ,form, reducing ,',form to ,form. * txr.1: Documented this special behavior. 2011-12-02 Kaz Kylheku * eval.c (expand_qquote): Bugfix: removed bogus recognition and processing of regular quote form. This broke nested backquote processing, and quasiquote forms containing non-quasi-quotes like like '(a 'b ,c). 2011-12-02 Kaz Kylheku Version 044 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2011-12-01 Kaz Kylheku * txr.1: Started Lisp documentation. Updated description of symbol syntax. 2011-12-01 Kaz Kylheku * lib.c (int_str): Return nil rather than 0 if no digits are extracted at all. 2011-12-01 Kaz Kylheku * match.c (h_skip, h_coll, v_skip, v_collect): Evaluate the arguments. (do_txeval): Optimization: short-circuit out if the expression is nil, without establishing the exception handler. 2011-12-01 Kaz Kylheku * match.c (v_skip): Bugfix: Nov 12 commit caused regression: skip min/max arguments not working! * RELNOTES: Updated. 2011-12-01 Kaz Kylheku Dropping the silly cons return value from txeval. Two interfaces are provided to the function. One throws on unbound variable, the other which evaluates them to the symbol noval_s (used in exception handling). * match.c (do_txeval): New static function. (txeval): Functionality moved to do_txeval. (txeval_allow_ub): New static function. (vars_to_bindings, h_fun, v_freeform, v_next, v_merge, v_bind, v_set, v_cat, v_output, v_deffilter, v_fun): No need to use cdr to get the value from txeval. (v_throw): Use txeval_ub_allowed, since unbound variables are allowed in throw. (v_try): Detect unbound arguments by checking for noval_s rather than nil. No need to use cdr. 2011-12-01 Kaz Kylheku * match.c (eval_form): Function renamed to txeval so its is not confused with the Lisp evaluation functions. (vars_to_bindings, h_fun, v_freeform, v_next, v_merge, v_bind, v_set, v_cat, v_output, v_throw, v_deffilter, v_fun): Updated. 2011-11-30 Kaz Kylheku * lib.h (or2): Restore macro version of or2, because we need the sequencing! Making it an inline function broke the tests. But we can't have multiple evaluation either, so it's going to use a temporary lexical variable. (uses_or2): Macro which declares the lexical variable needed by or2. * debug.c (debug): add uses_or2. * eval.c (eval_intrinsic, op_modplace): Likewise. * lib.c (lazy_str, lazy_str_force_upto, lazy_str_get_trailing_list): Likewise. * match.c (h_parallel, v_freeform, v_parallel, v_output): Likewise. * parser.y (unquotes_occur): Likewise. * stream.c (format): Likewise. 2011-11-30 Kaz Kylheku Removing useless hash table. * parser.h (ln_to_forms_hash): Declaration removed. * parser.l (ln_to_forms_hash): Variable removed. (parse_init): Initialization and protection of ln_to_forms_hash removed. * parser.y (rl): Update of ln_to_forms_hash removed. * txr.1: 2011-11-30 Kaz Kylheku * configure (extra_debugging): New variable. EXTRA_DEBUGGING conditionally generated in config.h. * gc.c (break_obj): New static variable. (mark_obj): Debugging feature: if the object is the one stored in break_obj and not yet reached, then call breakpt. (deheap): New debugging function for viewing regions of the heaps. * lib.c (breakpt): New function. * lib.h (breakpt): Declared. 2011-11-30 Kaz Kylheku * hash.c (hash_process_weak): Fix regression caused by a mistake in the the 2010-01-26 commit, prior to release 033. When processing a table with weak values, this function was mistakenly testing the keys rather than values for for reachability. I noticed this when a test case that should run in constant memory showed unwarranted accumulation of memory. 2011-11-30 Kaz Kylheku * eval.c (op_modplace): Bugfix: conflation of new value and increment value. Separate new value and increment value, and check number of arguments. * lib.h (or2): Turned into inline function due to multiple argument evaluation. 2011-11-30 Kaz Kylheku * txr.vim: New operators added. 2011-11-29 Kaz Kylheku * eval.c (bindings_helper): Fix uninitialized variable. 2011-11-29 Kaz Kylheku * eval.c (dohash_s): New symbol variable. (op_dohash): New static function (expand): New case for dohash_s. Bugfix for do_s: expand was used rather than expand_forms. (eval_init): dohash_s initialized and entered into op_table. 2011-11-29 Kaz Kylheku * eval.c (eval_init): hashp and maphash functions registered. * hash.c (maphash): New function. * hash.h (maphash): Declared. 2011-11-29 Kaz Kylheku * eval.c (expand_vars): Bugfix: was not handling vars of the form var, only (var initform). 2011-11-29 Kaz Kylheku Support assignment to (car ...) and (cdr ...). * eval.c (car_s, cdr_s): New symbol variables. (op_modplace): Cases for car and cdr added. (expand_place): Likewise. Calls abort should the cases fall through rather than returning 42. (expand): Bugfix: for and for* case not propagating source location info. Bugfix: expansion for do added. (eval_init): car_s and cdr_s initialized and used in place of previous intern calls. * parser.y (elem): Removed wrong logic for expanding the do form. It was expanding only the first argument. 2011-11-28 Kaz Kylheku * eval.c (let_star_s, for_s, for_star_s): New symbols. (env_replace_vbind, bindings_helper): New static functions. (op_let): Refactored to allow for let* form. Code for setting up bindings moved into bindings helper, shared by for loop. (op_for, expand_vars): New static functions. (expand): Bugfix: let case was neglecting to walk the var initialization forms. This is done via expand_vars now. let_star_s added to this case to handle let* and let at the same time. New case added for for and for*. (eval_init): let_star_s, for_s, and for_star_s initialized, and entered into op_table. 2011-11-28 Kaz Kylheku * eval.c (eval_init): More functions. * txr.vim: More highlighting. 2011-11-28 Kaz Kylheku Adding streams functions to Lisp evaluator. * eval.c (op_let): Bugfix: was not evaluating var init forms. (reg_var): New static function. (eval_init): Registered numerous stream functions and the three standard streams. * lib.c (obj_print, obj_pprint): Modified to return a value. (init): eval_init called after stream_init, because eval needs the three standrad streams prepared. * lib.h (obj_print, obj_pprint): Declarations updated. * stream.c (format): Support t as a shorthand for standard output. (formatv, open_directory, open_file, open_pipe): New functions. (w_opendir): New static function. * stream.h (formatv, open_directory, open_file, open_pipe): Declared. * txr.vim: set iskeyword such that keyword can contain special characters. Set b:current_syntax to "lisp". (txl_keyword): New keyword category populated with TXR Lisp keywords defined as separate category. (txr_list): Contains txl_keyword. (txr_meta): Contains txl_keyword and txr_list. 2011-11-28 Kaz Kylheku mapcar, mappend and apply functions. fun operator. * eval.c (apply_s): New symbol variable. (apply): Handle functions specified as symbols. Use symbol from context form in error reporting. (apply_intrinsic): New function. (interp_fun): Bugfix: removed evaluation of arguments, since arguments are already evaluated. (op_call): Simplified by not having to handle symbols, since apply does. (op_fun): New function. (expand): Handle special form fun. (mapcarv, mappendv): New functions. (eval_init): Initialize apply_s. Register op_fun function in op_table. Register mapcar, mappend and apply functions. 2011-11-28 Kaz Kylheku Added evaluation support for quote and quasiquote with unquotes. New functions list, append and eval. Code walking framework for expanding quasiquotes. quotes right now. * eval.c (let_s, lambda_s, call_s, cond_s, if_s, and_s, or_s defvar_s, defun_s, list_s, append_s): New symbol variables. (eval_intrinsic, op_quote, expand_forms, expand_cond_pairs, expand_place, expand_qquote): New static functions. (expand): New external function. (eval_init): Initialize new symbol variables. Use newly defined symbol variables to register functions. Also, new functions: quote, append, list and eval. * eval.h (expand): Declared. * lib.c (appendv): New function. (obj_init): quote and splice operator symbols moved into system package. (obj_print, obj_pprint): Support for printing quotes and splices. * lib.h (appendv): Declared. * match.c (do_s): New symbol variable. (syms_init): New variable initialized. (dir_tales_init): New variable used instead of intern. * match.h (do_s): Declared. * parser.y (elem): @(do) form recognized and its argument passed through the new expander. (o_elem, quasi_item): Pass list through expander. (list): Use choose_quote to decide whether to put regular quote or quasiquote on quoted list. (meta_expr): Fixed abstract syntax so the expression is a single argument of the sys:expr, rather than multiple arguments. (unquotes_occur, choose_quote): New static function. 2011-11-26 Kaz Kylheku * parser.y (expr): Set source location info on elements. (strlit): Set location info. 2011-11-26 Kaz Kylheku * match.c (subst_vars): Handle expr_s, so that Lisp expressions can be interpolated into quasiliterals. (extract_vars): Avoid recursing into expressions marked with expr_s. (do_output_line): Handle expr_s so that Lisp expressions can be interpolated into output. * parser.y (o_elem, quasi_items): Handle list expressions, annotated with expr_s. 2011-11-26 Kaz Kylheku Task #11436 Lisp interpreter added. * gc.c (finalize, mark_obj): Handle ENV objects. * hash.c (struct hash): acons_new_l_fun function pointer order of arguments change. (equal_hash): Handle ENV. (make_hash, gethash_l): Use cobj_handle for type safety. Follow change in acons_new_l. (gethash, gethash_f, remhash, hash_count, hash_get_userdata, hash_set_userdata, hash_next): Use cobj_handle. (gethash_n): New function. * hash.h (gethash_n): Declared. * lib.c (env_s): New symbol variable. (code2type, equal): Handle ENV. (plusv, minusv, mul, mulv, trunc, mod, gtv, ltv, gev, lev, maxv, minv, int_str): New functions. (rehome_sym): New static function. (func_f0, func_f1, func_f2, func_f3, func_f4, func_n0, func_n1, func_n2, func_n3, func_n4): Initialize new fields of struct func. (func_f0v, func_f1v, func_f2v, func_f3v, func_f4v, func_n0v, func_n1v, func_n2v, func_n3v, func_n4v, func_interp): New functions. (apply): Function removed: sanely re-implemented in new eval.c file. (funcall, funcall1, funcall2, funcall3, funcall4): Handle variadic and interpreted functions. (acons, acons_new, acons_new_l, aconsq_new, aconsq_new_l): Reordered arguments for compatibility with Common Lisp acons. (obj_init): Special hack to prepare hash_s symbol, which is needed for type checking inside the hash table funtions invoked by make_package, at a time when the symbol is not yet interned. Initialize new env_s variable. (obj_print, obj_pprint): Handle ENV. Fix confusing rendering of of function type. (init): Call new function eval_init. * lib.h (enum type): New enumeration member ENV. (struct func): functype member changed to bitfield. New bitfied members minparam and variadic. New members in f union: f0v, f1v, f2v, f3v, f4v, n0v, n1v, n2v, n3v, n4v. (struct env): New type. (union obj): New member e of type struct env. (env_s): Variable declared. (plusv, minusv, mul, mulv, trunc, mod, gtv, ltv, gev, lev, maxv, minv, int_str): New functions declared. (func_f0v, func_f1v, func_f2v, func_f3v, func_f4v, func_n0v, func_n1v, func_n2v, func_n3v, func_n4v, func_interp): Likewise. (apply): Declaration removed, and re-introduced in eval.h. (acons, acons_new, acons_new_l, aconsq_new, aconsq_new_l): Declarations updated to new argument order. * match.c (bindable): static function moved to eval.c, where it becomes external. (h_var, h_coll, h_parallel, h_fun, v_parallel, v_gather, v_collect, v_merge, v_fun): Follows argument order change in acons functions. (subst_vars): Print atoms other than strings. (eval_form): Support @(...) syntax for evaluating Lisp forms. (v_do, h_do): New functions. (dir_tables_init): Insert v_do and h_do into tables. * parser.l: Token syntax for numbers and symbols merged. Symbols in a nested context can consist of various additional characters. Useless code removed from action for '('/METAPAR. * stream.c (format): Bugfix in type checking, in the case that the stream argument is nil and defaults to a string stream. * txr.vim: Updated for new token syntax. Fixed uses of unescaped + operator. * unwind.c (uw_set_func) * unwind.h (numeric_assert, range_bug_unless): Missing whitespace in message added. * Makefile (OBJS): eval.o added. * dep.mk: Updated. * eval.c: New file. * eval.h: New file. 2011-11-24 Kaz Kylheku * lib.c (getplist_f): New function. * lib.h (getplist_f): Declared. * match.c (v_collect, h_coll): Use getplist_f to distinguish the case that :vars is explicitly specified as (). In this case, no bindings escape from the collect. * tests/008/soundex.txr: This test case broke due to using :vars () and yet counting on the variable to exist. * RELNOTES: Updated. 2011-11-24 Kaz Kylheku * match.c (match_funcall): Set source location info for generated function call. 2011-11-24 Kaz Kylheku * parser.y (texts, elem): Fixed incorrect use of rl rather than rlcp. Added forgotten rlcp on result of optimize_text. * RELNOTES: Updated. 2011-11-20 Kaz Kylheku Version 043 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. * RELNOTES: Updated. 2011-11-23 Kaz Kylheku * txr.c (remove_hash_bang_line): Recognize multiple syntax possibilities. A hash bang could be buried in a (text ...) compound, or it could just be a string (thanks to the text form optimization). 2011-11-23 Kaz Kylheku Optimization: if all the elements of (text ...) are strings, then replace the (text ...) by the catenation of those strings. * parser.y (optimize_text): New function. (elem): Use optimize_text. 2011-11-23 Kaz Kylheku * lib.c (plus, minus): Fixed wrong assertion which would incorrectly fire for inputs that do not overflow. * match.c (search_form): Fixed incorrect loop test which could lead to nonterminating behavior. * RELNOTES: Updated. 2011-11-23 Kaz Kylheku Semantics change. If a variable is followed by a mixture of text and regular expressions, that whole mixture is considered to follow the variable and used for matching. The earlier semantics change whereby a single unescaped space denotes the regular expression / +/ broke the simple case @a word. It caused the @a to be followed not by the text " word" but by just the regular expression element. With this change @a word means that a is followed by the regex / +/ and "word". * match.c (text_s): New symbol variable. (h_text): New function. (syms_init): Initialize new symbol variable. (dir_tables_init): Hook h_text into horizontal directives table. * match.h (text_s): Declared. * parser.y (text, texts): New nonterminals. (elem): TEXT, SPACE and regex are now handled under texts grammar production. All texts are run together and produce an item which looks like (text items ...). * txr.1, RELNOTES: Updated. * txr.c (remove_hash_bang_line): Updated to find #! buried in (text ...) syntax. 2011-11-22 Kaz Kylheku * configure: Fix environ test case for C++. 2011-11-22 Kaz Kylheku * match.c (search_form): Bugfix: we must search to one character position after the end of the line, otherwise we can never match @(eol). (h_eol): Bugfix: do not return t, but the line length. * txr.1: Warn users about @var@(bind ...) pitfall. * RELNOTES: Updated. 2011-11-20 Kaz Kylheku Version 042 * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. 2011-11-20 Kaz Kylheku * parser.y (char_from_name): const on wchar_t *. 2011-11-20 Kaz Kylheku Bug #34630 * parser.y (repeat_clause, rep_elem): Allow empty body. (yybadtoken): Handle unexpected newline with different message. * RELNOTES: Updated. 2011-11-20 Kaz Kylheku Relaxing :vars in collect/coll a little bit. * match.c (h_coll, v_collect): Only throw an error about missing required variables if the collect iteration collected some new variables. This allows strict collects with :vars to have some cases which explicitly match and skip unwanted material, without binding variables. Also, print all missing variables in the diagnostic. * txr.1: Mention this special exception. * RELNOTES: Updated. 2011-11-19 Kaz Kylheku * Makefile (tests/008/soundex.ok): New test case. (TXR_ARGS): Specified for new test case. * tests/008/soundex.expected: New file. * tests/008/soundex.txr: New file. 2011-11-19 Kaz Kylheku * RELNOTES: New file. 2011-11-19 Kaz Kylheku Bug #34866 * match.c (h_skip): Bugfix. Return the length of the line if the skip is to the end of line, not the value t. * txr.1: Clarify that @var@(skip)text is useless. 2011-11-19 Kaz Kylheku * match.c (v_deffilter): Even better. Just evaluate the arguments individually. Now @(deffilter a b ..) is possible where these evaluate to suitable lists of strings. * txr.1: Documented. 2011-11-19 Kaz Kylheku deffilter grows in power: it can take quasistrings. * lib.c (cdr_f): New global variable. (funcall1, funcall2, funcall3, funcall4): Fix unterminated arguments in uw_throwf call by using uw_throw instead. (do_or): New static function. (orf): New function. (obj_init): gc_protect and initialize cdr_f. * lib.h (cdr_f, orf): Declared. * match.c (v_deffilter): Treat the table as forms to be evaluated which must reduce to strings, rather than literal strings. * txr.1: Documented. 2011-11-19 Kaz Kylheku * parser.y (yybadtoken): Use ~a to print bad character rather than #\ notation. 2011-11-18 Kaz Kylheku * parser.y: Regression IDENT, '{' and '}' must be on the same precedence level and right associative. Without this consective braced variables don't work, etc. 2011-11-18 Kaz Kylheku * gc.c (mark_mem_region): Use the Valgrind API only to mark the type field as accessible, not the whole object that we are checking. Marking the whole object accessible hides uninitialized field bugs! * lib.c: And found a bug already: lazy_str was not completely initializing all of the object fields (ls.prefix, ls.list) before invoking memory allocating operations, making it possible for the garbage collector to encounter uninitialized object areas. 2011-11-18 Kaz Kylheku Added a JSON parsing test case. This flushed out a bug which crashed the garbage collector (uninitialized fields in function objects). * Makefile: Defined TXR_ARGS and TXR_OPTS for new test case. * hash.c (hash_begin): Construction of cobj modified to obey the correct procedure described in HACKING. * lib.c (func_n3, func_n4): These functions neglected to initialize the env member of the function structure. * tests/009/json.expected: New file. * tests/009/json.txr: New file. * tests/009/webapp.json: New file. 2011-11-17 Kaz Kylheku Task #11598. * match.c (resolve_k): New keyword symbol variable. (h_parallel, v_parallel): Implement :resolve keyword in @(some) directive. (syms_init): New symbol variable initialized. * parser.l: Allow (some) to have argument material. * parser.y (some_clause, elem): SOME syntax adjusted. * txr.1: Documented new :resolve keyword in @(some). 2011-11-17 Kaz Kylheku Adding quote and unquote read syntax to list forms, resembling Lisp. The difference is that splice is spelled ,* because @ already means something, and that there is only one quote operator. None of this does anything; it is only syntax. * lib.c (quote_s, qquote_s, unquote_s, splice_s): New variables. (obj_init): New variables initialized. * lib.h (quote_s, qquote_s, unquote_s, splice_s): Declared. * parser.l: Added recognition rules. * parser.y (SPLICE): New symbolic token. (list): Added new syntax for quote and splicing. 2011-11-17 Kaz Kylheku * match.c (h_fun, v_fun): Bugfix! copy_list should be used for copying the bindings, not copy_alist. Otherwise functions cannot destructively update a binding, which is useless. We want a function not to manipulate the binding list, but to be able to manipulate the contents of bindings. (match_files_ctx): Declaration moved ahead of match_line. (v_fun): Forward declaration added. (match_line): Allow vertical functions to be called from a horizontal context, in a limited way. * txr.1: Mention the possibility of a call from a horizontal context falling back on a vertical function. 2011-11-17 Kaz Kylheku * parser.y: Bugfix: precedence of { } must be low, close to that of IDENT, otherwise @{var}@(foo) doesn't parse. 2011-11-16 Kaz Kylheku Allow directives after variable to be a kind of negative match. * match.c (search_form): bugfix: return correct match extent. * parser.y: Adjusting associativity and precedence of directives, IDENT, and grouping tokens once again. This is so that a var followed by a directive will turn into one elem, rather than the var being reduced to an elem first. * txr.1: Revised documentation to mroe clearly define the concept of a negative match, broken into subsections. Some sections belonging to syntax were moved to an appropriate location. Subsections added to description of form syntax. Explanation of directive-driven syntax. 2011-11-16 Kaz Kylheku Variable matches can span over function calls. Function calls following variables have searching semantics. * match.c (ml_specline_pos, search_form): New static functions. (h_var): Handle functions and regexes in a common way. * parser.y: Adjusted precedence of IDENT and ( so that @var@(func) are parsed into a single var element. * txr.1: Documented. 2011-11-15 Kaz Kylheku * txr.vim: Update for new character constant syntax. 2011-11-15 Kaz Kylheku * match.c (h_var): when manipulating specline, propagate the source locatio info. (v_skip): Don't use specline for trace messages, because it may be nil. Use the skip spec. * parser.h (rl): Declared. (rlcp): New inline function. * parser.y (rl): Static declaration removed. Function becomes extern. (clause): Propagate location info from clause to clause list backbone. (collect_clause, COLL): Bugfix: car/cdr mixup in location info. (elem): Use rlcp function to abbreviate code. (o_elems_opt, o_elems_opt2, o_elem): Set location info. 2011-11-15 Kaz Kylheku Changing read syntax for character literals, because we are going to need the single quote in the Lisp way for suppressing evaluation, eventually. I'm going with a Scheme-compatible syntax for character literals. It has a richer repertoire of standard character names than Common Lisp, and has a x convention for coding characters in hex. * lib.c (obj_print): Print characters in a Scheme-like way. * parser.h (end_of_char): New function declared. * parser.l (grammar): Implement rules for #\ syntax, with involving new HASH_BACKSLASH token. (end_of_regex): Enhancement: added check that end_of_regex is called in correct state, like the one in end_of_char. (end_of_char): New function. * parser.y (repeat_rep_helper, o_elems_transform, define_transform, lit_char_helper): Functions changed to static. (rl): Function moved down, past the grammar section. (HASH_BACKSLASH): New terminal symbol. (chrlit): Grammar redesigned. (char_from_name): New function. * txr.1: Character syntax documented. 2011-11-14 Kaz Kylheku Bugfix: horizontal directives were being treated as vertical, and the trailing material silently ignored. For instance @(bind a 1)@(bind b 2). This was going to v_bind, v_bind does not check for the trailing material and doe snot call decline_s. The result was that b was not bound. Correct behavior is to process these binds in match_line. * match.c (match_line): Check if a directive IS found in the vertical table, and if so report a different error message. The fallback case is that there is no such function or directive. (v_next): Do not check for obsolete syntax any more. This case will not occur any more due to the following changes. (match_files): Do not defer opening the file if the data starts with an incorrectly written next directive. Do not look up and process a vertical directive or function call if it is followed by more material in the same line. Thus vertical directives can longer receive trailing material. This fixes the bug of horizontal directives being treated as vertical 2011-11-13 Kaz Kylheku * debug.c (debug): Eliminated duplicate code. Implemented better way of printing character context. 2011-11-13 Kaz Kylheku Adding a debugger. This is an experimental prototype. * Makefile (OBJS): New object file debug.o. * dep.mk: Updated. * match.c (h_fun): Use debug_begin and debug_end macros to set up a debug frame for backtracing. (match_line, match_files): Call debug_check to give debugger a chance to instrument call. (v_fun): Use debug_begin and debug_end macros to set up a debug frame for backtracing. Call debug_check to give debugger a chance to instrument call. * stream.c (struct strm_ops): New function pointer, flush. (stdio_maybe_write_error): Wrong word in error message corrected. (stdio_flush): New static function. (stdio_ops, pipe_ops): New function entered into tables. (flush_stream): New function. * stream.h (flush_stream): Declared. * txr.c (help): New options documented. (main): call to debug_init added. New debug options parsed and opt_debugger set accordingly. * unwind.c (uw_push_debug, uw_current_frame): New function. * unwind.h (uw_frtype): New enumeration member UW_DBG. (struct uw_debug): New frame variant. (union uw_frame): New member, db. (uw_push_debug, uw_current_frame): Declared, * debug.c: New file. * debug.h: New file. 2011-11-13 Kaz Kylheku Fix regression in earlier commit: "Eliminate line numbers from the abstract syntax tree representation of the TXR query." * match.c (match_funcall): Remove spurious object being added to the front of a form where a line number used to be. 2011-11-13 Kaz Kylheku * match.c: Removed * txr.c: Likewise. 2011-11-13 Kaz Kylheku Bug #34813 * match.c (v_freeform): Fail if the data is null, to avoid a false positive match as an empty line, followed by the type error of incrementing a nonexistent data line number. 2011-11-12 Kaz Kylheku * parser.y: Correctly record line number info for regex. 2011-11-12 Kaz Kylheku Improved line number reporting in errors and debug traces. * match.c (debugf): Function removed. (dest_bind, v_output, v_eof): Use debuglf instead of debugf, and sem_error instead of uw_throwf. (match_files): Likewise, and file_err is called with form. 2011-11-12 Kaz Kylheku Eliminate line numbers from the abstract syntax tree representation of the TXR query. * match.c (debuglf, sem_error, file_err, eval_form): Line number argument replaced with the form to which the situation pertains. Location information is pulled from the hash table entry associated with the form. (dest_set, dest_bind, eval_form, vars_to_bindings): Context argument renamed since it isn't a line number. (struct match_line_ctx): spec_lineno member removed. (ml_all, ml_bindings_specline): lineno parameter removed. (LOG_MISMATCH, LOG_MATCH, h_var, h_skip, h_coll, h_parallel, match_line): Pass elem to debuglf instead of line number. as context. (h_trailer, h_eol): define elem for LOG_MISMATCH and LOG_MATCH macros. (h_fun): Pass elem variable to debuglf instead of line number. Body stored as a simple cons cell once again (no line number). (do_output_line): Line number parameter removed. Pass specline to sem_error instead of line number. (do_output): Adjusted for one less parameter in do_output_line. (mf_from_ml): Pass one less parameter to ml_all. Conversion of specline to spec is just a wrapping into a nested list, with no line number. (spec_bind): Linenumber variable parameter removed from macro. Definition simplified. (v_skip): Pass specline to debuglf instead of spec_linenum, which is no longer computed. (v_trailer): Use new definition of specline. Pass first_spec to sem_error instead of spec_linenum. Computation of ff_specline no longer has to skip line number. (v_freeform, v_block, v_accept_fail, v_next, v_parallel, v_gather, v_collect, v_merge, v_bind, hv_trampoline, v_cat, v_output, v_try, v_defex, v_throw, v_deffilter, v_filter, match_funcall): Use new definition of specline. Pass first_spec to sem_error instead of spec_linenum. (v_forget_local): Specline computed differently since there is no linenumber to skip. (h_define): Back to implified representation of function with no extra cell for line number. (v_define, v_fun): Pass first_spec to sem_error instead of spec_linenum. Back to implified representation of function with no extra cell for line number. (match_files): first_spec_item computed differently. Pass first_spec to sem_error instead of spec_linenum. * parser.h (source_loc): Declared. * parser.l (source_loc): New function. * parser.y:x (grammar): Removed line numbers from abstract sytnax tree. A few more places needed the annotation of forms with location info, and a couple of cases of the need to propagate the info was identified. Use extra cons cell as output of until_last to propagate the line number from the symbol to the use. * txr.c (remove_hash_bang_line): No longer has to look past line number. 2011-11-12 Kaz Kylheku Infrastructure for storing line number information outside of the code, in hash tables. * filter.c (make_trie, trie_add): Update to three-argument make_hash. * hash.c (struct hash): New members, hash_fun, assoc_fun acons_new_l_fun. (ll_hash): Renamed to equal_hash. (eql_hash): New static function. (cobj_hash_op): Follows ll_hash rename. (hash_grow): Use new function indirection to call hashing function. (make_hash): New argument to specify type of hashing. Initialize new members of struct hash. (gethash_l, gethash, remhash): Use function indirection for hashing and chain search and update. (pushhash): New function. * hash.h (make_hash): Declaration updated with new parameter. (pushhash): Declared. * lib.c (eql_f): New global variable. (eql, assq, aconsq_new, aconsq_new_l): New functions. (make_package): Updated to new three-argument make_hash. (obj_init): gc-protect and initialize new variable eql_f. * lib.h (eql, assq, aconsq_new, aconsq_new_l): Declared. * match.c (dir_tables_init): Updated to there-argument make_hash. * parser.h (form_to_ln_hash, ln_to_forms_hash): Global variables declared. * parser.l (form_to_ln_hash, ln_to_forms_hash): New global variables. (grammar): Set yylval.lineno for tokens that are classified to that type in parser.y. (parse_init): Initialize and gc-protect new global variables. * parser.y (rl): New static helper function. (%union): New member, lineno. (ALL, SOME, NONE, MAYBE, CASES, CHOOSE, GATHER, AND, OR, END, COLLECT, UNTIL, COLL, OUTPUT, REPEAT, REP, SINGLE, FIRST, LAST, EMPTY, DEFINE, TRY, CATCH, FINALLY, ERRTOK, '('): Reclassified as lineno type. In the grammar, these keywords can thus provide a stable line number from the lexer. (grammar): Numerous rules updated to add constructs to the line number hash tables via the rl helper. * dep.mk: Updated. * Makefile (depend): Use the installed, stable txr in the system path to update dependencies rather than locally built ./txr, to prevent the problem that txr is broken because out out-of-date dependencies, and thus cannot regenerate dependencies. 2011-11-10 Kaz Kylheku Bug #34799: errors in horizontal functions reported to caller line number. * match.c (ml_bindings_specline): Extended with extra argument (h_coll): Pass nil for new argument of ml_bindings_specline. (h_fun): Extract line number from stored function. Pass line number to ml_bindings_specline. (h_define, v_define): Store function as a cons cell containing the line number and body. 2011-11-10 Kaz Kylheku * txr.1: Document -l/--lisp-bindings. 2011-11-10 Kaz Kylheku * match.c (opt_nobindings, opt_arraydims): Global variables moved from parser.l. (opt_lisp_bindings): New variable. (dump_bindings): Dump Lisp syntax bindings on standard output if opt_lisp_bindings is set. (v_cat): Do not complain about trailing material; this is not compatible with horizontal cat. * parser.l (opt_nobindings, opt_arraydims): Moved to match.c. * txr.c (txr_main): New options, --lisp-bindings and the equivalent -l. * txr.h: opt_lisp_bindings declared. 2011-11-10 Kaz Kylheku Task #11583 * match.c (dir_tables_init): Mapping flatten_s, forget_s, local_s, merge_s, set_s, cat_s and filter_s to hv_trampoline function, thereby making all these directives work in horizontal contexts in one fell swoop. 2011-11-10 Kaz Kylheku Task #11583 More generic approach. * match.c (h_bind): Function removed. (hv_trampoline): New function. (dir_tables_init): hv_trampoline installed in h_directive_table instead of h_bind. 2011-11-10 Kaz Kylheku * parser.l: Fixed wrong error message. 2011-11-10 Kaz Kylheku * match.c (v_fun): Bugfix: if there is material after the function call, decline it; it is a horizontal context. * txr.1: Discussion and examples of calls that are in a horizontal context. 2011-11-09 Kaz Kylheku * txr.1: Documented horizontal function definitions and calls 2011-11-09 Kaz Kylheku Task #11583 @(bind) in horizontal mode. * match.c (mf_from_ml, h_bind): New functions. (dir_tables_init): h_bind entered into table. 2011-11-09 Kaz Kylheku * match.c (h_fun, v_fun): Slightly more informative tracing from failed function calls. 2011-11-09 Kaz Kylheku * txr.vim: Missing coll keyword added. 2011-11-08 Kaz Kylheku Task #11431. First cut at horizontal match functions. * match.c (h_fun): New function. (match_line): Rearranged not to do hash lookup if the directive is a regex or list. If hash lookup fails, try it as a horizontal function. (h_define): New function. Handles horizontal function syntax embedded in line. (v_define): Handle the horizontal function syntax occuring on a line by itself. The function info is now stored as a cons cell whose car is the vertical function and cdr the horizontal one. (v_fun): Adjust to new function storage convention. (dir_tables_init): h_define entered in table. * parser.y: Added syntax for horizontal define. 2011-11-06 Kaz Kylheku * txr.vim: Make sure whitespace is recognized after @. 2011-11-06 Kaz Kylheku Task #11581 & bugfix. * match.c (noval_s): New symbol variable. (vars_to_bindings): Use a default value of noval_s to indicate a required variable, rather than nil, which would not allow an optional variable with a default value of nil. (h_coll, v_collect): Check default value against noval_s, rather than nil. (v_gather): Support :vars keyword. (syms_init): Initialize new symbol variable. * txr.1: Documented gather's :vars parameter. 2011-11-06 Kaz Kylheku Task #11581 * match.c (gather_s): New keyword variable. (v_gather): New function. (syms_init): gather_s initialized. (dir_tables_init): v_gather entered into table. * match.h (gather_s): Declared. * parser.l: GATHER token scanning added. * parser.y: GATHER token added. gather_clause nonterminal added. * txr.1: New directive documented. * txr.vim: gather keyword introduced. 2011-11-05 Kaz Kylheku * lib.c (env): Fixed inappropriate cut-and-pasted error messages. Check for failure of GetEnvironmentStringsW, and call FreeEnvironmentStringsW is called. 2011-11-05 Kaz Kylheku * match.c (dir_tables_init): Bugfix: horizontal @(some) directive not included in dispatch table. 2011-11-05 Kaz Kylheku * configure: Bugfixes. Before the compiler tests, we must remove the conftest executable, to make sure that the next test will try to re-make it. The configure runs fast enough that the new conftest.c does not always have a timestamp which is newer than previous conftest executable. 2011-11-05 Kaz Kylheku Task #11442. Make work on MingW. * configure: Test for environ and GetEnvironmentStrings. * lib.c: Conditionally include . (env): Implemented for POSIX and Windows with #ifdefs. 2011-11-05 Kaz Kylheku Task #11442. Access to environment variables. * lib.c (env_list): New static variable. (env): New function. (match): Declaration of nonexistent function removed. (obj_init): New variable gc-protected. * lib.h (env): Declared. * match.c (env_k): New symbol variable. (v_next): Implemented :env. * txr.1: @(next :env) described. 2011-11-04 Kaz Kylheku * hash.c (ll_hash): Added a break in the case that handles pointer hashing of identity-equal objects. Without this, if the pointer size is not 4 or 8, we fall through to the next case. 2011-11-04 Kaz Kylheku * txr.c (help): Change year from 2009 to 2011. 2011-11-03 Kaz Kylheku * tests/008/students.txr: Use disciplined collect with :vars. 2011-11-03 Kaz Kylheku * tests/008/students.txr: Regexes removed. 2011-11-02 Kaz Kylheku * txr.vim: Added missing keywords. 2011-11-01 Kaz Kylheku * genman.txr: Use filter for mapping month digits to names. Added comment about where to find the right man2html. 2011-11-01 Kaz Kylheku * txr.vim: Added installation instructions. 2011-11-01 Kaz Kylheku Syntax highlighting for Vim. * txr.vim: New file. 2011-10-30 Kaz Kylheku Version 041 Bugfixes: Runaway recursion in @(block) directive, introduced in 040. Fixed bug in matching list variable against text, at the same time clarifying semantics to longest-match. Fixed potential excessive memory use caused by refactoring in 040. Features: New :append keyword in @(output) to append instead of overwriting. Variable contents can be treated as input sources using :string and :list keywords in @(next). Variables can be treated as output destinations using :into keyword in @(output). New @(set) directive for destructive assignment to a variable. New filters: :upcase and :downcase. @(bind) can now compare left and right objects through filters. Filters can now be chained into compound filters. Pattern matching functions can be used as filters. Shorthand notation in @(deffilter) when multiple strings map to the same replacement string. @(cat) directive changes syntax. Error handling improvements in parser: no more reams and reams of errors. * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. 2011-10-30 Kaz Kylheku We don't include headers in headers in this project. * parser.h: Do not include * regex.c: Include * regex.h: Do not include 2011-10-30 Kaz Kylheku Bug #34691 Changing the parameter passing convention for vertical directives. They take one parameter which is a pointer, rather than a copy of the structure. They do not have to perform a structure assignment when returning next_spec_k. * match.c (v_match_func): Typedef updated to new function signature. (v_skip, v_trailer, v_freeform, v_block, v_accept, v_accept, v_next, v_parallel, v_collect, v_flatten, v_forget, v_forget, v_merge, v_bind, v_set, v_cat, v_output, v_define, v_try, v_defex, v_throw, v_deffilter, v_filter, v_eof, v_fun): Refactored. (match_files): Updated dispatch logic to new style calls. (match_funcall): Updated to new way of calling v_fun. 2011-10-29 Kaz Kylheku * HACKING: Grammar fixes. Expanded on lazy strings a little bit. Added something about mem_t *, and a few extra words here and there, including a blurb about a Valgrind debugging caveat. 2011-10-27 Kaz Kylheku Bug #34657 * txr.1: Added explanations about the differences between empty streams and empty lines, and to watch out when passing empty strings to @(next :string ...). 2011-10-26 Kaz Kylheku Bugfix: prepared_error_message variable needs to be gc-protected. * parser.h (parse_init): Declared. * parser.l (parse_init): New function. * txr.c (main): Call parse_init. (txr_main): No need to gc-protect yyin_stream since parse_init does it. 2011-10-26 Kaz Kylheku Parse error handling improvements. * parser.l (prepared_error_message): New static variable. (yyerror): Emit and clear prepared error message. (yyerrprepf): New static function. (yybadtoken): Function moved into parser.y. (grammar): For irrecoverable lexical errors, stash error message with yyerrprepf and return the special error token ERRTOK to generate a syntax error. I could find no other interface to the parser to make it cleanly exit. * parser.y (ERRTOK): New terminal symbol, does not appear anywhere in the grammar. (spec): Bail after 8 errors, recover to nearest newline, and use yyerrok to clear error situation. (YYEOF): Provided by Bison, conditionally defined for other yacc-s. (yybadtoken): Function moved from parser.l. Checks for the next token being YYEMPTY or YYEOF, and also handles ERRTOK. * stream.c (vformat_to_string): New function. (format): If stream is nil, format to string and return it. * stream.h (vformat_to_string): Declared. 2011-10-26 Kaz Kylheku * match.c (v_cat): Bugfix: unterminated variable argument list. * tests/001/query-3.txr: Updated to new cat syntax. 2011-10-26 Kaz Kylheku Fixed lame @(cat) directive, without obsolescence phase. * match.c (v_cat): Rewritten. * txr.1: Documented. 2011-10-25 Kaz Kylheku * configure: put in set -u to trap unbound variables, and fixed resulting errors that were found. 2011-10-25 Kaz Kylheku * match.c (filter_s): New symbol variable. (v_filter): New function. (syms_init): New symbol variable initialized. (dir_tables_init): New function entered into table. * txr.1: Documented new filter directive. 2011-10-25 Kaz Kylheku dep.mk: Regenerated. 2011-10-25 Kaz Kylheku Shorthand for filters which map multiple texts to a common replacement text. * filter.c (build_filter_from_list): Allow tuples to denote multiple keys mapping to the same value. * lib.c (do_curry_123_2, do_curry_123_1): New static functions. (curry_123_2, curry_123_1): New functions. * lib.h (curry_123_2, curry_123_1): New functions declared. * match.c (v_deffilter): Allow tuples of strings rather than just pairs. * txr.1: Updated. 2011-10-25 Kaz Kylheku * parser.y: Remove mention of nonexistent terminal \\ from %right associativity clause. 2011-10-25 Kaz Kylheku * filter.c (fun_k): New keyword variable. (function_filter): Use :fun keyword symbol instead of fun. (filter_init): New keyword variable initialized. * filter.h (upcase_k, downcase_k, fun_k): Declared. * txr.1: Updated. 2011-10-25 Kaz Kylheku * match.c (v_bind): Use sem_error to throw errors with line number info. 2011-10-24 Kaz Kylheku Bugs #34641, #34629. * lib.c (search_str_tree): If multiple strings from the needle tree matching within within the haystack string, then take the leftmost match. If there are multiple matches at the same leftmost position, take the longest one. 2011-10-24 Kaz Kylheku * filter.c (function_filter): New function. (get_filter): Handle (fun ...) syntax. * match.c (v_bind): Establish dynamic environment frame around dest_bind, and stash the bindings there so filters can have access to the bindings. (v_output): Likewise, around do_output calls. (v_fun): New function. (match_files): Function handling broken out into v_fun. (match_funcall): New function. * match.h (match_funcall): Declared. * unwind.c (uw_push_env): Initialize match_context. (uw_get_match_context, uw_set_match_context): New functions. * unwind.h (struct uw_dynamic_env): New member, match_context. (uw_get_match_context, uw_set_match_context): Declared. * txr.1: Documented function filters. 2011-10-24 Kaz Kylheku Turning attention to some plumbing. * unwind.c (uw_env_stack): New static variable. (uw_unwind_to_exit_point): Maintain correct uw_env_stack during unwinding. (uw_find_env): Just retrieve the env stack pointer; no search. (uw_push_env): Store a pointer to the previous environmental frame and just initialize the bindings to nil. No need to cons up a copy of the bindings from the previous frame. (uw_get_func): Perform a search through the environment stack. * unwind.h (struct uw_dynamic_env): New member, up_env. 2011-10-23 Kaz Kylheku * tests/007/except-1.txr: Use next :list instead of piping from echo command. As a result, this test case should run on MingW. 2011-10-23 Kaz Kylheku * match.c (list_k, string_k): New keyword symbol variables. (v_next): Implement :list and :string keywords. (syms_init): New keyword variables initialized. NOTE: the :var keyword is deprecated. * txr.1: Documented :list and :string. 2011-10-23 Kaz Kylheku * match.c (h_skip): Bugfix: bad agument list in debugf call. 2011-10-22 Kaz Kylheku Task #11474 * filter.c (filter_equal): Takes two filters instead of one. (lfilt_k, rfilt_k): New keyword variables. (filter_init): New keyword variables initialized. * filter.h (filter_equal): Declaration updated. (lfilt_k, rfilt_k): Declared. * lib.c (funcall4): New function. (do_curry_1234_34): New static function. (curry_1234_34): New function. (do_swap_12_21): New static function. (swap_12_21): New function. * lib.h (funcall4, curry_1234_34, swap_12_21): Declared. * match.c (dest_bind): Swap use the function argument swapping combinator when calling tree find such that the value being searched is on the left and pattern material is on the right. (v_bind): Implemented :lfilt and :rfilt. * txr.1: Documented :lfilt and :rfilt. 2011-10-22 Kaz Kylheku * filter.c (get_filter_trie): Function renamed to get_filter. A filter is not necessarily a trie. (string_filter, compound_filter): New functions. (get_filter): Recognize a compound filters and return a function which implements it. * filter.h (get_filter_trie): Declaration renamed. * match.c (format_field, v_bind, v_output): Follow get_filter_trie rename. Error message text updated. * txr.1: Describe compound filters. 2011-10-22 Kaz Kylheku Task #11474 * filter.c (filter_equal): New function. (upcase_k, downcase_k): New keyword variables. (filter_init): New keyword variables initialized, and new upcase and downcase filters registered. * filter.h (filter_equal): Declared. * lib.c (tree_find): Takes new argument, the equality test function. (upcase_str, downcase_str): New functions. (do_curry_123_23): New static function. (curry_123_23): New function. * lib.h (tree_find): Declaration updated. (upcase_str, downcase_str, curry_123_23): Declared. * match.c (dest_bind): Updated to take equality function. Uses it and passes it down to tree_find. (v_bind): Filter feature implemented. (h_var, v_try): Add equal_f to dest_bind argument list. * txr.1: Updated to describe new filters and bind arguments. 2011-10-21 Kaz Kylheku * match.c (v_collect, v_coll): Establish empty list bindings for all :vars in the event that the collect turns up nothing. * txr.1: Document behavior. 2011-10-21 Kaz Kylheku * match.c (v_collect): Regression bugfix. Make it work like the comment says: until/last clause has visibility to uncollated bindings from collect. 2011-10-21 Kaz Kylheku Implementing @(set) directive for assigning to variables destructively. * match.c (dest_set, v_set): New static functions. (dir_tables_init): Add v_set to vertical directives hash table. * txr.1: Documented. 2011-10-21 Kaz Kylheku * match.c (v_output): When appending output to a variable, flatten the previous contents so we can append to a single string, or to deeply nested list, etc. * txr.1: Documented these new extensions to next and output. 2011-10-21 Kaz Kylheku New features. Strling list output streams in stream library, allow output to be captured as a list of strings representing lines (in contrast to string streams which capture a single string). The output directive can output to a variable, and next can scan over a variable. * lib.c (span_str, compl_span_str, break_str): New functions. * lib.h (span_str, compl_span_str, break_str): New functions declared. * match.c (into_k, var_k): New keyword variables. (mf_file_data): New static function. (v_next): Refactored argument handling. Added support for :var keyword. (v_output): Added support for :into keyword. * stream.c (strlist_mark, strlist_out_put_string, strlist_out_put_char): New static functions. (strlist_out_ops): New static struct. (make_strlist_output_stream, get_list_from_stream): New functions. * stream.h (make_strlist_output_stream, get_list_from_stream): New functions declared. 2011-10-21 Kaz Kylheku * lib.c (proper_plist_to_alist, improper_plist_to_alist): New functions. * lib.h (proper_plist_to_alist, improper_plist_to_alist): New functions declared. * match.c (append_k): New keyword symbol variable. (complex_open): New append argument. (v_output): Streamlined parsing of keywords. Support :append keyword. * txr.1: Output directive's keyword documentation revised. 2011-10-20 Kaz Kylheku Bug #34609 * match.c (v_block): Regression induced by rabid refactoring. Block must apply remaining directives to data, excluding itself, otherwise runaway recursion takes the place of correct behavior. 2011-10-19 Kaz Kylheku Version 040 Single unescaped space behaves like @/ +/ regex. Ported to native Windows via MinGW. Bugfixes for Cygwin and more robust handling of errors arising from Windows not having proper Unicode support (16 bit wide characters only). Nasty GC bug fixed for all platforms, exposed by gcc 4.5.2, x86_64. [Internal] The huge functions match_line and match_files have been broken up into functions dispatched by hash table lookup on directive symbols. [Internal] Hashing of some objects improved. * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. 2011-10-19 Kaz Kylheku Task #11425 * match.c (repeat_spec_k): New symbol variable (h_match_func): New typedef. (elem_bind): New macro. (h_var, h_skip, h_coll, h_parallel, h_trailer, h_eol): New functions. (match_line): Remaining directives moved to functions. (syms_init): New symbol variable initialized. (dir_tables_init): New functions entered into hash table. 2011-10-19 Kaz Kylheku Task #11425 Refactoring match_files to make it easier to break up into subfunctions, similarly to what was done with match_files. * match.c (match_line_ctx): New struct type. (ml_all, ml_specline, ml_bindings_specline): New functions. (LOG_MISMATCH, LOG_MATCH): Macros moved outside of function, updated to refer to structure members rather than local variables. (match_line): Takes only one argument now. All recursive calls updated. (v_freeform): Call to match_line updated. (match_files): Likewise. 2011-10-19 Kaz Kylheku Task #11425 * match.c (v_accept_fail, v_next, v_parallel, v_collect, v_flatten, v_forget_local, v_merge, v_bind, v_cat, v_output, v_try, v_define, v_defex, v_throw, v_deffilter, v_eof): New functions. (match_files): Remaining directives moved to functions. (dir_tables_init): New functions entered into hash table. 2011-10-19 Kaz Kylheku * hash.c (ll_hash): Hashing of pointers should take into account alignment, otherwise only values divisible by the alignment occur. This patch takes into considerations that val values are pointers to object descriptors in a heap which are four words wide, and so most likely aligned to 16 byte boundaries (32 bit systems) or 32 byte boundaries (64 bit). We need to shift. 2011-10-18 Kaz Kylheku Task #11425 * match.c (v_block): New function. (match_files): Block directive moved to function. (dir_tables_init): v_block entered into table. 2011-10-18 Kaz Kylheku Task #11425 * match.c (spec_bind): New macro. (v_freeform): New function. (match_files): Freeform logic moved to function. (dir_tables_init): v_freeform entered into table. 2011-10-18 Kaz Kylheku Task #11425 * match.c (same_data_k): Symbol variable renamed to next_spec_k. (v_skip): Restructured not to return next_spec_k when there are no more specs, but rather thread directly to what match_file will do anyway, namely return the bindings and data position. (v_trailer): New functions. (match_files): Trailer logic moved to function. (syms_init): Follows renaming of variable. (dir_tables_init): GC bugfix: did not protect global hash tables again, like in filter.c previously. 2011-10-17 Kaz Kylheku Task #11425 Vertical skip directive moved into function dispatched via hash table. Test suite passes. * lib.c (cptr_s): New symbol variable. (cptr_equal_op): New static function. (cptr_equal_op, cptr, cptr_get): New functions. (cptr_ops): New static structure. (obj_init): New variable initialized. * lib.h (cptr_s, cptr, cptr_get): Declared. * match.c (decline_k, same_data_k): New symbol variables. (v_match_func): New typedef. (v_skip): New function. (match_files): Check symbol in v_directive_table and dispatch the associated function if an entry exists. Skip directive handling moved to v_skip function. (syms_init): Initialize new symbol variables. (dir_tables_init): Enter v_skip into v_directive_table under skip_s symbol. 2011-10-16 Kaz Kylheku Quick and dirty port to MinGW. * configure: Test for presence of added. Conditionally generates HAVE_SYS_WAIT variable in config.h. * stream.c: Include conditionally. (pipe_close): Do not test ermination status with WIFEXITED, etc. if there is no header. 2011-10-16 Kaz Kylheku * configure: reduced post-configure advice to just point to the INSTALL guide. * INSTALL: New file. 2011-10-16 Kaz Kylheku * filter.c (trie_filter_string): Fix warning about uninitialized variable (not a bug, but compiler cannot prove that). 2011-10-15 Kaz Kylheku Task #11425. Refactoring match_files to make it easier to break up into subfunctions. Arguments are packaged into a structure, so that subfunctions won't have to all have big argument lists. * match.c (h_directive_table, v_directive_table): New variables. (match_files_ctx): New structure. (mf_all, mf_args, mf_data, mf_spec, mf_spec_bindings): New functions. (match_files): Takes only one argument now, the context structure. data_lineno variable is a dynamic number. Recursive calls to match_files are handled by creating contexts as appropriate with the helper functions. The old local variable data is now part of the context. (syms_init, dir_tables_init): New functions. (match_init): Just calls syms_init and dir_tables_init. 2011-10-15 Kaz Kylheku Fixed broken GC on x86_64 (Ubuntu 11, gcc 4.5.2). The issues is that due to the aggressive function inlining in the gc module, the mark_mem_region function is not real subroutine. The address of its local variable &gc_stack_top ended up excluding the machine context saved by setjmp in the parent function. I.e. the buffer was not between the computed stack top and bottom. Thus registers were not being scanned for references to values. I added a little abstraction to the machine context in the process of fixing this. * gc.c (struct mach_context, mach_context_t): New type. (save_context): New macro. (mark): Takes two new arguments, pointer to the stack top and machine context. It scans the machine context explicitly rather than relying it to be on the stack, between the top and bottom. This context is in fact only object within the garbage collector part of the activation chain that we need to scan. (gc): Use new abstraction to save machine context. Local variable is used to derive the stack top here. The stack top is the top of the stack above the activation frames in the garbage collector itself. The gc has nothing on its stack that should be scanned, except for the machine context, which is now handled explicitly. 2011-10-15 Kaz Kylheku * configure: POSIX Portability. Use = instead of == in test expressions. This was revealed by ubuntu's dash. 2011-10-13 Kaz Kylheku * parser.y (elem): Amending previous change. A single space should only denote multiple spaces, not mixtures of spaces and tabs. WE have to be careful with tabs because they can be semantically different from spaces (e.g. file with tab delimited fields which can be blank, empty or have leading or trailing spaces.) * txr.1: Updated. 2011-10-13 Kaz Kylheku * Makefile (%.ok: %.txr): Use unified diff for showing differences between expected and actual test output. * parser.l (yybadtoken): Handle new terminal symbol, SPACE. New rule for producing SPACE token out of an extent of tabs and spaces. * parser.y (SPACE): New terminal symbol. (o_var): New nonterminal. I noticed that the var rule was being used for output elements, and the var rule refers to elem rather than o_elem. A new o_var rule is a simplified duplicate of var. (elem): Handle SPACE token. Transform to regex if it is a single space, otherwise to literal text. (o_elem): Handle SPACE token in output. * tests/001/query-2.txr: This query depends on matching single spaces and so needs to use escapes. * tests/001/query-4.txr, test/001/query-4.expected: New test case, based on query-2.txr. It produces the same output, but is simpler thanks to the new semantics of space. * txr.1: Documented. 2011-10-12 Kaz Kylheku Bug #34538 * lib.h (wli): This macro now does the pointer displacement by 1. (auto_str, static_str): #if/#else/#endif gone. These functions just add the type tag. The + 1 logic was incorrect; it should have been + sizeof(wchar_t). But even that was not right because other code expects a wchli_t * to point to the first character, such as the string_out_put_char function. 2011-10-10 Kaz Kylheku Improved support for broken unicode. Regex support for extra-large character sets not compiled in if wchar_t is not wide enough for it. The utf-8 properly throws exceptions when encountering characters that it cannot represent, instead of silently ignoring the situation and continuing with incorrectly computed data. * regex.c (FULL_UNICODE): New macro. (CHAR_SET_L3, CHAR_SET_L2_LO, CHAR_SET_L2_HI): Only defined if full unicde is available. (CHSET_XLARGE, cset_L3_t, struct xlarge_char_set, L2_full, L3_fill_range, L3_contains): Ditto. (unon char_set): Member x1 present only under FULL_UNICODE. (char_set_destroy, char_set_add, char_set_add_range, char_set_contains): CHSET_XLARGE cases only available on FULL_UNICODE. (char_set_compile): Default cst variable to CHSET_LARGE. * utf8.c (FULL_UNICODE): New macro. (conversion_error): New function. (utf8_from_uc): Throw error if not FULL_UNICODE and character is outside the BMP. (utf8_decode): Likewise. 2011-10-09 Kaz Kylheku * HACKING: Documented portability hacks for narrow wchar_t. 2011-10-09 Kaz Kylheku Version 039 Ported to Cygwin. Horizontal modes for @(trailer), @(skip). New :greedy keyword in skip which can be given instead of max distance to give it longest match semantics. @(collect) and @(coll) support a new clause, @(last). The :times keyword in @(collect) and @(coll) introduced in the previous release has a different meaning. The keywords :mintimes and :maxtimes are added, and :maxtimes behaves like :times did previously. There is a :vars keyword in @(collect) and @(coll) to have some control over what bindings are collected, as well as error checking for missing bindings and defaulting behavior. New @(eol) directive for explicitly matching the end of the input or end of a line. New lexical syntax: @(...) and @abc is allowed within expressions. This produces a special structural syntax with no assigned meaning yet. Awful bug fixed in function calling: if a function was called with multiple unbound variables, and bindings were produced for them, only one variable was being propagated to the calling environment. Bugfixes to binding environment handling in the face of @(local)/@(forget) directive. Fix for the issue of unbound variables being silently ignored in some contexts, like quasiliterals. An exception is now thrown. Bugfix for an issue with consecutive variables in output. Bugfix for an issue with horizontal @(cases) not collecting bindings. Bugfix for an issue with @(until) inside @(coll) not seeing bindings from main clause. * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. 2011-10-09 Kaz Kylheku One more swing at this with the axe. * lib.h (wini, wref): New macros. * stream.c (string_out_put_char): Rewritten with macros to eliminate preprocessor #if test. 2011-10-09 Kaz Kylheku * lib.h (wli, lit_noex): We need null characters on both ends so that this hack is correct for null strings. When recovering the wchar_t pointer from a null literal object, we wil increment unconditionally, since it always points to a null character. We end up skipping past null terminator #1, but safely landing on #2. 2011-10-09 Kaz Kylheku Following up to previous commit's TODO. * filter.c (struct filter_par): wchar_t becomes wchli_t. * lib.h (wchli_t): New type: an incomplete structure type, so that a pointer to this type is incompatible with anything else. (wli): Macro produces const wchli_t * pointer instead of const wchar_t *. (auto_str, static_str): Accept a const wchli_t * instead of const wchar_t *, making it impossible to misuse these functions by passing in a literal. * stream.c (string_out_put_char): These type changes showed this hack to have a bug. Confronted with the need to cast from const wchar_t * to const wchli_t *, it's obvious that the conversion has to be done properly with the + 1 in the one platform case, but not the other. * txr.c (version): Type changed to const wchli_t. * txr.h (version): Declaration updated. 2011-10-09 Kaz Kylheku Ported to Cygwin. TODO: there should be some type safety with the new wli macro so that if it is forgotten, there will be a diagnostic. * configure (lit_align): New configuration variable and configuration test. Generates LIT_ALIGN in config.h. Fixed the integer-holds-pointer test for the different output from the nm program on Cygwin. The arrays become common symbols marked C which do not show an offset attribute, only size: one less column. * filter.c (to_html_table, from_html_table): wrap wide string literals with the wli macro. This must be done from now on for all literals and initializes of arrays that are going to be directly converted to type tagged val-s. * lib.h (wli): New macro. (auto_str, static_str, litptr, lit_noex): Handle wide literals on platforms where they are aligned to only two bytes, such that we don't have two bits in the pointer. We can still add our 11 bit type tag, but then when recovering the pointer to the data, we have may have to fix up the pointer. * parser.l: Another portability issue here. Flex generates a scanner which has #include in the middle, after the source file's own #includes which can introduce macros. On Cygwin, there is some hygiene problem whereby our "noreturn" macro causes the header to generate bad syntax and fail to compile. Stupid Cygwin and even stupider flex! The workaround is to include at the top in the flex source. * stream.c (string_out_put_char): This is one more place where the string literal handling hack spreads. * txr.c (version): Wrap string in wli. 2011-10-09 Kaz Kylheku * dep.mk: Regenerated. Too easy to neglect this file. 2011-10-09 Kaz Kylheku * match.c (vars_to_bindings): Regression fix: recent commit caused test failure. An empty list not treated as a valid collect variable list. 2011-10-09 Kaz Kylheku * configure: Fixed indentation. 2011-10-08 Kaz Kylheku * txr.1: Removed references to obsolete @(next) variant. 2011-10-08 Kaz Kylheku * match.c (vars_to_bindings): New function. (match_line): keyword argument :vars implemented for coll. * txr.1: Documented :vars. 2011-10-08 Kaz Kylheku * match.c (vars_k): New symbol variable. (match_files): Implemented :vars in collect. (match_init): New symbol variable initialized. 2011-10-08 Kaz Kylheku * txr.1: Augment example of @/.*/ being used to skip to the end of the line with @(skip) which is now better style, since it avoids reaching for regexes. 2011-10-08 Kaz Kylheku * match.c (match_line): Skip directive bugfix. If skip is the last item on the line, it must match the whole line by returning success. 2011-10-08 Kaz Kylheku * match.c (mintimes_k, maxtimes_k): New keyword variables. (match_line): Implemented :mintimes and :maxtimes, changing the semantics of :times. (match_files): Likewise. (match_init): New keyword variables initialized. * txr.1: Updated. 2011-10-08 Kaz Kylheku * HACKING: Formatting. 2011-10-07 Kaz Kylheku * match.c (match_files): Fixed spectacular bug in function calling, dating back to before October 2009 when txr was put into git. Basically, unbound variables were not handled right after the function return, due to the increment step being wrongly written as ``piter = cdr(aiter)'' in the for loop that processes the ub_p_a_pairs. Evil cut and paste! 2011-10-07 Kaz Kylheku * match.c (greedy_k): New keyword symbol variable. (match_line): Greedy skip implemented. (match_files): Likewise. (match_init): New keyword symbol variable initialized. * txr.1: Updated. 2011-10-07 Kaz Kylheku * lib.c (eol_s): New symbol variable. (obj_init): New variable initialized. * lib.h (eol_s): Declared. * match.c (match_line): Implemented horizontal skip as and new eol directive. (match_lines): Vertical skip defers to horizontal skip if there is trailing material. * txr.1: Updated. 2011-10-07 Kaz Kylheku * lib.c (flatten_helper): Function removed. (flatten): Recurse directly, using func_n1. 2011-10-07 Kaz Kylheku * txr.1: fxed wrong word. 2011-10-06 Kaz Kylheku Extending syntax to allow for @VAR and @(...) forms inside nested lists. This is in anticipation of future features. * lib.c (expr_s): New symbol variable. (obj_init): expr_s initialized. * lib.h (expr_s): Declared. * match.c (dest_bind): Now takes linenum. Tests for the meta-syntax denoted by the system symbols var_s and expr_s, and throws an error. (eval_form): Similar error checks added. Also, hack: do not add file and line number to an exception which begins with a '(' character; just re-throw it. This suppresses duplicate line number addition when this throw occurs across some nestings. (match_files): Updated calls to dest_bind. * parser.l (yybadtoken): Handle new token kind, METAVAR and METAPAR. (grammar): Refactoring among patterns: TOK broken into SYM and NUM, NTOK introduced, unused NUM_END removed. Rule for @( producing METAPAR in nested state. * parser.y (METAVAR, METAPAR): New tokens. (meta_expr): New nonterminal. (expr): meta_expr and META_VAR productions handled. 2011-10-06 Kaz Kylheku Renaming the currying combinators according to new scheme. * lib.c (bind2): Function renamed to curry_12_2. (bind2other): Function renamed to curry_12_1. (do_bind_2, do_bind2other): Helpers renamed likewise. (tree_find): Follows rename of bind2. * match.c (match_files): deffilter code follows bind2 rename to curry_12_2. 2011-10-06 Kaz Kylheku * lib.c (funcall3, curry_123_2): New functions. (do_curry_123_2): New static function. * lib.h (funcall3, curry_123_2): Declared. * match.c (subst_vars): Bugfix: throw error on unbound variable instead of ignoring the situation. This bug caused unbound variables in quasiliterals to be silently ignored. (eval_form): Function changed to three argument form, so that it takes a line number for reporting errors. Restructured to catch the new unbound variable exception from subst_vars, and re-throw it with a line number. Also, throws exception now instead of returning nil if itself it detets an unbound variable. Uses of eval_form no longer have to test the return value for nil, but just assume it worked. (match_lines): Currying calls to eval form updated to use curry_123_2. Test of eval return value eliminated. In function calls, eval isn't used for reducing symbol arguments to values, because it now throws in the unbound case, and it's not worth setting up a catch for this. Instead, assoc is used directly. 2011-10-05 Kaz Kylheku * match.c (match_files): In function calls, the deletion of the unbound variable from the argument list can be done with a destructive operation since that list is a copy. 2011-10-04 Kaz Kylheku * LICENSE, Makefile, configure, filter.c, filter.h, gc.c, gc.h, hash.c, hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated e-mail address. 2011-10-04 Kaz Kylheku * match.c (match_line, match_files): Another correction to how bindings are handled in collect/coll. New bindings from the main clause and last clause must override old bindings. This is done by some additional set difference operations based on symbol identity. Otherwise it is possible to end up with multiple bindings for the same symbol, which is untidy. If the collect clause scrubs a variable with forget and re-binds it, then combining that environment with the previous bindings will create a duplicate. Also, fixed a serious bug with the bindings from the last clause; the append was wrongly put into the loop that processes the collected lists. 2011-10-04 Kaz Kylheku * lib.c (acons): New function. (set_diff): Optimize common case: list1 and list2 are the same, or list2 is substructure of list1. Situations in which this won't be the case for variable bindings are rare. * lib.h (acons): Declared. * match.c (match_line): Use acons rather than acons_new, when binding variables that we know are new (the symbol is unbound). When computing the set difference over bindings, use cons cell equality, rather than symbol equality. Symbol equality is wrong because a binding can be removed, and then a new binding can be introduced using the same symbol. This must be treated as a different binding. 2011-10-04 Kaz Kylheku Bugfixes to the semantics of binding environments, which were broken in the face of deletions (local, forget). For some stupid reason, I had written a destructive routine for removing elements from an association list, and used it as the basis for the local and forget directives. * lib.c (eq_f, car_f): New variables. (identity_tramp, equal_tramp): Obsolete functions removed. (apply): Broken function disabled at run time. (funcall, funcall1, funcall2): Throw meaningful error instead of aborting. (alist_remove_test): New static function. (alist_remove, alist_remove1): Rewritten to be functional rather than destructive. (alist_nremove, alist_nremove1): Destructive functions, using previous implementations of alist and alist_nremove. (do_sort): Recurses directly rather than via sort. That was probably why this helper was introduced! (find, set_diff): New functions. (obj_init): gc-protect new variables eq_f and car_f, and initialize them. Initializations for equal_f and identity_f changed to use equal and identity directly, without the obsolete wrappers. * lib.h (eq_f, car_f, alist_nremove, alist_nremove1, find, set_diff): Declared. * match.c (match_line): Use set_diff to determine what bindings are new, rather than ldiff and ldiff-like logic which break when the new bindings do not share structure with the old. (match_files): Likewise. 2011-10-03 Kaz Kylheku * txr.1: Starte dodcumenting the forgotten merge directive. 2011-10-03 Kaz Kylheku Implemented new last clause for collect and coll. Bugfix in cases inside coll: was not collecting bindings. Bugfix for until inside coll: was not seeing bindings from main clause. * lib.c (ldiff): New function. * lib.h (ldiff): Declared. * match.c (match_line): Implemented last clause. Fixed cases handling by moving misplaced termination check. (match_files): Implemented last clause. * parser.y (until_last): New nonterminal symbol. (collect_clause): Refactored syntax to support until and last. (elem): Likewise. * txr.1: Updated. 2011-10-02 Kaz Kylheku * parser.y (rep_elem): Bugfix: forgotten o_elems_transform on syntax tree of o_elems constituent, leading to problems with consecutive variables in a @(rep). 2011-10-02 Kaz Kylheku * match.c (match_line): Handle trailer_s directive. (match_files): Remove check against trailer_s not having trailing material. If it doesn't, it's a vertical directive processed here, otherwise leave it alone so match_line processed it. 2011-10-02 Kaz Kylheku Compiles as C++ again. * lib.h (cons_set): New macro. * match.c (match_line, match_files): In collect clause handlers, move variable declarations above goto, and initialize with cons_set, instead of declaring and initializing with cons_bind. This eliminates the stupid C++ error that goto skips a variable initialization (which happens even when it can be trivially proven that the has no next use at the goto site!) 2011-10-01 Kaz Kylheku Version 038 New eof directive. Fixes in skip directive to work very well with eof. Consecutive variable matching semantics improved; concept of double variable match introduced for unbound variable followed by regex variable. Directives collect and coll have keyword arguments for more control over their behavior. Parallel directives (all, some, none, ...) are available in horizontal mode. New choose directive for selecting one of numerous alternatives GC bugfix in new filtering code. The code has an issue compling with GNU C++ instead of C, which is something that is supported by this project. Not a release-blocking issue. Not easy to fix without restructuring some code. * txr.c (version): Bumped. * txr.1: Bumped version and set date. * configure (txr_ver): Bumped. 2011-10-01 Kaz Kylheku Maintaining C++ compiling (except for two issues that will need another commit). * filter.c: Include "gc.h" for prototype of protect. (struct filter_pair): Use const wchar_t *, so we can assign literals. (html_hex_continue): Ditto. * lib.c (and): Function renamed to andf, since and is a C++ operator. * lib.h (and): Declaration renamed. * match.c (match_files): Use of and updated to andf. 2011-10-01 Kaz Kylheku HACKING: Clarified that --vg-debug is also needed to turn on on the Valgrind support at run-time, in addition to building it in. 2011-10-01 Kaz Kylheku New test case, covering some filtering from HTML/XML. * Makefile: Defined TXR_ARGS for new test case. * tests/008/students.expected: New file. * tests/008/students.txr: New file. * tests/008/students.xml: New file. 2011-10-01 Kaz Kylheku * filter.c (filters, filter_init): Serious gc bug fixed: neglected to inform the garbage collector about the filters global variable. Ouch! 2011-10-01 Kaz Kylheku New test case under tests/008. * Makefile: Made previous TXR_ARGS for 008 specific to tokenizing test case, and introduced separate TXR_ARGS for this test case. * tests/008/configfile: New file. * tests/008/configfile.expected: New file. * tests/008/configfile.txr: New file. 2011-10-01 Kaz Kylheku Tokenizing test case, exercising for @(coll :gap 0) and horizontal @(choose :shortest ...). * Makefile: Defined TXR_ARGS for tests/008 directory. * tests/008/data: New file. * tests/008/tokenize.expected: New file. * tests/008/tokenize.txr: New file. 2011-10-01 Kaz Kylheku New test case, covering exception handling across nested function invocations. * Makefile (TEST): Test targets marked as .PHONY, because they are. * tests/007/except-1.expected: New file. * tests/007/except-1.txr: New file. 2011-10-01 Kaz Kylheku * parser.y (all_clause, some_clause, none_clause, maybe_clause, cases_clause, choose_clause, elem): Regression bug fix: bad list calls in parser, lacking nao terminator. 2011-10-01 Kaz Kylheku Regression bug fix: longest match variables broken by 2011-09-28 commit which introduced the double var match. * match.c (match_line): Handle case where modifier is t. * parser.y (var_op): Produce modifir as (t) rather than t. 2011-10-01 Kaz Kylheku * txr.1: Documented choose and horizontal mode for paralle constructs. 2011-10-01 Kaz Kylheku New directive: choose. * match.c (choose_s, longest_k, shortest_k): New variables. (match_line, match_files): Introduced choose directive. (match_init): Initialize new variables. * match.h (choose_s): Declared. * parser.l (yybadtoken): Handle CHOOSE. (CHOOSE): Clause added for returning this token. * parser.y: Added #include "match.h". (CHOOSE): New token symbol. (choose_clause): New nonterminal symbol. (clause): choose_clause added. (all_clause, some_clause, none_clause, maybe_clause, cases_clause): Abstract syntax tree tweaked. (choose_clause): New syntax. (elem): Abstract syntax trees tweaked for many clauses. New CHOOSE clauses. (out_clause): New error case for choose_clause. 2011-09-30 Kaz Kylheku * HACKING: Updated with debugging hints. 2011-09-29 Kaz Kylheku * txr.1: Clarified consecutive variables and documented double variable match. 2011-09-29 Kaz Kylheku * parser.l: Implemented backslash continuations in SPECIAL state, regexes and string literals. * txr.1: Documented. 2011-09-29 Kaz Kylheku * match.c (match_line): Implemented horizontal all, some, none, maybe and cases directives. (match_files): Recognize horizontal version of these directives by the presence of the extra symbol t and do not process. Also, bugfix in the all directive: not resetting the all_match flag when short circuiting out. * parser.y (clause_parts_h, additional_parts_h): New nonterminals. (elem): New clauses added. 2011-09-29 Kaz Kylheku * match.c (chars_k): New variable. (match_line): Keyword arguments in coll implemented. (match_init): chars_k variable initialized. * parser.l (COLL): Lexical syntax changed to allow for argument material. * parser.y (elem): Coll syntax rewritten for arguments. * txr.1: Updated. 2011-09-28 Kaz Kylheku * match.c (mingap_k, maxgap_k, gap_k, times_k, lines_k): New symbol variables. (match_lines): Keyword arguments in collect implemented. (match_init): New function. * match.h (match_init): Declared. * parser.l (COLLECT): Lexical syntax changed for COLLECT to allow for argument material. * parser.y (%union): obj renamed to val. (exprs_opt): New nonterminal. (collect_clause): Rewritten for arguments. * txr.c (main): Call to match_init introduced. 2011-09-28 Kaz Kylheku * match.c (match_line): Bugfix in double var. Do not prepend the next_pat to the specline if it is nil. 2011-09-28 Kaz Kylheku * match.c (match_line): Logic restructured to allow for regex variables which also have nested variables. Previously this code was assuming that the cases were mutually exclusive, and the parser happened to work that way. Also, added support for a "double var" match which occurs when an unbound variable is followed by a regex variable. This case should be allowed because it makes sense. It's similar to a variable followed by a regex, except that the regex is also a variable binding. * parser.y (o_elems_transform): New function. (o_elems_opt, o_elems_opt2, quasilit): Transform o_elems with new function. This is needed because subst_vars doesn't deal with the nested var syntax for consecutive variables. (var): New syntax case '{' IDENT exprs '}' elem. This allows consecutive variables to be nested in all cases. 2011-09-27 Kaz Kylheku * parser.y ('{', '}'): Nope, still not right. These must have exactly the same precedence as IDENT for this to work right, of course. 2011-09-27 Kaz Kylheku * parser.y ('{', '}'): Bugfix: precedence of these terminals was causing @foo@foo to be parsed differently from @foo@{foo}. We need consecutive variables to be specially folded in the syntax under a single var_s node. 2011-09-27 Kaz Kylheku * match.c (match_files): One more fix to this, argh. The test for !data should be done after matching, before incrementing to the next line. Then it is a true bottom of the loop test. This commit allows @(skip) @first_line @(skip nil 3) @(eof) to correctly match the first line of the input, not the fourth one from the bottom, since the second skip has an unbounded range. 2011-09-27 Kaz Kylheku * match.c (match_files): Another bugfix to skip. If a hard skip tries to go beyond EOF, then the query must fail. However, a skip to exactly EOF is fine. I.e. data can hit nil at the same time as the right number of skip iterations is performed. 2011-09-27 Kaz Kylheku * match.c (match_files): Bugfix in skip directive. We should try the match at least once even if there is no data after a hard skip, so that the query has an opportunity to do an explicit match for no data, as with @(endp). This commit makes possible queries like: @fourth_line_from_bottom @(skip 1 3) @(eof) This query depends on @(skip 1 3) not failing when it runs out of data, because @(eof) checks for htis. 2011-09-27 Kaz Kylheku * lib.c (eof_s): New symbol variable. (obj_init): New variable initialized. * lib.h (eof_s): Declared. * match.c (match_files): New @(eof) directive explicitly matches end of data. * txr.1: Updated. 2011-09-26 Kaz Kylheku Version 037 Short-circuiting behavior for @(all) and @(none). Obsolete forms of @(next) and @(output) syntax are gone. New filtering feature for substitutions in output. Filtering to and from HTML built in, plus user-defined filtering with deffilter. Bugfixes: wrong error message in throw; lack of support for escaping backslashes in literals and regexes. * txr.c (version): Bumped to 037. * txr.1: Set version to 037 and bumped date. * configure: Bumped txr_ver to 037. 2011-09-26 Kaz Kylheku Support &#NNNN; decimal escapes also. * filter.c (html_hex_continue): Bail with nil if no digits are collected. The &#x; syntax is not translated to anything. (html_dec_continue): New function. (html_hex_handler): Function renamed to html_numeric_handler. (filter_init): Change function-based trie node over to html_numeric_handler. 2011-09-26 Kaz Kylheku Support &#xNNNN; hex escapes in html. Bugfix in field formatting. chr function inlined. * filter.c (trie_value_at, trie_lookup_feed_char): Handle function case. (build_filter): New parameter, compress_p. (html_hex_continue, html_hex_handler): New functions. (filter_init): Add a function-based node to the from_html trie. * lib.c (chr): Function removed. (functionp) New function. * lib.h (chr): Declaration replaced with inline function. (functionp): Declared. * match.c (format_field): Bugfix: failed to apply filter that came in as an argument. 2011-09-26 Kaz Kylheku Bugfixes: Consistent escaping in various literals. Double backslash codes for single backslash. Output clause can be empty. * parser.l (char_esc): Backslash handled. Use internal_error rather than abort. (REGCHAR, LITCHAR): Backslash added to lexical syntax. * parser.y (output_clause): Allow empty output clause. 2011-09-26 Kaz Kylheku New feature: @(deffilter) Bugfix in @(throw) when non-symbol is thrown: exception message referred to the symbol throw rather than the erroneous object. * filter.c (build_filter_from_list, register_filter): New functions. * filter.h (register_filter): New function declared. * lib.c (deffilter_s): New variable defined. (chain): Function changed from single list argument to variable argument list to reduce the complexity of use. (do_and, and): New functions. (obj_init): deffilter_s initializatio added. * lib.h (deffilter_s, and): New declarations. (chain): Declaration updated to new function signature. (eq): Changed from macro to inline function. * match.c (do_output_line): Simplified expression involving chain. (do_output): Likewise. (match_files): Bugfix in error handling of throw. Implementation of deffilter. * txr.1: Documented deffilter. 2011-09-26 Kaz Kylheku Trie compression. Hash table iteration. Bugfix in typeof. * filter.c (trie_compress): New function. (trie_value_at, trie_lookup_feed_char, filter_string): Handle cons cell nodes in trie. (build_filter): Call trie_compress. * gc.c (cobj_destroy_op): Function renamed to cobj_destroy_stub_op since it doesn't do anything. (cobj_destroy_free_op): New function. * hash.c (struct hash_iter): New type. (hash_destroy): Function removed. (hash_ops): Reference to hash_destroy replaced with cobj_destroy_free_op. (hash_count, hash_iter_mark, hash_begin, hash_next): New functions. (hash_iter_ops): New static structure. * hash.h (hash_count, hash_begin, hash_next): New functions declared. * lib.c (hash_iter_s): New symbol variable. (typeof): Bugfix: TAG_LIT type tag not handled. (vecref): New function. (obj_init): Initialize hash_iter_s. * lib.h (cobj_destroy_op): Declaration renamed. (cobj_destroy_free_op, vecref): New functions declared. (hash_iter_s): New variable declared. * stream.c (string_in_ops, byte_in_ops): cobj_destroy_op renamed to cobj_destroy_stub_op. 2011-09-25 Kaz Kylheku Filtering from HTML implemented. * filter.c (from_html_k): New variable. (to_html_table): New static array. (filter_init): Intern new symbol. Instantiate new filter and store in filters hash. 2011-09-25 Kaz Kylheku Filtering feature for variable substitution in output. * filter.c, filter.h: New files. * Makefile (OBJS): filter.o added. * gc.c (mark_obj): Mark new alloc field of string objets. * hash.c (struct hash): New member, userdata. (hash_mark): Mark new userdata member of hash. (make_hash): Initialize userdata. (get_hash_userdata, set_hash_userdata, hashp): New functions. * hash.h (get_hash_userdata, set_hash_userdata, hashp): New functions declared. * lib.c (getplist, string_extend, cobjp): New functions. (string_own, string, string_utf8): Initialize new alloc field to nil. (mkstring, mkustring): Initialize new alloc field to actual size. (length_str): When length is computed and cached, also compute and cache alloc. (init): Call filter_init. * lib.h (string string): New member, alloc. (num_fast): Macro converted to inline function. (getplist, string_extend, cobjp): New functions declared. * match.c (match_line): Follows change of modifier s-exp syntax. (format_field): New parameter, filter. New modifier syntax parsed. Filter retrieved, and applied. (subst_vars): New parameter, filter. Filter is either applied in this function or passed to format_field, as needed. (eval_form): Pass nil to new parameter of subst_vars. (do_output_line): New parameter, filter. Passed down to subst_vars. (do_output): New parameter, filter. Passed down to do_output_line. (match_files): Pass nil filter to subst_vars in cat directive. Output directive refactored to parse keywords, extract the filter and pass down to do_output. * parser.y (regex): Generate (sys:regex regex syntax ...) instead of (regex syntax ...). (elem, expr): Updated w.r.t. regex syntax change. (var): Cases '{' IDENT regex '}' and '{' IDENT NUMBER '}' are removed. new syntax '{' IDENT exprs '}' to handle these more generally and allow for keywords. * txr.1: Updated. 2011-09-23 Kaz Kylheku Numeric constants become real constants. Vector code cleanup. * lib.h (zero, one, two, negone, maxint, minint): Extern declarations removed, macros introduced for these identifiers. * lib.c (zero, one, two, negone, maxint, minint): File scope definitions removed. (vector): Use vec_alloc and vec_fill enums instead of constants. (obj_init): Remove references to removed definitions. 2011-09-23 Kaz Kylheku * LICENSE, Makefile, configure, gc.c, gc.h, hash.c, hash.h, lib.c, lib.h, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h, stream.c, stream.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Updated copyright year. 2011-09-23 Kaz Kylheku * match.c, parser.y: Support for old output syntax removed. Leading :nothrow with trailing material is an error now. * txr.1: Updated. Made note of errors in pipes being asynchronous. 2011-09-23 Kaz Kylheku * tests/002/query-1.txr: Old next syntax rewritten to new. 2011-09-23 Kaz Kylheku * match.c (match_files): Some cleanup in preparation of new features. Support for obsolescent @(next) syntax is gone. 2011-09-23 Kaz Kylheku Semantics tweak: short circuiting behavior for @(all) and @(none). * match.c (match_files): Added a couple of break statements. * txr.1: Updated. 2011-09-22 Kaz Kylheku Version 036 Extension to @(skip). * txr.c (version): Bumped to 036. * txr.1: Set version to 036. * configure: Bumped txr_ver to 036. 2011-09-22 Kaz Kylheku Useful second argument in skip directive for skipping a minimum number of lines. * match.c (match_files): New behavior in skip_s case. * txr.1: Documented. 2010-10-05 Kaz Kylheku Version 035 Fixes garbage-collection-related problem affecting @(freeform) that was revealed by "make tests" on x86-64 system, gcc 4.3.2. Fixes show-stopper stupidity, again: a query that matches the end of data terminates successfully rather than failing. This is because version 032 did not properly address the breakage introduced in the 2010-01-21 change to match.c involving the dataline variable. The "fix" only prevented the program from aborting in that situation. * txr.c (version): Bumped to 035. * txr.1: Set version to 035. * configure: Bumped txr_ver to 035. 2010-10-04 Kaz Kylheku * match.c (match_files): Bugfix. A (sub)query that runs out of data lines to match must fail. Extra data lines relative to the spec are tolerated; extra spec lines unmet by data aren't. 2010-10-03 Kaz Kylheku * lib.h (lazy_string): Fix incorrect comment. * lib.c (split_str, split_str_set): It is necessary to protect input parameters against GC, because we cache their internal pointers, after which we no longer refer to the objects themselves. Moreover, we perform object allocation, and then keep using the internal pointers. 2010-09-30 Kaz Kylheku * txr.1: Fix formatting problem. 2010-03-01 Kaz Kylheku * txr.1: Fix inaccuracies: files are not read into memory all at once, and a query doesn't execute if it had errors. 2010-02-28 Kaz Kylheku Version 034 Patched up broken @(freeform) directive. * txr.c (version): Bumped to 034. * txr.1: Set version to 034. * configure: Bumped txr_ver to 034. 2010-02-28 Kaz Kylheku New testcases for freeform. * tests/006/data: New UTF-8 file. * tests/006/freeform-1.txr: Likewise. * tests/006/freeform-1.expected: Likewise. * tests/006/freeform-2.txr: Likewise. * tests/006/freeform-2.expected: Likewise. * Makefile (TXR_ARGS): New target-specific assignment to set data for test case set 006. 2010-02-27 Kaz Kylheku * lib.c (length_str_gt, length_str_ge, length_str_lt, length_str_le): Added missing support for literal string type. 2010-02-27 Kaz Kylheku * lib.c (search_str): Bugfix for empty haystack case: checks for end of string must use postincrement on the index, otherwise the access goes past the null terminator. 2010-02-27 Kaz Kylheku * match.c (match_lines): Bugfix in freeform directive. If the virtual line is partially matched, the remainder of the line is folded back into list form. In this case, the data line number must be incremented. Otherwise the calling context may conclude that no progress was made, and skip a line of input. I.e. the unmatched part of the input is a new line, even if there had originally been no line break at that point. 2010-02-27 Kaz Kylheku * lib.h (split_str_sep): Declared. * lib.c (split_str_sep): New function. (split_str): Semantics changed; the second argument is not a set of separator characters (like in split_str_sep) but rather a separator string. Fixed bug: if the input string is empty, the output list is empty. This caused infinite looping behavior in @(freeform). 2010-02-24 Kaz Kylheku * lib.c (init_str): Bugfix: copy only len characters, not len + 1, so that we don't clobber the null terminator in the target string, or try read past the end of the source data. This affects the @(freeform) directive. 2010-01-26 Kaz Kylheku Version 033 Addressed exponential memory behaviors in derivative-based regex matching. * txr.c (version): Bumped to 033. * txr.1: Set version to 033. * configure: Bumped txr_ver to 033. 2010-01-26 Kaz Kylheku hash.c (hash_process_weak): There is no point in fixing up the type codes of spuriously reached nodes; reached objects will not be removed by weak processing and so it's better to just detect those situations and short-circuit. 2010-01-26 Kaz Kylheku Optimization in derivative-based regex engine. Exponential memory consumption behavior was observed when matching the input aaaaaa.... against the regex a?a?a?a?....aaaa.... The fix is to eliminate common subexpressions from the derivative for the or operator. * lib.c (memqual, mapcon): New functions * lib.h (memqual, mapcon): Declared. * regex.c (flatten_or, unflatten_or, unique_first, reduce_or): New functions. (reg_derivative): Apply reduce_or to the constructed disjunction. 2010-01-25 Kaz Kylheku Fixing weak hash tables. * gc.c, gc.h (REACHABLE, FREE): Moved to header. * hash.c (hash_mark): Fix broken list push code. (has_process_weak): Defend against conservative garbage collector. We cannot trust that the conses which make up the chain backbone and hash entry pairs are unmarked, because the hash vectors might be reached through spurious references. 2010-01-25 Kaz Kylheku Version 032 Fix showstopper stupidity. * match.c (match_files): Fix incorrect change involving dataline variable made on 2010-01-21; failure to check for end of data. * txr.c (version): Bumped to 032. * txr.1: Set version to 032. * configure: Bumped txr_ver to 032. 2010-01-25 Kaz Kylheku Version 031 Addresses some spurious object retention problems related to the GC's conservative scan of the stack. * txr.c (version): Bumped to 031. * txr.1: Set version to 031. * configure: Bumped txr_ver to 031. 2010-01-25 Kaz Kylheku * match.c (match_files): Workaround for GC issue discovered on Red Hat EL 4 with gcc 3.4.3. In the collect loop, set car(success) to nil. Somehow the generated code hangs on to the last matching position for a regex, preventing GC. 2010-01-24 Kaz Kylheku * stream.c (vformat_num): Fix bad width calculation. 2010-01-21 Kaz Kylheku Fix for unbounded memory growth problem reproduced with GCC 4.4.1 on 32 bit x86 Fedora. This happens because the lazy list variable ``data'' in the match_files function is optimized to a register, but a stale value of that variable persists in the backing storage. * gc.h (gc_hint): New macro. (gc_hint_func): Declared. * gc.c (gc_hint_func): New function. * match.c (match_files): Use gc_hint on the data lazy list. 2010-01-21 Kaz Kylheku * match.c (match_files): Reduce scope, and bogus use of, dataline variable. 2010-01-19 Kaz Kylheku Version 030 Fixed grammar conflicts. R1~R2 syntax supported in regexes. * txr.c (version): Bumped to 030. * txr.1: Set version to 030. * configure: Bumped txr_ver to 030. 2010-01-19 Kaz Kylheku * parser.y (regex): Getting rid of empty '/' '/' production again. (regexpr): Re-introducing empty production; this time using %prec LOW trick to give this interpretation the lowest possible precedence. Thus expressions like /&/ work again. (regbranch): New production to allow R1~R2 to be valid. * txr.1: Documented. 2010-01-19 Kaz Kylheku * parser.l (grammar): The ^ character is no longer considered a special regex token, just a regular character. * parser.y (LOW): New phony terminal symbol, used as place holder for lowest precedence. (grammar): Fixed numerous conflicts in regex section by refactoring. The regex nonterminal no longer has an empty derivation. A regex character class no longer has an empty derivation; this is handled by special rules. Ambiguity around ^ is resolved; this is parsed as a regular character and specially recognized. Ambiguity between catenation of terms and postfix operators resolved in favor of shift by giving catenation low precedence using %prec LOW. 2010-01-18 Kaz Kylheku Version 029 Performance optimizations of derivative-based regexes. New syntax: [] and [^]. Saner semantics for % operator. * txr.c (version): Bumped to 029. * txr.1: Set version to 029. * configure: Bumped txr_ver to 029. 2010-01-18 Kaz Kylheku * regex.c (reg_derivative_list, reg_derivative): Recognition of cases to reduce consing. In reg_derivative_list, we avoid consing the full or expression if either branch is t, and also save a cons when the first element has a null derivative. In reg_derivative the oneplus and zeroplus cases are split, since zeroplus can re-use the input expression, when it's just a one-character match, deriving nil. 2010-01-18 Kaz Kylheku Adjust semantics of non-greedy operator R%S, to avoid the broken case whereby R%S matches nothing at all when S is not empty but equivalent to empty, or more generally when S is nullable. A much nicer definition is ``the intersection of R* and the set of all strings that do not contain a non-empty substring that matches S, followed by S''. * regex.c (dv_compile_regex): Adjust syntactic sugar for the % operator, taking advantage of the reg_nullable function to keep the simpler syntactic sugar for cases where S is not nullable. * txr.1: Document accordingly. 2010-01-17 Kaz Kylheku * parser.y (regterm, regclass): Relocate handling of empty [] into regterm, via empty derivation. 2010-01-16 Kaz Kylheku Regex syntactic tweaks: support the [] syntax to match no character and [^] as its complement, being synonymous with the wildcard dot. * parser.y (regterm): Added new productions. * txr.1: Documented. 2010-01-16 Kaz Kylheku Version 028. Code cleanup. New additional regex implementation using regex derivatives, providing new operators: regex complement, intersection, non-greedy match. Regex syntax bugfixes. * txr.c (version): Bumped to 028. * txr.1: Bumped version to 028. * configure: Bumped txr_ver to 028. * match.c (dest_bind): Remove spurious syntax. 2010-01-15 Kaz Kylheku * txr.1: Get rid of parens from regex operator descriptions. Correct wrong text: all operators can take an empty regex. Clarify escaping rules within a character class. Describe Kleene and non-greedy behavior more accurately. 2010-01-15 Kaz Kylheku * genman.txr, txr.1: Encode version differently; extract from text during HTML conversion. 2010-01-15 Kaz Kylheku Automate the maintenance of the HTML-ized man page. * Makefile (txr-manpage.html): New target, generated from txr.1 man page. * genman.txr: New txr query to transform the output of man2html. 2010-01-15 Kaz Kylheku Implemented non-greedy operator. * lib.c (nongreedy_s): New symbol globals. (obj_init): New symbol interned. * lib.h (nongreedy_s): Declared. * parser.l (grammar): Support % as a regex operator. * parser.y (grammar): Define '%' nonterminal, on th esame precedence level as '*'. (regterm): Add the % expression as a term. (regchar): Recognize % as ordinary character in a character class. Also, bugfix: recognize & and ~ similarly. * regex.c (dv_compile_regex): Implement % as a syntactic sugar via an algebraic transformation to a more complex expression. (regex_requires_dv): A regex containing the % operator requires derivatives. * txr.1: Documented %; moved exotic regex notes to end of document. 2010-01-15 Kaz Kylheku * regex.c (reg_derivative_list): Bugfix: wrong algebra, taking a double derivative of the first item. 2010-01-15 Kaz Kylheku * txr.1: Fix accidental edit garbage. 2010-01-14 Kaz Kylheku * txr.1: Fix accidental .b, which should have been .B. Revised description of regex operators. Added section on intersection and complement, which may not be familiar to regex users. 2010-01-14 Kaz Kylheku * regex.c (reg_derivative): Bugfix: remove invalid algebraic reductions in the derivative for the or operator. 2010-01-13 Kaz Kylheku Bugfix: allow unescaped / to be used in regex character classes. To do this, we no longer make the lexer look for the terminating slash which ends the regex syntax. This is driven by the parser, which calls a special function in the lexer to indicate that the regex parsing is done. * parser.h (end_of_regex): New function declared. * parser.l (REGCLASS): Unused start condition removed. (grammar): A slash character in the REGEX start condition is now simply returned as an operator token; no popping of the state stack takes place. The scanner stays in REGEX mode. (end_of_regex): New function. * parser.y (regex): Call end_of_regex when a regex is successfully scanned through to terminating slash, or if a syntax error occurs. (regchar): Can derive a / terminal now, thus including it in a regex character class. 2010-01-13 Kaz Kylheku * parser.y (precedence): bugfix: character classes like this [^*] being treated as a non-complemented set of two characters. 2010-01-13 Kaz Kylheku Dynamically determine which regex implementation to use: NFA or derivatives. The default behavior is NFA, with derivatives used if the regular expression contains uses of complement or intersection. The --dv-regex option forces derivatives always. * regex.c (opt_derivative_regex): Default value is 0 now. (regex_requires_dv): New function. (regex_compile): If regex_requires_dv function reports true, or if the opt_derivative_regex flag is true, treat the regex with the derivative-based implementation. * txr.c (txr_main): Implemented --dv-regex option to set the opt_derivative_regex flag. 2010-01-13 Kaz Kylheku * lib.h (c_num): Remove redundant declaration. 2010-01-13 Kaz Kylheku Impelement derivative-based regular expressions. * lib.c (chset_s, compiled_regex_s): New symbol globals. (obj_init): New symbols interned. * lib.h (chest_s, compiled_regex_s): Declared. * match.c (match_line, match_files): Use regexp predicate function instead of typeof for detecting regex. * parser.y (regexpr, regbranch, regterm): Minor syntactic refactoring. * regex.h (union nfa_state, nfa_state_t, struct nfa, enum nfam_result, nfa_machine_t, nfa_compile_regex, nfa_free, nfa_run, nfa_machine_reset, nfa_machine_init, nfa_machine_cleanup, nfa_machine_feed, nfa_machine_match_span, regex_nfa): Declarations for internal material removed from header, some moved into regex.c. * regex.c: Includes txr.h now to get declaration of new option global. (union nfa_state, nfa_state_t, struct nfa, nfa_compile_regex, nfa_free, nfa_run, regex_nfa): Declarations moved from regex.h. (enum nfam_result, nfa_machine_reset, nfa_machine_init, nfa_machine_cleanup, nfa_machine_feed, nfa_machine_match_span): Renamed from nfam_* and nfa_machine_* to regm_* and regex_machine_*. Functions made static. Regex machine is now polymorphic: the machine is instantiated based on whether the regex is NFA or derivative type, and the behavior of the functions is type dependent. (nfa_machine_t): Renamed to regex_machine_t, now typedef name for union regex_machine. (struct dv_machine, union regex_machine): New types. (struct nfa_machine): New member is_nfa. A few members rearranged, so that union common members are at the start of the structure. (opt_derivative_regex): New global added. (char_set_compile, char_set_cobj_destroy): New function. (char_set_cobj_ops): New static structure. (nfa_compile_set): Refactored to use char_set_compile; made static. (nfa_compile_list): New function. (nfa_compile_regex): Refactored to follow new syntax from parser.y; made static. (nfa_free, nfa_run, regex_nfa): Made static. (dv_compile_regex, reg_nullable_list, reg_nullable, reg_derivative_list, reg_derivative, dv_run): New functions. (regex_compile): Can compile either kind of regex now. (search_regex, match_regex): Decoupled from dependency on NFA implementation. * txr.h (opt_derivative_regex): Declared. * dep.mk: Regenerated. 2010-01-06 Kaz Kylheku Remove incorrect implementation of extended regex operations (complement, intersection). The syntax extensions documentation are retained. * regex.c (struct any_char_set, struct small_char_set, struct displaced_char_set): refs field removed. (nfa_kind_t): Removed enum members nfa_super_accept, nfa_reject, nfa_compl_empty, nfa_compl_wild, nfa_compl_single, nfa_compl_set. (nfa_super_accept_state, nfa_is_accept_state): Removed. (char_set_create, char_set_destroy): Reverted. (char_set_clone): Removed. (nfa_state_empty_convert, nfa_state_merge): Reverted. (nfa_compl_state, nfa_compl): Removed. (nfa_compile_regex, nfa_all_states, nfa_closure, nfa_move): Reverted. 2010-01-06 Kaz Kylheku Some fine tuning in regex grammar. * parser.y (regex): Empty regex handled by allowing regex to generate empty, rather than a special case production for '/' '/'. Thus empty subexpressions are possible. (regbranch, regterm): Complement is handled in regbranch, so that it has lower precedence than aggregation. 2010-01-05 Kaz Kylheku Implemented the regular expression ~ and & operators. This turns out to be easy to do in NFA land. The complement of an NFA has exactly the same number and configuration of states and transitions, except that the states have an inverted meaning; and furthermore, failed character transitions are routed to an extra state (which in this impelmentation is permanently allocated and shared by all regexes). The regex & is implemented trivially using DeMorgan's. Also, bugfix: regular expressions like A|B|C are allowed now by the syntax, rather than constituting syntax error. Previously, this would have been entered as (A|B)|C. * lib.c (comp_s, and_s): New symbol globals. (obj_init): New symbols interned. * lib.h (comp_s, and_s): Declared. * parser.l (grammar): Provide new ~ and & tokens in REGEX state. * parser.y (regexpr): Constituents of '|' are regexprs, rather than regbranches (see bugfix note above). The '&' operator is added. (regterm): The '~' operator is added. * regex.c (struct any_char_set, struct small_char_set, struct displaced_char_set): refs field added. (nfa_kind_t): New enum members nfa_super_accept, nfa_reject, nfa_compl_empty, nfa_compl_wild, nfa_compl_single, nfa_compl_set. (nfa_super_accept_state): New static structure. (nfa_is_accept_state): New inline function. (char_set_create): Initialize reference count to 1. (char_set_destroy): Decrement refcount, free if zero. (char_set_clone): New function. (nfa_state_empty_convert, nfa_state_merge): Handle nfa_reject state, the complement of nfa_accept. (nfa_compl_state, nfa_compl): New functions. (nfa_compile_regex): Handle new operators. (nfa_all_states, nfa_closure): Handle new state types. (nfa_move): Handle new types according to special rules: the new complemented states that have character transitions have a next move to the super-accept state if they do not match the input character. * txr.1: Documented new regex operators. 2009-12-17 Kaz Kylheku * lib.c (make_package, find_package): Eliminate declaration in the middle of statement block. * lib.h (TAG_MASK): Becomes type cnum rather than long. (nao): Based off 1 rather than -1 to avoid left shift of negative number. 2009-12-09 Kaz Kylheku * parser.l (YYINPUT): Fix signed/unsigned comparison. 2009-12-09 Kaz Kylheku * hash.c (sethash): New function. * hash.h (sethash): Declared. * lib.c (cobj_handle): New function. * lib.h (cobj_handle): Declared. 2009-12-08 Kaz Kylheku All COBJ operations have default implementations now; no null pointer check over struct cobj_ops operations. New typechecking function for COBJ objects. * gc.c (finalize): Assume function pointer destroy is not null. (cobj_destroy_op): New function. (mark_obj): Assume function pointer mark is not null. (cobj_mark_op): New function. * hash.c (ll_hash): Assume function pointer hash is not null. (cobj_hash_op): New function. (hash_equal): Function removed. (hash_ops): Replaced hash_equal with cobj_equal_op. * lib.c (class_check, cobj_equal_op): New functions. * lib.h (cobj_equal_op, cobj_destroy_op, cobj_mark_op, cobj_hash_op): Declarations added. (system_package, user_package, class_check): Declaration added. * regex.c (regex_equal): Function removed. (regex_obj_ops): regex_equal replaced with cobj_equal_op. * stream.c (common_equal): Function removed. (stdio_ops, pipe_ops, string_in_ops, byte_in_ops, string_out_ops, dir_ops): common_equal replaced with cobj_equal_op, and all previously null function pointers populated with default functions. 2009-12-05 Kaz Kylheku More void * to mem_t * conversion. * stream.c (stdio_put_char_callback, stdio_get_char_callback, stdio_put_string, stdio_put_char, stdio_snarf_line, stdio_get_char): Convert void * to mem_t *. * utf8.c (utf8_encode, utf8_decode): Convert void * to mem_t *. * utf8.h (utf8_encode, utf8_decode): Update declarations. 2009-12-04 Kaz Kylheku Eliminate the void * disease. Generic pointers are of mem_t * from now on, which is compatible with unsigned char *. No implicit conversion to or from this type, in C or C++. * hash.c (make_hash): Convert void * to mem_t *. * lib.c (oom_realloc, chk_malloc, chk_realloc, vec_set_fill, cobj, init): Convert to using mem_t *. * lib.h (mem_t): New typedef. (struct cobj): Convert void * to mem_t *. (oom_realloc, chk_malloc, chk_realloc, init): Declarations updated. * regex.c (regex_compile): Convert void * to mem_t *. * stream.c (snarf_line, string_out_put_string, make_stdio_stream, make_pipe_stream, make_string_input_stream, make_string_byte_input_stream, make_string_output_stream, get_string_from_stream): Convert void * to mem_t *. * txr.c (oom_realloc_handler): Convert void * to mem_t *. 2009-12-03 Kaz Kylheku * gc.c (heap_min_bound, heap_max_bound): New static globals. (more): Update heap_min_bound and heap_max_bound. (in_heap): Do early rejection tests on the pointer. If it's not aligned, or it's completely outside of the bounding box of the heap area, short circuit to false. 2009-12-03 Kaz Kylheku Version 027. Code cleanup. gc-related bugfix. Improved file copying semantics of make install, and adherence for DESTDIR convention. * txr.c (version): Bumped to 027. * txr.1: Bumped version to 027. * configure: Bumped txr_ver to 027. 2009-12-03 Kaz Kylheku * Makefile (CFLAGS): Better test for g++, when removing warning options not appropriate for g++. Sometimes g++ may be called something that dosn't end in g++, like g++4. 2009-12-03 Kaz Kylheku * parser.l (YY_NO_UNPUT): Removed superfluous #define. This is not needed because suppressing generation of unput is requested via the %option. In scanners generated by the legacy version of flex, 2.5.4, still widely in use. this redundancy leads to a multiple #define YY_NO_UNPUT and a compiler warning. 2009-12-03 Kaz Kylheku Fix for failing test suite on MIPS machine, due to gc failing to mark a local variable in txr_main. * txr.c (txr_main): Changed from internal linkage to external. This prevents gcc -O2 from inlining txr_main into main. We need separate stack frames for main and txr_main, in order to be sure that when walking to the bottom of stack pointer, we visit all locals in main. This is the whole reason why there is a separate txr_main. 2009-12-02 Kaz Kylheku * Makefile (tests): Don't depend on the executable. Otherwise, during make install-tests, if it doesn't exist in the install directory, a gcc compile command gets deposited into the run.sh generated script. (install-tests): Fixes to make this work when using a separate build directory. Split the cpio -p job into a cpio -i piping into cpio -o. 2009-12-02 Kaz Kylheku * Makefile (install-tests): New target. Provides a way to make the test cases part of the installation, and a generated script to run the commands on the installation host. 2009-12-02 Kaz Kylheku Fix annoyances with dependency generation, such as picking up local files that are not in the project. * Makefile (depend): Rule passes object file names as arguments to depend.txr script. * depend.txr: Changed to take names of object files from command line, rather than scanning the directory for all .c files. Switched to new style next directives, using quasiliterals. * dep.mk: Regenerated. 2009-11-28 Kaz Kylheku * Makefile (CFLAGS): If the compiler matches the pattern %g++, then remove some C-front-end-specific warnings from CFLAGS, which the g++ front end will complain about. 2009-11-28 Kaz Kylheku * Makefile (CFLAGS): add -Dlint to CFLAGS when compiling y.tab.o. This suppresses some warnings from a byacc-generated parser, and gets rid of a useless static sccsid array. May help with Bison-generated parser also. 2009-11-28 Kaz Kylheku * parser.l: Use flex options to suppress generation of the unused functons yyunput and yyinput, thus getting rid of some compiler diagnostics. 2009-11-28 Kaz Kylheku Code cleanup. All private functions static. Private stuff in regex module not exposed in header. Etc. * configure (diag_flags): Add -Wmissing-prototypes and -Wstrict-prototypes. * gc.c (more): Turn into prototyped definition with (void). * gc.h (unmark): Declared. * hash.c (hash_equal, hash_destroy, hash_mark, hash_grow): Private functions defined static. * lib.c (flatten_helper, do_bind2, do_bind2other): Likewise. * lib.h (make_package, merge, d): Declared. * match.c (dump_shell_string, dump_byte_string, dump_var, dump_bindings, depth, weird_merge, bindable, dest_bind, match_line, format_field, subst_vars, eval_form, complex_open, complex_snarf, complex_stream, robust_length, bind_car, bind_cdr, extract_vars, extract_bindings, do_output_line, do_output, match_files): Private functions defined static. (map_leaf_lists, complex_close): Unused functions removed. * parser.h (yyerror): Declared. * regex.c (bitcell_t, BITCELL_ALL1, CHAR_SET_SIZE, chset_type_t, cset_L0_t, cset_L1_t, cset_L2_t, cset_L3_t, struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set, union char_set, nfa_kind_t, struct nfa_state_accept, struct nfa_state_empty, struct nfa_state_single, struct nfa_state_set, struct nfa_state, struct nfa_machine): Definitions moved here from regex.h file. (L0_fill_range, L0_contains, L1_full, L1_fill_range, L1_contains, L1_free, L2_full, L2_fill_range, L2_contains, L2_free, L3_fill_range, L3_contains, L3_free, char_set_create, char_set_destroy, char_set_compl, char_set_add, char_set_add_range, char_set_contains, nfa_state_accept, nfa_state_empty, nfa_state_single, nfa_state_wild, nfa_state_free, nfa_state_shallow_free, nfa_state_set, nfa_state_empty_convert, nfa_state_merge, nfa_make, nfa_combine, nfa_compile_set, nfa_all-states, nfa_closure, nfa_move): Private functions defined static. * regex.h (bitcell_t, BITCELL_ALL1, CHAR_SET_SIZE, chset_type_t, cset_L0_t, cset_L1_t, cset_L2_t, cset_L3_t, struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set, union char_set, nfa_kind_t, struct nfa_state_accept, struct nfa_state_empty, struct nfa_state_single, struct nfa_state_set, struct nfa_state, struct nfa_machine): Definitions removed. (char_set_created, char_set_destroy, char_set_compl, char_set_add, char_set_add_range, char_set_contains nfa_state_accept, nfa_state_empty, nfa_state_single, nfa_state_wild, nfa_state_set, nfa_state_free, nfa_state_shallow_free, nfa_state_merge): Extern declarations removed. * stream.c (stdio_stream_print, stdio_stream_destroy, stdio_stream_mark, stdio_get_char, stdio_get_byte, string_in_stream_mark, vformat_str): Private functions defined static. * txr.c (oom_realloc_handler, help, hint, remove_hash_bang_line): Likewise. * unwind.c (uw_unwind_to_exit_point): Likewise. 2009-11-28 Kaz Kylheku * configure: Workaround in banner code for coreutils printf %.*s bug. 2009-11-27 Kaz Kylheku Switching to DESTDIR convention for install. Make install step does some things more correctly now, without relying on the install program. * configure: Help text doesn't refer to ``Makefile variables'' but ``make variables'', or ``variables in config.make''. The install_prefix variable becomes DESTDIR now in config.make. * Make (INSTALL): New rule body macro. (install): Uses of mkdir -p and cp switched to a call to the INSTALL macro. 2009-11-26 Kaz Kylheku Version 026. Fixed wchar_t build problem in parser.y. Improved configure script to auto-detect yacc program. Txr works with either Berkeley yacc (byacc) or Bison. Fixed some two uninitialized memory bugs. Valgrind API is now used to integrate GC memory manager with valgrind. The symbols nothrow and args in the next directive are now keyword symbols, written :nothrow and :args. (Breaks backward compatibility; sorry!) * txr.c (version): Bumped to 026. * txr.1: Bumped version to 026. * configure: Bumped txr_ver to 026. 2009-11-26 Kaz Kylheku Not all systems have a yacc alias for the yacc program. txr is known to work with two yacc implementations: GNU Bison and Berkeley yacc. Let's add some auto-detection for yacc. * Makefile: use "include" rather than "-include" for including config.make, so that make fails if the file does not exist. (conftest.yacc): New target. Just outputs the value of the variable expansion of $(YACC). * configure (yaccname): New variable. (gen_config_make): New function. Steps added to test for existence of various yaccs. 2009-11-25 Kaz Kylheku * gc.c (mark_mem_region): Bugfix: do not mess with the valgrind accessibility of the heap object if valgrind debugging is not enabled. 2009-11-25 Kaz Kylheku * parser.y (grammar): Fixes for bison 2.4.1. Remove superfluous action in chrlit. Include for abort. 2009-11-25 Kaz Kylheku Refinements to Valgrind support. * gc.c (mark_mem_region): If a pointer from the stack is valid for the heap, it may point to a free object, which is marked in accessible. We must grant the garbage collector access to the object. If the object is free, close off access. This is not 100% correct, because if the object is accessible but undefined, then we end up flipping it to defined. (sweep): Before sweeping each heap, mark the entire block as defined. This is necessary because sweep accesses blocks, which may be free, and thus inaccessible. Then, during the sweep, any block which is already free must be marked inaccessible again. This means that the remaining blocks that are reachable become defined. Here that is okay, because gc has marked all those blocks. If any of them had uninitialized members, that would have been caught by valgrind during the marking phase, if not sooner. 2009-11-25 Kaz Kylheku More Valgrind support. New option --vg-debug which turns on Valgrind protection of free blocks. This works independently of --gc-debug. * gc.c (opt_vg_debug): New conditionally defined global variable. (more): Mark entire heap of free blocks inaccessible, if vg debugging is enabled. (make_obj): If vg debugging enabled, mark returned block as accessible, but undefined, and take care to grant self temporary access while manipulating the free list. (finalize): Removed old debugging logic of not freeing strings and vectors during gc debug. If the null pointers are ever a problem during debugging, they can be checked inside obj_print, and turned into # notation. (sweep): Switch to FIFO free block recycling if vg debugging is enabled, just like when gc debugging is enabled. Mark freed blocks as inaccessible, careful to grant self temporary access while manipulating the free list. * txr.c (txr_main): Parse the --vg-debug option. * txr.h (opt_vg_debug): Conditionally declared. 2009-11-25 Kaz Kylheku Fix a build breakage that may happen on some platforms. The parser.y file includes "utf8.h", which uses the the type wint_t. It also includes "lib.h" which uses "wchar_t". But it fails to include any headers which define these types. The generated y.tab.c picks up wchar_t by the Bison-inserted inclusion of , so that's how we got that. But wint_t does not come from any of the headers---if they are standard-conforming. * parser.y: Add inclusion of and . 2009-11-25 Kaz Kylheku More valgrind integration. Vector objects keep displaced pointers to vector data; they point to element 0 which is actually the third element of the vector. If an object is only referenced by interior pointers, Valgrind reports it as possibly leaked. This change conditionally adds a pointer to the true start of the vector, if Valgrind support is enabled. * lib.h (struct vec): vec_true_start, new member. * lib.c (vector, vec_set_fill): Maintain vec_true_start. 2009-11-25 Kaz Kylheku First stab at Valgrind integration. First goal: eliminate false positives when gc is accessing uninitialized parts of the stack. * configure (valgrind): New variable. Defaults to false (do not build valgrind support). New check for whether the valgrind API is actually avilable if --valgrind is selected. (HAVE_VALGRIND): Conditionally added to config.h. * gc.c: Conditionally include valgrind memcheck.h header. (mark_mem_region): After pulling out a value from the stack, mark that copy as defined memory using VALGRIND_MAKE_MEM_DEFINED. (mark): Removed check for a registered root variable pointer being null; this cannot happen, unless someone registers a null pointer, or the stack is trashed. The comment about a possible null was misleading. 2009-11-24 Kaz Kylheku Fix uninitialized memory locations. * hash.c (make_hash): Uninitialized h->count member. * lib.c (mkustring): Preallocated string buffer to have its null terminator byte initialized, because the caller does not do so (e.g. see lit_har_helper in parser.y). The calling module is responsible for initializing all API-accessible parts of the string, but the null belongs to the string implementation. 2009-11-24 Kaz Kylheku Switching to keyword symbols for :args and :nothrow. * lib.c (args_s, nothrow_s): Renamed to args_k and nothrow_k. (flattn_s): Renamed to flatten_s. (obj_init): args_k and nothrow_k interned in keyword package. * lib.h (args_s, nothrow_s, flattn_s): Declarations updated. * match.c (match_files): Follow name changes. * tests/004/query1.txr: Changed nothrow to :nothrow. * txr.1: Documentation updated. 2009-11-24 Kaz Kylheku /Now/ this can be released as 025. utf8.c (utf8_from_uc): Fix bug introduced several commits ago (porting to C++). Caught by regression test suite. Found using git bisect. 2009-11-24 Kaz Kylheku Version 025 External changes: Flattening an empty list produces an empty list, not (()), which is a list containing an empty list. Tightened up semantics of bind, merge and other forms. Fixed false positives in binding. More bugfixes in the parser leading to garbage error messages. (Still no regression test cases for error cases, oops). Fixed crash in regexp function. Symbol packages added. Keyword symbols (symbols in keyword package) introduced. Clarified semantics that t, nil and keywords evaluate to themselves. Fixed bugs in the system for building in a separate directory. Configuration script now tests the compiler for sanity, and runs compiler-based tests to detects which integer type to use for casting an obj_t * value to a number, and what specifiers to use for inline functions. Internal changes: Macros replaced with inline functions. Uses of obj_t * replaced with val typedef everywhere. Exceptions occuring during early initialization no longer lead to an infinite recursion due to streams not working. The long type is no longer used, but a configured typedef. Configure script now spits out a "config.h" header that is widely included. Symbol globals renamed to _s naming scheme. Code made portable to C++. A new configure flag --ccname make it easier to switch compilers. * txr.c (version): Bumped to 025. * txr.1: Bumped version to 025. * configure: Bumped txr_ver to 025. 2009-11-24 Kaz Kylheku Auto-detect what specifiers to use for inline functions. Allow compiler command to be set independently of full path for easier compiler switching. * Makefile (conftest.o): Target removed. What this rule does is already an implicit rule; and nowhere else in the Makefile are there rules for .c -> .o. (conftest2): New target, for two-translation-unit config test program. (INLINE_FLAGS): Removed. * configure (ccname, inline): New variables. (inline_flags): Variable removed. INLINE_FLAGS not generated any more in config.make. Added test for what inline specifiers to use, which is turned into #define INLINE ... in the config.h header. * lib.h: (tag, is_ptr, is_num, is_chr, is_lit, type, auto_str, static_str, litptr): Changed from inline to INLINE. 2009-11-24 Kaz Kylheku Changes to make the code portable to C++ compilers, which can be taken advantage of for better diagnostics. * gc.c (more, mark_obj, sweep, unmark): Obey stricter C++ rules with regard to enumerations. (make_obj): Avoid using C++ keyword "try". * lib.c: Removed duplicate definitions of objects, found by C++. (chk_malloc, chk_realloc): Casts needed when converting from void *. (list): Discovered and fixed lack of va_end. (trim_str, acons_new_l): Avoid use of C++ keyword "new". (make_sym): Follow rename of struct member. * lib.h (struct sym): Renamed val member to value. (null): Added missing declaration. * match.c (enum fpip_close, struct fpip): Moved and named enum out of struct. * regex.c (L0_full): Cast added in signed/unsigned comparison. (L1_fill_range, L2_fill_range, L3_fill_range, char_set_create): Don't mark static blank structures const; then they need initializers in C++. (char_set_compl, char-set_destroy, char_set_contains, nfa_compile_set): Avoid using the C++ keyword "compl". * regex.h (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): Renamed compl member to comp. * utf8.c (utf8_from_uc, ut8_decode): Obey stricter C++ rules with regard to enumerations. 2009-11-24 Kaz Kylheku Fixed broken yyerrorf. It was still taking char *, and passing that as an object to vformat, resulting in # output. * parser.h (yybadtoken): Declaration updated. * parser.l (yybadtoken): Redefined to take val argument. The tok stays as int; this is closely coupled to yacc, so why bother with num(). * parser.y (grammar): Fix occurences of yybadtoken to pass proper literal objects using the lit macro, or nil in the one case when there is no context. 2009-11-24 Kaz Kylheku Renaming global variables that denote symbols, such that they have a _s suffix. * lib.c (cons_t, str_t, chr_t, num_t, sym_t, pkg_t, fun_t, vec_t, stream_t, hash_t, lcons_t, lstr_t, cobj_t var, regex, set, cset, wild, oneplus zeroplus, optional, compound, or, quasi skip, trailer, block, next, freeform, fail, accept all, some, none, maybe, cases, collect, until, coll define, output, single, frst, lst, empty, repeat, rep flattn, forget, local, mrge, bind, cat, args try, catch, finally, nothrow, throw, defex error, type_error, internal_err, numeric_err, range_err query_error, file_error, process_error): Symbol globals renamed to cons_s, str_s, chr_s, num_s, sym_s, pkg_s, fun_s, vec_s, t, cons_s, str_s, chr_s, num_s, sym_s, pkg_s, fun_s, vec_s, stream_s, hash_s, lcons_s, lstr_s, cobj_s, var_s, regex_s, set_s, cset_s, wild_s, oneplus_s, zeroplus_s, optional_s, compound_s, or_s, quasi_s, skip_s, trailer_s, block_s, next_s, freeform_s, fail_s, accept_s, all_s, some_s, none_s, maybe_s, cases_s, collect_s, until_s, coll_s, define_s, output_s, single_s, first_s, last_s, empty_s, repeat_s, rep_s, flattn_s, forget_s, local_s, merge_s, bind_s, cat_s, args_s, try_s, catch_s, finally_s, nothrow_s, throw_s, defex_s, error_s, type_error_s, internal_error_s, numeric_error_s, range_error_s, query_error_s, file_error_s, process_error_s, (code2type, typeof, make_package, intern, obj_init): Symbols references follow rename. * lib.h (cons_t, str_t, chr_t, num_t, sym_t, pkg_t, fun_t, vec_t, stream_t, hash_t, lcons_t, lstr_t, cobj_t var, regex, set, cset, wild, oneplus zeroplus, optional, compound, or, quasi skip, trailer, block, next, freeform, fail, accept all, some, none, maybe, cases, collect, until, coll define, output, single, frst, lst, empty, repeat, rep flattn, forget, local, mrge, bind, cat, args try, catch, finally, nothrow, throw, defex error, type_error, internal_err, numeric_err, range_err query_error, file_error, process_error): Declarations updated. * hash.c (make_hash): Symbol references follow rename. * match.c (sem_error, file_err, dump_var, match_line, subst_vars, eval_form, complex_stream, extract_vars, do_output_line, do_output, match_files): Likewise. * parser.y (grammar, repeat_rep_helper, define_transform): Likewise. * regex.c (nfa_compile_set, nf_compile_regex, regex_compile, regexp, regex_nfa): Likewise. * stream.c (stdio_maybe_read_error, stdio_maybe_write_error, stdio_close, pipe_close, make_stdio_stream, make_pipe_stream, make_string_input_stream, make_string_byte_input_stream, make_string_output_stream, get_string_from_stream, make_dir_stream, close_stream, get_line, get_char, get_byte, vformat, format, put_string, put_char): Likewise. * txr.c (txr_main): Likewise. * unwind.c (uw_throw, uw_errorf, type_mismatch, uw_register_subtype, uw_init): Likewise. * unwind.h (internal_error, numeric_assert, range_bug_unless); Likewise. 2009-11-23 Kaz Kylheku * configure (platform_flags, remove_flags): New config variables. * Makefile (CFLAGS): Take into account new flags. 2009-11-23 Kaz Kylheku Follow up on 64 bit compilation warnings. * lib.c (chr, chrp): Do not convert directly between wchar_t and the pointer type; go through cnum intermediate value. * stream.c (vformat): Fix bad cast from pointer to int; this was missed in the conversion to cnum because it should have been a cast to long originally. 2009-11-23 Kaz Kylheku * Makefile (conftest.o): revert change that took CFLAGS from this target. 2009-11-23 Kaz Kylheku * configure: Don't rely on higher precision arithmetic from the build machine's shell. POSIX requires shell arithmetic to be only signed long. We can't compute the INT_PTR_MAX constant in the shell, but rather generate a constant C expression to compute it. 2009-11-23 Kaz Kylheku Reporting of compile errors during configuration for easier configure debugging. * Makefile (conftest): Pass all of the CFLAGS when building conftest. This way bad compiler options are caught right in the basic compiler sanity test. * configure: Compiler jobs are redirected to temporary error file conftest.err which is dumped if there is a failure. Parting text is improved: the user should not blindly trust the success of the configuration but check its sanity. 2009-11-23 Kaz Kylheku * configure: Bugfix in parsing configuration variables which contain the = character. * Makefile (conftest.o): Pass full CFLAGS to configuration test builds. If some flags don't work with the compiler, this should be caught. 2009-11-23 Kaz Kylheku * Makefile (CFLAGS): Added -I. so current directory is first in the include search path. This is needed for finding generated header files, when building in a separate directory. 2009-11-23 Kaz Kylheku * lib.c (chk_malloc, chk_realloc): Fix diagnosable conversion, caught by gcc 4.1.1. 2009-11-23 Kaz Kylheku * configure (cross): Print out value of $cross in --help. * depend.txr: Add "config.h" to list of headers that are not prefixed with $(top_srcdir). * dep.mk: Regenerated. 2009-11-23 Kaz Kylheku Improving portability. It is no longer assumed that pointers can be converted to a type long and vice versa. The configure script tries to detect the appropriate type to use. Also, some run-time checking is performed in the streams module to detect which conversions specifier strings to use for printing numbers. * Makefile (conftest, conftest.o, conftest.syms): New targets. Used by the configure script. * configure (intptr, nm): New configuration variables. Generating config.make is no longer the last step; compiler tests are performed after config.make is set up, so that rules in the Makefile can be used for doing the compiling. (This is the cleanest way to do it, since the paths to the tools may contain Make variable expansion syntax). New steps are added to try to detect whether the compiler has a wider integer type than the c89 long, and which of the available types (including, potentially, the extra wide type) is suitable for holding a pointer. Results are generated into a header config.h. * dep.mk: Regenerated. * lib.h (NUM_MAX, NUM_MIN): Now derived from INT_PTR_MAX and INT_PTR_MIN macros, which come from config.h. (cnum): New typedef name. (cobj ops, tag, auto_str, static_str, litptr, lit_noex): Changed long to cnum. (num, c_num): Declaration updated. * lib.c (equal, length, num, c_num, plus, minus, neg, search_str, cat_str, vector, vec_set_fill, obj_print, obj_pprint): Changed long to cnum. * gc.c (mark_obj): Changed long to cnum. * hash.c (stuct hash, ll_hash, hash_mark, hash_grow, hash_process_weak): Changed long to cnum. * match.c (complex_open, do_output_line, do_output, match_files): Changed long to cnum. * parser.h (lineno): Declaration updated. * parser.l (lineno): Redefined as cnum. (grammar): Changed long to cnum. * parser.y (%union/yystype): num member changed to cnum. of config.h added. * regex.c (nfa_run, nfa_machine_match_span, search_regex): Changed long to cnum. * regex.h (struct nfa_machine): Members last_accept_pos and count changed to cnum. (nfa_run, nfa_machine_match_span): Declarations updated. * stream.c (struct fmt): New type. (fmt_tab): New static array. (num_fmt): New static pointer. (detect_format_string): New function. (vformat): Changed long to cnum. Formatting of numbers uses num_fmt. (stream_init): Call detect_format_string. * txr.c, unwind.c, utf8.c: include config.h. * unwind.h (internal_error): Local declaration of num updated. 2009-11-21 Kaz Kylheku Introducing symbol packages. Internal symbols are now in a system package instead of being hacked with the $ prefix. Keyword symbols are provided. In the matcher, evaluation is tightened up. Keywords, nil and t are not bindeable, and errors are thrown if attempts are made to bind them. Destructuring in dest_bind is strict in the number of items. String streams are exploited to print bindings to objects that are not strings or characters. Numerous bugfixes. * lib.h (enum type, type_t): new member: PKG. (struct stym): New member: package. (struct package): New type. (union obj, obj_t): New member pk. (interned_syms): Declaration removed. (keyword_package, pkg_t): Declared. (intern, acons_new_l): Declarations updated. (find_package, symbol_package, keywordp): Declared. * lib.c (interned_syms): Definition removed. (packages, pkg_t, system_package, keyword_package, user_package): New global variables. (code2type, equal, obj_pprint): Handle PKG case. (symbol_package, make_package, find_package, keywordp): New functions. (make_sym): Initialize package field of symbol. (intern): Takes package argument. Rewritten using packages, which use hash tables to store symbols. (acons_new_l): Takes extra pointer argument to return an extra value. (obj_init): Updated to handle packages. The orders of some initializations have to change. The way nil is added as a symbol is quite different, and a special hack for the symbol t is used. Most symbols go into the user_package, but symbols that were previously namespaced with $ go to the system package. (obj_print): SYM cases now considers the packge of a symbol. Symbols in the user package are printed as before. Symbols with no package are printed using #: notation; keywords with : notation; and all others with their package prefix. PKG case is handled. * gc.c (finalize): Handle PKG case. (mark_obj): For SYM, mark the new package member. Handle PKG case. * hash.h (gethash_l): Declaration updated. * hash.c (ll_hash): Handle PKG case. (gethash_l): Extra argument added to distinguish new addition from existing find. * match.c (dump_var): Dumps any object now by printing to a string with a string stream. (bindable): New function. (dest_bind): Tightened up to distinguish bindable symbols from non-bindable. Symbols that stand for themselves, including nil, can only match themselves. Destructuring matches have to match in the number of elements: dot notation can be used to match superfluous elements. (eval_form): Tightened up to recognize bindable symbols. (match_files): Various directives honor non-bindable symbols (cat, merge, flatten). * parser.l (yybadtoken): Handle KEYWORD case. (grammar): TOK can start with : . Returned as KEYWORD terminal, with a lexeme that no longer has the : character. * parser.y (KEYWORD): New nonterminal. (grammar): Calls to intern given extra parameter. In the expr rule, KEYWORD turned into symbol in keyword package. * regex.c (regexp): Bugfix: dereferencing non pointer. * stream.c (vformat): Bugfixes in state machine: handling of prefix digits; printing of numbers in ~s. * txr.c (txr_main): Intern calls updated. * txr.1: Updated with information about nil, t and keywords. More details about destructuring matching in bind. 2009-11-20 Kaz Kylheku * unwind.c (uw_throw): If streams are not initialized, we have an unhandled exception too early in initialization. Use C stream to print an error message and abort. Using the nil stream variable will just cause a recursion bomb. 2009-11-20 Kaz Kylheku * lib.c (intern): Symbol interning to hash tables. (obj_init): interned_syms must be created as a hash table. Rearranged the order of some initializations so the vector code called by hash works. 2009-11-20 Kaz Kylheku * lib.c (dest_bind): Fix breakage from two commits ago; was falling through to unsuccessful return in the consp case. 2009-11-20 Kaz Kylheku * parser.y (grammar): Fix error actions that do not assign a value to $$. 2009-11-20 Kaz Kylheku * match.c (dest_bind): Extended to handle more general forms by using eval_form rather than direct symbol binding lookups. False positive return fixed. (match_line): Fixed merge to use eval_from rather than direct symbol binding. 2009-11-20 Kaz Kylheku * lib.c (flatten): Semantics change. The flatten function should not map nil -> (nil), but nil -> nil. 2009-11-20 Kaz Kylheku Changing ``obj_t *'' occurences to a ``val'' typedef. (Ideally, we wouldn't have to declare object variables at all, so why use an obtuse syntax to do so?) * lib.h (val): New typedef name. Used throughout. * gc.c, gc.h, hash.c, hash.h, lib.c, match.c, match.h, parser.h, parser.l, parser.y, regex.c, regex.h stream.c,, stream.h, txr.c, unwind.c, unwind.h: Replace obj_t * with val almost everywhere. Low-level gc functinos that work with arrays of obj_t use obj_t *. Seeing pointer arithmetic on a val doesn't make sense. In macros we use obj_t *, to reduce the chances of clashing with some local variable called val. 2009-11-19 Kaz Kylheku * txr.1: Fixed mangled formatting of exception handling example. 2009-11-19 Kaz Kylheku Get rid of macros in favor of safer inline functions. The recent auto_str("byte str") error could have been caught at compile time. * Makefile (CFLAGS): Include expansion of INLINE_FLAGS. * configure (inline_flags): New variable. (INLINE_FLAGS): New variable generated in config.make. * lib.h (tag, is_ptr, is_num, is_chr, is_lit, type, auto_str, static_str, litptr): Function-like macros converted to functions. 2009-11-19 Kaz Kylheku Version 024 Fixed show-stopper breakage in parse error diagnostic function. Fixed bug introduced back in 015: collects that don't yield any variable bindings being wrongly treated as failed. * txr.c (version): Bumped to 024. * txr.1: Bumped version to 024. 2009-11-19 Kaz Kylheku Use unsigned char * as allocator return value. * lib.c (chk_malloc, chk_realloc): Return unsigned char *. * lib.c (chk_malloc, chk_realloc): Declarations updated. * utf8 (utf8_dup_to_uc): Remove cast to unsigned char *. 2009-11-18 Kaz Kylheku Following-up on diagnostics obtained by running code through C++ compiler. Idea: allocator functions return char * instead of void *, like malloc did in classic pre-ANSI C. That way we are forced to use a cast except when the target pointer is char * already. * lib.c (progname): Duplicate definition of global removed. (equal): Some default: cases to switch statements added. (chk_malloc): Returns char *. (chk_realloc): Returns char *, but takes void * on the way in. That way we get C++-like behavior. (chk_strdup): Oops, this returned void * instead of wchar_t *. c++ catches boo boo. (stringp): Added default: case to switch. (vec_set_fill): Cast return value of chk_realloc. * lib.h (chk_malloc, chk_realloc, chk_strdup): Declarations updated. * parser.h (lineno): extern qualifier added to prevent duplicate definitions of. * regex.c (nfa_free, nfa_run, nfa_machine_init, regex_compile): Cast return value of chk_malloc. * stream.c (snarf_line, get_string_from_stream): Cast return value of chk_realloc. 2009-11-18 Kaz Kylheku * match.c (match_line, match_files): Fix broken behavior of collect that doesn't match anything. It is not a failed match, as the documentation makes perfectly clear. Collect/coll were introduced in txr-006 and had the proper non-failing semantics. However, in txr-015, during code restructuring, a bug crept in. When changing to a different debugiging function, for some reason I added the nil returns. 2009-11-18 Kaz Kylheku * parser.l (yyerror): Total breakage: can't take auto_str of char * string. (yyerrorf): Total breakage: arguments of wrong types. Detected by vformat as garbage. 2009-11-18 Kaz Kylheku txr.1: Clarified handling of UTF-8, now that it's precise and portable. 2009-11-18 Kaz Kylheku Version 023 Minor bugfix. Code cleanup. Portability. Completely removed dependency on C99 wide character stream functions, and character encoding support from glibc. All UTF-8 encoding and decoding is done by the program itself. Removed the use of all GNU extensions and C99 syntax. txr now requires a C90 compiler, and POSIX 1003.1 and 1003.2. * txr.c (version): Bumped to 023. * txr.1: Bumped version to 023. 2009-11-17 Kaz Kylheku More removal of C99 wide character I/O, and tightening up of standard conformance. * configure (lang_flags): Specify -D_POSIX_C_SOURCE=2 to obtain POSIX 1003.1 and POSIX 1003.2 functions from the headers, without GNU extensions. Specify -std=c89 to get C89 conformance from gcc. * match.c (dump_byte_string): New function. (dump_shell_string): Retargetted to object streams. (dump_var, dump_bindings): Retargetted to object streams. Changed back to using a byte string for the array index prefixes, to avoid using the wide-character swprintf. * parser.l (grammar): Eliminate wcsdup uses in favor of chk_strdup. Not only is wcsdup a GNU extension, it doesn't have the OOM check. * stream.c: Added header to define WIFEXITED and others. * txr.c: Added include of . Removed , (main): Removed setlocale call. Not needed any more, since wide stream and string I/O is no longer used from the C library. 2009-11-17 Kaz Kylheku Removing use of C99 wide character I/O. * stream.c (BROKEN_POPEN_GETWC): Macro removed. Work around no longer needed since the program does not call getwc. (struct stdio_handle): #ifdef text removed. New member added: utf8 decoder. (stdio_maybe_read_error, stdio_maybe_write_error): Treat null handle as an exception rather than nil return. No need to check ferror in stdio_maybe_write_error, since there is no need to distinguish an end-of-file situation from error. (stdio_put_char_callback, stdio_get_char_callback): New functions. (stdio_put_string, stdio_put_char): Retargetted to utf8 encoder. Null handle treated as separate kind of error. (snarf_line, stdio_get_line, stdio_get_char): Retargetted to utf8 decoder. (pipe_close): #ifdef text removed. (make_stdio_stream): utf8 decode initialized. (make_pipe_stream): utf8 decoder initialized. #ifdef text removed. 2009-11-17 Kaz Kylheku Warning fixes. * hash.c (hash_ops): Add missing initializer. * match.c (complex_open): Add missing intializer to ret variable. * regex.c (regex_obj_ops): Add missing initializer. * stream.c (stdio_ops, pipe_ops, string_in_ops, byte_in_ops, string_out_ops, dir_ops): Likewise. 2009-11-17 Kaz Kylheku * lib.c (chrp): Fix broken is_chr(num) call. 2009-11-17 Kaz Kylheku * regex.c (nfa_all_states, nfa_closure): visited parameter should be unsigned. 2009-11-17 Kaz Kylheku Fixes for compliance to C89. * lib.c (init): Do not define variable after statements. * match.c (match_files): Likewise. * regex.h (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): do not use enum bitfields, which is a GCC extension. * unwind.h (enum uw_frtype, uw_frtype_t): Combine into one declartion, eliminating forward enum reference which is a GCC extension. (uw_block_begin): Add dummy typedef to macro so that it requires a following semicolon. Without this, if the macro use is followed by a semicolon, that semicolon looks like a null statement. A subsequent declaration thus follows a statement and is not conforming to C89. Also added an opening do. (uw_block_end): Add while(0) to match do in uw_block_begin. (uw_env_begin, uw_env_end): Add do/while(0) to macro pair, so uw_env_end reuqires a semicolon. (uw_catch_begin, uw_catch_end): Likewise. 2009-11-17 Kaz Kylheku Version 022 Fix for bug 28033: crash in string output stream. (Used by exception handling). New kernel object type introduced which allows C string literals to be used as first-class objects. Use of printf-like C formatting eliminated from the code base. The dependency on C99 wide character I/O is now minimized. * txr.c (version): Bumped to 022. * txr.1: Bumped version to 022. 2009-11-16 Kaz Kylheku * Makefile (rebuild): New target. Tired of doing make clean; make. 2009-11-16 Kaz Kylheku Big round of changes to switch the code base to use the stream abstraction instead of directly using C standard I/O, to eliminate most uses of C formatted I/O, and fix numerous bugs, such variadic argument lists which lack a terminating ``nao'' sentinel. Bug 28033 is addressed by this patch, since streams no longer provide printf-compatible formatting. The native formatter is extended with some additional capabilities to take over. The work on literal objects is expanded and they are now used throughout the code base. Fixed bad realloc in string output stream: reallocating by number of wide chars rather than bytes. * gc.c (sweep): Debugging code switched from fprintf to format. * lib.c (typ_check, type_check2, car, cdr, car_l, cdr_l, list, num, chrp, apply, cobj_print_op, dump): Retargetted, with help of new literals, to new funtions that take string objects, rather than raw C strings. (obj_print, obj_pprint): Revamped with support for LIT type. Retargetted to not use C style I/O functions in streams. * lib.h (lit): Macro retargetted to another macro so that it expands its argument. (lit_noex): New macro, like lit, but does not macro-expand argument. (auto_str): New macro. (static_str): New macro. * match.c (debugf, debuglf, sem_error, file_err): Converted from C string to string object. (dest_bind, match_line, LOG_MISMATCH, LOG_MATCH, match_files): Retargetted to new interfaces that take string objects rather than raw C strings. (complex_stream): New function. (do_output_line, do_output, extract): Retargetted from C streams to object streams. * parser.h (yyerrorf): Declaration updated. * parser.l (yyerror): Call new yyerrorf interface, using auto_str macro to dress up C string as a temporary object. (yyerrorf): Changed from C strings to object strings. (yybadtoken, grammar): Retargetted to new yyerrorf. * stream.c (strm_ops): put_string and put_char function pointers changed to take object strings rather than C strings. vcformat and vformat virtuals removed. C formatting is not supported, and vformat is handled above the stream switch level in one place for all streams. (common_vformat, stdio_vcformat, string_out_vcformat, cformat, put_cstring, put_cchar): Functions removed. (stdio_stream_print, stdio_stream_destroy, stdio_maybe_write_error, stdio_put_string, stdio_put_char, stdio_close, pipe_close, string_out_put_char, make_pipe_stream, make_string_input_stream, make_string_output_stream, make_dir_stream, close_stream, get_line, put_line, get_char, put_char, put_string): Retargetted to new string object interfaces. (stdio_ops, pipe_ops): stdio_vcformat and common_vcformat initializers (string_out_ops): string_out_vcformat and common_vcformat initializers removed. (string_in_ops, byte_in_ops, dir_ops): Two null initializers removed. (string_out_put_string): Converted to object string interface. Unnecessary chk_realloc call suppressed. (get_string_from_stream): Fixed bad call to realloc with incorrect size. (vformat_num, vformat_str): New functions, helper to vformat. (vformat): Rewritten. Is now the formatting engine. (format, put_string, put_char): Interface converted from C string to object string. * stream.h (vformat, format): Declarations updated. (vcformat, cformat, put_cstring, put_cchar): Declarations removed. * txr.c (oom_realloc_handler, help, txr_main): Retargetted to object stream and strings. * unwind.c (uw_throw, type_mismatch, uw_register_subtype): Retargetted to new string object interfaces. (uw_throwf, uw_errorf): Interface changed from C string to object string. (uw_throwcf, uw_errorcf): Functions removed. * unwind.h (uw_throwf, uw_errorf, type_mismatch): Declarations updated. (uw_throwcf, uw_errorcf): Declarations removed. (internal_error): Macro interface changed and retargetted to object strings. Also, num hygiene problem worked around with local extern declaration. (numeric_assert, range_bug_unless): Retargetted to object strings. * utf8.c (utf8_to, utf8_dup_from_uc, utf8_dup_from, utf8_dup_to_uc): Casts of chk_malloc return value added. 2009-11-15 Kaz Kylheku Use the 11 tag bit pattern to denote a new type: LIT. This is a pointer to a C static string, intended for literals. We can now treat literal strings as light-weight objects. * lib.h (TAG_MASK): Ensure the constant expr has long type. (TAG_LIT): New macro. (enum type, type_t): New enum member, LIT. * gc.c (finalize, mark_obj): Handle LIT type. * hash.c (ll_hash): Likewise. * lib.c (code2type, equal, stringp, length_str, c_str, obj_print): Likewise. (obj_init): Intern symbols using literal strings. (type): Parentheses added to macro expansion. (is_lit, lit, litptr): New macros. 2009-11-15 Kaz Kylheku * lib.c (chr): Take wchar_t argument, not int. Dropped range check. (c_chr): Return wchar_t not int. * lib.h (chr, c_chr): Declarations updated. 2009-11-15 Kaz Kylheku Version 021. Text is represented using wide characters now. Queries and data are parsed as UTF-8, so extended characters can be directly used. Numeric character escapes can go up to \x10FFF. (More limited on platforms where wchar_t is 16 bit). Regular expressions support extended characters, directly or through escapes. Regex character set matches can use full Unicode range. New test case 005 exercises some of these features over Japanese text. Failed exit status of pipes, and file close errors are exceptions now. Bug fixed in regex character classes. Fixed off-by-one error in lazy string implementation, which broke some uses of the @(freeform) directive. Fixed all instances of gc bug 28086: objects being prematurely reclaimed. This showed up when compiling for profiling (gcc -pg). The --cc argument of the configure script works properly now. Numbers and characters are unboxed types now, encoded directly in the (obj_t *) value. Lowest two bits of (obj_t *) are a tag distinguishing characters, integers and pointers. The program performs better from not having to cons memory when operating on numbers and characters. Discovered bug in glibc: getwc function segfaults when applied to stream returned by popen. Worked around this bug. Bug is filed as 10958 in glibc bugzilla. Internals: Hash tables implemented. Hash tables support weak keys and values. * configure, hash.c, lib.c, stream.c, utf8.c: Removed trailing from some lines. * txr.c (version): Bumped to 021. Removed trailing whitespace. * txr.1: Bumped version to 021. 2009-11-14 Kaz Kylheku Provide both char * and unsigned char * interfaces in UTF-8 module. Fix unsigned and plan char * mixing. * utf8.c (utf8_from_uc, utf8_to_uc, utf8_dup_from_uc, utf8_dup_to_uc): New functions. (utf8_from): Fix type of backtrack pointer to unsigned char *. * utf8.h (utf8_from_uc, utf8_to_uc, utf8_dup_from_uc, utf8_dup_to_uc): Declared. * lib.c (string_utf8): Changed to take char * argument. * lib.h (string_utf8): Declaration updated. 2009-11-14 Kaz Kylheku * Makefile (depend): Marked phony and $(PROG) prerequisite dropped. (clean, distclean, tests, install): Phony targets marked phony. * dep.mk: Regenerated. 2009-11-14 Kaz Kylheku * configure (cc): Compute variable properly. 2009-11-14 Kaz Kylheku Fixes for bug 28086. When constructing a cobj, whose associated C structure contains obj_t * references, we should initialize that C structure after allocating the cobj. If we initialize the structure first, it may end up having the /only/ references to the objects. In that case, the objects are invisible to the garbage collector. The subsquent allocation of the cobj itself then may invoke gc which will turn these objects into dust. The result is a cobj which contains a handle structure that contains references to free objects. The fix is to allocate the handle structure, then the cobj which is associated with that handle, and then initialize the handle, at which point it is okay if the handle has the only references to some objects. Care must be taken not to let a cobj escape with a partially initialized handle structure, and not to trigger gc between allocating the cobj, and initializing the fields. * hash.c (make_hash): Fix cobj construction order. * stream.c (make_stdio_stream): Fix cobj construction order. (make_pipe_stream): Fix cobj construction order. Also noticed and fixed a bug: h->descr field not being initialized in the currently enabled BROKEN_POPEN_GETWC variant of the code. 2009-11-13 Kaz Kylheku New testcase which does some UTF-8 scanning, Unicode regexes, and @(freeform). * tests/005/data: New UTF-8 file. * tests/005/query-1.txr: Likewise. * tests/005/query-1.expected: Likewise. * Makefile (TXR_ARGS): New target-specific assignment to set data for test case set 005. 2009-11-13 Kaz Kylheku * lib.c (symbolp): Bugfix: function crashed on NUM argument. (lazy_str): Fix for gc correctness: object from make_obj must be completely intialized before any gc-triggering operation is invoked, otherwise the garbage collector will be traversing an object whose fields contain old garbage. (lazy_str_force_upto): Off-by-one error. To force the object up to index position N, means forcing up to length N+1. This bug can make it look like a lazy string is much shorter than it really is. 2009-11-13 Kaz Kylheku Allow -c scripts to not have a trailing newline. Test suite exercises -c now. txr.c (txr_main): If the script specified with -c is not terminated by a newline, just add a newline. On the shell command line, it's a nuisance to have to add the extra line before closing the quote. It's also awkward in scripting, because the shell (or at least Bash 3.0) does not produce a final terminating newline in command substitution syntax like -c "$(cat file)". The last newline in the file is trimmed, and has to be explicitly added in the script itself, which is wrong in the case when the file is empty. Makefile (TXR_SCRIPT_ON_CMDLINE): New target-specific variable, arbitarily set for test 002. (%.ok: %.txr): Rule updated to honor TXR_SCRIPT_ON_CMDLINE variable, passing the script body to txr using -c rather than as a file argument. txr.1: Document -c change. 2009-11-13 Kaz Kylheku Previous commit broke UTF-8 lexing, by changing the get_char semantics on the input stream to wide character input. Also, reading a query the command line (-c) must read bytes from a UTF-8 encoding of the string. We introduce a new get_byte function which can extract bytes from streams which provide it. * parser.l (YYINPUT): Call get_byte instead of get_char. * stream.c (struct strm_ops): New function pointer, get_byte. (stdio_get_byte): New function. (stdio_ops, pipe_ops): Add new function. (string_in_ops, string_out_ops, dir_ops): Null pointer added. (struct byte_input): New struct type. (byte_in_get_byte): New function. (byte_in_ops): New structure. (make_string_byte_input_stream, get_byte): New functions. * stream.h (make_string_byte_input_stream, get_byte): New functions. * txr.c (txr_main): Make a byte input stream from the command line spec, rather than a string input stream. 2009-11-12 Kaz Kylheku Continuing wchar_t conversion. Making sure all stdio calls use wide character functions so that there is no illicit mixing. (But the goal is to replace this usage with txr streams). * lib.c (list, cobj_print_op, obj_print, obj_pprint): Use wide string literals and I/O functions. * match.c (debuglcf): Converted to wchar_t. (dump_bindings, match_line, match_lines, extract): Use wide string literal and I/O function. * parser.h (yyerrorf): Declaration updated. * parser.l (yyerror): Use wide-string yyerrorf. Users of yyerror continue to pass multibyte strings. (yyerrorf): Converted to wchar_t. (yybadtoken, grammar): Use wide string literals to call yyerrorf. * stream.c (struct strm_ops): vcformat changed to wchar_t. (stdio_vcformat, string_out_vcformat, vcformat, cformat): Likewise. * stream.h (vformat, vcformat, cformat): Declarations updated. * txr.c (oom_realloc_handler, help, hint, txr_main): Use wide string literals and I/O functions. * unwind.c (uw_throwcf, uw_errorcf): Converted to wchar_t. * unwind.h (uw_throwcf, uw_errorcf): Declarations updated. (internal_error, numeric_assert, range_bug_unless): Macros fixed to use wide string literals. 2009-11-12 Kaz Kylheku * utf8.c (utf8_from): Fix total breakage. Was writing out incomplete wide characters on internal state transtions while traversing a single multi-byte character. Also, improved handling of bad bytes close to EOF: if EOF occurs in a multi-byte character, it will backtrack, and skip one bad byte, etc. (utf8_encode, utf8_decoder_init, utf8_decode): New functions. * utf8.h (enum utf8_state): New enum. (struct utf8_decoder, utf8_decoder_t): New struct. (utf8_encode, utf8_decoder_init, utf8_decode): Declared. 2009-11-12 Kaz Kylheku Documenting extended characters in man page. Cleaned up some more issues related to extended characters. * parser.l (grammar): Added error sctions for invalid UTF-8 bytes. * stream.c (BROKEN_POPEN_GETWC): New macro. Enables workaround for a glibc bug, whereby getwc blows up when applied to a FILE * stream returned from a popen call. (struct strm_ops): put_char function takes wchar_t. (common_format): Use wchar_t rather than int. (stdio_put_string): fputws returns -1, not EOF. (stdio_put_char, put_cchar): Character argument changed to wchar_t. Output done with putwc used instead of putc. (snarf_line, stdio_get_char): Use getwc to read from the stream. (pipe_close, make_pipe_stream): Implement workaround form glibc bug. * stream.h (put_cchar): Declaration updated. * txr.1: Added notes about international characters. 2009-11-12 Kaz Kylheku Regular expression module updated to do unicode character sets. Most of the changes are in the area of representing sets. Also, a bug was found in the compilation of regex character sets: ranges straddling two adjacent blocks of 32 characters were not being added to the character set. However, ranges falling within a single 32 block, or spanning three or more such blocks, worked properly. This bug is not tickled by common ranges such as A-Z, or 0-9, which land within a 32 block. * regex.h (BITCELL_LIT): Macro removed. (CHAR_SET_SIZE): Macro does not depend on UCHAR_MAX any more, but hard-codes a set size of 256. UCHAR_MAX means nothing to us any more since we are using wchar_t. The number 256 is simply an arbitrarily chosen size for representing the small character sets (or the leaves of the radix tree for representing large sets). (chset_type_t): New enum typedef. (cset_L0_t, cset_L1_t, cset_L2_t, cset_L3_t): New array typedefs. (struct char_set): Replaced by union char_set. (struct any_char_set, struct small_char_set, struct displaced_char_set, struct large_char_set, struct xlarge_char_set): New struct types. (char_set_clear): Declaration removed. (char_set_create, char_set_destroy): Declared. (char_set_add, char_set_add_range, char_set_contains, nfa_state_single, nfa_state_set, nfa_machine_feed): Declarations updated for wchar_t. (struct nfa_state_single): member ch changed to wchar_t. * regex.c (char_set_clear): Function removed. (CHAR_SET_L0, CHAR_SET_L1, CHAR_SET_L2, CHAR_SET_L3, CHAR_SET_L2_L0, CHAR_SET_L2_HI, CHAR_SET_L1_L0, CHAR_SET_L1_HI, CHAR_SET_L0_L0, CHAR_SET_L0_HI): New macros. (L0_full, L0_fill_range, L0_contains, L1_full, L1_fill_range, L1_contains, L1_free, L2_full, L2_fill_range, L2_contains, L2_free, L3_fill_range, L3_contains, char_set_create, char_set_destroy): New functions. (char_set_compl): Works using a flag rather than by actually computing a complemented set. Also, is no longer a toggle (and was never used that way). (char_set_add, char_set_add_range, char_set_contains): Polymorphic over the different set types. (nfa_state_single, nfa_move, nfa_run, nfa_machine_feed): Converted to wchar_t. (nfa_state_free): Use char_set_destroy to free set. (nfa_state_set): Does not construct the set internally but takes it as a parameter. (nfa_compile_set): Rewritten to perform two passes over the s-expression representing the list of characters and ranges making up the set. The first pass determines what representation will be used for the set. The second pass stuffs the characters and ranges into the set. 2009-11-11 Kaz Kylheku * txr.c (main): call setlocale to set the LC_CTYPE to en_US.UTF-8, so that the C library streams do the encoding. Once the program is weaned from C library wide character stream I/O, this can go away. 2009-11-11 Kaz Kylheku Big conversion to wide characters and UTF-8 support. This is incomplete. There are too many dependencies on wide character support from the C stream I/O library. The regex code does not handle wide characters properly. Character type is still int in some places, rather than wchar_t. Test suite passes though. * hash.c (hash_str): Converted to wchar_t. * lib.c (progname, type_check, type_check2, type_check3, car, cdr, car_l, cdr_l, equal, chk_strdup, string_own, string, mkstring, mkustring, init_str, length_str, c_str, search_str, sub_str, cat_str, split_str, trim_str, chrp, apply, lazy_str, lazy_str_get_trailing_list, cobj, obj_init, obj_print, obj_pprint, init): Converted to wchar_t. (vector): Cast of chk_malloc return value added. (string_utf8): New function. * lib.h (struct string): Member str changed to wchar_t *. (progname, chk_strdup, string_own, string, init_str, c_str, init): Declarations updated. (string_utf8): Declared. * match.c (debugf, debuglf, sem_error, file_err, dump_shell_string, dump_var, dump_bindings, dest_bind, match_line, do_output_line, do_output, match_files): Converted to wchar_t. * parser.h (spec_file): Declaration updated. * parser.l (yy_errorf, char_esc, num_esc): Converted to wchar_t. (ASC, ASCN, U, U2, U3, U4, UANY, UNANN, UONLY): New named regexes, used for lexing utf-8. (grammar): Converted to wchar_t and utf-8 handling. * parser.y (%union/yystype): lexeme member changed to wchar_t *, chr member changed to wchar_t. * regex.c (nfa_run): Input string is wchar_t *. (search_regex): String from haystack is wchar_t *. * regex.h (nfa_run): Declaration updated. * stream.c (struct strm_ops, common_vformat, stdio_stream_print, stdio_maybe_read_error, stdio_maybe_write_error, stdio_put_string, stdio_put_char, snarf_line, stdio_get_line, stdio_close, pipe_close, struct string_output, string_out_put_string, string_out_put_char, string_out_vcformat, dir_get_line, make_string_output_stream, get_string_from-stream, make_dir_stream, get_line, get_char, vformat, vcformat, format, cformat, put_string, put_cstring, put_char, put_cchar, stream_init): Converted to wchar_t. * stream.h (vformat, format, put_cstring): Declarations updated. * txr.c (version, progname, spec_file, oom_realloc_handler, help, hint, remove_hash_bang_line, main, txr_main): Converted to wchar_t. * txr.h (version, progname): Declarations updated. * unwind.c (uw_throw, uw_throwf, uw_errorf, type_mismatch, uw_register_subtype): Converted to wchar_t. * unwind.h (uw_throwf, uw_errorf, type_mismatch): Declarations updated. * utf8.c, utf8.h: New files. 2009-11-10 Kaz Kylheku hash.c (hash_grow): Rewritten to avoid resizing the vector in place, and thus having to pulling all conses into a big list. TODO: avoid recomputing the hash function over the keys. We could enhance cons cells with two more fields without using additional storage. 2009-11-06 Kaz Kylheku Changing representation of objects to allow for unboxed characters. Now numbers and characters fit into a cell. We lose one more bit from the range of numbers. * lib.h (TAG_SHIFT, TAG_MASK, TAG_NUM, TAG_PTR, NUM_MASK, NUM_MIN, is_ptr, is_num): Macros updated. (is_chr, tag): New macros. (struct chr): Removed. (union obj): Updated. * lib.c (typeof, equal, chr, chrp, c_chr, obj_print): Updated. * hash.c (ll_hash): Characters aren't pointers any longer; use abstract accessor. 2009-11-06 Kaz Kylheku Add hash removal. * hash.c (remhash): New function. * hash.h (remhash): Declared. 2009-11-06 Kaz Kylheku Add hash table growth. hash.c (hash_grow): New function. (l_gethash): Renamed to gethash_l. Increment count; if load factor gets to two, call hash_grow to double the size. hash.h (l_gethash): Declaration changed to gethash_l. 2009-11-06 Kaz Kylheku Changing representation of objects to allow the NUM type to be unboxed. If the lowest bit of the obj_t * pointer is 1, then the remaining bits are a number. A lot of assumptions are made: - the long type can be converted to and from a pointer - two's complement. - behavior of << and >> operators when the sign bit is involved. * lib.h (TAG_SHIFT, TAG_MASK, TAG_NUM, TAG_PTR, NUM_MASK, NUM_MIN, is_ptr, is_num, type): New macros. (struct num): Removed. (nao): Redefined, so that it doesn't have the numeric tag. * lib.c (typeof, type_check2, type_check3, car, car_l, cdr, cdr_l, equal, consp, atom, listp, num, c_num, nump, plus, minus, stringp, lazy_stringp, obj_print, obj_pprint): Fixed these functions to use the new number representation, and not to deference the obj_t * poitner if it is actually a number. (obj_init): Adjusted values of maxint and minint. * gc.c (mark_obj, gc_is_reachable): Avoid dereferencing numbers. * hash.c (ll_hash): Likewise. * match.c (match_line, do_output_line): Likewise. 2009-11-06 Kaz Kylheku First cut at hash tables. One known problem is allocation during gc, due to use of boxed numbers for vector access. * gc.c (gc): Disable gc when doing garbage collection, in case something tries to allocate memory during gc, triggering a recursive gc, which would be very bad. Also, call the new function, hash_process_weak, in between the mark and sweep phases. (gc_is_reachable): New function. * gc.h (gc_is_reachable): Declared. * lib.c (hash_t): New symbol global. (acons_new_l): New function. (obj_init): New symbol interned. * lib.h (hash_t, acons_new_l): Declared. * hash.c, hash.h: New files. * Makefile: New target, hash.o. * dep.mk: Regenerated. 2009-11-06 Kaz Kylheku Throw exception on stream error during close, or I/O operations. This is needed for pipes that terminate abnormally or return failed termination. Pipe and stdio streams have an extra description field so they are printed in a readable way. * lib.c (process_error): New global defined. (obj_init): New symbol interned. (lazy_stream_func): Pass t to close_stream, so exception is thrown if the close fails. (lazy_stream_cons): Ditto. * lib.h (process_error): Declared. * match.c (complex_snarf): Pass new desr argument to make_stdio_stream and make_pipe_stream. * stream.c (strm_ops): New argument on close function pointer. (common_destroy): Close without throwing exception. For objects being finalized, we don't care if the close works or not; the program has shown that it doesn't care about the stream by letting it become unreachable, so we don't bother the program by throwing an exception. (stdio_handle): New struct. (stdio_stream_print, stdio_stream_destroy, stdio_stream_mark, stdio_maybe_read_error, stdio_maybe_write_error): New functions. (stdio_put_string, stdio_put_char, stdio_get_line, stdio_get_char, stdio_vcformat, stdio_close): Updated to new handle format, and throw errors now. (stdio_ops, pipe_ops): Redirected to new functions stdio_stream_print, stdio_stream_destroy and stdio_stream_mark. (pipe_close): Updated to new handle format. Parses status from pclose and throws exceptions appropriate to the situation. (dir_close): Takes extra argument. (make_stdio_stream, make_pipe_stream): New argument added. (make_string_output_stream): Some casts added. (close_stream): Pass new argument down to virtual function. (stream_init): Pass new argument to make_stdio_stream when creating streams for stdin, stdout and stderr. * stream.h (make_stdio_stream, make_pipe_stream, close_stream): Declarations updated. * txr.c (txr_main): Pass new argument to make_stdio_stream. * unwind.c (uw_init): Register process_error. 2009-11-01 Kaz Kylheku Version 020 Improved documentation. Building via configure script. Support for cross compiling support. Support for building in separate build directory. Internal bugfixes. Portability bugs fixed; works on x86-64 GNU/Linux. 2009-11-01 Kaz Kylheku Bug ID 27898: Directory order dependencies in test case. Converted some directories to text files. * tests/002/proc/*/task: Directories removed. * tests/002/proc/*/tasks: Files created. * tests/002/query-1.txr: Query updated. * tests/002/query-1.expected: Expected output updated. 2009-11-01 Kaz Kylheku Bug ID 27895: Calls to protect have an argument list terminated by the integer constant 0 rather than a proper null pointer constant. lib.c (obj_init): Properly terminate argument list of protect call. stream.c (stream_init): Likewise. unwind.c (unwind_init): Likewise. txr.c (txr_main): two-argument protect calls rewritten using prot1. 2009-11-01 Kaz Kylheku Bug ID 27899: Garbage collection problem: method of locating stack bottom is unreliable due to the unpredictable allocation order of local variables. The addresses of stack_bottom_0 and stack_bottom_1 variables do not necessarily bracket the others which means that some local variables in main can be out of the reach of the garbage collector: our stack bottom is wrongly in the middle of the frame. * lib.c (init): Removed one of the stack bottom parameters, so there is only one. This is passed straight down to gc_init. Also noticed that the oom_realloc variable was not being set from the oom parameter. * lib.h (init): Declaration updated. * txr.c (txr_main): New static function. (main): Calls init, and then txr_main. The idea is that txr_main should get fresh stack frame. So the stack_bottom variable in main should be outside of that stack frame. 2009-10-22 Kaz Kylheku * lib.c (equal): Fix broken LSTR and FUN cases. 2009-10-22 Kaz Kylheku Got "make tests" working in separate build directory, with .out files going to local tests/ tree. * Makefile (depend): Refer to depend.txr and dep.mk using $top_srcdir; no need for symlinks. Changed a few more ./txr references to use $(PROG). (TESTS): Path munging to generate targets with local paths. (%.ok): Fixed diff logic to compare between .expected file in $(top_srcdir) and local .out file. * configure: Don't generate symlinks for tests and dep.mk. 2009-10-22 Kaz Kylheku Got "make install" working. * Makefile (install): New target. * configure (mandir, bindir): New variables. 2009-10-22 Kaz Kylheku Got build to work in separate build directory. * Makefile (CFLAGS): Added -I flag to point header inclusion to the source directory. (PROG): New variable to hold program name. (VPATH): Variable set, as a quick and dirty way to get GNU make to find the prerequisites back in the source directory. * configure: Added steps to symlink the tests directory and dep.mk. * depend.txr: Modified to generate the dependencies with correct references to the top_srcdir, with the exception of locally generated headers. * dep.mk: Regenerated. 2009-10-22 Kaz Kylheku Build configuration via configure script, with cross compiling support. (Tested by cross-compiling txr on an x86 GNU/Linux system to run on a MIPS-based GNU/Linux system). * configure: New script. * Makefile: (OPT_FLAGS, LANG_FLAGS, DIAG_FLAGS, DBG_FLAGS, LEX_DBG_FLAGS, TXR_DBG_OPTS, LEXLIB): Variables removed; these are now generated in config.make by configure. (config.make): New target to print friendlier diagnostic if the build is not configured. (distclean): New target to do clean, plus remove config.make. 2009-10-22 Kaz Kylheku * parser.l (YY_INPUT): Kill tabs with spaces (how did they sneak in?). Fix possible use of uninitialized ch. 2009-10-21 Kaz Kylheku * txr.1: Fixed misleading wording (separation versus termination). Added Introduction headings to some major sections. Improved exception handling intro. 2009-10-21 Kaz Kylheku Version 019 Regexps can be bound to variables. New freeform directive. * txr.c (version): Bump. * txr.1: Bump version and date. * lib.c, match.c, regex.c, regex.h, stream.c: Trailing whitespace removed from lines. 2009-10-21 Kaz Kylheku * txr.1: Documented freeform. 2009-10-21 Kaz Kylheku Change the freeform line catenation semantics to termination rather than separation. * lib.h (lazy_str): Declaration updated. * lib.c (lazy_str): Tack terminator onto initial prefix string. Parameter renamed. Also, terminator string cached in the object. (lazy_str_force, lazy_str_force_upto): Terminate, rather than separate. * match.c (match_files): sep variable renamed to term. 2009-10-21 Kaz Kylheku * gc.c (mark_obj): Bugfix: recurse over recently added member, opts, in the lazy_string structure. 2009-10-20 Kaz Kylheku Got regex working over lazy strings from freeform. Bugfixes. * lib.c (length_str): Fixed recursion to wrong length function. (lazy_str_force): March down list properly. Update lazy string's limit value. * match.c (match_line): Convert to lazy-string-aware style; i.e. avoidance of triggering a force of the whole string. (match_files): Bugfix in argument processing of freeform directive. * regex.h (nfam_result_t): New typedef. (nfa_machine_reset): New function declaration. (nfa_machine_feed): Updated declaration. * regex.c (nfa_machine_init): Refactor to use nfa_machine_reset. (nfa_machine_feed): Return nfam_result_t rather than just int. (search_regex, match_regex): Refactor to work with lazy strings well. 2009-10-20 Kaz Kylheku Implement custom separator and limit in freeform. * lib.h (lazy_string): New struct member, opts. (lazy_str): Declaration updated. * lib.c (lazy_str): New constructor parameters to set the seprator string and numeric line limit. (lazy_str_force, lazy_str_upto): Honor the line limit, and use the separator string if provided. * match.c (match_files): Process the arguments for freeform directive. 2009-10-20 Kaz Kylheku * lib.c (sub_str): Avoid invoking c_str which forces the lazy string. 2009-10-20 Kaz Kylheku Start of implementation for freestyle matching. Lazy strings implemented, incompletely. Changed string function to implicitly strdup; non-strdup version changed to string_own. Fixed wrong uses of strdup rather than chk_strdup. Functions added to regex module to provide regex matching as a state machine to which characters are fed. * lib.h (type_t): New enum member LSTR, for lazy strings. (lstr_t, freestyle, type_check3, string_own): Declared. (string): Parameter changed to const char *. (lazy_stringp, split_str, lazy_str, lazy_str_force_upto, lazy_str_force, lazy_str_get_trailing_list, length_str_gt, length_str_ge, length_str_lt, length_str_le): Declared. * lib.c (lstr_t, freestyle): New symbol globals. (code2type, obj_print, obj_pprint, equal): Extended to handle LSTR. (type_check3): New function. (string_own): New function; does what string used to do. (string): Duplicates the string with strdup, so callers don't have to. (mkstring, copy_str, trim_str): Use string_own. (stringp): A lazy string is a kind of string. (lazy_stringp): New function. (length_str, c_str, search_str, sub_str, chr_str, chr_str_set): Handle lazy strings. (split_str): New function. (lazy_str, lazy_str_force_upto, lazy_str_force, lazy_str_get_trailing_list, length_str_gt, length_str_ge, length_str_lt, length_str_le): New functions. (obj_init): New symbols interned. Eliminated strdup calls. * gc.c (finalize, mark_obj): Changed to handle LSTR type. Eliminated default case from switch so we get a gcc diagnostic if a case is not handled. * match.c (match_files): Eliminated strdup calls. Added freeform directive. * parser.y (grammar): Changed string calls to string_own. * stream.c (stdio_get_line, get_string_from_stream): Changed string calls to string_own. (dir_get_line): Eliminated chk_strdup call. * txr.c (remove_hash_bang_line, main): Eliminated strdup calls. * regex.h (nfam_result): New union. (nfa_machine, nfa_machine_t): New struct and typedef. (nfa_machine_init, nfa_machine_cleanup, nfa_machine_feed, nfa_machine_match_span): New functions declared. * regex.c (nfa_machine_init, nfa_machine_cleanup, nfa_machine_feed, nfa_machine_match_span): New functions defined. 2009-10-18 Kaz Kylheku Trivial change allows regexps to be bound to variables, and used for matching. This Just Works because of the way match_line treats variables. * match.c (eval_form): Check for a regexp form and return it as a value representing itself. * regex.c (regexp): New function. * regex.h (regexp): Declared. 2009-10-17 Kaz Kylheku * deps.mk: Updated. 2009-10-17 Kaz Kylheku Version 018 Bugfixes: mistakes in debugging calls; infinite looping bug in collect; skip directive not advancing match by proper number of lines bug. * match.c (debuglcf): Cosmetic fix. (match_files): After recognizing nothrow in the file spec, replace it by a string. A few places expect first(files) to be a string. The skip directive must return whatever return value it obtained from the nested match_files call, and not substitute the current line number, so that the caller can proceed past the correct number of lines that were matched. Fixed obj_t * being passed to %s printf specifier in debug printf. Collect directive must make progress even if the nested spec makes no progress (returns successfully, but with the original line number). * txr.c (version): Bump. * txr.1: Bump version and date. * txr/tests/004/query-1.txr: New test case. * tests/004/query-1.expected: Expected result for new test case. 2009-10-17 Kaz Kylheku Version 017 Bugfix in exception subtype definition (defex). Tail recursion in marking function of garbage collector. -f option for specifying query file, allowing more options to follow, useful in hash-bang scripting and other situations. * txr.c: (version): Bump to 016 * txr.1: Bump version to 016. 2009-10-17 Kaz Kylheku * txr.1: Documented defex. * unwind.c (uw_register_subtype): Bugfix: if the subtype exists already, we must not delete it and create a new entry, but destructively point its entry to its assigned supertype. An exceptions is thrown rather than abort for attempts to make t a subtype of something other than itself. An attempt to make something other than nil a subtype of nil is diagnosed. Attempts to redefine the relationship between two types if they are already connected by one; this covers circularity and other cases, while still allowing a relaxed order of definition. 2009-10-17 Kaz Kylheku * gc.c (mark_obj_tail): New macro. (mark_obj): Optimized with manual tail recursion. The funtion will no longer generate long call stacks for long lists. Descending to the car field of a cons is still recursive, but ``car-heavy'' trees are rare. 2009-10-16 Kaz Kylheku Resurrect -f option, with different meaning. We need "-f query-file" so that hash-bang scripts can be written which can pass options to txr. * txr.c (help, main): Inplement and document -f. Also bugfix: do not throw file open errors as exceptions of type error, because these cause an abort, potentially leading to a core dump. They are now thrown as file_error. * txr.1: Documented -f. 2009-10-16 Kaz Kylheku Implemented @(next arg) for treating the command line as an input source. * txr.1: Updated, and fixed a few unrelated mistakes. * lib.c (dir): Removed unused symbol globa. (args): New symbol global. * lib.h (dir): Declaration removed. (args): Declared. match.c (match_files): Implemented @(next arg). Had to hack laziness to the file opening logic in match_files. If the function is entered with a spec whose first directive is @(next), then it defers opening the first file in the list of files (since it will be immediately abandoned in favor of another input source). This prevents an error in the situation when the arguments do not name files, and there is a @(next args) directive to process them as an input source. 2009-10-16 Kaz Kylheku Version 016 Catch clauses with parameters. Directive for throwing exceptions: throw. Directive for defining exception types: defex. -f option renamed to -c. * txr.c: (version): Bump to 016 * txr.1: Bump version to 016. 2009-10-16 Kaz Kylheku * txr.c (help, main): Changed -f argument to -c. This is consistent with the -c argument of the shell; -f looks like awk's -f option, which specifies a file, not a literal script body. * txr.1: Updated. 2009-10-15 Kaz Kylheku * txr.1: Grammar, spelling. 2009-10-15 Kaz Kylheku * parser.y (clauses_opt): Long overdue nonterminal added. (define_clause) simplified with clauses_opt. (try_clause): Error handling improved. (catch_clauses_opt): Catch and finally clauses can be empty. Error cases added. * txr.1: Updated. 2009-10-15 Kaz Kylheku * match.c (match_files): Use alist_remove1 for a one element removal. 2009-10-15 Kaz Kylheku * unwind.c (uw_throw): Add program prefix before unhandled exception text. Print it in the standard notation if it's not a string literal. * match.c (sem_error, file_err): Don't stick program prefix into exception text. 2009-10-15 Kaz Kylheku * unwind.c (uw_exception_subtype_p, uw_init): Slight change in representation for exception subtypes, saving one node in the list. 2009-10-15 Kaz Kylheku New throw and defex directives, catches with arguments. * lib.c (defex, throw): New symbol globals. (obj_init): Symbols interned. * lib.h (defex, throw): Declared. * match.c (match_files): Implemented throw and defex. Argument handling in catches. * unwind.c (uw_register_subtype): Returns right argument, so we can cleverly use it with reduce_left. * unwind.h (uw_register_subtype): Declaration updated. * txr.1: Updated. 2009-10-14 Kaz Kylheku Version 015 Code restructuring. Corruption bugfix in gc-debugging code. The nil symbol more properly implemented. Semantics change: collect treated as a failed match if it does not collect anything. Bugfix in function argument reconciliation: must only be done for unbound parameters. New @(local) directive (synonym of forget) for expressing local variables in functions. Quasi-literals: backquote-delimited literals that contain interpolated variables. Useful in next, output, bind and function calls. Hygiene: some implementation-inserted syntax tree elements are now in their own namespace so they can't clash with user-defined constructs. Rewritten streams implementation. Exception handling: try/catch/finally. Exceptions used internally and externally. File errors are mapped to exceptions now. Hash bang (#!) scripting supported. New -f paramater, allowing entire query to be specified as argument rather than from a file or stdin. * txr.c: (version): Bump to 014. * txr.1: Bump version to 014. More documentation about exceptions. 2009-10-14 Kaz Kylheku Support for hash bang execution, and embedding query in a command line argument. * txr.c (remove_hash_bang_line): New function. (main): Added -f option. Initialize and gc-protect yyin_stream, and use it in all places where yyin was previously set up. Diagnose when -a, -D and -f are wrongly clumped with other options. Remove the first line of the query if it starts with #!. * parser.h (yyin): Declaration removed. (yyin_stream): Declared. * parser.l (YY_INPUT): Macro defined. (yyin_stream): New global. * stream.c (string_in_get_line, string_in_get_char): Bugfix: wrong length function used. (string_in_ops): Bugfix: wrong get_char function wired in. (get_char): New function. * stream.h (get_char): Declared. * txr.1: -f option documented. 2009-10-14 Kaz Kylheku * lib.c (obj_print, obj_pprint): Print # syntax if an object has a bad type code; do not just return without printing anything. 2009-10-14 Kaz Kylheku Code cleanup and documentation. * txr.1: Start documenting quasiliterals, exception handling and nothrow in next and output. * parser.y (catch_clauses_opt): Add missing empty production, so that a try block doesn't have to have a finally clause. * lib.h (or2, or3, or4): New macros. * match.c (match_files): Allow output and next forms which just have one argument that is nothrow, as documented. * stream.c common_vformat, string_out_vcformat, string_out_vcformat, make_string_output_stream, make_dir_stream, close_stream, get_line, vformat, vcformat, format, cformat, put_string, put_cstring, put_char): Switch to new style type assertions. 2009-10-13 Kaz Kylheku New syntax for next and output directives, taking advantage of quasi-literals. Non-throwing behavior can be specified in both using nothrow. The old syntax is supported, and has the old semantics (non-throwing). Hence, the test cases pass again without modification. File open errors thrown as file_error type. * lib.c (nothrow, file_error): New symbol globals. (obj_init): New symbols interned. * lib.h (nothrow, file_error): Declared. * match.c (file_err): New function. (eval_form): Bugfix: if input is nil, or an atom other than a symbol, return the value hoisted into a cons. A nil return strictly means, unbound variable. (match_files): Support new syntax for next and and output. Throw open errors as file_err. * parser.l (grammar): Change how OUTPUT is returned to the style similar to DEFINE, so interior forms can be parsed. * parser.y (grammar): Fix up output_clause with new syntax. * unwind.c (uw_throw): Do not abort on unhandled file_error, but terminate with a failed status. (uw_init): Register file_error as a subtype of error exception. 2009-10-13 Kaz Kylheku First cut at working try/catch/finally implementation. * lib.c (try, catch, finally): New symbol globals. (obj_init): New symbols interned. * lib.h (try, catch, finally: Declared. * parser.y (TRY, CATCH, FINALLY): New tokens. (try_clause, catch_clauses_opt): New nonterminal grammar symbols. * parser.l (yybadtoken): TRY, CATCH and FINALLY handled. (grammar): New cases for try, catch and finally. * unwind.h (struct uw_catch): New member called visible. (uw_continue): New parameter added. (uw_exception_subtype_p): Declared. (uw_catch_begin): Macro rewritten to use switch logic around setjmp. (uw_do_unwind, uw_catch, uw_unwind): New macros. (uw_catch_end): Rewritten to close switch, and automatically continue the unwinding if the block is entered as an unwind. * unwind.c (uw_unwind_to_exit_point): Exception catching frames made invisible via new flag prior to control passing to them. longjmp code 2 introduced for distinguishing a catch from an unwind. Visibility flag is checked and invisible frames are skipped. (uw_push_catch): cont member of the unwind frame initialized to zero. (exception_subtype_p): Renamed to uw_exception_subtype_p, changed to extern. Fixed wrong order of arguments to assoc. (uw_throw): Honor visibility flag: do not consider invisible catch frames. (uw_register_subtype): sup/sub mixup bugfix. (uw_continue): Takes extra argument: the continuation frame that (re)establishes the exit point for the unwinding. This allows nested unwinding action to take place in a finally, and then to continue to the original exit point. * match.c (match_files): Handling for try directive added. 2009-10-13 Kaz Kylheku * parser.l (yybadtoken): Bugfix: added missing LITCHAR case. * unwind.h (internal_error): Fixed broken macro. * match.c (match_line, match_files): sem_error bugfix: used %a instead of ~a. (match_files): Wrap block handler in compound statement, otherwise the macroexpansion declares a variable in the middle of a statement, which is a gcc extension to C90 (or a C99 feature, but we aren't using C99). 2009-10-08 Kaz Kylheku Exception handling for query errors. Verbose logging decoupled from yyerror functions. Superior object-oriented formatting used for cleaner code. * lib.c (query_error): New symbol global. (obj_init): New symbol interned. * lib.h (query_error): Declared. * match.c (output_produced): Variable changed to external linkage. (debugf, debuglf, debuglcf, sem_error): New static functions. (dest_bind, match_line, match_files): Regtargetted away from the yyerrorf and yyerrorlf functions to use debugf, debuglf, debuglcf for logging and sem_error for throwing query errors as exceptions. * parser.h (spec_file_str): New global declared. * parser.l (yyerror): Calls yyerrorf instead of yyerrorlf; lets yyerrorf increment error count. (yyerrorf): Loses level argument. (yyerrorlf): Function removed. (yybadtoken): Retargetted from yyerrorlf to yyerrorf. (grammar): yyerrorf call fixed up. * txr.c (spec_file_str): New global defined. (main): Protects new global against gc, and initializes it. * unwind.c (uw_throw): If an unhandled exception is of type query_error, it results in an exit rather than abort. The false string is conditionally printed. (uw_init): Register query_error as subtype of error. 2009-10-08 Kaz Kylheku Exception handling framework implemented. * lib.c (cobj_t, error, type_error, internal_err, numeric_err, range_err): New symbol globals. (prog_string): New string global. (code2type): New static function. (typeof): Rewritten using code2type. (type_check, type_check2): New static functions. (car, cdr, list, plus, minus, length_str, chr_p, chr_str, chr_str_set, apply, funcall, funcall1, funcall2, vec_get_fill, vecref_l, lazy_stream_cons): Checks and assertions rewritten using new functions and macros. (obj_init): prog_string protected from gc. New symbols interned. (init): uw_init() call moved after obj_init() because it needs stable symbols. * lib.h (cobj_t, error, type_error, internal_err, numeric_err, range_err, prog_string, type_check, type_check2): Declared. * match.c (dump_var, complex_snarf, complex_close): abort calls rewritten to use exception handling. * regex.c (nfa_all_states, nfa_closure, nfa_move): Likewise. * stream.c (string_out_vcformat): Bugfix: fill index not updated. (make_string_output_stream): Bugfix: initial buffer not null terminated. (get_string_from_stream): New function. * stream.h (get_string_from_stream): Declared. * txr.c (main): Some error prints turned to throws. * unwind.c (unwind_to_exit_point): Supports UW_CATCH frames, whose finalization logic has to be invoked during unwinding, and as target exit points. (uw_init): Installs exception symbols into subtyping hirearchy. (uw_push_catch, exception_subtype_p, uw_throw, uw_throwf, uw_errorf, uw_throwcf, uw_errorcf, type_mismatch, uw_register_subtype, uw_continue): New functions. (exception_subtypes): New static global. * unwind.h (noreturn): New macro, conditionally defined on __GNUC__. (enum uw_frtype): New member, UW_CATCH. (struct uw_catch): New struct type. (union uw_frame): New member, ca. (uw_push_catch, exception_subtype_p, uw_throw, uw_throwf, uw_errorf, uw_throwcf, uw_errorcf, type_mismatch, uw_register_subtype, uw_continue): New functions declared. (uw_catch_begin, uw_catch_end, internal_error, type_assert, bug_unless, numeric_assert, range_bug_unless): New macros. 2009-10-07 Kaz Kylheku Rewritten streams implementation. * stream.h, stream.c: New files. * Makefile (OBJS): New object file stream.o. * dep.mk: Dependencies updated. * gc.c (finalize): STREAM case removed. Call destroy only if not null. (mark_obj): STREAM case removed. * lib.c (push, pop): New functions. (equal): STREAM case removed. (sub_str): Allow from parameter to be nil, defaulting to zero. (stdio_line_read, stdio_line_write, stdio_close, stdio_line_stream, pipe_close, pipe_line_stream, dirent_read, dirent_close, dirent_stream, stream_get, stream_pushback, stream_put, stream_close): Functions removed. (stream_ops dirent_stream_ops, stdio_line_stream_ops, struct stream_ops, pipe_line_stream_op): Static structs removed. (lazy_stream_func, lazy_stream_cons): Retargetted to new streams. (cobj_print_op): Likewise. (init): Disables and restores GC, instead of doing it in obj_init. (obj_print): Retargetted to new streams. (obj_pprint): New function. (obj_init): Does not manipulate gc_state any more, moved to init. Call to stream_init added. (d, snarf): Retargetted to new streams. (snarf_line): Removed, now appears in stream.c, retargetted to new streams. * lib.h (enum type): STREAM removed. (struct stream, struct stream_ops): Removed. (struct cobj_ops): Retargetted to new streams. (union obj): sm member removed. (push, pop, obj_pprint): Declared. (stdio_line_stream, pipe_line_stream, dirent_stream, stream_get, stream_pushback, stream_put, stream_close, snarf_line): Removed. (cobj_print_op, dump, snarf): Modified. * match.c (dump_bindings, complex_snarf): Retargetted to new streams. * txr.c (main): format used to dump bindings and specs in verbose mode. 2009-10-07 Kaz Kylheku Implemented quasi-literals: string literals which may contain variables to be interpolated. Also, took care of a hygiene problem with respect to some parser-generated forms, which must be invisible to the user. * Makefile (LEX_DB_FLAGS): New variable; helpful in generating a lexical analyzer with debug tracing. * parser.l (nesting, closechar): Static variables removed. (char_esc): Add \` escape for quasi-literals. (stack): New %option, to generate a scanner which has a start condition stack. (QSILIT): New start condition. (grammar): Refactored to use start condition stacks. Quasi-literal lexical analysis added. * parser.y (lit_char_helper): New function, for factoring out some common logic between string literals and quasi literals. (quasilit, quasi_item, quasi_items): New grammar symbols and production rules. (strlit): Rule shortened with new helper function. Bugfix: error case assigns nil to $$. (chrlist): Bugfix: error case assigns nil to $$. (LITCHAR): Added to %prec table to fix shift-reduce problem. (expr): Production now can generate a quasilit. * lib.c (quasi): New symbol global. (obj_init): Intern quasi as "$quasi", so the user can make a function called quasi. Also, var and regex are now interned with the names "$var" and "$regex" for the same reason. * lib.h (quasi): Declared. * match.c (eval_form): Rewritten with recursive processing to handle deeply embedded variables, as well as quasi-strings. (subst_vars): Handles quasi-strings. (match_files): Function calls now use eval_form for function argument evaluation, except of course in the special case that if an argument is a symbol, it may be unbound. 2009-10-06 Kaz Kylheku * match.c (match_files): No error message for merging to a symbol which is already bound; the existing behavior is to destructively update the binding, which is useful, and so the error is pointless. 2009-10-06 Kaz Kylheku Introduce local as synonym to forget. It does exactly the same thing; a previous binding is forgotten. This spelling is nicer for functions. * lib.h (local): Declared. * lib.c (local): Defined. (obj_init): New symbol interned. 2009-10-06 Kaz Kylheku Bugfix: function parameter reconciliation (after function call completes) must only consider the unbound parameters. Otherwise false mismatches result if the function destructively manipulated some bindings of bound parameters. E.g. @(define foo (a)) is called as @(foo "bar") and internally it rebinds bound parameter a to "baz". This situation is not a mismatch. The rebinding is thrown away. * match.c (match_files): When processing a function call, keep an alist which associates arguments and unbound parameters. Then, after the function call, process the alist, rather than the full parameter list. 2009-10-06 Kaz Kylheku Semantics change: collect fails if it does not collect anything. Non-failing behavior can be obtained by wrapping with @(maybe) (but no such workaround for coll yet). * match.c (match_line): Return nil if coll collected nothing. (match_files): Return nil if collect collected nothing. 2009-10-06 Kaz Kylheku Bugfix: nil must be on the list of interned symbols. * lib.c (sym_name): Function removed. This was like symbol_name but did not accept nil. (intern): Use symbol_name instead of sym_name, allowing nil to be on the list of interned symbols. (obj_init): Add nil to interned_syms list. (nil_string): Changed from "NIL" to "nil". * match.c (dest_bind): Treat nil as a value, not a symbol. (match_files): Treat nil as a value when it's a function argument. 2009-10-06 Kaz Kylheku * gc.c (more): Bugfix: free_tail was incorectly calculated, thereby destroying the validity of the FIFO recycling algorithm used when GC debugging is enabled. This showed up as mysterious assertions and crashes. (mark_obj): Do not abort if a free object is marked. (mark_mem_region): Renamed bottom and top variables to low and high. The naming was confusing inverted relative to that in the caller. (sweep): Abort if somehow a block is free and marked reachable. 2009-10-06 Kaz Kylheku * match.c (match_files): Fixed nonexitent symbol warning for merge directive (complained about wrong symbol). 2009-10-05 Kaz Kylheku Refactoring matching code. * lib.h (cobj_ops): New function pointer, mark. * gc.c (mark_obj): For a COBJ type, call the mark function if the pointer is non-null. (gc_mark): New public function, wrapper that calls the private mark_obj. Implementations of mark for COBJ objects will need to call this. * gc.h (mark_obj): Declared. * regex.c (regex_obj_ops): Explicitly initialize mark function pointer to null. 2009-10-05 Kaz Kylheku Code restructuring. * Makefile (match.o): New object file. (depend): New rule for generating dep.mk, using txr. (lib.o, lex.yy.o, regex.o, y.tab.o unwind.o, txr.o, match.o, gc.o): Dependency rules removed. * dep.mk: New make include file; captures dependencies. Generated by new depend rule in Makefile, using txr. * depend.txr: Txr query to generate dependencies. * extract.y: File renamed to parser.y (output_produced): Variable removed, moved into new file match.c. (dump_shell_string, dump_shell_string, dump_var, dump_bindings, depth, weird_merge, map_leaf_lists, dest_bind, eval_form, match_line, format_field, subs_vars, complex_open, complex_open_failed, complex_close, complex_snarf, robust_length, bind_car, bind_cdr, extract_vars, extract_bindings, do_output_line, do_output, match_files, extract): Functions removed, added to match.c. (struct fpip): Definition removed, added to match.c (, , , , , "gc.h", "unwind.h"): Unneeded headers removed. * match.c: New file. * extract.l: Renamed to parser.l. * extract.h: Renamed to parser.h. (opt_loglevel, opt_nobindings, opt_arraydims, version, progname): Declarations moved to txr.h. (extract): Dclaration moved to match.h. * txr.h, match.h: New headers. * gc.h (opt_gc_debug): Moved to txr.h. 2009-10-03 Kaz Kylheku Version 014 New cases directive. New define directive: user-defined dynamically scoped functions. String literals in bind and function calls. EOF in the middle of a line handled properly. * extract.l (version): Bump to 014. * txr.1: Bump version to 014. 2009-10-02 Kaz Kylheku New cases directive. * extract.l (yybadtoken): Add case for CASES. (grammar): Tokenize cases directive. * extract.y (CASES): New token kind. (cases_clause): New grammar symbol. (grammar): Implement new grammar cases. (match_files): Implement semantics for cases. * lib.c (cases): New global. (obj_init): Intern cases symbol. * lib.h (cases): Declared. * txr.1: Documented. 2009-10-02 Kaz Kylheku Support for string and character literals. * extract.l (char_esc): Support \' and \" escapes. (STRLIT, CHRLIT): New flex start conditions. (grammar): New rules for tokenizing string literals. * extract.y (LITCHAR): New token kind. (strlit, chrlit, litchars): New grammar symbols. (grammar): Implement string literal parsing. (dump_var): Support character objects, treating them as one-character strings. (eval_form): New function. (match_files): In bind directive, allow the right hand side to be an arbitrary object. * lib.c (mkustring, init_str): New functions. (cat_str): Allow characters in the mix, treating them as one-character strings. * lib.h (mkustring, init_str): Declared. (chrp, chr_str, chr_str_set): New function. * txr.1: Documented. 2009-10-02 Kaz Kylheku Support for query-defined functions. * extract.l (yybadtoken): New DEFINE case. (NESTED): New flex start condition. This allows for different lexing rules in nested lists, so even though for instance @(collect) is a special token @((collect)) isn't. (grammar): Refactored with NESTED. Tokenize define directive. * extract.y (define_transform): New function. (DEFINE): New token kind. (define_clause): New grammar symbol. (match_files): Implement define semantics, and function calls. * lib.c (define): New global. * lib.h (define): Declared. (proper_listp, alist_remove1, copy_cons, copy_alist): New functions. (obj_init): Intern define symbol. (init): Call new function uw_init. * unwind.c (toplevel_env): New static structure. (uw_unwind_to_exit_point): Support new UW_ENV frame type. (uw_init, uw_find_env, uw_push_env, uw_get_func, uw_set_func): New functions. * unwind.h (UW_ENV): New enumeration member in uw_frtype. (uw_dynamic_env): New struct. (uw_block_begin, uw_block_end): Renamed some variables. (uw_env_begin, uw_env_end): New macros. * txr.1: Documented. 2009-10-02 Kaz Kylheku Misc. bugfixes and improvements. * extract.l (grammar): Newline in a directive no longer an error. Why not allow it. * extract.y (grammar): Productions for catching empty bodies in some constructs now end with END newl, rather than just END, so parsing can continue sanely. (match_lines): In diagnostics, don't say "ignored" about material which causes an error that fails the query! * lib.c (mkstring): Initialize length since we know it! (c_str): Take a symbol as an arg, so we don't have to keep writing c_str(symbol_name(sym)). (obj_print): Use isprint rather than isctrl to decide whether to print a character as an escape. (snarf_line): Properly handle EOF in the middle of line. 2009-09-29 Kaz Kylheku Version 013 Some minor garbage collection issues fixed. Infinite looping bug fixed. New @(trailer) directive. * extract.y (match_files): Implemented trailer directive. * extract.l (version): Bump to 013. * lib.h (trailer): Declaration added. * lib.c (trailer): External definition added. (obj_init): Initializer trailer with interned symbol. * txr.1: Documented @(trailer) and bumped version to 013. 2009-09-29 Kaz Kylheku Looping bug fixed. Certain directives could cause an infinite loop if the query has run out of data. * extract.y (match_files): The semantics of the first_file_parsed argument changes a little bit. Previously, if nil was passed, a new lazy stream would be opened for the first file. But this is ambiguous because nil also means empty list; sometimes when we recurse into match_files, the data has ran out and this argument is thus nil. Now, that argument must be the symbol t in order to mean ``open the first file''. If the argument is nil, it unambiously means ``we are at the end of the current file; don't open anything''. (extract): The initial call to match_files now passes the symbol t for the first_file_parsed argument. 2009-09-29 Kaz Kylheku Fixing some gc issues. The test cases were found to bomb with an assertion when run with --gc-debug enabled, due to a garbage-collected object still being used. This was due to the way the main function was structured. Also, the stack ``top'' terminology in the gc was stupidly wrong. Leaf function frames are at the stack top, and main is near the bottom. I was thinking of the ``top caller''. * Makefile (TXR_DBG_OPTS): New variable. Tests are now run with --gc-debug, which makes them slower, but has much greater chance of trapping gc problems. * extract.l (main): Two variables are now used for determining the stack bottom. We don't know in which order the compiler places local variables into a stack frame. (This is a separate question from that of the direction of stack growth). The call to the init function is now done right away. The argument processing section of main does some processing with GC objects, but the init function was being called afterward, before the list of interned symbols is protected from garbage collection! So with --gc-debug turned on, parts of the interned symbol list were being garbage collected (since the variable has not yet been added to the set of root pointers, which is done in the init function). Also, the use of an unknown --long-option is diagnosed properly now. * gc.c (gc_stack_top): Renamed to gc_stack_bottom, and converted from extern to static. (mark): Follows rename of gc_stack_top to gc_stack_bottom. (sweep): Eliminated the freed variable for counting freed objects, and the associated debug message, which was not useful. Commented why the free list is managed differently when dbg is turned on. (gc_init): New function. * gc.h (gc_stack_top): Declaration removed. (gc_init): Declaration added. * lib.c (min): New macro. (init): Takes two additional arguments which are used to determine the stack bottom. The function first determiens whether the stack grows up or down. Then it takes the greater or smaller of the two potential stack top pointers, based on that. The result is passed go gc_init. * lib.h (init): Declaration updated. 2009-09-28 Kaz Kylheku Version 012 Semantics change of @(until) in @(collect) and @(coll). Minor fixes. * extract.y (match_line, match_files): The until clauses continue to be processed after the main clauses of the collect or coll (to see the bindings), but are processed before the collection occurs, so that the until will veto the bindings of the last iteration. Moreover, the data positions stays where it is when this happens, and no arrangement is made to match the until material again. * txr.1: Tried to document the change. 2009-09-27 Kaz Kylheku * txr.1: following proofread, fixed various escaping problems and instances of missing text. 2009-09-26 Kaz Kylheku * lib.c (equal): Bugfixes: wrong fallthrough of FUN case. VEC case must return nil, not break. 2009-09-26 Kaz Kylheku Preparation for some sorting support. * extract.y (merge): Renamed to weird_merge. (map_leaf_lists): New functino. (match_file): Follow weird_merge rename. * lib.c (all_satisfy, none_satisfy, string_lt, do_bind2other, bind2other, merge, do_sort, sort): New functions. * lib.h (all_satsify, none_satisfy, string_lt, bind2other, sort): Declared. 2009-09-25 Kaz Kylheku Version 011 New @(maybe) clause optionally matches (does not fail if none of its clauses match anything). New blocks feature: allows a query or subquery to be abruptly terminated by invoking an exit to a named or anonymous block. @(collect) and @(skip) have implicit anonymous blocks now. The @(skip) directive takes a numeric argument now, which limits how many lines are searched. * Makefile, extract.l, extract.y, extract.h, gc.c, gc.h, lib.c, lib.h, regex.c, regex.h, txr.1, unwind.c, unwind.h: Copyright notice and license text updated or added, and version bumped up to 011. * tests/001/query-1.txr, tests/001/query-2.txr, tests/001/query-3.txr, tests/002/query-1.txr: Assigned to public domain. 2009-09-25 Kaz Kylheku New features: - named blocks; - maybe clause; - optional iteration bound on skip. * extract.y: includes added: "unwind.h", . (MAYBE, OR): New grammar tokens. (maybe_clause): New nonterminal grammar symbol. (expr): A NUMBER can be an expression now, so that @(skip 42) is valid syntax. (match_files): Support for numeric argument in skip directive to bound the search to a maximum number of lines. Anonymous block established around skip. New directives implemented: maybe, block, accept and fail. Anonymous block established around collect. * txr.1: Documentation updated with new features. * Makefile: new object file unwind.o, and associated rules. * extract.l (yybadtoken): New cases for MAYBE and OR. (grammar): Likewise. * lib.c (block, fail, accept): New symbol variables. (obj_init): New symbols interned. * lib.h (block, fail, accept): Declared. (if2, if3): Macros fixed so test expression is not compared to nil, but implicitly tested as boolean. * unwind.c, unwind.h: New source files. 2009-09-24 Kaz Kylheku Stability fixes. * extract.y (match_files): Fixed invalid string("-") to string(chk_strdup("-")) which caused a freeing of a non-malloced string at gc finalization time. * regex.c (nfa_state_shallow_free): New function: does not free satellite objects, just the structure itself. (nfa_combine): Use nfa_state_shallow_free instead of nfa_state_free, because the merged state inherits ownership of objects from the state being spliced out. (nfa_state_set): Fix lack of initialization of s.visited member of the state structure. 2009-09-24 Kaz Kylheku Version 010 A file specs can start with $, which means read a directory. Data sources are not into memory at once, but on demand, which can reduce memory for many queries. Regular expressions are now compiled once, when the query is parsed. Character escapes are now supported in regular expressions, and as a special syntax. * extract.l (version): Bumped to 010. (grammar): 8 and 9 are not octal digits; handle all regex backslash escaping in lexical grammar. * extract.y (grammar): Get rid of backslash handling from regex grammar. Lexer returns a REGCHAR for every escaped item. In situations where an operator character is implicily literal, like * in a character class, we use the grammar to include that alongside REGCHAR. Bugfixes: the character ], when not closing a class, is not a syntax error but stands for itself; the character - stands for itself outside of character class; the | character is literal in a character class. * txr.1: Updated version. Documented character escapes. 2009-09-24 Kaz Kylheku Lazy stream list improvement: no extra NIL element caused by end-of-file. Requires push-back support in streams. To avoid introducing a new structure member into streams, we extend the semantics of the label member, and rename it to label_pushback. * lib.c (stdio_line_stream, pipe_line_stream, dirent_stream): Follow rename of struct stream member; assert that label is an atom. (stream_get): Check pushback stack first and get item from there. (stream_pushback): New function. (lazy_stream_func): Pull one more item from the stream and use /that/ to decide whether to continue the lazy stream. The extra item is pushed back, if valid. (lazy_stream_cons): Simplified: no hack involving regular cons. Starts the induction by peeking into the stream. If something is there, it is pushed back, and a lazy cons is constructed which will fetch it. (obj_print): Made aware of the pushback, which must be skipped to get to the terminating label. * lib.h (struct stream): Member renamed from label to label_pushback. (stream_pushback): New function declaration. 2009-09-23 Kaz Kylheku Escape syntax in regexes, and text. The standard seven character escapes are supported, namely \a, \b, \t, \n, \v, \f, and \r, as well as hex and octal escapes, plus the code \e for ASCII ESC. * extract.l (char_esc, num_esc): New functions. (grammar): New lex cases. * lib.c (obj_print): Support all character escapes in printing. Bugfix: backslash printed as two backslashes, not one. 2009-09-23 Kaz Kylheku * tests/002/query-1.txr: Modified to use $ to scan thread subdirectories. * tests/002/query-1.expected: Updated. 2009-09-23 Kaz Kylheku New COBJ type for wrapping arbitrary C objects into the Lisp-like framework. Compiled regexes are objects now. Regexes in a query are now compiled just once. * extract.y (grammar): Regexes compiled while parsing. (match_line): Modify with respect to the abstract syntax tree change, and the interface changes in the match_regex, and search_regex functions. * gc.c (mark_obj, finalize): Handle marking and finalization of COBJ objects. * lib.c (typeof, equal, obj_print): Handle COBJ. (cobj, cobj_print_op): New functions. * lib.h (type_t): New enum element, COBJ. (struct cobj, struct subj_ops): New types. (union obj): New member, co. (cobj, cobj_print_op): New functions declared. * regex.c (regex_equal, regex_destroy, regex_compile, regex_nfa): New functions. (regex_obj_ops): New static struct. (search_regex, match_regex): Interface change. Regex arguments are now compiled regexes. Functions won't handle raw regexes. * regex.h (regex_compile, regex_nfa): New functions declared. 2009-09-23 Kaz Kylheku New feature: file specs that start with $ read directories. Reading from an ``ls'' pipe is too slow. Streams and lazy conses implemented. Lazy conses allow us to treat a file or other kind of stream exactly as if it were a list. We can use car and cdr, etc. But only the parts of the list that we actually touch are instantiated on-the-fly by reading from the underlying stream. * extract.l: inclusion of added. * extract.l: inclusion of added. * extract.y (fpip_closedir): new enumeration in struct fpip, and fpip_noclose removed. (complex_open): Check for leading $, use opendir. (complex_open_failed): New function. (complex_close): Handle fpip_closedir case. Not closing stdin and stdout is handled by explicit comparison now. (complex_snarf): New function, constructs stream of a suitable type, over object returned from complex_close, wraps it in a lazy list. (match_files): Use complex_snarf instead of snarf to get a lazy list. * gc.c: Handle LCONS and STREAM cases. * lib.c (stream_t, lcons_t): New variables holding symbols. (typeof, equal, obj_print): Handle LCONS and STREAM. (car, cdr, car_l, cdr_l, consp, atom, listp): Rewritten to handle LCONS. (chk_strdup, stdio_line_read, stdio_line_write, stdio_close stdio_line_stream, pipe_close, pipe_line_stream, dirent_read, dirent_close, dirent_stream, stream_get, stream_put, stream_close, make_lazycons, lazy_stream_func, lazy_stream_cons): New functions. (stdio_line_stream_ops, pipe_line_stream_ops, dirent_stream_ops): New static structs. (obj_init): Intern new symbols lstream, lcons, and dir. * lib.h (type_t): New enum members STREAM and LCONS. (struct stream, struct stream_ops, struct lazy_cons): New types. (union obj): New members sm and lc. (chk_strdup, stdio_line_stream, pipe_line_stream, dirent_stream, stream_get, stream_put, stream_close, lazy_stream_cons): New function declarations. * regex.c: inclusion of added 2009-09-23 Kaz Kylheku Version 009 User-friendly error messages from parser. Fixed -q option. * extract.l (version): Bumped to 009. * txr.1: Updated version. 2009-09-22 Kaz Kylheku * Makefile (LIBLEX): New variable. Refer to lex library as -lfl, using variable that can be overridden. 2009-09-22 Kaz Kylheku * extract.h (yybadtoken): New function declaration. * extract.l (yybadtoken): New function. (main): Fixed -q option. * extract.y (grammar): Lots of new error productions, some phrase rules refactored, resulting in much more user-friendly error diagnosis. * txr.1: -q option semantics clarified.