summaryrefslogtreecommitdiffstats
path: root/parser.l
Commit message (Collapse)AuthorAgeFilesLines
...
* parser: diagnose syntax like 0.1.2 and .1.1.Kaz Kylheku2017-04-021-3/+3
| | | | | | | | | Currently (list .1.1) yields (0.1 0.1). This is evading the rule for catching cramped floating-point literals. * parser.l (grammar): Carefully weaken the pattern match in the relevant rule for catching cramped floating-point literals, so it matches these cases.
* Bugfix: .1 treated as dot if preceded by space.Kaz Kylheku2017-04-021-4/+4
| | | | | | | | | | | | Some recent work in supporting .slot syntax (uref dot) broke the treatment of floating point literals. This is because part of the trick is that a uref dot is recognized with leading whitespace as part of the token. But that of course means it steals the match for some floating-point tokens; oops! * parser.l (grammar): All rules for floating-point tokens which can match a leading decimal point now munch optional whitespace first.
* Package prefix handling on directive symbols.Kaz Kylheku2017-03-271-30/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The directives which are involved in special phrase structure syntax like @(collect), @(end), @(and) and many others have always been a hack, recognized specially in the lexical analyzer and handled in the parser. The identifiers were not treated via the normal Lisp interning mechanism. In this patch, we try to make the illusion more complete and functional. Going forward, these symbols are understood as being interned in the usr package. As a special relaxation, keyword symbols may be used in their place, so that @(:end) is the same as @(end) and @(:collect) is the same as @(collect). Suppose that @(collect) is scanned, but the collect symbol interned in the current package isn't usr:collect, or keyword:collect. Then this is an error. Further, package prefixes may be used. The syntax @(abc:collect) is still valid and is still recognized as the head of the @(collect) phrase structure syntax. However, if abc:collect isn't the same symbol as either usr:collect or :collect, then an error is triggered. * parser.l (grammar): Recognize optional package prefixes on directive phrase structure identifiers. (directive_tok): Extract package prefix and symbol from lexeme. Implement the above described checks for all the cases. * txr.1: Added description of this under the Packages and Symbols section.
* Lexer refactoring: special syntax tokens.Kaz Kylheku2017-03-271-90/+43
| | | | | | * parser.l (directive_tok): New static function. (grammar): Replace repeated code with calls to directive_tok.
* uref: the a.b.c syntax extended to .a.b.cKaz Kylheku2017-03-061-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now it is possible to use a leading dot on the referencing dot syntax. This is the is the "unbound reference dot". It expands to the uref macro, which denotes an unbound-reference: it produces a function which takes an object as the argument, and curries the reference implied by the remaining arguments. * eval.c (uref_s): New global symbol variable. (eval_init): Intern uref symbol and init uref_s. * eval.h (uref_s): Declared. * lib.c (simple_qref_args_p): A qref expression is now also not simple if it contains an embedded uref, meaning that it cannot be rendered into the dot notation without ambiguity. (obj_print_impl): Support printing (uref a b c) as .a.b.c. * lisplib.c (struct_set_entries): Add uref to the list of autoload triggers for struct.tl. * parser.l (DOTDOT): Consume any leading whitespace as part of recognizing the DOTDOT token. Otherwise the new rule for UREFDOT, which matches (mandatory) leading space will take precedence, causing " .." to be scanned wrong. (UREFDOT): Rule for new kind of dot token, which is preceded by mandatory whitespace, and isn't consing dot (which has mandatory trailing whitespace too, matched by an earlier rule). * parser.y (UREFDOT): New token type. (i_dot_expr, n_dot_expr): New grammar rules. (list): Handle a leading dot on the first element of a list as a special case. Things are done this way because trying to work a UREFDOT into the grammar otherwise causes intractable conflicts. (i_expr): The ^, ' and , punctuators are now followed by an i_dot_expr, so that the expression can be an unbound dot. (n_expr): Same change as in i_expr, but using n_dot_expr. Plus new UREFDOT n_expr production. * share/txr/stdlib/struct.tl (uref): New macro. * txr.1: Documented.
* parser: diagnose run-on symbols.Kaz Kylheku2017-02-011-0/+14
| | | | | | | | | | * parser.l (grammar): Add rules which capture two symbols glued together, and diagnose as bad token. Of course a legitimate symbol token can be divided into two that are glued together. This rule is placed after the legitimate symbol matching rule, so that if a token can be interpreted as a single symbol token or as two, the first interpretation is taken.
* parser: diagnose more kinds of junk after float.Kaz Kylheku2017-02-011-1/+2
| | | | | | | | | * parser.l (grammar): Add a rule that if a floating-point (of the type that ends in decimal digits with an optional exponent) is immediately followed by a period which is not followed by another period (range syntax), it is trailing junk. For instance 1.0.3 or .2.$, or 1.0. followed by no other input.
* Bump copyright year to 2017.Kaz Kylheku2017-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | * LICENSE, LICENSE-CYG, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, signal.c, signal.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/except.tl, share/txr/stdlib/getopts.tl, share/txr/stdlib/getput.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/package.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/tagbody.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl: Add 2017 to all copyright headers and strings.
* Fix some C style casts to use casting macros.Kaz Kylheku2016-12-071-2/+2
| | | | | | | | | | | | | | | | | | | | This is uncovered by compiling with g++ using -Wold-style-cast. * mpi/mpi.c (mp_get_intptr): Use convert macro. Also in one of the rules producing REGCHAR. * parser.l (num_esc): Likewise. * struct.c (static_slot_set, static_slot_ens_rec, get_equal_method): Use coerce macro for int to pointer conversion. * sysif.c (setgroups_wrap): Use convert macro. * termios.c (termios_unpack, termios_pack): Likewise. * txr.c (sysroot_init): Likewise.
* Removes stray debug printf from lexer.Kaz Kylheku2016-12-041-1/+0
| | | | | | * parser.l: A stray printf was committed in November 2015. The spurious output only occurs when certain invalid floating-point syntax is encountered.
* Harden processing of character escapes.Kaz Kylheku2016-12-021-4/+8
| | | | | | | | | | | | | | | | | | Weakness uncovered by fuzzing with AFL (fast) 2.30b. The failing test case is regex syntax like [\1111111...111abc], where the bad character escape allows an invalid, negatively valued character object to escape out of the parser into the system leading to an an out-of-bounds array access in the char set code in the regex compiler. * parser.l (num_esc): Make sure that an out-of-range character is mapped to zero. Set up a default value of zero for the return variable. If the character token has too many digits, don't pass them through strtol at all, which will produce a garbage value. Then in the final range check, actually replace the value with zero if it is out of range: issuing a diagnostic is not enough.
* Support #: reading for uninterned symbols.Kaz Kylheku2016-11-071-4/+4
| | | | | | | | | | | | | | * parser.l (BTKEY, NTKEY): Renamed to BTKWUN and NTKWUN ("keyword and uninterned") respectively. Include an optional match for the # character. (BTOK, NTOK): Refer to BTKEY and NTKEY respectively * parser.y (sym_helper): Implement uninterned symbols by detecting when the package name string is "#" and handling specially. * txr.1: Documented package prefixes and uninterned symbols.
* New #; syntax for erasing following object.Kaz Kylheku2016-11-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * parser.c (parser_circ_ref): Don't generate the circular reference if circular suppression is in effect. * parser.h (struct parser): New member, circ_suppress. We use this for suppressing the generation of circular #n# references in erased objects. * parser.l (grammar): Scan #; producing HASH_SEMI token. * parser.y (HASH_SEMI): New token. (hash_semis_n_expr, hash_semis_i_expr, ignored_i_exprs, ignored_n_exprs): New nonterminals, needed for supporting the use of #; in front of top-level forms. (spec): Use hash_semis_n_expr and hash_semis_i_expr instead of n_expr and i_expr. (r_expr): Support object erasure within nested syntax. (yybadtoken): Handle H_SEMI token. (parse): Initialize new circ_suppress member of parser struct to zero. * txr.1: Documented. * genvim.txr (txr_ign_par, txr_ign_bkt, txr_ign_par_interior, txr_ign_bkt_interior): New regions for colorizing erased objects (partial support). (txr_list, txr_bracket, txr_mlist, txr_mbrackets): Include erased objects by including regions txr_ign_par and txr_ign_bkt. * txr.vim, tl.vim: Regenerated.
* Adding notation for cycles and shared structure.Kaz Kylheku2016-10-181-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit implements the parse-side support for handling a notation that exists in ANSI Common Lisp for specifying objects with cycles and shared substructure. * parser.h (struct parser): New members, circ_ref_hash and circ_count. (circref_s, parser_resolve_circ, parser_circ_def, parser_circ_ref): Declared. * parser.c (circref_s): New symbol variable. (parser_mark): Visit the new circ_ref_hash member of the parser structure. (parser_common_init): Initialize new members circ_ref_hash and circ_count of parser structure. (patch_ref, circ_backpatch): New static functions. (parser_resolve_circ, parser_circ_def, parser_circ_ref): New functions. (circref): New static function. (parse_init): Initialize circref_s as sys:circref symbol. Register sys:circref function. * parser.l (grammar): Scan #<num>= and #<num># notation as tokens, extracting their numeric value. * parser.y (HASH_N_EQUALS, HASH_N_HASH): New token types. (i_expr, n_expr): Adding phrases for hash-equalsign and hash-hash syntax. (yybadtoken): Handle new token types in switch. (parse_once): Call parser_resolve_circ after parsing to rewrite any remaining #<num># references in the structure to the objects they denote. (parse): Reset new struct parse members to initial state. Call parser_resolve_circ after parsing to rewrite any remaining #<num># references.
* Synchronize license comments with LICENSE.Kaz Kylheku2016-10-011-16/+17
| | | | | | | | | | | | | | | | | | | | * Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, ftw.c, ftw.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/awk.tl, share/txr/stdlib/build.tl, share/txr/stdlib/cadr.tl, share/txr/stdlib/conv.tl, share/txr/stdlib/except.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/socket.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/termios.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, socket.c, socket.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, termios.c, termios.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Revert to verbatim 2-Clause BSD.
* Allow whitespace between @ and ; in comments.Kaz Kylheku2016-05-231-2/+2
| | | | | | | * parser.l (grammar): Recognize {WS}* between @ and ; (or the legacy #) in comments. * txr.1: Documentation updated.
* Handle non-UTF-8 byte in regex scanned from string.Kaz Kylheku2016-04-211-0/+6
| | | | | | | | The current behavior is that there is no lex rule for this, so such a byte gets echoed. parser.l (grammar): Add fallback rule to match one byte in SREGEX state and turn it into 0xDCxx character.
* Better job of diagnosing out-of-range char escapes.Kaz Kylheku2016-04-211-2/+9
| | | | | | * parser.l (num_esc): Check for converted value being out of the range of wchar_t or beyond 0x10FFFF, whichever is less.
* Bugfix: allow newline in regex parsing from string.Kaz Kylheku2016-04-181-1/+7
| | | | | | | * parser.l (grammar): The newline character is incorrectly handled by the same rule under the SREGEX and REGEX states. In the SREGEX state, just return it as a REGCHAR, not forgetting to increment the line number.
* Trailing whitespace.Kaz Kylheku2016-04-181-1/+1
| | | | * parser.l: Remove trailing whitespace.
* Revamp bad character messages in lexer.Kaz Kylheku2016-04-011-4/+15
| | | | | | | | * parser.l (grammar): Drop colon from unrecognized escape message. "bad character in directive" handles various cases to avoid printing junk to the terminal. Basic message harmonizes with the one in the yybadtoken function in the parser. Non-UTF-8 byte printed as TXR hex integer literal.
* gc bug: prepared_msg field of struct parser.Kaz Kylheku2016-03-071-1/+2
| | | | | | | * parser.l (yyerrprepf): Replace wrong bare assignment to parser->prepared_msg with proper set macro which handles the mutation of a mature generation object such that it points to a baby object.
* New :mandatory keyword in until/last clauses.Kaz Kylheku2016-01-151-5/+4
| | | | | | | | | | | | | | | | | | | * match.c (mandatory_k): New keyword variable. (h_coll, v_gather, v_collect): Implement :mandatory logic. (syms_init): Initialize mandatory_k. * parser.l (grammar): The UNTIL and LAST tokens must be matched similarly to collect, without consuming the closing parenthesis, allowing a list of items to be parsed between the symbol and the closure, in the NESTED state. * parser.y (gather_clause, collect_clause, elem, repeat_parts_opt, rep_parts_opt): Adjust to new until/last syntax. In the matching productions, the abstract syntax changes to incorporate the options. In the output productions, we throw an error if options are present. * txr.1: Documented :mandatory for collect, coll and gather.
* Copyright year bump.Kaz Kylheku2015-12-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | * LICENSE, METALICENSE, Makefile, args.c, args.h, arith.c, arith.h, cadr.c, cadr.h, combi.c, combi.h, configure, debug.c, debug.h, eval.c, eval.h, filter.c, filter.h, gc.c, gc.h, glob.c, glob.h, hash.c, hash.h, jmp.S, lib.c, lib.h, lisplib.c, lisplib.h, match.c, match.h, parser.c, parser.h, parser.l, parser.y, rand.c, rand.h, regex.c, regex.h, share/txr/stdlib/cadr.tl, share/txr/stdlib/except.tl, share/txr/stdlib/hash.tl, share/txr/stdlib/ifa.tl, share/txr/stdlib/path-test.tl, share/txr/stdlib/place.tl, share/txr/stdlib/struct.tl, share/txr/stdlib/txr-case.tl, share/txr/stdlib/type.tl, share/txr/stdlib/with-resources.tl, share/txr/stdlib/with-stream.tl, share/txr/stdlib/yield.tl, signal.c, signal.h, stream.c, stream.h, struct.c, struct.h, sysif.c, sysif.h, syslog.c, syslog.h, txr.1, txr.c, txr.h, unwind.c, unwind.h, utf8.c, utf8.h: Add 2016 copyright. * linenoise/LICENSE, linenoise/linenoise.c, linenoise/linenoise.h: Bump one principal author's copyright from 2014 to 2015. The code is based on a snapshot of 2015 upstream work.
* Implementing *print-base* and ~d format directive.Kaz Kylheku2015-11-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * debug.c (show_bindings): Use ~d for level, so as not to be influenced by *print-base*. (debug): Use ~d for line numbers. * lib.c (gensym): Use ~d conversion specifier for formatting gensym counter into symbol name. * match.c (LOG_MISMATCH, LOG_MATCH): Use ~d for line number references. (h_skip, h_coll, h_fun, h_chr, match_line_completely, v_skip, v_fuzz, v_gather, v_collect, v_output, v_filter, v_fun, v_assert, v_load, v_line, h_assert, open_data_source): Use ~d for line refs, number of iterations, errno values. * parser.c (repl): Use ~d for prompt line numbers, numbered variables and the expr-<n> string in error messages. * parser.l (yyerrorf, source_loc_str): Use ~d for line numbers. * stream.c (print_base_s): New symbol variable. (formatv): Implement *print-base*. (stdio_maybe_read_error, stdio_maybe_error, stdio_close, pipe_close, open_directory, open_file, open_fileno, open_tail, open_process, run, remove_path): Use ~d for errno values. (stream_init): Initialize print_base_s and register *print-base* special variable. sysif.c (mkdir_wrap, ensure_dir, getcwd_wrap, mknod_wrap, chmod_wrap, symlink_wrap, link_wrap, readlink_wrap, excec_wrap, stat_impl, pipe_wrap, poll_wrap, getgroups_wrap, setuid_wrap, seteuid_wrap, setgid_wrap): Use ~d for errno values and system function results. * txr.1: Documented *print-base* and ~d conversion specifier.
* New iread function.Kaz Kylheku2015-11-071-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The read function no longer works like it used to on an interactive terminal because of the support for .. and . syntax on a top-level expression. The iread function is provided which uses a modified syntax that doesn't support these operators on a top-level expression. The parser thus doesn't look one token ahead, and so iread can return immediately. * eval.c (eval_init): Register iread intrinsic function. * parser.c (prime_parser): Only push back the recently seen token when priming for a regular Lisp read. Handle the prime_interactive method by preparing a SECRET_ESCAPE_I token. (lisp_parse_impl): New static function, formed from previous lisp_parse. Takes a boolean argument indicating interactive mode. (prime_parser_post): New function. (lisp_parse): Now a wrapper for lisp_parse_impl which passes a nil to indicate noninteractive read. (iread): New function. * parser.h (enum prime_parser): New member, prime_interactive. (scrub_scanner, iread, prime_parser_post): Declared. * parser.l (prime_scanner): Handle the prime_interactive case the same way as prime_lisp. (scrub_scanner): New function. * parser.y (SECRET_ESCAPE_I): New token type. (i_expr): New nonterminal symbol. Like n_expr, but doesn't support dot or dotdot operators, except in nested subexpressions. (spec): Handle SECRET_ESCAPE_I by way of i_expr. (sym_helper): Before freeing the token lexeme, call scrub_scanner. If the token is registered as the scanner's most recently seen token, the scanner must forget that registration, because it is no longer valid. (parse): Call prime_parser_post. * txr.1: Documented iread.
* New range type, distinct from cons cell.Kaz Kylheku2015-11-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * eval.c (eval_init): Register intrinsic functions rcons, rangep from and to. (eval_init): Register rangep intrinsic. * gc.c (mark_obj): Traverse RNG objects. (finalize): Handle RNG in switch. * hash.c (equal_hash, eql_hash): Hashing for for RNG objects. * lib.c (range_s, rcons_s): New symbol variables. (code2type): Handle RNG type. (eql, equal): Equality for ranges. (less_tab_init): Table extended to cover RNG. (less): Semantics defined for ranges. (rcons, rangep, from, to): New functions. (obj_init): range_s and rcons_s variables initialized. (obj_print_impl): Produce #R notation for ranges. (generic_funcall, dwim_set): Recognize range objects for indexing * lib.h (enum type): New enum member, RNG. MAXTYPE redefined to RNG value. (TYPE_SHIFT): Increased to 5 since there are now 16 type codes. (struct range): New struct type. (union obj): New member rn, of type struct range. (range_s, rcons_s, rcons, rangep, from, to): Declared. (range_bind): New macro. * parser.l (grammar): New rule for recognizing the #R sequence as HASH_R token. * parser.y (HASH_R): New terminal symbol. (range): New nonterminal symbol. (n_expr): Derives the new range symbol. The n_expr DOTDOT n_expr rule produces rcons expression rather than const. * match.c (format_field): Recognize rcons syntax in fields which is now what ranges translate to. Also recognize range object. * tests/013/maze.tl (neigh): Fix code which destructures range as a cons. That can't be done any more. * txr.1: Document ranges.
* Better diagnostic for cramped floating literals.Kaz Kylheku2015-10-071-2/+7
| | | | | | * parser.l: Different text needed for ).1 and a.1 cases, because the insertion of a zero cannot fix the latter. Might as well make the messages more detailed.
* syntax: be tolerant of carriage returns.Kaz Kylheku2015-09-161-15/+16
| | | | | | | | | | | | | | This is needed for multi-line mode with CR line breaks. It also makes TXR tolerant when code is ported among systems with different line endings. * parser.l (NL): New lex named pattern, matching three possible line terminators: CR, NL or CR-NL. (grammar): In places where \n was previously matched, use {NL}. In a few places where \n is in a character class, add \r. In one place (comment matching), the the pattern . which implicitly doesn't match newlines had to be replaced with [^\r\n].
* Parse errors lose program prefix and parens.Kaz Kylheku2015-09-061-2/+7
| | | | | | | | | | | * parser.l (yyerrorf): Don't print the program prefix and parenthes, except if compatibility to 114 or older is requested. The main motivation for this is the repl, where the program prefix is not informative. The new format is also a de facto standard which is compatible with other parsers. Vim understands it directly. * txr.1: Documented.
* One-liner to allow @{obj.slot} in quasiliterals.Kaz Kylheku2015-09-021-1/+1
| | | | | | | | * parser.l (grammar): Recognize '.' token in BRACED state also. * genvim.txr: @{obj.slot ...} syntax highlighting support. Include txr_dot and txr_dotdot in txr_bracevar region.
* Introducing structs.Kaz Kylheku2015-09-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * args.c (args_cat_zap): New function. * args.h: (args_cat_zap): Declared. * eval.c (struct_lit_s): New symbol variable. (eval_init): Initialize struct_lit_s. * eval.h (struct_lit_s): Declared. * gc.c (finalize): If a symbol has a struct slot hash attached to it, we must free it when the symbol is reclaimed. * lib.c (make_sym): Initialize symbol's slot_cache pointer to null. (copy): Copy structure objects. (init): Call struct_init to initialize struct module. * lib.h (SLOT_CACHE_SIZE): New preprocessor symbol (slot_cache_line_t, slot_cache_t): New typedefs. (struct sym): New member, slot_cache. * lisplib.c (struct_set_entries, struct_instantiate): New static functions. (liplib_init): Register new functions in dl_table. parser.y (HASH_S): New terminal symbol. (struct): New grammar rule. (n_expr): Derive struct. (yybadtoken): Map HASH_S to #S string. parser.l (grammar): Recognize #S and return HASH_S token. share/txr/stdlib/place.tl (slot): New defplace. share/txr/stdlib/struct.tl: New file. struct.c: New file. struct.h: New file. * Makefile (OBJS): Adding struct.o.
* Allow slashes in regex passed to regex-parse.Kaz Kylheku2015-08-151-16/+15
| | | | | | | | | | | * parser.l (SREGEX): New start state, for stand-alone regex parsing. (grammar): All REGEX state rules are active in the SREGEX state also. The rule for the / character returns a REGCHAR if in the SREGEX state, so it is treated as an ordinary character. * txr.1: Updated regex-parse documentation about the treatment of the slash. Also added notes about double escaping when a string literal is passed to regex-parse.
* Floating-point constant tightening.Kaz Kylheku2015-08-121-8/+8
| | | | | | | | | | | | | * parser.l (grammar): Change order of rule which recognizes FLODOT with a one-character trailing context other than a dot, and the rule which diagnoses trailing junk. The issue is that this order gives the wrong interpretation to 123.E, treating it as 123. followed by E rather than trailing junk, like in the case of 123.0E or 123.B. * txr.1: Adding the valid example 1.E5. Removing references to dot as consing dot. Fixed documentation which says that 1.E is 1 followed by a consing dot and E. The wrong behavior in fact produced 1.0 followed by E. No consing dot semantics.
* Use new pushback token priming for single regex parse.Kaz Kylheku2015-08-121-7/+10
| | | | | | | | | | | | | | | | | | | | | | | * parser.h (enum prime_parser): New enum. (prime_parser, prime_scanner, parse): Declarations updated with new argument. * parser.c (prime_parser): New argument of enum prime_parser type Select appropriate secret token for regex and Lisp case. Pass prime selector down to prime_scanner. (regex_parse): Do not prepend secret escape to string. Do not use parse_once function; instead do the parser init and cleanup here and use the parse function. (lisp_parse): Pass new argument to parse, configuring the parser to be primed for Lisp parsing. * parser.l (grammar): Rule producing SECRET_ESCAPE_R removed. (prime_scanner): New argument. Pop the scanner state down to INITIAL. Then unconditionally switch to appopriate state based on priming configuration. * parser.y (parse): New argument for priming selection, passed down to prime parser.
* Crafting a better parser-priming hack.Kaz Kylheku2015-08-121-19/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The method of inserting a character sequence which generates a SECRET_TOKEN_E token is being replaced with a purely token based method. Because we don't manipulate the input stream, the lexer is not involved. We don't have to flush its state and deal with the carry-over of the yy_hold_char. This comes about because recent changes expose a weakness in the old scheme. Now that a top-level expression can have the form expr.expr, it means that the Yacc parser reads one token ahead, to see whether there is a dot or something else. This lookahead token is discarded. We must re-create it when we call yyparse again. This re-creation is done by creating a custom yylex function, which can maintain pushback tokens. We can prime this array of pushback tokens to generate the SECRET_TOKEN_E, as well as to re-inject the lookahead symbol that was thrown away by the previous yyparse. To know which lookahead symbol to re-inject is simple: the scanner just keeps a copy of the most recent token that it returns to the parser. When the parser returns, that token must be the lookahead one. The tokens we keep now in the parser structure are subject to garbage collection, and so we must mark them. Since the YYSTYPE union has no type field, a new API is opened up into the garbage collector to help implement a conservative GC technique. * gc.c (gc_is_heap_obj): New function. * gc.h (gc_is_heap_obj): Declared. * match.c: Include y.tab.h. This is now needed by any module that needs to instantiate a parser_t structure, because members of type YYSTYPE occur in the structure. (parser.h can still be included without y.tab.h, but only an incomplete declaration for the parser strucure is then given, and a few functions are not declared.) * parser.c (yy_tok_mark): New static function. (parser_mark): Mark the recent token and the pushback tokens. (parser_common_init): Initialize the recent token, the pushback tokens, and the pushback stack index. (pushback_token): New static function. (prime_parser): hold_byte argument removed. Body considerably simplified. The catenated stream trick is no longer required. All we do here is set up two pushback tokens and prime the scanner, if necessary, so it is in the right start state for Lisp. * parser.l (YY_DECL): Take over definition of scanning function, renaming to yylex_impl, so we can implement yylex. (grammar): Rule which produces SECRET_ESCAPE_E token removed. (reset_scanner): Function removed. (yylex): New function. * parser.h (struct parser): Now only forward-declared unless y.tab.h has been included. New members, recent_tok, tok_pushback and tok_idx. (yyset_hold_char): Declared. (reset_scanner): Declaration removed. (yylex): Declared (if y.tab.h included). (prime_parser): Declaration updated. (prime_scanner): Declared. * Makefile: express new dependency on existence of y.tab.h of txr.o, match.o and parser.o.
* Diagnose ambiguous floats like (a b).4 and x.y.5Kaz Kylheku2015-08-101-0/+30
| | | | | | | | These look like integers involved in qref dot syntax. * parser.l (DOTFLO): New pattern definition. (grammar): New rules for detecting cramped floating literals.
* Dot with no whitespace generates qref syntax.Kaz Kylheku2015-08-101-0/+11
| | | | | | | | | | | | | | | | | | | | | | | a.b.(expr ...).c -> (qref a b (expr ...) c) Consing dot requires whitespace. * eval.c (qref_s): New symbol global variable. (eval_init): Initialize qref_s. * eval.h (qref_s): Declared. * parser.l (REQWS): New pattern definition, required whitespace. (grammar): New rules to scan CONSDOT (space required on both sides) and LAMBDOT (space required after). * parser.y (CONSDOT, LAMBDOT): New token types. (list): (. n_expr) rule replaced with LAMBDOT and CONSDOT. (r_exprs): r_exprs . n_expr consing dot rule replaced with CONSDOT. (n_expr): New n_expr . n_expr rule introduced here for producing qref expressions. (yybadtoken): Handle CONSDOT and LAMBDOT. * txr.1: Documented qref dot.
* Handle abc: token syntax.Kaz Kylheku2015-08-101-2/+2
| | | | | | * parser.l (BTREG, NTREG): Allow an empty string symbol name with a nonempty package name. Without this, abc: parses as abc :.
* * eval.c (force): Default the new second argument of source_loc_str.Kaz Kylheku2015-08-041-4/+4
| | | | | | | | | | | | | | | | | | | (eval_error): Derive location of error from the last_form_evaled, if form doesn't have it. (eval_init): Re-register source-loc-str as binary with an optional arg. * match.c (debuglf, sem_error, file_err, typed_error): Default new argument of source_loc_str. * parser.h (source_loc_str): Declaration updated. * parser.l (source_loc_str): Take second argument which specifies alternative value if the source loc info is not found. * unwind.c (uw_throw): Simplify code thanks to source_loc_str default argument. * txr.1: Document new argument of source-loc-str.
* * parser.l (grammar): Do not allow unescaped newline inKaz Kylheku2015-07-231-1/+15
| | | | | | | | | | | word list literals and word list quasiliterals, except in <= 109 compatibility mode. An escaped newline in these literals, together with surrounding whitespace, now produces a single space, except in <= 109 compatibility mode. * txr.1: Documented new rules for WLL's and QLL's, and added compatibility notes.
* Bugfix: lexer loses unmatched "hold char" between top-level forms.Kaz Kylheku2015-07-101-1/+5
| | | | | | | | | | | | | | | | | | Test case: file containing 4(prinl 3). Scanner consumes 4 and (. The ( is lost when the scanner is reset for the next call to yyparse, resulting in jut prinl being read and interpreted as a variable. * parser.c (prime_parser): If present, append hold byte to priming string. Takes parser_t * instead of parser, and returns void now. * parser.l (reset_scanner): Now returns int value, the value of the scanner's yy_hold_char variable which is nonzero when the scanner is hanging on to an unmatched byte of input. * parser.h (reset_scanner, prime_parser): Declarations updated. * parser.y (parse): Pass hold byte returned by reset_scanner to prime_parser.
* Parser cleanup: embed scanner in parser.Kaz Kylheku2015-07-091-0/+8
| | | | | | | | | | | | | | | | | | | | | | | * parser.c (parser_destroy): New GC finalizer static function. (parser_ops): Register parser_destroy. (parser_common_init): New function, shared by parse and parse_once. Initializes embedded scanner. (parser_cleanup): New function, shared by parse_once and parser_destroy. (parser): Use parser_common_init. * parser.h (parser_t): New member, yyscan. (reset_scanner, parser_common_init): Declared. * parser.l (reset_scanner): New function. * parser.y (parse_once): Use parser_common_init, and thus perform only a few initializations. Do not define scanner as a local variable. (parse): Call reset_scanner instead of yylex_init since the scanner is being reused, and for the same reason do not call yylex_destroy. GC will do that now.
* Bugfix: allow @1 in brace variables.Kaz Kylheku2015-06-071-1/+1
| | | | | | | * parser.l (grammar): Scan a METANUM token in the BRACED state also. This allows us to correctly reference op arguments in a quasiliteral, as in `foo @{@1 [1..2] ","} bar`.
* Support trailing semicolon after hex/octal characters.Kaz Kylheku2015-07-021-2/+9
| | | | | | | | | | | | | | | | * parser.l (%option): Remove nounput option since we need yyunput. (grammar): Rule for matching hex and octal escape in SPECIAL state recognizes optional semicolon. In 109 compatibility, this is pushed back into the stream, otherwise consumed. * txr.1: Updated documentation, including compat notes. * genvim.txr (txr_char): Include optional semicolon in match. Corrected some errors where 8 and 9 were being included as matches for octal digits. (txr_error): Default match for \x or \o not followed by digits.
* Third round of quasiliteral-related fixes.Kaz Kylheku2015-06-261-0/+6
| | | | | | * parser.l (char_esc): Recognize \@ escape. (grammar): Add a rule for a \@ escape in quasiliterals, and quasi word list literals.
* Second round of quasiliteral-related fixes.Kaz Kylheku2015-06-261-1/+6
| | | | | | | | | * parser.l: Only shift to QSPECIAL state when @ is followed by a trailing context consisting of certain characters. Not every kind of Lisp object syntax can be introduced with @ in a quasiliteral. Adding a rule to produce an error when @ appears that is not followed by an allowed character.
* First round of quasiliteral-related fixes.Kaz Kylheku2015-06-261-7/+8
| | | | | | | | | | | | | | | | | | | | | * parser.l: Do not try to recognize floating-point literals in QSPECIAL state; that is not possible because @134.3 in a quasiliteral parses as a METANUM followed by ".3". On the other hand, recognize METANUM literals in QSPECIAL state, so that @@123 scans. Recognize @ as a token in QSPECIAL state, so @@abc will scan. When transitioning from QSILIT and QWLIT states to QSPECIAL upon scanning @, return a @ token, which is now parsed in the grammar. * parser.y (quasi_meta_helper): New static function. (q_var): Do not handle SYMTOK any more, only the braced variable syntax. SYMTOK is handled as a n_expr. Braced vars are handled with explicit '@' token, which is now produced by the scanner when it shifts from QSILIT to QSPECIAL. (quasi_item): No longer necessary to recognize various forms here such as quotes and splices. Just recognize a n_expr, preceded by '@'.
* Do not allow unrecognized escapes in regex.Kaz Kylheku2015-04-191-11/+14
| | | | | | | | | | | | * parser.l (REGOP): New regex alias for matching all regex special characters. (grammar): Several rules for regex special characters merged together. New rule introduced to match a special character after a backslash, making it literal. The old rule which makes literal any character after a backslash now throws an error, unless version 105 comaptibility is selected. * txr.1: Documented this behavior change.
* Allow quasiquotes in braces and quasiliterals, and quotes in braces.Kaz Kylheku2015-04-151-6/+1
| | | | | | | | | * parser.l: Consolidate rules for recognizing quote, unquote, and quasiquote. An effect of this is that quasiquotes can now occur in braces and in string quasiliterals. * parser.y (quasi_item): Support quotes and quasiquotes as quasi items: that is to say, i.e. objects denoted by @ in a quasiliteral.