diff options
-rw-r--r-- | COPYING | 128 | ||||
-rw-r--r-- | Makefile | 36 | ||||
-rw-r--r-- | README | 25 | ||||
-rw-r--r-- | awk.h | 255 | ||||
-rw-r--r-- | awk.tab.c | 1696 | ||||
-rw-r--r-- | awk.y | 898 | ||||
-rw-r--r-- | awk1.c | 723 | ||||
-rw-r--r-- | awk2.c | 1129 | ||||
-rw-r--r-- | awk3.c | 900 | ||||
-rw-r--r-- | debug.c | 485 | ||||
-rw-r--r-- | obstack.c | 157 | ||||
-rw-r--r-- | obstack.h | 204 | ||||
-rw-r--r-- | regex.c | 1689 | ||||
-rw-r--r-- | regex.h | 213 |
14 files changed, 8538 insertions, 0 deletions
diff --git a/COPYING b/COPYING new file mode 100644 index 00000000..d0f89166 --- /dev/null +++ b/COPYING @@ -0,0 +1,128 @@ + + GAWK GENERAL PUBLIC LICENSE + (Clarified 20 March 1987) + + Copyright (C) 1986 Richard M. Stallman + Everyone is permitted to copy and distribute verbatim copies + of this license, but changing it is not allowed. + + The license agreements of most software companies keep you at the +mercy of those companies. By contrast, our general public license is +intended to give everyone the right to share GAWK. To make sure that +you get the rights we want you to have, we need to make restrictions +that forbid anyone to deny you these rights or to ask you to surrender +the rights. Hence this license agreement. + + Specifically, we want to make sure that you have the right to give +away copies of GAWK, that you receive source code or else can get it +if you want it, that you can change GAWK or use pieces of it in new +free programs, and that you know you can do these things. + + To make sure that everyone has such rights, we have to forbid you to +deprive anyone else of these rights. For example, if you distribute +copies of GAWK, you must give the recipients all the rights that you +have. You must make sure that they, too, receive or can get the +source code. And you must tell them their rights. + + Also, for our own protection, we must make certain that everyone +finds out that there is no warranty for GAWK. If GAWK is modified by +someone else and passed on, we want its recipients to know that what +they have is not what we distributed, so that any problems introduced +by others will not reflect on our reputation. + + Therefore we (Richard Stallman and the Free Software Foundation, +Inc.) make the following terms which say what you must do to be +allowed to distribute or change GAWK. + + + COPYING POLICIES + + 1. You may copy and distribute verbatim copies of GAWK source code as +you receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy a valid copyright notice "Copyright +(C) 1986 Free Software Foundation, Inc." (or with the year updated if +that is appropriate); keep intact the notices on all files that refer +to this License Agreement and to the absence of any warranty; and give +any other recipients of the GAWK program a copy of this License +Agreement along with the program. You may charge a distribution fee +for the physical act of transferring a copy. + + 2. You may modify your copy or copies of GAWK or any portion of it, +and copy and distribute such modifications under the terms of +Paragraph 1 above, provided that you also do the following: + + a) cause the modified files to carry prominent notices stating + that you changed the files and the date of any change; and + + b) cause the whole of any work that you distribute or publish, + that in whole or in part contains or is a derivative of GAWK or any + part thereof, to be licensed at no charge to all third parties on + terms identical to those contained in this License Agreement + (except that you may choose to grant more extensive warranty + protection to third parties, at your option). + + c) You may charge a distribution fee for the physical act of + transferring a copy, and you may at your option offer warranty + protection in exchange for a fee. + + 3. You may copy and distribute GAWK or any portion of it in +compiled, executable or object code form under the terms of Paragraphs +1 and 2 above provided that you do the following: + + a) cause each such copy to be accompanied by the + corresponding machine-readable source code, which must + be distributed under the terms of Paragraphs 1 and 2 above; or, + + b) cause each such copy to be accompanied by a + written offer, with no time limit, to give any third party + free (except for a nominal shipping charge) a machine readable + copy of the corresponding source code, to be distributed + under the terms of Paragraphs 1 and 2 above; or, + + c) in the case of a recipient of GAWK in compiled, executable + or object code form (without the corresponding source code) you + shall cause copies you distribute to be accompanied by a copy + of the written offer of source code which you received along + with the copy you received. + + 4. You may not copy, sublicense, distribute or transfer GAWK +except as expressly provided under this License Agreement. Any attempt +otherwise to copy, sublicense, distribute or transfer GAWK is void and +your rights to use the program under this License agreement shall be +automatically terminated. However, parties who have received computer +software programs from you with this License Agreement will not have +their licenses terminated so long as such parties remain in full compliance. + + 5. If you wish to incorporate parts of GAWK into other free +programs whose distribution conditions are different, write to the Free +Software Foundation at 1000 Mass Ave, Cambridge, MA 02138. We have not yet +worked out a simple rule that can be stated here, but we will often permit +this. We will be guided by the two goals of preserving the free status of +all derivatives of our free software and of promoting the sharing and reuse +of software. + +In other words, go ahead and share GAWK, but don't try to stop +anyone else from sharing it farther. Help stamp out software hoarding! + + NO WARRANTY + + BECAUSE GAWK IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY NO +WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT +WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC, +RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE GAWK "AS IS" WITHOUT +WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND +PERFORMANCE OF GAWK IS WITH YOU. SHOULD GAWK PROVE DEFECTIVE, YOU +ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M. +STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY +WHO MAY MODIFY AND REDISTRIBUTE GAWK AS PERMITTED ABOVE, BE LIABLE TO +YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR OTHER +SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR +INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA +BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR A +FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) GAWK, EVEN +IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES, OR FOR +ANY CLAIM BY ANY OTHER PARTY. diff --git a/Makefile b/Makefile new file mode 100644 index 00000000..e0f050cb --- /dev/null +++ b/Makefile @@ -0,0 +1,36 @@ +LIB=. +#CFLAGS=-O -pg -I$(LIB) -DFAST +CFLAGS=-g -I$(LIB) +LIBOBJS= $(LIB)/obstack.o $(LIB)/regex.o +OBJS = awk1.o awk2.o awk3.o awk.tab.o debug.o +gawk: $(OBJS) $(LIBOBJS) + $(CC) -o gawk $(CFLAGS) $(OBJS) $(LIBOBJS) -lm +$(OBJS): awk.h +awk.tab.c: awk.y + bison -v awk.y +# yacc -v awk.y +# mv y.tab.c awk.tab.c +lint: + lint -I$(LIB) awk.tab.c awk1.c awk2.c awk3.c +clean: + rm -f gawk *.o core awk.output awk.tab.c + +# We don't distribute shar files, but they're useful for mailing. +SHARS = COPYING Makefile README awk.h awk.tab.c awk.y awk1.c awk2.c awk3.c\ + debug.c obstack.h obstack.c regex.h regex.c + +awk.shar: $(SHARS) + shar -f awk -c $(SHARS) + +awk.tar: $(SHARS) + tar cvf awk.tar $(SHARS) + +awk.tar.Z: awk.tar + -@mv awk.tar.Z awk.tar.old.Z + compress < awk.tar > awk.tar.Z + +# This command probably won't be useful to the rest of the world, but makes +# life much easier for me. +dist: awk.tar awk.tar.Z + rcp awk.tar prep:/u2/emacs/awk.tar + rcp awk.tar.Z prep:/u2/emacs/awk.tar.Z @@ -0,0 +1,25 @@ +This is the Beta-test distribution of gawk. (Probably around version 1.01 +or so.) + +Please send all +bug-reports, comments, cries for help, etc, to hack@prep.ai.mit.edu +AKA mit-eddie!prep!hack During odd hours I can sometimes be reached at +(617) 253-8975, which is an MIT phone in the middle of the corridor, so don't +be suprised if someone wierd answers, or if the person on the other end has +never heard of me. (Direct them to the microvax about 10feet to their left.) + +Gawk requires some berkeleyisms, like alloca(), bcopy(), index(), etc. I +believe we have a portable version of alloca() (part of GNUemacs), and +probably the other stuff as well. Send me mail if you need anything. + +For real speed, you should change the Makefile to compile -O -DFAST and +disable the debugger. (-DFAST replaces some function calls with macros, and +disables a lot of debugging stuff.) + +If you don't have bison, modify the makefile to call yacc instead +(The proper commands should be already in the makefile; just un-comment them.) +If you have neither bison nor yacc, use the awk.tab.c file here. It was +generated with bison, and should have no AT+T code in it. (Note that +modifying awk.y without bison or yacc will be difficult, at best. You might +want to get a copy of bison from us too.) + @@ -0,0 +1,255 @@ +/* + * awk.h -- Definitions for gawk. + * + * Copyright (C) 1986 Free Software Foundation + * Written by Paul Rubin, August 1986 + * + */ + +/* +GAWK is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY. No author or distributor accepts responsibility to anyone +for the consequences of using it or for whether it serves any +particular purpose or works at all, unless he says so in writing. +Refer to the GAWK General Public License for full details. + +Everyone is granted permission to copy, modify and redistribute GAWK, +but only under the conditions described in the GAWK General Public +License. A copy of this license is supposed to have been given to you +along with GAWK so you can know your rights and responsibilities. It +should be in a file named COPYING. Among other things, the copyright +notice and this notice must be preserved on all copies. + +In other words, go ahead and share GAWK, but don't try to stop +anyone else from sharing it farther. Help stamp out software hoarding! +*/ + +#define AWKNUM float + +#include <ctype.h> +#define is_identchar(c) (isalnum(c) || (c) == '_') + +#include "obstack.h" +#define obstack_chunk_alloc malloc +#define obstack_chunk_free free +char *malloc(),*realloc(); +void free(); + +typedef enum { + /* illegal entry == 0 */ + Node_illegal, /* 0 */ + + /* binary operators lnode and rnode are the expressions to work on */ + Node_times, /* 1 */ + Node_quotient, /* 2 */ + Node_mod, /* 3 */ + Node_plus, /* 4 */ + Node_minus, /* 5 */ + Node_cond_pair, /* 6: conditional pair (see Node_line_range) jfw */ + Node_subscript, /* 7 */ + Node_concat, /* 8 */ + + /* unary operators subnode is the expression to work on */ + Node_preincrement, /* 9 */ + Node_predecrement, /* 10 */ + Node_postincrement, /* 11 */ + Node_postdecrement, /* 12 */ + Node_unary_minus, /* 13 */ + Node_field_spec, /* 14 */ + + /* assignments lnode is the var to assign to, rnode is the exp */ + Node_assign, /* 15 */ + Node_assign_times, /* 16 */ + Node_assign_quotient, /* 17 */ + Node_assign_mod, /* 18 */ + Node_assign_plus, /* 19 */ + Node_assign_minus, /* 20 */ + + /* boolean binaries lnode and rnode are expressions */ + Node_and, /* 21 */ + Node_or, /* 22 */ + + /* binary relationals compares lnode and rnode */ + Node_equal, /* 23 */ + Node_notequal, /* 24 */ + Node_less, /* 25 */ + Node_greater, /* 26 */ + Node_leq, /* 27 */ + Node_geq, /* 28 */ + + /* unary relationals works on subnode */ + Node_not, /* 29 */ + + /* match ops (binary) work on lnode and rnode ??? */ + Node_match, /* 30 */ + Node_nomatch, /* 31 */ + + /* data items */ + Node_string, /* 32 has stlen, stptr, and stref */ + Node_temp_string, /* 33 has stlen, stptr, and stref */ + Node_number, /* 34 has numbr */ + + /* program structures */ + Node_rule_list, /* 35 lnode is a rule, rnode is rest of list */ + Node_rule_node, /* 36 lnode is an conditional, rnode is statement */ + Node_statement_list, /* 37 lnode is a statement, rnode is more list */ + Node_if_branches, /* 38 lnode is to run on true, rnode on false */ + Node_expression_list, /* 39 lnode is an exp, rnode is more list */ + + /* keywords */ + Node_K_BEGIN, /* 40 no stuff */ + Node_K_END, /* 41 ditto */ + Node_K_if, /* 42 lnode is conditonal, rnode is if_branches */ + Node_K_while, /* 43 lnode is condtional, rnode is stuff to run */ + Node_K_for, /* 44 lnode is for_struct, rnode is stuff to run */ + Node_K_arrayfor, /* 45 lnode is for_struct, rnode is stuff to run */ + Node_K_break, /* 46 no subs */ + Node_K_continue, /* 47 no stuff */ + Node_K_print, /* 48 lnode is exp_list, rnode is redirect */ + Node_K_printf, /* 49 lnode is exp_list, rnode is redirect */ + Node_K_next, /* 59 no subs */ + Node_K_exit, /* 51 subnode is return value, or NULL */ + + /* I/O redirection for print statements */ + Node_redirect_output, /* 52 subnode is where to redirect */ + Node_redirect_append, /* 53 subnode is where to redirect */ + Node_redirect_pipe, /* 54 subnode is where to redirect */ + + /* Variables */ + Node_var, /* 55 rnode is value, lnode is array stuff */ + Node_var_array, /* 56 array is ptr to elements, asize num of eles */ + + /* Builtins subnode is explist to work on, proc is func to call */ + Node_builtin, /* 57 */ + + /* pattern: conditional ',' conditional ; lnode of Node_line_range is + * the two conditionals (Node_cond_pair), other word (rnode place) is + * a flag indicating whether or not this range has been entered. + * (jfw@eddie.mit.edu) + */ + Node_line_range, /* 58 */ +} NODETYPE; + +typedef struct exp_node { + NODETYPE type; + union { + struct { + struct exp_node *lptr; + union { + struct exp_node *rptr; + struct exp_node *(* pptr)(); + struct re_pattern_buffer *preg; + struct for_loop_header *hd; + struct ahash **av; + int r_ent; /* range entered (jfw) */ + } r; + } nodep; + struct { + struct exp_node **ap; + int as; + } ar; + struct { + char *sp; + short slen,sref; + } str; + AWKNUM fltnum; + } sub; +} NODE; + +#define lnode sub.nodep.lptr +#define rnode sub.nodep.r.rptr + +#define subnode lnode +#define proc sub.nodep.r.pptr + +#define reexp lnode +#define rereg sub.nodep.r.preg + +#define forsub lnode +#define forloop sub.nodep.r.hd + +#define array sub.ar.ap +#define arrsiz sub.ar.as + +#define stptr sub.str.sp +#define stlen sub.str.slen +#define stref sub.str.sref + +#define numbr sub.fltnum + +#define var_value lnode +#define var_array sub.nodep.r.av + +#define condpair lnode +#define triggered sub.nodep.r.r_ent + +NODE *newnode(), *dupnode(); +NODE *node(), *snode(), *make_number(), *make_string(); +NODE *mkrangenode(); /* to remove the temptation to use sub.nodep.r.rptr + * as a boolean flag, or to call node() with a 0 and + * hope that it will store correctly as an int. (jfw) + */ +NODE *tmp_string(),*tmp_number(); +NODE *variable(), *append_right(); + +NODE *tree_eval(); + +struct re_pattern_buffer *make_regexp(); + +extern NODE *Nnull_string; + +#ifdef FAST +double atof(); +NODE *strforce(); +#define force_number(x) ((x)->type==Node_number ? (x)->numbr : atof((x)->stptr)) +#define force_string(x) ((x)->type==Node_number ? (strforce(x)) : (x)) +#define tmp_node(ty) (global_tmp=(NODE *)obstack_alloc(&temp_strings,sizeof(NODE)),global_tmp->type=ty) +#define tmp_number(n) (tmp_node(Node_number),global_tmp->numbr=(n),global_tmp) +/* #define tmp_string(s,len) (tmp_node(Node_temp_string),global_tmp->stref=1,global_tmp->stlen=len,global_tmp->stptr=(char *)obstack_alloc(&temp_strings,len+1),bcopy(s,global_tmp->stptr,len),global_tmp->stptr[len]='\0',global_tmp) */ +NODE *global_tmp; +#else +AWKNUM force_number(); +NODE *force_string(); +#endif + +NODE *expression_value; + +#define HASHSIZE 101 + +typedef struct hashnode HASHNODE; +struct hashnode { + HASHNODE *next; + char *name; + int length; + NODE *value; +} *variables[HASHSIZE]; + + +typedef struct ahash AHASH; +struct ahash { + AHASH *next; + NODE *name, + *symbol, + *value; +}; + + +typedef struct for_loop_header { + NODE *init; + NODE *cond; + NODE *incr; +} FOR_LOOP_HEADER; + +FOR_LOOP_HEADER *make_for_loop(); + +#define ADD_ONE_REFERENCE(s) ++(s)->stref +#define FREE_ONE_REFERENCE(s) {\ + if(s==Nnull_string) {\ + fprintf(stderr,"Free_Nnull_string %d",(s)->stref);\ + }\ + if (--(s)->stref == 0) {\ + free((char *)((s)->stptr));\ + free((char *)s);\ + }\ +} +/* #define FREE_ONE_REFERENCE(s) {if (--(s)->stref == 0) {printf("FREE %x\n",s);free((s)->stptr);free(s);}} */ diff --git a/awk.tab.c b/awk.tab.c new file mode 100644 index 00000000..f0f43df4 --- /dev/null +++ b/awk.tab.c @@ -0,0 +1,1696 @@ + +/* A Bison parser, made from awk.y */ + +#define NAME 258 +#define REGEXP 259 +#define YSTRING 260 +#define ERROR 261 +#define INCDEC 262 +#define NUMBER 263 +#define ASSIGNOP 264 +#define RELOP 265 +#define MATCHOP 266 +#define NEWLINE 267 +#define REDIRECT_OP 268 +#define CONCAT_OP 269 +#define LEX_BEGIN 270 +#define LEX_END 271 +#define LEX_IF 272 +#define LEX_ELSE 273 +#define LEX_WHILE 274 +#define LEX_FOR 275 +#define LEX_BREAK 276 +#define LEX_CONTINUE 277 +#define LEX_PRINT 278 +#define LEX_PRINTF 279 +#define LEX_NEXT 280 +#define LEX_EXIT 281 +#define LEX_IN 282 +#define LEX_AND 283 +#define LEX_OR 284 +#define INCREMENT 285 +#define DECREMENT 286 +#define LEX_BUILTIN 287 +#define UNARY 288 + +#line 27 "awk.y" + +#define YYDEBUG 12 + +#include <stdio.h> +#include "awk.h" + + static int yylex (); + + + /* + * The following variable is used for a very sickening thing. + * The awk language uses white space as the string concatenation + * operator, but having a white space token that would have to appear + * everywhere in all the grammar rules would be unbearable. + * It turns out we can return CONCAT_OP exactly when there really + * is one, just from knowing what kinds of other tokens it can appear + * between (namely, constants, variables, or close parentheses). + * This is because concatenation has the lowest priority of all + * operators. want_concat_token is used to remember that something + * that could be the left side of a concat has just been returned. + * + * If anyone knows a cleaner way to do this (don't look at the Un*x + * code to find one, though), please suggest it. + */ + static int want_concat_token; + + /* Two more horrible kludges. The same comment applies to these two too */ + static int want_regexp; /* lexical scanning kludge */ + static int want_redirect; /* similarly */ + int lineno = 1; /* JF for error msgs */ + +/* During parsing of a gawk program, the pointer to the next character + is in this variable. */ + char *lexptr; /* JF moved it up here */ + char *lexptr_begin; /* JF for error msgs */ + +#line 64 "awk.y" +typedef union { + long lval; + AWKNUM fval; + NODE *nodeval; + NODETYPE nodetypeval; + char *sval; + NODE *(*ptrval)(); +} YYSTYPE; + +#ifndef YYLTYPE +typedef + struct yyltype + { + int timestamp; + int first_line; + int first_column; + int last_line; + int last_column; + char *text; + } + yyltype; + +#define YYLTYPE yyltype +#endif + +#define YYACCEPT return(0) +#define YYABORT return(1) +#define YYERROR return(1) +#include <stdio.h> + +#ifndef __STDC__ +#define const +#endif + + + +#define YYFINAL 200 +#define YYFLAG -32768 +#define YYNTBASE 49 + +#define YYTRANSLATE(x) (yytranslate[x]) + +static const char yytranslate[] = { 0, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 40, 2, 2, 48, 37, 2, 2, 41, + 42, 35, 33, 39, 34, 2, 36, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 45, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 46, 2, 47, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 43, 2, 44, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 1, 2, 3, 4, 5, + 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, + 26, 27, 28, 29, 30, 31, 32, 38 +}; + +static const short yyrline[] = { 0, + 105, 110, 112, 117, 122, 124, 126, 131, 134, 136, + 138, 140, 142, 151, 153, 158, 160, 164, 166, 171, + 173, 178, 180, 182, 186, 189, 193, 196, 197, 198, + 199, 201, 204, 206, 208, 210, 212, 214, 217, 220, + 222, 228, 230, 236, 239, 244, 246, 248, 250, 255, + 259, 265, 267, 271, 276, 282, 284, 288, 291, 293, + 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, + 319, 323, 325, 327, 329, 331, 334, 336, 340, 342, + 344, 346, 348, 350, 352, 354, 356, 358, 360, 364, + 366, 368, 370, 372, 375, 379, 382, 384 +}; + +static const char * yytname[] = { 0, +"error","$illegal.","NAME","REGEXP","YSTRING","ERROR","INCDEC","NUMBER","ASSIGNOP","RELOP", +"MATCHOP","NEWLINE","REDIRECT_OP","CONCAT_OP","LEX_BEGIN","LEX_END","LEX_IF","LEX_ELSE","LEX_WHILE","LEX_FOR", +"LEX_BREAK","LEX_CONTINUE","LEX_PRINT","LEX_PRINTF","LEX_NEXT","LEX_EXIT","LEX_IN","LEX_AND","LEX_OR","INCREMENT", +"DECREMENT","LEX_BUILTIN","'+'","'-'","'*'","'/'","'%'","UNARY","','","'!'", +"'('","')'","'{'","'}'","';'","'['","']'","'$'","start" +}; + +static const short yyr1[] = { 0, + 49, 50, 50, 51, 52, 52, 52, 53, 53, 53, + 53, 53, 53, 54, 53, 55, 53, 53, 53, 56, + 56, 57, 57, 57, 58, 58, 59, 59, 59, 59, + 59, 60, 60, 60, 60, 60, 60, 60, 60, 61, + 60, 62, 60, 63, 60, 60, 60, 60, 60, 64, + 64, 65, 65, 66, 66, 67, 67, 68, 68, 68, + 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, + 69, 69, 69, 69, 69, 69, 69, 69, 70, 70, + 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, + 70, 70, 70, 70, 70, 71, 71, 71 +}; + +static const short yyr2[] = { 0, + 2, 1, 2, 4, 0, 1, 3, 1, 1, 2, + 3, 3, 3, 0, 4, 0, 6, 3, 1, 0, + 4, 0, 1, 2, 2, 2, 0, 1, 1, 2, + 2, 5, 1, 6, 10, 9, 9, 2, 2, 0, + 5, 0, 5, 0, 7, 2, 2, 5, 2, 6, + 9, 0, 2, 0, 2, 0, 1, 0, 1, 3, + 4, 1, 3, 2, 2, 2, 2, 2, 1, 1, + 1, 3, 3, 3, 3, 3, 3, 3, 4, 1, + 3, 2, 2, 2, 2, 2, 1, 1, 1, 3, + 3, 3, 3, 3, 3, 1, 4, 2 +}; + +static const short yydefact[] = { 52, + 5, 96, 71, 70, 53, 8, 9, 0, 0, 62, + 0, 14, 0, 0, 0, 5, 2, 20, 6, 19, + 69, 0, 65, 66, 58, 0, 64, 0, 10, 0, + 19, 89, 88, 0, 0, 80, 0, 0, 98, 87, + 3, 27, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 67, 68, 0, 0, 59, + 0, 0, 13, 63, 83, 84, 58, 82, 0, 0, + 0, 0, 0, 0, 0, 85, 86, 29, 28, 22, + 52, 11, 12, 7, 18, 16, 77, 75, 76, 72, + 73, 74, 78, 97, 0, 61, 15, 0, 81, 95, + 93, 94, 90, 91, 92, 31, 30, 0, 0, 0, + 0, 0, 40, 42, 0, 0, 27, 0, 23, 33, + 0, 4, 0, 60, 79, 0, 0, 56, 52, 52, + 38, 39, 58, 58, 58, 46, 0, 47, 22, 21, + 24, 49, 0, 0, 0, 96, 0, 57, 25, 26, + 54, 0, 54, 0, 0, 17, 27, 27, 0, 0, + 0, 0, 44, 0, 0, 27, 0, 0, 0, 56, + 0, 55, 41, 54, 43, 48, 32, 50, 34, 0, + 0, 56, 0, 27, 27, 27, 0, 45, 0, 0, + 0, 27, 51, 37, 36, 0, 35, 0, 0, 0 +}; + +static const short yydefgoto[] = { 198, + 16, 17, 18, 19, 28, 123, 43, 118, 131, 80, + 119, 133, 135, 174, 120, 1, 162, 147, 59, 121, + 39, 21 +}; + +static const short yypact[] = {-32768, + 262, -29,-32768,-32768,-32768,-32768,-32768, 11, 11, 6, + 313,-32768, 326, 326, 345, 142,-32768, 17, 243, 399, + 31, 313,-32768,-32768, 313, 313,-32768, 61,-32768, 35, + 129,-32768,-32768, 11, 11, 39, 313, 313,-32768, 157, +-32768, 99, 81, 326, 326, 326, 313, 64, 313, 313, + 313, 313, 313, 313, 313,-32768,-32768, 72, -20, 254, + 87, 105,-32768,-32768,-32768,-32768, 313,-32768, 369, 313, + 313, 313, 313, 313, 313,-32768,-32768,-32768,-32768, 172, +-32768,-32768, 107, 89, 254,-32768, 336, 164, 164,-32768, +-32768,-32768, 254,-32768, 313,-32768,-32768, 46,-32768, 336, + 164, 164,-32768,-32768,-32768,-32768,-32768, 111, 114, 115, + -6, -6,-32768, 120, -6, 71, 99, 202,-32768,-32768, + -7, 156, 166, 254,-32768, 326, 326, 360,-32768,-32768, +-32768,-32768, 313, 313, 313,-32768, 313,-32768, 172,-32768, +-32768,-32768, 143, 109, 125, -5, 140, 254, 156, 156, + 3, 53, 3, 381, 232,-32768, 99, 99, 181, 292, + 313, -6,-32768, -6, -6, 99, 172, 172, 206, 313, + -25, 254,-32768, 198,-32768,-32768, 132, 194,-32768, 174, + 175, 313, -6, 99, 99, 99, 176,-32768, 172, 172, + 172, 99,-32768,-32768,-32768, 172,-32768, 214, 229,-32768 +}; + +static const short yypgoto[] = {-32768, +-32768, 215,-32768, -12,-32768,-32768,-32768, 91, -34, -82, + -100,-32768,-32768,-32768,-32768, -73, -95, -159, -36, -1, +-32768, 304 +}; + + +#define YYLAST 436 + + +static const short yytable[] = { 20, + 29, 30, 44, 45, 129, 129, 49, 122, 159, 27, + 181, 20, 31, 2, 20, 161, 22, 141, 95, 182, + 58, 96, 187, 60, 61, 50, 51, 52, 53, 54, + 98, 82, 83, 84, 139, 68, 69, 130, 130, 55, + 22, 95, 20, 20, 20, 85, 25, 87, 88, 89, + 90, 91, 92, 93, 141, 149, 150, 164, 15, 42, + 56, 57, 44, 45, 62, 60, 178, 179, 100, 101, + 102, 103, 104, 105, 167, 168, 63, 132, 183, 67, + 136, 138, 129, 177, 95, 49, 142, 125, 193, 194, + 195, 95, 81, 124, 163, 197, 151, 152, 153, 86, + 49, 189, 190, 191, 50, 51, 52, 53, 54, 196, + 78, 137, 79, 144, 145, 130, 44, 45, 94, 50, + 51, 52, 53, 54, 20, 20, 148, 173, 64, 175, + 176, 60, 60, 60, 44, 154, 44, 45, 47, 48, + 97, -1, 49, 106, 2, 107, 3, 171, 188, 4, + 157, 126, 44, 45, 127, 128, 6, 7, 20, 172, + 134, 50, 51, 52, 53, 54, 158, 5, 148, 143, + 64, 8, 9, 10, 2, 11, 3, 12, 156, 4, + 148, 13, 14, 106, 160, 107, 76, 77, 108, 15, + 109, 110, 111, 112, 113, 114, 115, 116, 52, 53, + 54, 8, 9, 10, 2, 11, 3, 169, 180, 4, + 161, 184, 26, 199, 117, 185, 186, 192, 108, 15, + 109, 110, 111, 112, 113, 114, 115, 116, 200, 155, + 41, 8, 9, 10, 2, 11, 3, 0, 0, 4, + 0, 0, 26, 0, 117, 140, 0, 0, 108, 15, + 109, 110, 111, 112, 113, 114, 115, 116, 0, 0, + 0, 8, 9, 10, 2, 11, 3, 49, 0, 4, + 44, 45, 26, 5, 117, 166, 6, 7, 0, 15, + 0, 46, 0, 0, 0, 0, 50, 51, 52, 53, + 54, 8, 9, 10, 2, 11, 3, 12, 0, 4, + 0, 13, 14, 0, 0, 0, 6, 7, 0, 15, + 0, 23, 24, 0, 0, 2, 0, 3, 40, 0, + 4, 8, 9, 10, 0, 11, 0, 12, 2, 0, + 3, 13, 14, 4, 0, 0, 170, 65, 66, 15, + 6, 7, 8, 9, 10, 0, 11, 2, 0, 32, + 0, 0, 33, 26, 0, 8, 9, 10, 0, 11, + 15, 12, 146, 0, 3, 13, 14, 4, 50, 51, + 52, 53, 54, 15, 34, 35, 36, 0, 37, 0, + 0, 0, 49, 0, 0, 38, 0, 0, 0, 8, + 9, 10, 15, 11, 49, 0, 0, 0, 0, 0, + 26, 50, 51, 52, 53, 54, 0, 15, 47, 48, + 99, 0, 49, 50, 51, 52, 53, 54, 0, 0, + 0, 0, 165, 0, 0, 0, 0, 0, 0, 0, + 0, 50, 51, 52, 53, 54 +}; + +static const short yycheck[] = { 1, + 13, 14, 28, 29, 12, 12, 14, 81, 14, 11, + 170, 13, 14, 3, 16, 13, 46, 118, 39, 45, + 22, 42, 182, 25, 26, 33, 34, 35, 36, 37, + 67, 44, 45, 46, 117, 37, 38, 45, 45, 9, + 46, 39, 44, 45, 46, 47, 41, 49, 50, 51, + 52, 53, 54, 55, 155, 129, 130, 153, 48, 43, + 30, 31, 28, 29, 4, 67, 167, 168, 70, 71, + 72, 73, 74, 75, 157, 158, 42, 112, 174, 41, + 115, 116, 12, 166, 39, 14, 121, 42, 189, 190, + 191, 39, 12, 95, 42, 196, 133, 134, 135, 36, + 14, 184, 185, 186, 33, 34, 35, 36, 37, 192, + 12, 41, 14, 126, 127, 45, 28, 29, 47, 33, + 34, 35, 36, 37, 126, 127, 128, 162, 42, 164, + 165, 133, 134, 135, 28, 137, 28, 29, 10, 11, + 36, 0, 14, 12, 3, 14, 5, 160, 183, 8, + 42, 41, 28, 29, 41, 41, 15, 16, 160, 161, + 41, 33, 34, 35, 36, 37, 42, 12, 170, 4, + 42, 30, 31, 32, 3, 34, 5, 36, 36, 8, + 182, 40, 41, 12, 45, 14, 30, 31, 17, 48, + 19, 20, 21, 22, 23, 24, 25, 26, 35, 36, + 37, 30, 31, 32, 3, 34, 5, 27, 3, 8, + 13, 18, 41, 0, 43, 42, 42, 42, 17, 48, + 19, 20, 21, 22, 23, 24, 25, 26, 0, 139, + 16, 30, 31, 32, 3, 34, 5, -1, -1, 8, + -1, -1, 41, -1, 43, 44, -1, -1, 17, 48, + 19, 20, 21, 22, 23, 24, 25, 26, -1, -1, + -1, 30, 31, 32, 3, 34, 5, 14, -1, 8, + 28, 29, 41, 12, 43, 44, 15, 16, -1, 48, + -1, 39, -1, -1, -1, -1, 33, 34, 35, 36, + 37, 30, 31, 32, 3, 34, 5, 36, -1, 8, + -1, 40, 41, -1, -1, -1, 15, 16, -1, 48, + -1, 8, 9, -1, -1, 3, -1, 5, 15, -1, + 8, 30, 31, 32, -1, 34, -1, 36, 3, -1, + 5, 40, 41, 8, -1, -1, 45, 34, 35, 48, + 15, 16, 30, 31, 32, -1, 34, 3, -1, 5, + -1, -1, 8, 41, -1, 30, 31, 32, -1, 34, + 48, 36, 3, -1, 5, 40, 41, 8, 33, 34, + 35, 36, 37, 48, 30, 31, 32, -1, 34, -1, + -1, -1, 14, -1, -1, 41, -1, -1, -1, 30, + 31, 32, 48, 34, 14, -1, -1, -1, -1, -1, + 41, 33, 34, 35, 36, 37, -1, 48, 10, 11, + 42, -1, 14, 33, 34, 35, 36, 37, -1, -1, + -1, -1, 42, -1, -1, -1, -1, -1, -1, -1, + -1, 33, 34, 35, 36, 37 +}; +#define YYPURE 1 + +#line 2 "bison.simple" + +/* Skeleton output parser for bison, + copyright (C) 1984 Bob Corbett and Richard Stallman + + Permission is granted to anyone to make or distribute verbatim copies of this program + provided that the copyright notice and this permission notice are preserved; + and provided that the recipient is not asked to waive or limit his right to + redistribute copies as permitted by this permission notice; + and provided that anyone possessing an executable copy + is granted access to copy the source code, in machine-readable form, + in some reasonable manner. + + Permission is granted to distribute derived works or enhanced versions of + this program under the above conditions with the additional condition + that the entire derivative or enhanced work + must be covered by a permission notice identical to this one. + + Anything distributed as part of a package containing portions derived + from this program, which cannot in current practice perform its function usefully + in the absense of what was derived directly from this program, + is to be considered as forming, together with the latter, + a single work derived from this program, + which must be entirely covered by a permission notice identical to this one + in order for distribution of the package to be permitted. + + In other words, you are welcome to use, share and improve this program. + You are forbidden to forbid anyone else to use, share and improve + what you give them. Help stamp out software-hoarding! */ + +/* This is the parser code that is written into each bison parser + when the %semantic_parser declaration is not specified in the grammar. + It was written by Richard Stallman by simplifying the hairy parser + used when %semantic_parser is specified. */ + +/* Note: there must be only one dollar sign in this file. + It is replaced by the list of actions, each action + as one case of the switch. */ + +#define yyerrok (yyerrstatus = 0) +#define yyclearin (yychar = YYEMPTY) +#define YYEMPTY -2 +#define YYEOF 0 +#define YYFAIL goto yyerrlab; + +#define YYTERROR 1 + +#ifndef YYIMPURE +#define YYLEX yylex() +#endif + +#ifndef YYPURE +#define YYLEX yylex(&yylval, &yylloc) +#endif + +/* If nonreentrant, generate the variables here */ + +#ifndef YYIMPURE + +int yychar; /* the lookahead symbol */ +YYSTYPE yylval; /* the semantic value of the */ + /* lookahead symbol */ + +YYLTYPE yylloc; /* location data for the lookahead */ + /* symbol */ + +int yydebug = 0; /* nonzero means print parse trace */ + +#endif /* YYIMPURE */ + + +/* YYMAXDEPTH indicates the initial size of the parser's stacks */ + +#ifndef YYMAXDEPTH +#define YYMAXDEPTH 200 +#endif + +/* YYMAXLIMIT is the maximum size the stacks can grow to + (effective only if the built-in stack extension method is used). */ + +#ifndef YYMAXLIMIT +#define YYMAXLIMIT 10000 +#endif + + +#line 87 "bison.simple" +int +yyparse() +{ + register int yystate; + register int yyn; + register short *yyssp; + register YYSTYPE *yyvsp; + YYLTYPE *yylsp; + int yyerrstatus; /* number of tokens to shift before error messages enabled */ + int yychar1; /* lookahead token as an internal (translated) token number */ + + short yyssa[YYMAXDEPTH]; /* the state stack */ + YYSTYPE yyvsa[YYMAXDEPTH]; /* the semantic value stack */ + YYLTYPE yylsa[YYMAXDEPTH]; /* the location stack */ + + short *yyss = yyssa; /* refer to the stacks thru separate pointers */ + YYSTYPE *yyvs = yyvsa; /* to allow yyoverflow to reallocate them elsewhere */ + YYLTYPE *yyls = yylsa; + + int yymaxdepth = YYMAXDEPTH; + +#ifndef YYPURE + + int yychar; + YYSTYPE yylval; + YYLTYPE yylloc; + + extern int yydebug; + +#endif + + + YYSTYPE yyval; /* the variable used to return */ + /* semantic values from the action */ + /* routines */ + + int yylen; + + if (yydebug) + fprintf(stderr, "Starting parse\n"); + + yystate = 0; + yyerrstatus = 0; + yychar = YYEMPTY; /* Cause a token to be read. */ + + /* Initialize stack pointers. + Waste one element of value and location stack + so that they stay on the same level as the state stack. */ + + yyssp = yyss - 1; + yyvsp = yyvs; + yylsp = yyls; + +/* Push a new state, which is found in yystate . */ +/* In all cases, when you get here, the value and location stacks + have just been pushed. so pushing a state here evens the stacks. */ +yynewstate: + + *++yyssp = yystate; + + if (yyssp >= yyss + yymaxdepth - 1) + { + /* Give user a chance to reallocate the stack */ + /* Use copies of these so that the &'s don't force the real ones into memory. */ + YYSTYPE *yyvs1 = yyvs; + YYLTYPE *yyls1 = yyls; + short *yyss1 = yyss; + + /* Get the current used size of the three stacks, in elements. */ + int size = yyssp - yyss + 1; + +#ifdef yyoverflow + /* Each stack pointer address is followed by the size of + the data in use in that stack, in bytes. */ + yyoverflow("parser stack overflow", + &yyss1, size * sizeof (*yyssp), + &yyvs1, size * sizeof (*yyvsp), + &yyls1, size * sizeof (*yylsp), + &yymaxdepth); + + yyss = yyss1; yyvs = yyvs1; yyls = yyls1; +#else /* no yyoverflow */ + /* Extend the stack our own way. */ + if (yymaxdepth >= YYMAXLIMIT) + yyerror("parser stack overflow"); + yymaxdepth *= 2; + if (yymaxdepth > YYMAXLIMIT) + yymaxdepth = YYMAXLIMIT; + yyss = (short *) alloca (yymaxdepth * sizeof (*yyssp)); + bcopy ((char *)yyss1, (char *)yyss, size * sizeof (*yyssp)); + yyls = (YYLTYPE *) alloca (yymaxdepth * sizeof (*yylsp)); + bcopy ((char *)yyls1, (char *)yyls, size * sizeof (*yylsp)); + yyvs = (YYSTYPE *) alloca (yymaxdepth * sizeof (*yyvsp)); + bcopy ((char *)yyvs1, (char *)yyvs, size * sizeof (*yyvsp)); +#endif /* no yyoverflow */ + + yyssp = yyss + size - 1; + yylsp = yyls + size - 1; + yyvsp = yyvs + size - 1; + + if (yydebug) + fprintf(stderr, "Stack size increased to %d\n", yymaxdepth); + + if (yyssp >= yyss + yymaxdepth - 1) + YYERROR; + } + + if (yydebug) + fprintf(stderr, "Entering state %d\n", yystate); + +/* Do appropriate processing given the current state. */ +/* Read a lookahead token if we need one and don't already have one. */ +yyresume: + + /* First try to decide what to do without reference to lookahead token. */ + + yyn = yypact[yystate]; + if (yyn == YYFLAG) + goto yydefault; + + /* Not known => get a lookahead token if don't already have one. */ + + /* yychar is either YYEMPTY or YYEOF + or a valid token in external form. */ + + if (yychar == YYEMPTY) + { + yychar = YYLEX; + } + + /* Convert token to internal form (in yychar1) for indexing tables with */ + + if (yychar <= 0) /* This means end of input. */ + { + yychar1 = 0; + yychar = YYEOF; /* Don't call YYLEX any more */ + + if (yydebug) + fprintf(stderr, "Now at end of input.\n"); + } + else + { + yychar1 = YYTRANSLATE(yychar); + + if (yydebug) + fprintf(stderr, "Parsing next token; it is %d (%s)\n", yychar, yytname[yychar1]); + } + + yyn += yychar1; + if (yyn < 0 || yyn > YYLAST || yycheck[yyn] != yychar1) + goto yydefault; + + yyn = yytable[yyn]; + + /* yyn is what to do for this token type in this state. + Negative => reduce, -yyn is rule number. + Positive => shift, yyn is new state. + New state is final state => don't bother to shift, + just return success. + 0, or most negative number => error. */ + + if (yyn < 0) + { + if (yyn == YYFLAG) + goto yyerrlab; + yyn = -yyn; + goto yyreduce; + } + else if (yyn == 0) + goto yyerrlab; + + if (yyn == YYFINAL) + YYACCEPT; + + /* Shift the lookahead token. */ + + if (yydebug) + fprintf(stderr, "Shifting token %d (%s), ", yychar, yytname[yychar1]); + + /* Discard the token being shifted unless it is eof. */ + if (yychar != YYEOF) + yychar = YYEMPTY; + + *++yyvsp = yylval; + *++yylsp = yylloc; + + /* count tokens shifted since error; after three, turn off error status. */ + if (yyerrstatus) yyerrstatus--; + + yystate = yyn; + goto yynewstate; + +/* Do the default action for the current state. */ +yydefault: + + yyn = yydefact[yystate]; + if (yyn == 0) + goto yyerrlab; + +/* Do a reduction. yyn is the number of a rule to reduce with. */ +yyreduce: + yylen = yyr2[yyn]; + yyval = yyvsp[1-yylen]; /* implement default value of the action */ + + if (yydebug) + { + if (yylen == 1) + fprintf (stderr, "Reducing 1 value via line %d, ", + yyrline[yyn]); + else + fprintf (stderr, "Reducing %d values via line %d, ", + yylen, yyrline[yyn]); + } + + + switch (yyn) { + +case 1: +#line 106 "awk.y" +{ expression_value = yyvsp[0].nodeval; ; + break;} +case 2: +#line 111 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_rule_list,(NODE *) NULL); ; + break;} +case 3: +#line 114 "awk.y" +{ yyval.nodeval = append_right (yyvsp[-1].nodeval, node(yyvsp[0].nodeval, Node_rule_list,(NODE *) NULL)); ; + break;} +case 4: +#line 118 "awk.y" +{ yyval.nodeval = node (yyvsp[-3].nodeval, Node_rule_node, yyvsp[-2].nodeval); ; + break;} +case 5: +#line 123 "awk.y" +{ yyval.nodeval = NULL; ; + break;} +case 6: +#line 125 "awk.y" +{ yyval.nodeval = yyvsp[0].nodeval; ; + break;} +case 7: +#line 127 "awk.y" +{ yyval.nodeval = mkrangenode ( node(yyvsp[-2].nodeval, Node_cond_pair, yyvsp[0].nodeval) ); ; + break;} +case 8: +#line 133 "awk.y" +{ yyval.nodeval = node ((NODE *)NULL, Node_K_BEGIN,(NODE *) NULL); ; + break;} +case 9: +#line 135 "awk.y" +{ yyval.nodeval = node ((NODE *)NULL, Node_K_END,(NODE *) NULL); ; + break;} +case 10: +#line 137 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_not,(NODE *) NULL); ; + break;} +case 11: +#line 139 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_and, yyvsp[0].nodeval); ; + break;} +case 12: +#line 141 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_or, yyvsp[0].nodeval); ; + break;} +case 13: +#line 143 "awk.y" +{ + yyval.nodeval = yyvsp[-1].nodeval; + want_concat_token = 0; + ; + break;} +case 14: +#line 152 "awk.y" +{ ++want_regexp; ; + break;} +case 15: +#line 154 "awk.y" +{ want_regexp = 0; + yyval.nodeval = node (node (make_number ((AWKNUM)0), Node_field_spec, (NODE *)NULL), + Node_match, (NODE *)make_regexp (yyvsp[-1].sval)); + ; + break;} +case 16: +#line 159 "awk.y" +{ ++want_regexp; ; + break;} +case 17: +#line 161 "awk.y" +{ want_regexp = 0; + yyval.nodeval = node (yyvsp[-5].nodeval, yyvsp[-4].nodetypeval, (NODE *)make_regexp(yyvsp[-1].sval)); + ; + break;} +case 18: +#line 165 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, yyvsp[-1].nodetypeval, yyvsp[0].nodeval); ; + break;} +case 19: +#line 167 "awk.y" +{ yyval.nodeval = yyvsp[0].nodeval; ; + break;} +case 20: +#line 172 "awk.y" +{ yyval.nodeval = NULL; ; + break;} +case 21: +#line 174 "awk.y" +{ yyval.nodeval = yyvsp[-1].nodeval; ; + break;} +case 22: +#line 179 "awk.y" +{ yyval.nodeval = NULL; ; + break;} +case 23: +#line 181 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_statement_list, (NODE *)NULL); ; + break;} +case 24: +#line 183 "awk.y" +{ yyval.nodeval = append_right(yyvsp[-1].nodeval, node( yyvsp[0].nodeval, Node_statement_list, (NODE *)NULL)); ; + break;} +case 25: +#line 188 "awk.y" +{ yyval.nodetypeval = Node_illegal; ; + break;} +case 26: +#line 190 "awk.y" +{ yyval.nodetypeval = Node_illegal; ; + break;} +case 27: +#line 195 "awk.y" +{ yyval.nodetypeval = Node_illegal; ; + break;} +case 32: +#line 203 "awk.y" +{ yyval.nodeval = yyvsp[-2].nodeval; ; + break;} +case 33: +#line 205 "awk.y" +{ yyval.nodeval = yyvsp[0].nodeval; ; + break;} +case 34: +#line 207 "awk.y" +{ yyval.nodeval = node (yyvsp[-3].nodeval, Node_K_while, yyvsp[0].nodeval); ; + break;} +case 35: +#line 209 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_K_for, (NODE *)make_for_loop (yyvsp[-7].nodeval, yyvsp[-5].nodeval, yyvsp[-3].nodeval)); ; + break;} +case 36: +#line 211 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_K_for, (NODE *)make_for_loop (yyvsp[-6].nodeval, (NODE *)NULL, yyvsp[-3].nodeval)); ; + break;} +case 37: +#line 213 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_K_arrayfor, (NODE *)make_for_loop(variable(yyvsp[-6].sval), (NODE *)NULL, variable(yyvsp[-3].sval))); ; + break;} +case 38: +#line 216 "awk.y" +{ yyval.nodeval = node ((NODE *)NULL, Node_K_break, (NODE *)NULL); ; + break;} +case 39: +#line 219 "awk.y" +{ yyval.nodeval = node ((NODE *)NULL, Node_K_continue, (NODE *)NULL); ; + break;} +case 40: +#line 221 "awk.y" +{ ++want_redirect; ; + break;} +case 41: +#line 223 "awk.y" +{ + want_redirect = 0; + /* $4->lnode = NULL; */ + yyval.nodeval = node (yyvsp[-2].nodeval, Node_K_print, yyvsp[-1].nodeval); + ; + break;} +case 42: +#line 229 "awk.y" +{ ++want_redirect; ; + break;} +case 43: +#line 231 "awk.y" +{ + want_redirect = 0; + /* $4->lnode = NULL; */ + yyval.nodeval = node (yyvsp[-2].nodeval, Node_K_printf, yyvsp[-1].nodeval); + ; + break;} +case 44: +#line 237 "awk.y" +{ ++want_redirect; + want_concat_token = 0; ; + break;} +case 45: +#line 240 "awk.y" +{ + want_redirect = 0; + yyval.nodeval = node (yyvsp[-4].nodeval, Node_K_printf, yyvsp[-1].nodeval); + ; + break;} +case 46: +#line 245 "awk.y" +{ yyval.nodeval = node ((NODE *)NULL, Node_K_next, (NODE *)NULL); ; + break;} +case 47: +#line 247 "awk.y" +{ yyval.nodeval = node ((NODE *)NULL, Node_K_exit, (NODE *)NULL); ; + break;} +case 48: +#line 249 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_K_exit, (NODE *)NULL); ; + break;} +case 49: +#line 251 "awk.y" +{ yyval.nodeval = yyvsp[-1].nodeval; ; + break;} +case 50: +#line 257 "awk.y" +{ yyval.nodeval = node (yyvsp[-3].nodeval, Node_K_if, + node (yyvsp[0].nodeval, Node_if_branches, (NODE *)NULL)); ; + break;} +case 51: +#line 261 "awk.y" +{ yyval.nodeval = node (yyvsp[-6].nodeval, Node_K_if, + node (yyvsp[-3].nodeval, Node_if_branches, yyvsp[0].nodeval)); ; + break;} +case 53: +#line 268 "awk.y" +{ yyval.nodetypeval = Node_illegal; ; + break;} +case 54: +#line 273 "awk.y" +{ yyval.nodeval = NULL; /* node (NULL, Node_redirect_nil, NULL); */ ; + break;} +case 55: +#line 277 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, yyvsp[-1].nodetypeval, (NODE *)NULL); ; + break;} +case 56: +#line 283 "awk.y" +{ yyval.nodeval = NULL; /* node(NULL, Node_builtin, NULL); */ ; + break;} +case 57: +#line 285 "awk.y" +{ yyval.nodeval = yyvsp[0].nodeval; ; + break;} +case 58: +#line 290 "awk.y" +{ yyval.nodeval = NULL; ; + break;} +case 59: +#line 292 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_expression_list, (NODE *)NULL); ; + break;} +case 60: +#line 294 "awk.y" +{ yyval.nodeval = append_right(yyvsp[-2].nodeval, node( yyvsp[0].nodeval, Node_expression_list, (NODE *)NULL)); ; + break;} +case 61: +#line 300 "awk.y" +{ yyval.nodeval = snode (yyvsp[-1].nodeval, Node_builtin, yyvsp[-3].ptrval); ; + break;} +case 62: +#line 302 "awk.y" +{ yyval.nodeval = snode ((NODE *)NULL, Node_builtin, yyvsp[0].ptrval); ; + break;} +case 63: +#line 304 "awk.y" +{ yyval.nodeval = yyvsp[-1].nodeval; ; + break;} +case 64: +#line 306 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_unary_minus, (NODE *)NULL); ; + break;} +case 65: +#line 308 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_preincrement, (NODE *)NULL); ; + break;} +case 66: +#line 310 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_predecrement, (NODE *)NULL); ; + break;} +case 67: +#line 312 "awk.y" +{ yyval.nodeval = node (yyvsp[-1].nodeval, Node_postincrement, (NODE *)NULL); ; + break;} +case 68: +#line 314 "awk.y" +{ yyval.nodeval = node (yyvsp[-1].nodeval, Node_postdecrement, (NODE *)NULL); ; + break;} +case 69: +#line 316 "awk.y" +{ yyval.nodeval = yyvsp[0].nodeval; ; + break;} +case 70: +#line 318 "awk.y" +{ yyval.nodeval = make_number (yyvsp[0].fval); ; + break;} +case 71: +#line 320 "awk.y" +{ yyval.nodeval = make_string (yyvsp[0].sval, -1); ; + break;} +case 72: +#line 324 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_times, yyvsp[0].nodeval); ; + break;} +case 73: +#line 326 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_quotient, yyvsp[0].nodeval); ; + break;} +case 74: +#line 328 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_mod, yyvsp[0].nodeval); ; + break;} +case 75: +#line 330 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_plus, yyvsp[0].nodeval); ; + break;} +case 76: +#line 332 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_minus, yyvsp[0].nodeval); ; + break;} +case 77: +#line 335 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_concat, yyvsp[0].nodeval); ; + break;} +case 78: +#line 337 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, yyvsp[-1].nodetypeval, yyvsp[0].nodeval); ; + break;} +case 79: +#line 341 "awk.y" +{ yyval.nodeval = snode (yyvsp[-1].nodeval, Node_builtin, yyvsp[-3].ptrval); ; + break;} +case 80: +#line 343 "awk.y" +{ yyval.nodeval = snode ((NODE *)NULL, Node_builtin, yyvsp[0].ptrval); ; + break;} +case 81: +#line 345 "awk.y" +{ yyval.nodeval = yyvsp[-1].nodeval; ; + break;} +case 82: +#line 347 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_unary_minus, (NODE *)NULL); ; + break;} +case 83: +#line 349 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_preincrement, (NODE *)NULL); ; + break;} +case 84: +#line 351 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_predecrement, (NODE *)NULL); ; + break;} +case 85: +#line 353 "awk.y" +{ yyval.nodeval = node (yyvsp[-1].nodeval, Node_postincrement, (NODE *)NULL); ; + break;} +case 86: +#line 355 "awk.y" +{ yyval.nodeval = node (yyvsp[-1].nodeval, Node_postdecrement, (NODE *)NULL); ; + break;} +case 87: +#line 357 "awk.y" +{ yyval.nodeval = yyvsp[0].nodeval; ; + break;} +case 88: +#line 359 "awk.y" +{ yyval.nodeval = make_number (yyvsp[0].fval); ; + break;} +case 89: +#line 361 "awk.y" +{ yyval.nodeval = make_string (yyvsp[0].sval, -1); ; + break;} +case 90: +#line 365 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_times, yyvsp[0].nodeval); ; + break;} +case 91: +#line 367 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_quotient, yyvsp[0].nodeval); ; + break;} +case 92: +#line 369 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_mod, yyvsp[0].nodeval); ; + break;} +case 93: +#line 371 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_plus, yyvsp[0].nodeval); ; + break;} +case 94: +#line 373 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_minus, yyvsp[0].nodeval); ; + break;} +case 95: +#line 376 "awk.y" +{ yyval.nodeval = node (yyvsp[-2].nodeval, Node_concat, yyvsp[0].nodeval); ; + break;} +case 96: +#line 381 "awk.y" +{ yyval.nodeval = variable (yyvsp[0].sval); ; + break;} +case 97: +#line 383 "awk.y" +{ yyval.nodeval = node (variable(yyvsp[-3].sval), Node_subscript, yyvsp[-1].nodeval); ; + break;} +case 98: +#line 385 "awk.y" +{ yyval.nodeval = node (yyvsp[0].nodeval, Node_field_spec, (NODE *)NULL); ; + break;} +} + /* the action file gets copied in in place of this dollarsign */ +#line 303 "bison.simple" + + yyvsp -= yylen; + yylsp -= yylen; + yyssp -= yylen; + + if (yydebug) + { + short *ssp1 = yyss - 1; + fprintf (stderr, "state stack now", yyssp-yyss); + while (ssp1 != yyssp) + fprintf (stderr, " %d", *++ssp1); + fprintf (stderr, "\n"); + } + + *++yyvsp = yyval; + + yylsp++; + if (yylen == 0) + { + yylsp->first_line = yylloc.first_line; + yylsp->first_column = yylloc.first_column; + yylsp->last_line = (yylsp-1)->last_line; + yylsp->last_column = (yylsp-1)->last_column; + yylsp->text = 0; + } + else + { + yylsp->last_line = (yylsp+yylen-1)->last_line; + yylsp->last_column = (yylsp+yylen-1)->last_column; + } + + /* Now "shift" the result of the reduction. + Determine what state that goes to, + based on the state we popped back to + and the rule number reduced by. */ + + yyn = yyr1[yyn]; + + yystate = yypgoto[yyn - YYNTBASE] + *yyssp; + if (yystate >= 0 && yystate <= YYLAST && yycheck[yystate] == *yyssp) + yystate = yytable[yystate]; + else + yystate = yydefgoto[yyn - YYNTBASE]; + + goto yynewstate; + +yyerrlab: /* here on detecting error */ + + if (! yyerrstatus) + /* If not already recovering from an error, report this error. */ + { + yyerror("parse error"); + } + + if (yyerrstatus == 3) + { + /* if just tried and failed to reuse lookahead token after an error, discard it. */ + + /* return failure if at end of input */ + if (yychar == YYEOF) + YYERROR; + + if (yydebug) + fprintf(stderr, "Discarding token %d (%s).\n", yychar, yytname[yychar1]); + + yychar = YYEMPTY; + } + + /* Else will try to reuse lookahead token + after shifting the error token. */ + + yyerrstatus = 3; /* Each real token shifted decrements this */ + + goto yyerrhandle; + +yyerrdefault: /* current state does not do anything special for the error token. */ + +#if 0 + /* This is wrong; only states that explicitly want error tokens + should shift them. */ + yyn = yydefact[yystate]; /* If its default is to accept any token, ok. Otherwise pop it.*/ + if (yyn) goto yydefault; +#endif + +yyerrpop: /* pop the current state because it cannot handle the error token */ + + if (yyssp == yyss) YYERROR; + yyvsp--; + yylsp--; + yystate = *--yyssp; + + if (yydebug) + { + short *ssp1 = yyss - 1; + fprintf (stderr, "Error: state stack now", yyssp-yyss); + while (ssp1 != yyssp) + fprintf (stderr, " %d", *++ssp1); + fprintf (stderr, "\n"); + } + +yyerrhandle: + + yyn = yypact[yystate]; + if (yyn == YYFLAG) + goto yyerrdefault; + + yyn += YYTERROR; + if (yyn < 0 || yyn > YYLAST || yycheck[yyn] != YYTERROR) + goto yyerrdefault; + + yyn = yytable[yyn]; + if (yyn < 0) + { + if (yyn == YYFLAG) + goto yyerrpop; + yyn = -yyn; + goto yyreduce; + } + else if (yyn == 0) + goto yyerrpop; + + if (yyn == YYFINAL) + YYACCEPT; + + if (yydebug) + fprintf(stderr, "Shifting error token, "); + + *++yyvsp = yylval; + *++yylsp = yylloc; + + yystate = yyn; + goto yynewstate; +} +#line 388 "awk.y" + + + +struct token { + char *operator; + NODETYPE value; + int class; + NODE *(*ptr)(); +}; + +#define NULL 0 + +NODE *do_exp(), *do_getline(), *do_index(), *do_length(), + *do_sqrt(), *do_log(), *do_sprintf(), *do_substr(), + *do_split(), *do_int(); + + /* Special functions for debugging */ +#ifndef FAST +NODE *do_prvars(), *do_bp(); +#endif + +/* Tokentab is sorted ascii ascending order, so it can be binary searched. */ +/* (later. Right now its just sort of linear search (SLOW!!) */ + +static struct token tokentab[] = { + {"BEGIN", Node_illegal, LEX_BEGIN, 0}, + {"END", Node_illegal, LEX_END, 0}, +#ifndef FAST + {"bp", Node_builtin, LEX_BUILTIN, do_bp}, +#endif + {"break", Node_K_break, LEX_BREAK, 0}, + {"continue", Node_K_continue, LEX_CONTINUE, 0}, + {"else", Node_illegal, LEX_ELSE, 0}, + {"exit", Node_K_exit, LEX_EXIT, 0}, + {"exp", Node_builtin, LEX_BUILTIN, do_exp}, + {"for", Node_K_for, LEX_FOR, 0}, + {"getline", Node_builtin, LEX_BUILTIN, do_getline}, + {"if", Node_K_if, LEX_IF, 0}, + {"in", Node_illegal, LEX_IN, 0}, + {"index", Node_builtin, LEX_BUILTIN, do_index}, + {"int", Node_builtin, LEX_BUILTIN, do_int}, + {"length", Node_builtin, LEX_BUILTIN, do_length}, + {"log", Node_builtin, LEX_BUILTIN, do_log}, + {"next", Node_K_next, LEX_NEXT, 0}, + {"print", Node_K_print, LEX_PRINT, 0}, + {"printf", Node_K_printf, LEX_PRINTF, 0}, +#ifndef FAST + {"prvars", Node_builtin, LEX_BUILTIN, do_prvars}, +#endif + {"split", Node_builtin, LEX_BUILTIN, do_split}, + {"sprintf", Node_builtin, LEX_BUILTIN, do_sprintf}, + {"sqrt", Node_builtin, LEX_BUILTIN, do_sqrt}, + {"substr", Node_builtin, LEX_BUILTIN, do_substr}, + {"while", Node_K_while, LEX_WHILE, 0}, + {NULL, Node_illegal, ERROR, 0} +}; + +/* Read one token, getting characters through lexptr. */ + +static int +yylex () +{ + register int c; + register int namelen; + register char *tokstart; + register struct token *toktab; + double atof(); /* JF know what happens if you forget this? */ + + + static did_newline = 0; /* JF the grammar insists that actions end + with newlines. This was easier than hacking + the grammar. */ + int do_concat; + + int seen_e = 0; /* These are for numbers */ + int seen_point = 0; + + retry: + + if(!lexptr) + return 0; + + if (want_regexp) { + want_regexp = 0; + /* there is a potential bug if a regexp is followed by an equal sign: + "/foo/=bar" would result in assign_quotient being returned as the + next token. Nothing is done about it since it is not valid awk, + but maybe something should be done anyway. */ + + tokstart = lexptr; + while (c = *lexptr++) { + switch (c) { + case '\\': + if (*lexptr++ == '\0') { + yyerror ("unterminated regexp ends with \\"); + return ERROR; + } + break; + case '/': /* end of the regexp */ + lexptr--; + yylval.sval = tokstart; + return REGEXP; + case '\n': + case '\0': + yyerror ("unterminated regexp"); + return ERROR; + } + } + } + do_concat=want_concat_token; + want_concat_token=0; + + if(*lexptr=='\0') { + lexptr=0; + return NEWLINE; + } + + /* if lexptr is at white space between two terminal tokens or parens, + it is a concatenation operator. */ + if(do_concat && (*lexptr==' ' || *lexptr=='\t')) { + while (*lexptr == ' ' || *lexptr == '\t') + lexptr++; + if (isalnum(*lexptr) || *lexptr == '\"' || *lexptr == '(' + || *lexptr == '.' || *lexptr == '$') /* the '.' is for decimal pt */ + return CONCAT_OP; + } + + while (*lexptr == ' ' || *lexptr == '\t') + lexptr++; + + tokstart = lexptr; /* JF */ + + switch (c = *lexptr++) { + case 0: + return 0; + + case '\n': + lineno++; + return NEWLINE; + + case '#': /* it's a comment */ + while (*lexptr != '\n' && *lexptr != '\0') + lexptr++; + goto retry; + + case '\\': + if(*lexptr=='\n') { + lexptr++; + goto retry; + } else break; + case ')': + case ']': + ++want_concat_token; + /* fall through */ + case '(': /* JF these were above, but I don't see why they should turn on concat. . . &*/ + case '[': + + case '{': + case ',': /* JF */ + case '$': + case ';': + /* set node type to ILLEGAL because the action should set it to + the right thing */ + yylval.nodetypeval = Node_illegal; + return c; + + case '*': + if(*lexptr=='=') { + yylval.nodetypeval=Node_assign_times; + lexptr++; + return ASSIGNOP; + } + yylval.nodetypeval=Node_illegal; + return c; + + case '/': + if(*lexptr=='=') { + yylval.nodetypeval=Node_assign_quotient; + lexptr++; + return ASSIGNOP; + } + yylval.nodetypeval=Node_illegal; + return c; + + case '%': + if(*lexptr=='=') { + yylval.nodetypeval=Node_assign_mod; + lexptr++; + return ASSIGNOP; + } + yylval.nodetypeval=Node_illegal; + return c; + + case '+': + if(*lexptr=='=') { + yylval.nodetypeval=Node_assign_plus; + lexptr++; + return ASSIGNOP; + } + if(*lexptr=='+') { + yylval.nodetypeval=Node_illegal; + lexptr++; + return INCREMENT; + } + yylval.nodetypeval=Node_illegal; + return c; + + case '!': + if(*lexptr=='=') { + yylval.nodetypeval=Node_notequal; + lexptr++; + return RELOP; + } + if(*lexptr=='~') { + yylval.nodetypeval=Node_nomatch; + lexptr++; + return MATCHOP; + } + yylval.nodetypeval=Node_illegal; + return c; + + case '<': + if(*lexptr=='=') { + yylval.nodetypeval=Node_leq; + lexptr++; + return RELOP; + } + yylval.nodetypeval=Node_less; + return RELOP; + + case '=': + if(*lexptr=='=') { + yylval.nodetypeval=Node_equal; + lexptr++; + return RELOP; + } + yylval.nodetypeval=Node_assign; + return ASSIGNOP; + + case '>': + if(want_redirect) { + if (*lexptr == '>') { + yylval.nodetypeval = Node_redirect_append; + lexptr++; + } else + yylval.nodetypeval = Node_redirect_output; + return REDIRECT_OP; + } + if(*lexptr=='=') { + yylval.nodetypeval=Node_geq; + lexptr++; + return RELOP; + } + yylval.nodetypeval=Node_greater; + return RELOP; + + case '~': + yylval.nodetypeval=Node_match; + return MATCHOP; + + case '}': /* JF added did newline stuff. Easier than hacking the grammar */ + if(did_newline) { + did_newline=0; + return c; + } + did_newline++; + --lexptr; + return NEWLINE; + + case '"': + while (*lexptr != '\0') { + switch (*lexptr++) { + case '\\': + if (*lexptr++ != '\0') + break; + /* fall through */ + case '\n': + yyerror ("unterminated string"); + return ERROR; + case '\"': + yylval.sval = tokstart + 1; /* JF Skip the doublequote */ + ++want_concat_token; + return YSTRING; + } + } + return ERROR; /* JF this was one level up, wrong? */ + + case '-': + if(*lexptr=='=') { + yylval.nodetypeval=Node_assign_minus; + lexptr++; + return ASSIGNOP; + } + if(*lexptr=='-') { + yylval.nodetypeval=Node_illegal; + lexptr++; + return DECREMENT; + } + /* JF I think space tab comma and newline are the legal places for + a UMINUS. Have I missed any? */ + if((!isdigit(*lexptr) && *lexptr!='.') || (lexptr>lexptr_begin+1 && + !index(" \t,\n",lexptr[-2]))) { + /* set node type to ILLEGAL because the action should set it to + the right thing */ + yylval.nodetypeval = Node_illegal; + return c; + } + /* FALL through into number code */ + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + case '.': + /* It's a number */ + if(c=='-') namelen=1; + else namelen=0; + for (; (c = tokstart[namelen]) != '\0'; namelen++) { + switch (c) { + case '.': + if (seen_point) + goto got_number; + ++seen_point; + break; + case 'e': + case 'E': + if (seen_e) + goto got_number; + ++seen_e; + if (tokstart[namelen+1] == '-' || tokstart[namelen+1] == '+') + namelen++; + break; + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + break; + default: + goto got_number; + } + } + +got_number: + lexptr = tokstart + namelen; + yylval.fval = atof(tokstart); + ++want_concat_token; + return NUMBER; + + case '&': + if(*lexptr=='&') { + yylval.nodetypeval=Node_and; + lexptr++; + return LEX_AND; + } + return ERROR; + + case '|': + if(want_redirect) { + lexptr++; + yylval.nodetypeval = Node_redirect_pipe; + return REDIRECT_OP; + } + if(*lexptr=='|') { + yylval.nodetypeval=Node_or; + lexptr++; + return LEX_OR; + } + return ERROR; + } + + if (!isalpha(c)) { + yyerror ("Invalid char '%c' in expression\n", c); + return ERROR; + } + + /* its some type of name-type-thing. Find its length */ + for (namelen = 0; is_identchar(tokstart[namelen]); namelen++) + ; + + + /* See if it is a special token. */ + for (toktab = tokentab; toktab->operator != NULL; toktab++) { + if(*tokstart==toktab->operator[0] && + !strncmp(tokstart,toktab->operator,namelen) && + toktab->operator[namelen]=='\0') { + lexptr=tokstart+namelen; + if(toktab->class == LEX_BUILTIN) + yylval.ptrval = toktab->ptr; + else + yylval.nodetypeval = toktab->value; + return toktab->class; + } + } + + /* It's a name. See how long it is. */ + yylval.sval = tokstart; + lexptr = tokstart+namelen; + ++want_concat_token; + return NAME; +} + +/*VARARGS1*/ +yyerror (mesg,a1,a2,a3,a4,a5,a6,a7,a8) + char *mesg; +{ + register char *ptr,*beg; + + /* Find the current line in the input file */ + if(!lexptr) { + beg="(END OF FILE)"; + ptr=beg+13; + } else { + if (*lexptr == '\n' && lexptr!=lexptr_begin) + --lexptr; + for (beg = lexptr;beg!=lexptr_begin && *beg != '\n';--beg) + ; + for (ptr = lexptr;*ptr && *ptr != '\n';ptr++) /*jfw: NL isn't guaranteed*/ + ; + if(beg!=lexptr_begin) + beg++; + } + fprintf (stderr, "Error near line %d, '%.*s'\n",lineno, ptr-beg, beg); + /* figure out line number, etc. later */ + fprintf (stderr, mesg, a1, a2, a3, a4, a5, a6, a7, a8); + fprintf (stderr,"\n"); + exit (1); +} + +/* Parse a C escape sequence. STRING_PTR points to a variable + containing a pointer to the string to parse. That pointer + is updated past the characters we use. The value of the + escape sequence is returned. + + A negative value means the sequence \ newline was seen, + which is supposed to be equivalent to nothing at all. + + If \ is followed by a null character, we return a negative + value and leave the string pointer pointing at the null character. + + If \ is followed by 000, we return 0 and leave the string pointer + after the zeros. A value of 0 does not mean end of string. */ + +static int +parse_escape (string_ptr) + char **string_ptr; +{ + register int c = *(*string_ptr)++; + switch (c) + { + case 'a': + return '\a'; + case 'b': + return '\b'; + case 'e': + return 033; + case 'f': + return '\f'; + case 'n': + return '\n'; + case 'r': + return '\r'; + case 't': + return '\t'; + case 'v': + return '\v'; + case '\n': + return -2; + case 0: + (*string_ptr)--; + return 0; + case '^': + c = *(*string_ptr)++; + if (c == '\\') + c = parse_escape (string_ptr); + if (c == '?') + return 0177; + return (c & 0200) | (c & 037); + + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + { + register int i = c - '0'; + register int count = 0; + while (++count < 3) + { + if ((c = *(*string_ptr)++) >= '0' && c <= '7') + { + i *= 8; + i += c - '0'; + } + else + { + (*string_ptr)--; + break; + } + } + return i; + } + default: + return c; + } +} @@ -0,0 +1,898 @@ + +/* + * gawk -- GNU version of awk + * Copyright (C) 1986 Free Software Foundation + * Written by Paul Rubin, August 1986 + * + */ + +/* +GAWK is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY. No author or distributor accepts responsibility to anyone +for the consequences of using it or for whether it serves any +particular purpose or works at all, unless he says so in writing. +Refer to the GAWK General Public License for full details. + +Everyone is granted permission to copy, modify and redistribute GAWK, +but only under the conditions described in the GAWK General Public +License. A copy of this license is supposed to have been given to you +along with GAWK so you can know your rights and responsibilities. It +should be in a file named COPYING. Among other things, the copyright +notice and this notice must be preserved on all copies. + +In other words, go ahead and share GAWK, but don't try to stop +anyone else from sharing it farther. Help stamp out software hoarding! +*/ + +%{ +#define YYDEBUG 12 + +#include <stdio.h> +#include "awk.h" + + static int yylex (); + + + /* + * The following variable is used for a very sickening thing. + * The awk language uses white space as the string concatenation + * operator, but having a white space token that would have to appear + * everywhere in all the grammar rules would be unbearable. + * It turns out we can return CONCAT_OP exactly when there really + * is one, just from knowing what kinds of other tokens it can appear + * between (namely, constants, variables, or close parentheses). + * This is because concatenation has the lowest priority of all + * operators. want_concat_token is used to remember that something + * that could be the left side of a concat has just been returned. + * + * If anyone knows a cleaner way to do this (don't look at the Un*x + * code to find one, though), please suggest it. + */ + static int want_concat_token; + + /* Two more horrible kludges. The same comment applies to these two too */ + static int want_regexp; /* lexical scanning kludge */ + static int want_redirect; /* similarly */ + int lineno = 1; /* JF for error msgs */ + +/* During parsing of a gawk program, the pointer to the next character + is in this variable. */ + char *lexptr; /* JF moved it up here */ + char *lexptr_begin; /* JF for error msgs */ +%} + +%union { + long lval; + AWKNUM fval; + NODE *nodeval; + NODETYPE nodetypeval; + char *sval; + NODE *(*ptrval)(); +} + +%type <nodeval> exp start program rule pattern conditional +%type <nodeval> action variable redirection expression_list +%type <nodeval> statements statement if_statement +%type <nodeval> opt_exp v_exp +%type <nodetypeval> whitespace + +%token <sval> NAME REGEXP YSTRING +%token <lval> ERROR INCDEC +%token <fval> NUMBER +%token <nodetypeval> ASSIGNOP RELOP MATCHOP NEWLINE REDIRECT_OP CONCAT_OP +%token <nodetypeval> LEX_BEGIN LEX_END LEX_IF LEX_ELSE +%token <nodetypeval> LEX_WHILE LEX_FOR LEX_BREAK LEX_CONTINUE +%token <nodetypeval> LEX_PRINT LEX_PRINTF LEX_NEXT LEX_EXIT +%token LEX_IN +%token <lval> LEX_AND LEX_OR INCREMENT DECREMENT +%token <ptrval> LEX_BUILTIN + +/* these are just yylval numbers */ +/* %token <lval> CHAR JF this isn't used anymore */ + +/* Lowest to highest */ +%left LEX_OR +%left LEX_AND +%right ASSIGNOP +%left CONCAT_OP +%left '+' '-' +%left '*' '/' '%' +%right UNARY +%nonassoc MATCHOP RELOP + +%% + +start : optional_newlines program + { expression_value = $2; } + ; + + +program : rule + { $$ = node ($1, Node_rule_list,(NODE *) NULL); } + | program rule + /* cons the rule onto the tail of list */ + { $$ = append_right ($1, node($2, Node_rule_list,(NODE *) NULL)); } + ; + +rule : pattern action NEWLINE optional_newlines + { $$ = node ($1, Node_rule_node, $2); } + ; + + +pattern : /* empty */ + { $$ = NULL; } + | conditional + { $$ = $1; } + | conditional ',' conditional + { $$ = mkrangenode ( node($1, Node_cond_pair, $3) ); } /*jfw*/ + ; + + +conditional : + LEX_BEGIN + { $$ = node ((NODE *)NULL, Node_K_BEGIN,(NODE *) NULL); } + | LEX_END + { $$ = node ((NODE *)NULL, Node_K_END,(NODE *) NULL); } + | '!' conditional %prec UNARY + { $$ = node ($2, Node_not,(NODE *) NULL); } + | conditional LEX_AND conditional + { $$ = node ($1, Node_and, $3); } + | conditional LEX_OR conditional + { $$ = node ($1, Node_or, $3); } + | '(' conditional ')' + { + $$ = $2; + want_concat_token = 0; + } + + /* In these rules, want_regexp tells yylex that the next thing + is a regexp so it should read up to the closing slash. */ + + | '/' + { ++want_regexp; } + REGEXP '/' + { want_regexp = 0; + $$ = node (node (make_number ((AWKNUM)0), Node_field_spec, (NODE *)NULL), + Node_match, (NODE *)make_regexp ($3)); + } + | exp MATCHOP '/' + { ++want_regexp; } + REGEXP '/' + { want_regexp = 0; + $$ = node ($1, $2, (NODE *)make_regexp($5)); + } + | exp RELOP exp + { $$ = node ($1, $2, $3); } + | exp /* JF */ + { $$ = $1; } + ; + + +action : /* empty */ + { $$ = NULL; } + | '{' whitespace statements '}' + { $$ = $3; } + ; + + +statements : /* EMPTY */ + { $$ = NULL; } + | statement + { $$ = node ($1, Node_statement_list, (NODE *)NULL); } + | statements statement + { $$ = append_right($1, node( $2, Node_statement_list, (NODE *)NULL)); } + ; + +statement_term : + NEWLINE optional_newlines + { $<nodetypeval>$ = Node_illegal; } + | ';' optional_newlines + { $<nodetypeval>$ = Node_illegal; } + ; + +whitespace : + /* blank */ + { $$ = Node_illegal; } + | CONCAT_OP + | NEWLINE + | whitespace CONCAT_OP + | whitespace NEWLINE + ; +statement : + '{' whitespace statements '}' whitespace + { $$ = $3; } + | if_statement + { $$ = $1; } + | LEX_WHILE '(' conditional ')' whitespace statement + { $$ = node ($3, Node_K_while, $6); } + | LEX_FOR '(' opt_exp ';' conditional ';' opt_exp ')' whitespace statement + { $$ = node ($10, Node_K_for, (NODE *)make_for_loop ($3, $5, $7)); } + | LEX_FOR '(' opt_exp ';' ';' opt_exp ')' whitespace statement + { $$ = node ($9, Node_K_for, (NODE *)make_for_loop ($3, (NODE *)NULL, $6)); } + | LEX_FOR '(' NAME CONCAT_OP LEX_IN NAME ')' whitespace statement + { $$ = node ($9, Node_K_arrayfor, (NODE *)make_for_loop(variable($3), (NODE *)NULL, variable($6))); } + | LEX_BREAK statement_term + /* for break, maybe we'll have to remember where to break to */ + { $$ = node ((NODE *)NULL, Node_K_break, (NODE *)NULL); } + | LEX_CONTINUE statement_term + /* similarly */ + { $$ = node ((NODE *)NULL, Node_K_continue, (NODE *)NULL); } + | LEX_PRINT + { ++want_redirect; } + expression_list redirection statement_term + { + want_redirect = 0; + /* $4->lnode = NULL; */ + $$ = node ($3, Node_K_print, $4); + } + | LEX_PRINTF + { ++want_redirect; } + expression_list redirection statement_term + { + want_redirect = 0; + /* $4->lnode = NULL; */ + $$ = node ($3, Node_K_printf, $4); + } + | LEX_PRINTF '(' expression_list ')' + { ++want_redirect; + want_concat_token = 0; } + redirection statement_term + { + want_redirect = 0; + $$ = node ($3, Node_K_printf, $6); + } + | LEX_NEXT statement_term + { $$ = node ((NODE *)NULL, Node_K_next, (NODE *)NULL); } + | LEX_EXIT statement_term + { $$ = node ((NODE *)NULL, Node_K_exit, (NODE *)NULL); } + | LEX_EXIT '(' exp ')' statement_term + { $$ = node ($3, Node_K_exit, (NODE *)NULL); } + | exp statement_term + { $$ = $1; } + ; + + +if_statement: + LEX_IF '(' conditional ')' whitespace statement + { $$ = node ($3, Node_K_if, + node ($6, Node_if_branches, (NODE *)NULL)); } + | LEX_IF '(' conditional ')' whitespace statement + LEX_ELSE whitespace statement + { $$ = node ($3, Node_K_if, + node ($6, Node_if_branches, $9)); } + ; + +optional_newlines : + /* empty */ + | optional_newlines NEWLINE + { $<nodetypeval>$ = Node_illegal; } + ; + +redirection : + /* empty */ + { $$ = NULL; /* node (NULL, Node_redirect_nil, NULL); */ } + /* | REDIRECT_OP NAME + { $$ = node ($2, $1, NULL); } */ + | REDIRECT_OP exp + { $$ = node ($2, $1, (NODE *)NULL); } + ; + + +/* optional expression, as in for loop */ +opt_exp : + { $$ = NULL; /* node(NULL, Node_builtin, NULL); */ } + | exp + { $$ = $1; } + ; + +expression_list : + /* empty */ + { $$ = NULL; } + | exp + { $$ = node ($1, Node_expression_list, (NODE *)NULL); } + | expression_list ',' exp + { $$ = append_right($1, node( $3, Node_expression_list, (NODE *)NULL)); } + ; + + +/* Expressions, not including the comma operator. */ +exp : LEX_BUILTIN '(' expression_list ')' + { $$ = snode ($3, Node_builtin, $1); } + | LEX_BUILTIN + { $$ = snode ((NODE *)NULL, Node_builtin, $1); } + | '(' exp ')' + { $$ = $2; } + | '-' exp %prec UNARY + { $$ = node ($2, Node_unary_minus, (NODE *)NULL); } + | INCREMENT variable %prec UNARY + { $$ = node ($2, Node_preincrement, (NODE *)NULL); } + | DECREMENT variable %prec UNARY + { $$ = node ($2, Node_predecrement, (NODE *)NULL); } + | variable INCREMENT %prec UNARY + { $$ = node ($1, Node_postincrement, (NODE *)NULL); } + | variable DECREMENT %prec UNARY + { $$ = node ($1, Node_postdecrement, (NODE *)NULL); } + | variable + { $$ = $1; } /* JF was variable($1) */ + | NUMBER + { $$ = make_number ($1); } + | YSTRING + { $$ = make_string ($1, -1); } + +/* Binary operators in order of decreasing precedence. */ + | exp '*' exp + { $$ = node ($1, Node_times, $3); } + | exp '/' exp + { $$ = node ($1, Node_quotient, $3); } + | exp '%' exp + { $$ = node ($1, Node_mod, $3); } + | exp '+' exp + { $$ = node ($1, Node_plus, $3); } + | exp '-' exp + { $$ = node ($1, Node_minus, $3); } + /* Empty operator. See yylex for disgusting details. */ + | exp CONCAT_OP exp + { $$ = node ($1, Node_concat, $3); } + | variable ASSIGNOP exp + { $$ = node ($1, $2, $3); } + ; + +v_exp : LEX_BUILTIN '(' expression_list ')' + { $$ = snode ($3, Node_builtin, $1); } + | LEX_BUILTIN + { $$ = snode ((NODE *)NULL, Node_builtin, $1); } + | '(' exp ')' + { $$ = $2; } + | '-' exp %prec UNARY + { $$ = node ($2, Node_unary_minus, (NODE *)NULL); } + | INCREMENT variable %prec UNARY + { $$ = node ($2, Node_preincrement, (NODE *)NULL); } + | DECREMENT variable %prec UNARY + { $$ = node ($2, Node_predecrement, (NODE *)NULL); } + | variable INCREMENT %prec UNARY + { $$ = node ($1, Node_postincrement, (NODE *)NULL); } + | variable DECREMENT %prec UNARY + { $$ = node ($1, Node_postdecrement, (NODE *)NULL); } + | variable + { $$ = $1; } /* JF was variable($1) */ + | NUMBER + { $$ = make_number ($1); } + | YSTRING + { $$ = make_string ($1, -1); } + +/* Binary operators in order of decreasing precedence. */ + | v_exp '*' exp + { $$ = node ($1, Node_times, $3); } + | v_exp '/' exp + { $$ = node ($1, Node_quotient, $3); } + | v_exp '%' exp + { $$ = node ($1, Node_mod, $3); } + | v_exp '+' exp + { $$ = node ($1, Node_plus, $3); } + | v_exp '-' exp + { $$ = node ($1, Node_minus, $3); } + /* Empty operator. See yylex for disgusting details. */ + | v_exp CONCAT_OP exp + { $$ = node ($1, Node_concat, $3); } + ; + +variable : + NAME + { $$ = variable ($1); } + | NAME '[' exp ']' + { $$ = node (variable($1), Node_subscript, $3); } + | '$' v_exp %prec UNARY + { $$ = node ($2, Node_field_spec, (NODE *)NULL); } + ; + +%% + + +struct token { + char *operator; + NODETYPE value; + int class; + NODE *(*ptr)(); +}; + +#define NULL 0 + +NODE *do_exp(), *do_getline(), *do_index(), *do_length(), + *do_sqrt(), *do_log(), *do_sprintf(), *do_substr(), + *do_split(), *do_int(); + + /* Special functions for debugging */ +#ifndef FAST +NODE *do_prvars(), *do_bp(); +#endif + +/* Tokentab is sorted ascii ascending order, so it can be binary searched. */ +/* (later. Right now its just sort of linear search (SLOW!!) */ + +static struct token tokentab[] = { + {"BEGIN", Node_illegal, LEX_BEGIN, 0}, + {"END", Node_illegal, LEX_END, 0}, +#ifndef FAST + {"bp", Node_builtin, LEX_BUILTIN, do_bp}, +#endif + {"break", Node_K_break, LEX_BREAK, 0}, + {"continue", Node_K_continue, LEX_CONTINUE, 0}, + {"else", Node_illegal, LEX_ELSE, 0}, + {"exit", Node_K_exit, LEX_EXIT, 0}, + {"exp", Node_builtin, LEX_BUILTIN, do_exp}, + {"for", Node_K_for, LEX_FOR, 0}, + {"getline", Node_builtin, LEX_BUILTIN, do_getline}, + {"if", Node_K_if, LEX_IF, 0}, + {"in", Node_illegal, LEX_IN, 0}, + {"index", Node_builtin, LEX_BUILTIN, do_index}, + {"int", Node_builtin, LEX_BUILTIN, do_int}, + {"length", Node_builtin, LEX_BUILTIN, do_length}, + {"log", Node_builtin, LEX_BUILTIN, do_log}, + {"next", Node_K_next, LEX_NEXT, 0}, + {"print", Node_K_print, LEX_PRINT, 0}, + {"printf", Node_K_printf, LEX_PRINTF, 0}, +#ifndef FAST + {"prvars", Node_builtin, LEX_BUILTIN, do_prvars}, +#endif + {"split", Node_builtin, LEX_BUILTIN, do_split}, + {"sprintf", Node_builtin, LEX_BUILTIN, do_sprintf}, + {"sqrt", Node_builtin, LEX_BUILTIN, do_sqrt}, + {"substr", Node_builtin, LEX_BUILTIN, do_substr}, + {"while", Node_K_while, LEX_WHILE, 0}, + {NULL, Node_illegal, ERROR, 0} +}; + +/* Read one token, getting characters through lexptr. */ + +static int +yylex () +{ + register int c; + register int namelen; + register char *tokstart; + register struct token *toktab; + double atof(); /* JF know what happens if you forget this? */ + + + static did_newline = 0; /* JF the grammar insists that actions end + with newlines. This was easier than hacking + the grammar. */ + int do_concat; + + int seen_e = 0; /* These are for numbers */ + int seen_point = 0; + + retry: + + if(!lexptr) + return 0; + + if (want_regexp) { + want_regexp = 0; + /* there is a potential bug if a regexp is followed by an equal sign: + "/foo/=bar" would result in assign_quotient being returned as the + next token. Nothing is done about it since it is not valid awk, + but maybe something should be done anyway. */ + + tokstart = lexptr; + while (c = *lexptr++) { + switch (c) { + case '\\': + if (*lexptr++ == '\0') { + yyerror ("unterminated regexp ends with \\"); + return ERROR; + } + break; + case '/': /* end of the regexp */ + lexptr--; + yylval.sval = tokstart; + return REGEXP; + case '\n': + case '\0': + yyerror ("unterminated regexp"); + return ERROR; + } + } + } + do_concat=want_concat_token; + want_concat_token=0; + + if(*lexptr=='\0') { + lexptr=0; + return NEWLINE; + } + + /* if lexptr is at white space between two terminal tokens or parens, + it is a concatenation operator. */ + if(do_concat && (*lexptr==' ' || *lexptr=='\t')) { + while (*lexptr == ' ' || *lexptr == '\t') + lexptr++; + if (isalnum(*lexptr) || *lexptr == '\"' || *lexptr == '(' + || *lexptr == '.' || *lexptr == '$') /* the '.' is for decimal pt */ + return CONCAT_OP; + } + + while (*lexptr == ' ' || *lexptr == '\t') + lexptr++; + + tokstart = lexptr; /* JF */ + + switch (c = *lexptr++) { + case 0: + return 0; + + case '\n': + lineno++; + return NEWLINE; + + case '#': /* it's a comment */ + while (*lexptr != '\n' && *lexptr != '\0') + lexptr++; + goto retry; + + case '\\': + if(*lexptr=='\n') { + lexptr++; + goto retry; + } else break; + case ')': + case ']': + ++want_concat_token; + /* fall through */ + case '(': /* JF these were above, but I don't see why they should turn on concat. . . &*/ + case '[': + + case '{': + case ',': /* JF */ + case '$': + case ';': + /* set node type to ILLEGAL because the action should set it to + the right thing */ + yylval.nodetypeval = Node_illegal; + return c; + + case '*': + if(*lexptr=='=') { + yylval.nodetypeval=Node_assign_times; + lexptr++; + return ASSIGNOP; + } + yylval.nodetypeval=Node_illegal; + return c; + + case '/': + if(*lexptr=='=') { + yylval.nodetypeval=Node_assign_quotient; + lexptr++; + return ASSIGNOP; + } + yylval.nodetypeval=Node_illegal; + return c; + + case '%': + if(*lexptr=='=') { + yylval.nodetypeval=Node_assign_mod; + lexptr++; + return ASSIGNOP; + } + yylval.nodetypeval=Node_illegal; + return c; + + case '+': + if(*lexptr=='=') { + yylval.nodetypeval=Node_assign_plus; + lexptr++; + return ASSIGNOP; + } + if(*lexptr=='+') { + yylval.nodetypeval=Node_illegal; + lexptr++; + return INCREMENT; + } + yylval.nodetypeval=Node_illegal; + return c; + + case '!': + if(*lexptr=='=') { + yylval.nodetypeval=Node_notequal; + lexptr++; + return RELOP; + } + if(*lexptr=='~') { + yylval.nodetypeval=Node_nomatch; + lexptr++; + return MATCHOP; + } + yylval.nodetypeval=Node_illegal; + return c; + + case '<': + if(*lexptr=='=') { + yylval.nodetypeval=Node_leq; + lexptr++; + return RELOP; + } + yylval.nodetypeval=Node_less; + return RELOP; + + case '=': + if(*lexptr=='=') { + yylval.nodetypeval=Node_equal; + lexptr++; + return RELOP; + } + yylval.nodetypeval=Node_assign; + return ASSIGNOP; + + case '>': + if(want_redirect) { + if (*lexptr == '>') { + yylval.nodetypeval = Node_redirect_append; + lexptr++; + } else + yylval.nodetypeval = Node_redirect_output; + return REDIRECT_OP; + } + if(*lexptr=='=') { + yylval.nodetypeval=Node_geq; + lexptr++; + return RELOP; + } + yylval.nodetypeval=Node_greater; + return RELOP; + + case '~': + yylval.nodetypeval=Node_match; + return MATCHOP; + + case '}': /* JF added did newline stuff. Easier than hacking the grammar */ + if(did_newline) { + did_newline=0; + return c; + } + did_newline++; + --lexptr; + return NEWLINE; + + case '"': + while (*lexptr != '\0') { + switch (*lexptr++) { + case '\\': + if (*lexptr++ != '\0') + break; + /* fall through */ + case '\n': + yyerror ("unterminated string"); + return ERROR; + case '\"': + yylval.sval = tokstart + 1; /* JF Skip the doublequote */ + ++want_concat_token; + return YSTRING; + } + } + return ERROR; /* JF this was one level up, wrong? */ + + case '-': + if(*lexptr=='=') { + yylval.nodetypeval=Node_assign_minus; + lexptr++; + return ASSIGNOP; + } + if(*lexptr=='-') { + yylval.nodetypeval=Node_illegal; + lexptr++; + return DECREMENT; + } + /* JF I think space tab comma and newline are the legal places for + a UMINUS. Have I missed any? */ + if((!isdigit(*lexptr) && *lexptr!='.') || (lexptr>lexptr_begin+1 && + !index(" \t,\n",lexptr[-2]))) { + /* set node type to ILLEGAL because the action should set it to + the right thing */ + yylval.nodetypeval = Node_illegal; + return c; + } + /* FALL through into number code */ + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + case '.': + /* It's a number */ + if(c=='-') namelen=1; + else namelen=0; + for (; (c = tokstart[namelen]) != '\0'; namelen++) { + switch (c) { + case '.': + if (seen_point) + goto got_number; + ++seen_point; + break; + case 'e': + case 'E': + if (seen_e) + goto got_number; + ++seen_e; + if (tokstart[namelen+1] == '-' || tokstart[namelen+1] == '+') + namelen++; + break; + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + break; + default: + goto got_number; + } + } + +got_number: + lexptr = tokstart + namelen; + yylval.fval = atof(tokstart); + ++want_concat_token; + return NUMBER; + + case '&': + if(*lexptr=='&') { + yylval.nodetypeval=Node_and; + lexptr++; + return LEX_AND; + } + return ERROR; + + case '|': + if(want_redirect) { + lexptr++; + yylval.nodetypeval = Node_redirect_pipe; + return REDIRECT_OP; + } + if(*lexptr=='|') { + yylval.nodetypeval=Node_or; + lexptr++; + return LEX_OR; + } + return ERROR; + } + + if (!isalpha(c)) { + yyerror ("Invalid char '%c' in expression\n", c); + return ERROR; + } + + /* its some type of name-type-thing. Find its length */ + for (namelen = 0; is_identchar(tokstart[namelen]); namelen++) + ; + + + /* See if it is a special token. */ + for (toktab = tokentab; toktab->operator != NULL; toktab++) { + if(*tokstart==toktab->operator[0] && + !strncmp(tokstart,toktab->operator,namelen) && + toktab->operator[namelen]=='\0') { + lexptr=tokstart+namelen; + if(toktab->class == LEX_BUILTIN) + yylval.ptrval = toktab->ptr; + else + yylval.nodetypeval = toktab->value; + return toktab->class; + } + } + + /* It's a name. See how long it is. */ + yylval.sval = tokstart; + lexptr = tokstart+namelen; + ++want_concat_token; + return NAME; +} + +/*VARARGS1*/ +yyerror (mesg,a1,a2,a3,a4,a5,a6,a7,a8) + char *mesg; +{ + register char *ptr,*beg; + + /* Find the current line in the input file */ + if(!lexptr) { + beg="(END OF FILE)"; + ptr=beg+13; + } else { + if (*lexptr == '\n' && lexptr!=lexptr_begin) + --lexptr; + for (beg = lexptr;beg!=lexptr_begin && *beg != '\n';--beg) + ; + for (ptr = lexptr;*ptr && *ptr != '\n';ptr++) /*jfw: NL isn't guaranteed*/ + ; + if(beg!=lexptr_begin) + beg++; + } + fprintf (stderr, "Error near line %d, '%.*s'\n",lineno, ptr-beg, beg); + /* figure out line number, etc. later */ + fprintf (stderr, mesg, a1, a2, a3, a4, a5, a6, a7, a8); + fprintf (stderr,"\n"); + exit (1); +} + +/* Parse a C escape sequence. STRING_PTR points to a variable + containing a pointer to the string to parse. That pointer + is updated past the characters we use. The value of the + escape sequence is returned. + + A negative value means the sequence \ newline was seen, + which is supposed to be equivalent to nothing at all. + + If \ is followed by a null character, we return a negative + value and leave the string pointer pointing at the null character. + + If \ is followed by 000, we return 0 and leave the string pointer + after the zeros. A value of 0 does not mean end of string. */ + +static int +parse_escape (string_ptr) + char **string_ptr; +{ + register int c = *(*string_ptr)++; + switch (c) + { + case 'a': + return '\a'; + case 'b': + return '\b'; + case 'e': + return 033; + case 'f': + return '\f'; + case 'n': + return '\n'; + case 'r': + return '\r'; + case 't': + return '\t'; + case 'v': + return '\v'; + case '\n': + return -2; + case 0: + (*string_ptr)--; + return 0; + case '^': + c = *(*string_ptr)++; + if (c == '\\') + c = parse_escape (string_ptr); + if (c == '?') + return 0177; + return (c & 0200) | (c & 037); + + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + { + register int i = c - '0'; + register int count = 0; + while (++count < 3) + { + if ((c = *(*string_ptr)++) >= '0' && c <= '7') + { + i *= 8; + i += c - '0'; + } + else + { + (*string_ptr)--; + break; + } + } + return i; + } + default: + return c; + } +} @@ -0,0 +1,723 @@ +/* + * awk1 -- Expression tree constructors and main program for gawk. + * + * Copyright (C) 1986 Free Software Foundation + * Written by Paul Rubin, August 1986 + * + */ + +/* +GAWK is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY. No author or distributor accepts responsibility to anyone +for the consequences of using it or for whether it serves any +particular purpose or works at all, unless he says so in writing. +Refer to the GAWK General Public License for full details. + +Everyone is granted permission to copy, modify and redistribute GAWK, +but only under the conditions described in the GAWK General Public +License. A copy of this license is supposed to have been given to you +along with GAWK so you can know your rights and responsibilities. It +should be in a file named COPYING. Among other things, the copyright +notice and this notice must be preserved on all copies. + +In other words, go ahead and share GAWK, but don't try to stop +anyone else from sharing it farther. Help stamp out software hoarding! +*/ + +#include <stdio.h> +#include "regex.h" +#include "awk.h" + +/* Temporary nodes are stored here. ob_dummy is a dummy object used to + keep the obstack library from free()ing up the entire stack. */ +struct obstack temp_strings; +char *ob_dummy; + +/* The parse tree and field nodes are stored here. Parse_end is a dummy + item used to free up unneeded fields without freeing the program being run + */ +struct obstack other_stack; +char *parse_end; + +/* The global null string */ +NODE *Nnull_string; + +/* The special variable that contains the name of the current input file */ +extern NODE *FILENAME_node; + +/* The name the program was invoked under, for error messages */ +char *myname; + +/* A block of gAWK code to be run before running the program */ +NODE *begin_block = 0; + +/* A block of gAWK code to be run after the last input file */ +NODE *end_block = 0; + +FILE *input_file; /* Where to read from */ + +#ifndef FAST +/* non-zero means in debugging is enabled. Probably not very useful */ +int debugging; +#endif + +char *index(); + + +main(argc, argv) + int argc; + char **argv; +{ + register int i; + register NODE *tmp; + char **do_vars; +#ifndef FAST + /* Print out the parse tree. For debugging */ + register int dotree = 0; + extern int yydebug; +#endif + extern char *lexptr; + extern char *lexptr_begin; + FILE *fp,*fopen(); + + --argc; + myname= *argv++; + if(!argc) + usage(); + + /* Tell the regex routines how they should work. . . */ + re_set_syntax(RE_NO_BK_PARENS|RE_NO_BK_VBAR); + + /* Set up the stack for temporary strings */ + obstack_init (&temp_strings); + ob_dummy=obstack_alloc(&temp_strings,0); + + /* Set up the other stack for other things */ + obstack_init(&other_stack); + /* initialize the null string */ + Nnull_string = make_string("",0); + /* This was to keep Nnull_string from ever being free()d It didn't work */ + /* Nnull_string->stref=32000; */ + /* Set up the special variables */ + /* Note that this must be done BEFORE arg parsing else -R and -F + break horribly */ + init_vars(); + + + for(;*argv && **argv=='-';argc--,argv++) { + switch(argv[0][1]) { +#ifndef FAST + case 'd': + debugging++; + dotree++; + break; + + case 'D': + debugging++; + yydebug=2; + break; +#endif + /* This feature isn't in un*x awk, but might be useful */ + case 'R': + set_rs(&argv[0][2]); + break; + + case 'F': + set_fs(&argv[0][2]); + break; + + + /* It would be better to read the input file in as we parse + it. Its done this way for hysterical reasons. Feel + free to fix it. */ + case 'f': + if(lexptr) + panic("Can only use one -f option"); + if((fp=fopen(argv[1],"r"))==NULL) + er_panic(argv[1]); + else { + char *curptr; + int siz,nread; + + curptr=lexptr=malloc(2000); + if(curptr==NULL) + panic("Memory exhausted"); /* jfw: instead of abort() */ + siz=2000; + i=siz-1; + while((nread=fread(curptr,sizeof(char),i,fp)) > 0) { + curptr+=nread; + i-=nread; + if(i==0) { + lexptr=realloc(lexptr,siz*2); + if(lexptr==NULL) + panic("Memory exhausted"); /* jfw: instead of abort() */ + curptr=lexptr+siz-1; + i=siz; + siz*=2; + } + } + *curptr='\0'; + fclose(fp); + } + argc--; + argv++; + break; + + case '\0': /* A file */ + break; + + default: + panic("Unknown option %s",argv[0]); + } + } + if (debugging) setbuf(stdout, 0); /* jfw: make debugging easier */ + /* No -f option, use next arg */ + if(!lexptr) { + if(!argc) usage(); + lexptr= *argv++; + --argc; + } + + /* Read in the program */ + lexptr_begin=lexptr; + (void)yyparse (); + + /* Anything allocated on the other_stack after here will be freed + when the next input line is read. + */ + parse_end=obstack_alloc(&other_stack,0); + +#ifndef FAST + if(dotree) + print_parse_tree(expression_value); +#endif + /* Set up the field variables */ + init_fields(); + + /* Look for BEGIN and END blocks. Only one of each allowed */ + for(tmp=expression_value;tmp;tmp=tmp->rnode) { + if(!tmp->lnode || !tmp->lnode->lnode) + continue; + if(tmp->lnode->lnode->type==Node_K_BEGIN) + begin_block=tmp->lnode->rnode; + else if(tmp->lnode->lnode->type==Node_K_END) + end_block=tmp->lnode->rnode; + } + if(begin_block && interpret(begin_block) == 0) exit(0); /* jfw */ + do_vars=argv; + while(argc>0 && index(*argv,'=')) { + argv++; + --argc; + } + if(do_vars==argv) do_vars=0; + if(argc==0) { + static char *dumb[2]= { "-", 0}; + + argc=1; + argv= &dumb[0]; + } + while(argc--) { + if(!strcmp(*argv,"-")) { + input_file=stdin; + FILENAME_node->var_value=Nnull_string; + ADD_ONE_REFERENCE(Nnull_string); + } else { + extern NODE *deref; + + input_file=fopen(*argv,"r"); + /* This should print the error message from errno */ + if(!input_file) + er_panic(*argv); + /* This is a kludge. */ + deref=FILENAME_node->var_value; + do_deref(); + FILENAME_node->var_value=make_string(*argv,strlen(*argv)); + } + /* This is where it spends all its time. The infamous MAIN LOOP */ + if(inrec()==0) { + if(do_vars) { + while(do_vars!=argv && *do_vars) { + char *cp; + + cp=index(*do_vars,'='); + *cp++='\0'; + variable(*do_vars)->var_value=make_string(cp,strlen(cp)); + do_vars++; + } + do_vars=0; + } + do + obstack_free(&temp_strings, ob_dummy); + while (interpret(expression_value) && inrec() == 0); + } + if(input_file!=stdin) fclose(input_file); + argv++; + } + if(end_block) (void)interpret(end_block); + exit(0); +} + +/* These exit values are arbitrary */ +/*VARARGS1*/ +panic(str,arg) +char *str; +{ + fprintf(stderr,"%s: ",myname); + fprintf(stderr,str,arg); + fprintf(stderr,"\n"); + exit(12); +} + +er_panic(str) +char *str; +{ + fprintf(stderr,"%s: ",myname); + perror(str); + exit(15); +} + +usage() +{ + fprintf(stderr,"%s: usage: %s {-f progfile | program } [-F{c} -R{c}] file . . .\n",myname,myname); + exit(11); +} + + +/* This allocates a new node of type ty. Note that this node will not go + away unless freed, so don't use it for tmp storage */ +NODE * +newnode(ty) +NODETYPE ty; +{ + register NODE *r; + + r=(NODE *)malloc(sizeof(NODE)); + if(r==NULL) + abort(); + r->type=ty; + return r; +} + + +/* Duplicate a node. (For global strings, "duplicate" means crank up + the reference count.) This creates global nodes. . .*/ +NODE * +dupnode(n) +NODE *n; +{ + register NODE *r; + + if(n->type==Node_string) { + n->stref++; + return n; + } else if(n->type==Node_temp_string) { + r=newnode(Node_string); + r->stlen=n->stlen; + r->stref=1; + r->stptr=malloc(n->stlen+1); + if(r->stptr==NULL) + abort(); + bcopy (n->stptr, r->stptr, n->stlen); + r->stptr[r->stlen]='\0'; /* JF for hackval */ + return r; + } else { + r=newnode(Node_illegal); + *r= *n; + return r; + } +} + +/* This allocates a node with defined lnode and rnode. */ +/* This should only be used by yyparse+co while + reading in the program */ +NODE * +node (left, op, right) + NODE *left, *right; + NODETYPE op; +{ + register NODE *r; + + r = (NODE *)obstack_alloc(&other_stack,sizeof(NODE)); + r->type=op; + r->lnode = left; + r->rnode = right; + return r; +} + +/* This allocates a node with defined subnode and proc */ +/* Otherwise like node() */ +NODE * +snode(subn, op, procp) +NODETYPE op; +NODE *(*procp)(); +NODE *subn; +{ + register NODE *r; + + r=(NODE *)obstack_alloc(&other_stack,sizeof(NODE)); + r->type=op; + r->subnode=subn; + r->proc=procp; + return r; +} + +/* (jfw) This allocates a Node_line_range node + * with defined condpair and zeroes the trigger word + * to avoid the temptation of assuming that calling + * 'node( foo, Node_line_range, 0)' will properly initialize 'triggered'. + */ +/* Otherwise like node() */ +NODE * +mkrangenode(cpair) +NODE *cpair; +{ + register NODE *r; + + r=(NODE *)obstack_alloc(&other_stack,sizeof(NODE)); + r->type=Node_line_range; + r->condpair=cpair; + r->triggered = 0; + return r; +} + +/* this allocates a node with defined numbr */ +/* This creates global nodes! */ +NODE * +make_number (x) + AWKNUM x; +{ + register NODE *r; + + r=newnode(Node_number); + r->numbr = x; + return r; +} + +/* This creates temporary nodes. They go away quite quicly, so + don't use them for anything important */ +#ifndef FAST +NODE * +tmp_number(x) +AWKNUM x; +{ +#ifdef DONTDEF + return make_number(x); +#endif + NODE *r; + + r=(NODE *)obstack_alloc(&temp_strings,sizeof(NODE)); + r->type=Node_number; + r->numbr=x; + return r; +} +#endif + +/* Make a string node. If len==0, the string passed in S is supposed to end + with a double quote, but have had the beginning double quote + already stripped off by yylex. + If LEN!=0, we don't care what s ends with. This creates a global node */ + +NODE * +make_string (s,len) + char *s; +{ + register NODE *r; + register char *pf,*pt; + register int c; + + /* the aborts are impossible because yylex is supposed to have + already checked for unterminated strings */ + if(len==-1) { /* Called from yyparse, find our own len */ +#ifndef FAST + if (s[-1] != '\"') /* Didn't start with " */ + abort (); +#endif + + for(pf = pt = s; *pf != '\0' && *pf!='\"';) { + c= *pf++; + switch(c) { +#ifndef FAST + case '\0': + abort(); +#endif + + case '\\': +#ifndef FAST + if(*pf=='\0') + abort(); +#endif + + c= *pf++; + switch(c) { + case '\\': /* no massagary needed */ + case '\'': + case '\"': + break; + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + c-='0'; + while(*pf && *pf>='0' && *pf<='7') { + c=c*8+ *pf++ - '0'; + } + break; + case 'b': + c='\b'; + break; + case 'f': + c='\f'; + break; + case 'n': + c='\n'; + break; + case 'r': + c='\r'; + break; + case 't': + c='\t'; + break; + case 'v': + c='\v'; + break; + default: + *pt++='\\'; + break; + } + /* FALL THROUGH */ + default: + *pt++=c; + break; + } + } +#ifndef FAST + if(*pf=='\0') + abort(); /* JF hit the end of the buf */ +#endif + len = pt - s; /* JF was p - s - 1 */ + } + + r=newnode(Node_string); + r->stptr=(char *)malloc(len+1); + if(r->stptr==0) + abort(); + r->type=Node_string; + r->stlen=len; + r->stref=1; + bcopy (s, r->stptr, len); + r->stptr[len]='\0'; /* JF a hack */ + + return r; +} + +/* #ifndef FAST */ +/* This should be a macro for speed, but the C compiler chokes. */ +/* Read the warning under tmp_number */ +NODE * +tmp_string(s,len) +char *s; +{ + register NODE *r; + +#ifdef DONTDEF + return make_string(s,len); +#endif + r=(NODE *)obstack_alloc(&temp_strings,sizeof(NODE)); + r->stptr=(char *)obstack_alloc(&temp_strings,len+1); + r->type=Node_temp_string; + r->stlen=len; + r->stref=1; + bcopy (s, r->stptr, len); + r->stptr[len]='\0'; /* JF a hack */ + + return r; +} +/* #endif */ + +/* Generate compiled regular expressions */ +struct re_pattern_buffer * +make_regexp (s) + char *s; +{ + typedef struct re_pattern_buffer RPAT; + RPAT *rp; + char *p, *err; + + rp = (RPAT *) obstack_alloc(&other_stack, sizeof (RPAT)); + bzero((char *)rp,sizeof(RPAT)); + rp->buffer = (char *)malloc(8); /* JF I'd obstack allocate it, + except the regex routines + try to realloc() it, which fails. */ + /* Note that this means it may never be freed. Someone fix, please? */ + + rp->allocated = 8; + rp->fastmap = (char *)obstack_alloc(&other_stack, 256); + + for (p = s; *p != '\0'; p++) { + if (*p == '\\') + p++; + else if (*p == '/') + break; + } +#ifndef FAST + if (*p != '/') + abort (); /* impossible */ +#endif + + /* JF was re_compile_pattern, but that mishandles ( ) and |, + so I had to write my own front end. Sigh. */ + + if ((err = re_compile_pattern (s, p - s, rp)) != NULL) { + fprintf (stderr, "illegal regexp: "); + yyerror (err); /* fatal */ + } + + return rp; +} + +/* Build a for loop */ +FOR_LOOP_HEADER * +make_for_loop (init, cond, incr) + NODE *init, *cond, *incr; +{ + register FOR_LOOP_HEADER *r; + + r = (FOR_LOOP_HEADER *)obstack_alloc(&other_stack,sizeof (FOR_LOOP_HEADER)); + r->init = init; + r->cond = cond; + r->incr = incr; + return r; +} + +/* Name points to a variable name. Make sure its in the symbol table */ +NODE * +variable (name) + char *name; +{ + register NODE *r; + NODE *lookup(), *install(); + + if ((r = lookup (variables, name)) == NULL) { + r = install (variables, name, node(Nnull_string, Node_var, (NODE *)NULL)); + /* JF make_number (0.0) is WRONG */ + } + return r; +} + +/* Create a special variable */ +NODE * +spc_var (name,value) +char *name; +NODE *value; +{ + register NODE *r; + NODE *lookup(), *install(); + + if ((r = lookup(variables, name)) == NULL) + r = install (variables, name, node(value, Node_var, (NODE *)NULL)); + return r; +} + + +/* + * Install a name in the hash table specified, even if it is already there. + * Name stops with first non alphanumeric. + * Caller must check against redefinition if that is desired. + */ +NODE * +install (table, name, value) + HASHNODE **table; + char *name; + NODE *value; +{ + register HASHNODE *hp; + register int i, len, bucket; + register char *p; + + len = 0; + p = name; + while (is_identchar(*p)) + p++; + len = p - name; + + i = sizeof (HASHNODE) + len + 1; + hp = (HASHNODE *)obstack_alloc(&other_stack,i); + bucket = hashf(name, len, HASHSIZE); + hp->next = table[bucket]; + table[bucket] = hp; + hp->length = len; + hp->value = value; + hp->name = ((char *) hp) + sizeof (HASHNODE); + hp->length = len; + bcopy (name, hp->name, len); + return hp->value; +} + +/* + * find the most recent hash node for name name (ending with first + * non-identifier char) installed by install + */ +NODE * +lookup (table, name) + HASHNODE **table; + char *name; +{ + register char *bp; + register HASHNODE *bucket; + register int len; + + for (bp = name; is_identchar(*bp); bp++) + ; + len = bp - name; + bucket = table[hashf(name, len, HASHSIZE)]; + while (bucket) { + if (bucket->length == len && strncmp(bucket->name, name, len) == 0) + return bucket->value; + bucket = bucket->next; + } + return NULL; +} + +#define HASHSTEP(old, c) ((old << 1) + c) +#define MAKE_POS(v) (v & ~0x80000000) /* make number positive */ + +/* + * return hash function on name. must be compatible with the one + * computed a step at a time, elsewhere (JF: Where? I can't find it!) + */ +int +hashf(name, len, hashsize) + register char *name; + register int len; + int hashsize; +{ + register int r = 0; + + while (len--) + r = HASHSTEP(r, *name++); + + return MAKE_POS(r) % hashsize; +} + +/* Add new to the rightmost branch of LIST. This uses n^2 time, but + doesn't get used enough to make optimizing worth it. . . */ +/* You don't believe me? Profile it yourself! */ + +NODE * +append_right(list,new) +NODE *list,*new; +{ + register NODE *oldlist; + + oldlist = list; + while(list->rnode!=NULL) + list=list->rnode; + list->rnode = new; + return oldlist; +} @@ -0,0 +1,1129 @@ +/* + * awk2 --- gawk parse tree interpreter + * + * Copyright (C) 1986 Free Software Foundation + * Written by Paul Rubin, August 1986 + * + */ + +/* +GAWK is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY. No author or distributor accepts responsibility to anyone +for the consequences of using it or for whether it serves any +particular purpose or works at all, unless he says so in writing. +Refer to the GAWK General Public License for full details. + +Everyone is granted permission to copy, modify and redistribute GAWK, +but only under the conditions described in the GAWK General Public +License. A copy of this license is supposed to have been given to you +along with GAWK so you can know your rights and responsibilities. It +should be in a file named COPYING. Among other things, the copyright +notice and this notice must be preserved on all copies. + +In other words, go ahead and share GAWK, but don't try to stop +anyone else from sharing it farther. Help stamp out software hoarding! +*/ + +#include <setjmp.h> +#include <stdio.h> + +#ifdef SYSV +/* nasty nasty berkelixm */ +#define _setjmp setjmp +#define _longjmp longjmp +#endif + +#include "awk.h" + +NODE **get_lhs(); + +extern NODE dumb[],*OFMT_node; +/* BEGIN and END blocks need special handling, because we are handed them + * as raw Node_statement_lists, not as Node_rule_lists (jfw) + */ +extern NODE *begin_block, *end_block; +NODE *do_sprintf(); +extern struct obstack other_stack; + + +#define min(a,b) ((a) < (b) ? (a) : (b)) + +/* More of that debugging stuff */ +#ifdef FAST +#define DEBUG(X) +#else +#define DEBUG(X) print_debug X +#endif + +/* longjmp return codes, must be nonzero */ +/* Continue means either for loop/while continue, or next input record */ +#define TAG_CONTINUE 1 +/* Break means either for/while break, or stop reading input */ +#define TAG_BREAK 2 + +/* the loop_tag_valid variable allows continue/break-out-of-context + * to be caught and diagnosed (jfw) */ +#define PUSH_BINDING(stack, x) (bcopy ((char *)(x), (char *)(stack), sizeof (jmp_buf)), loop_tag_valid++) +#define RESTORE_BINDING(stack, x) (bcopy ((char *)(stack), (char *)(x), sizeof (jmp_buf)), loop_tag_valid--) + +/* for "for(iggy in foo) {" */ +struct search { + int numleft; + AHASH **arr_ptr; + AHASH *bucket; + NODE *symbol; + NODE *retval; +}; + +struct search *assoc_scan(),*assoc_next(); +/* Tree is a bunch of rules to run. + Returns zero if it hit an exit() statement */ +interpret (tree) + NODE *tree; +{ + register NODE *t; /* temporary */ + + auto jmp_buf loop_tag_stack; /* shallow binding stack for loop_tag */ + static jmp_buf loop_tag; /* always the current binding */ + static int loop_tag_valid = 0;/* nonzero when loop_tag valid (jfw) */ + + static jmp_buf rule_tag; /* tag the rule currently being run, + for NEXT and EXIT statements. It is + static because there are no nested rules */ + + register NODE **lhs; /* lhs == Left Hand Side for assigns, etc */ + register struct search *l; /* For array_for */ + + + extern struct obstack temp_strings; + extern char *ob_dummy; + NODE *do_printf(); + + /* clean up temporary strings created by evaluating expressions in + previous recursive calls */ + obstack_free (&temp_strings, ob_dummy); + + if(tree == NULL) + return 1; + switch (tree->type) { +#ifndef FAST + /* Can't run these! */ + case Node_illegal: + case Node_rule_node: + case Node_if_branches: + case Node_expression_list: + case Node_K_BEGIN: + case Node_K_END: + case Node_redirect_output: + case Node_redirect_append: + case Node_redirect_pipe: + case Node_var_array: + abort(); +#endif + + case Node_rule_list: + for (t = tree; t != NULL; t = t->rnode) { + switch (_setjmp(rule_tag)) { + case 0: /* normal non-jump */ + if (eval_condition (t->lnode->lnode)) { + DEBUG(("Found a rule",t->lnode->rnode)); + if (t->lnode->rnode == NULL) { + /* special case: pattern with no action is equivalent to + * an action of {print} (jfw) */ + NODE printnode; + printnode.type = Node_K_print; + printnode.lnode = NULL; + printnode.rnode = NULL; + hack_print_node(&printnode); + } else + (void)interpret (t->lnode->rnode); + } + break; + case TAG_CONTINUE: /* NEXT statement */ + return 1; + case TAG_BREAK: + return 0; + } + } + break; + + case Node_statement_list: + /* print_a_node(tree); */ + /* because BEGIN and END do not have Node_rule_list nature, yet can + * have exits and nexts, we special-case a setjmp of rule_tag here. + * (jfw) + */ + if (tree == begin_block || tree == end_block) { + switch (_setjmp(rule_tag)) { + case TAG_CONTINUE: /* next */ + panic("unexpected next"); + return 1; + case TAG_BREAK: return 0; + } + } + for (t = tree; t != NULL; t = t->rnode) { + DEBUG(("Statements",t->lnode)); + (void)interpret (t->lnode); + } + break; + + case Node_K_if: + DEBUG(("IF",tree->lnode)); + if (eval_condition(tree->lnode)) { + DEBUG(("True",tree->rnode->lnode)); + (void)interpret (tree->rnode->lnode); + } else { + DEBUG(("False",tree->rnode->rnode)); + (void)interpret (tree->rnode->rnode); + } + break; + + case Node_K_while: + PUSH_BINDING (loop_tag_stack, loop_tag); + + DEBUG(("WHILE",tree->lnode)); + while (eval_condition (tree->lnode)) { + switch (_setjmp (loop_tag)) { + case 0: /* normal non-jump */ + DEBUG(("DO",tree->rnode)); + (void)interpret (tree->rnode); + break; + case TAG_CONTINUE: /* continue statement */ + break; + case TAG_BREAK: /* break statement */ + RESTORE_BINDING (loop_tag_stack, loop_tag); + return 1; +#ifndef FAST + default: + abort (); /* never happens */ +#endif + } + } + RESTORE_BINDING (loop_tag_stack, loop_tag); + break; + + case Node_K_for: + PUSH_BINDING (loop_tag_stack, loop_tag); + + DEBUG(("FOR",tree->forloop->init)); + (void)interpret (tree->forloop->init); + + DEBUG(("FOR.WHILE",tree->forloop->cond)); + while (eval_condition (tree->forloop->cond)) { + switch (_setjmp (loop_tag)) { + case 0: /* normal non-jump */ + DEBUG(("FOR.DO",tree->lnode)); + (void)interpret (tree->lnode); + /* fall through */ + case TAG_CONTINUE: /* continue statement */ + DEBUG(("FOR.INCR",tree->forloop->incr)); + (void)interpret (tree->forloop->incr); + break; + case TAG_BREAK: /* break statement */ + RESTORE_BINDING (loop_tag_stack, loop_tag); + return 1; +#ifndef FAST + default: + abort (); /* never happens */ +#endif + } + } + RESTORE_BINDING (loop_tag_stack, loop_tag); + break; + + case Node_K_arrayfor: +#define hakvar forloop->init +#define arrvar forloop->incr + PUSH_BINDING(loop_tag_stack, loop_tag); + DEBUG(("AFOR.VAR",tree->hakvar)); + lhs=get_lhs(tree->hakvar); + do_deref(); + for(l=assoc_scan(tree->arrvar);l;l=assoc_next(l)) { + *lhs=dupnode(l->retval); + DEBUG(("AFOR.NEXTIS",*lhs)); + switch(_setjmp(loop_tag)) { + case 0: + DEBUG(("AFOR.DO",tree->lnode)); + (void)interpret(tree->lnode); + case TAG_CONTINUE: + break; + + case TAG_BREAK: + RESTORE_BINDING(loop_tag_stack, loop_tag); + return 1; +#ifndef FAST + default: + abort(); +#endif + } + } + RESTORE_BINDING(loop_tag_stack, loop_tag); + break; + + case Node_K_break: + DEBUG(("BREAK",NULL)); + if (loop_tag_valid == 0) /* jfw */ + panic("unexpected break or continue"); + _longjmp (loop_tag, TAG_BREAK); + break; + + case Node_K_continue: + DEBUG(("CONTINUE",NULL)); + if (loop_tag_valid == 0) /* jfw */ + panic("unexpected break or continue"); + _longjmp (loop_tag, TAG_CONTINUE); + break; + + case Node_K_print: + DEBUG(("PRINT",tree)); + (void)hack_print_node (tree); + break; + + case Node_K_printf: + DEBUG(("PRINTF",tree)); + (void)do_printf(tree); + break; + + case Node_K_next: + DEBUG(("NEXT",NULL)); + _longjmp (rule_tag, TAG_CONTINUE); + break; + + case Node_K_exit: + /* The unix awk doc says to skip the rest of the input. Does that + mean after performing all the rules on the current line? + Unix awk quits immediately, so this does too. */ + /* The UN*X exit can also take an optional arg return code. We don't */ + /* Well, we parse it, but never *DO* it */ + DEBUG(("EXIT",NULL)); + _longjmp (rule_tag, TAG_BREAK); + break; + + default: + /* Appears to be an expression statement. Throw away the value. */ + DEBUG(("E",NULL)); + (void)tree_eval (tree); + break; + } + return 1; +} + +/* evaluate a subtree, allocating strings on a temporary stack. */ +/* This used to return a whole NODE, instead of a ptr to one, but that + led to lots of obnoxious copying. I got rid of it (JF) */ +NODE * +tree_eval (tree) + NODE *tree; +{ + register NODE *r, *t1, *t2; /* return value and temporary subtrees */ + register NODE **lhs; + static AWKNUM x; /* Why are these static? */ + extern struct obstack temp_strings; + + if(tree == NULL) { + DEBUG(("NULL",NULL)); + return Nnull_string; + } + switch (tree->type) { + /* trivial data */ + case Node_string: + case Node_number: + DEBUG(("DATA",tree)); + return tree; + + /* Builtins */ + case Node_builtin: + DEBUG(("builtin",tree)); + return ((*tree->proc)(tree->subnode)); + + /* unary operations */ + + case Node_var: + case Node_subscript: + case Node_field_spec: + DEBUG(("var_type ref",tree)); + lhs=get_lhs(tree); + return *lhs; + + case Node_preincrement: + case Node_predecrement: + DEBUG(("+-X",tree)); + lhs=get_lhs(tree->subnode); + assign_number(lhs,force_number(*lhs) + (tree->type==Node_preincrement ? 1.0 : -1.0)); + return *lhs; + + case Node_postincrement: + case Node_postdecrement: + DEBUG(("X+-",tree)); + lhs=get_lhs(tree->subnode); + x = force_number(*lhs); + assign_number (lhs, x + (tree->type==Node_postincrement ? 1.0 : -1.0)); + return tmp_number(x); + + case Node_unary_minus: + DEBUG(("UMINUS",tree)); + return tmp_number(-force_number(tree_eval(tree->subnode))); + + /* assignments */ + case Node_assign: + DEBUG(("ASSIGN",tree)); + r = tree_eval (tree->rnode); + lhs=get_lhs(tree->lnode); + *lhs= dupnode(r); + do_deref(); + /* FOO we have to regenerate $0 here! */ + if(tree->lnode->type==Node_field_spec) + fix_fields(); + return r; + /* other assignment types are easier because they are numeric */ + case Node_assign_times: + r = tree_eval (tree->rnode); + lhs=get_lhs(tree->lnode); + assign_number(lhs, force_number(*lhs) * force_number(r)); + do_deref(); + return r; + + case Node_assign_quotient: + r = tree_eval (tree->rnode); + lhs=get_lhs(tree->lnode); + assign_number(lhs, force_number(*lhs) / force_number(r)); + do_deref(); + return r; + + case Node_assign_mod: + r = tree_eval (tree->rnode); + lhs=get_lhs(tree->lnode); + assign_number(lhs, (AWKNUM)(((int) force_number(*lhs)) % ((int) force_number(r)))); + do_deref(); + return r; + + case Node_assign_plus: + r = tree_eval (tree->rnode); + lhs=get_lhs(tree->lnode); + assign_number(lhs, force_number(*lhs) + force_number(r)); + do_deref(); + return r; + + case Node_assign_minus: + r = tree_eval (tree->rnode); + lhs=get_lhs(tree->lnode); + assign_number(lhs, force_number(*lhs) - force_number(r)); + do_deref(); + return r; + } + /* Note that if TREE is invalid, gAWK will probably bomb in one of these + tree_evals here. */ + /* evaluate subtrees in order to do binary operation, then keep going */ + t1 = tree_eval (tree->lnode); + t2 = tree_eval (tree->rnode); + + switch (tree->type) { + + case Node_concat: + t1=force_string(t1); + t2=force_string(t2); + + r=(NODE *)obstack_alloc(&temp_strings,sizeof(NODE)); + r->type=Node_temp_string; + r->stlen=t1->stlen+t2->stlen; + r->stref=1; + r->stptr=(char *)obstack_alloc(&temp_strings,r->stlen+1); + bcopy(t1->stptr,r->stptr,t1->stlen); + bcopy(t2->stptr,r->stptr+t1->stlen,t2->stlen); + r->stptr[r->stlen]='\0'; + return r; + + case Node_times: + return tmp_number(force_number(t1) * force_number(t2)); + + case Node_quotient: + x=force_number(t2); + if(x==(AWKNUM)0) return tmp_number((AWKNUM)0); + else return tmp_number(force_number(t1) / x); + + case Node_mod: + x=force_number(t2); + if(x==(AWKNUM)0) return tmp_number((AWKNUM)0); + return tmp_number((AWKNUM) /* uggh... */ + (((int) force_number(t1)) % ((int) x))); + + case Node_plus: + return tmp_number(force_number(t1) + force_number(t2)); + + case Node_minus: + return tmp_number(force_number(t1) - force_number(t2)); + +#ifndef FAST + default: + fprintf (stderr, "internal error: illegal numeric operation\n"); + abort (); +#endif + } + return 0; +} + +/* We can't dereference a variable until after we've given it its new value. + This variable points to the value we have to free up */ +NODE *deref; + +/* This returns a POINTER to a node pointer. + *get_lhs(ptr) is the current value of the var, or where to store the + var's new value */ + +NODE ** +get_lhs(ptr) +NODE *ptr; +{ + register NODE *subexp; + register NODE **aptr; + register int num; + extern NODE **fields_arr; + extern f_arr_siz; + NODE **assoc_lookup(); + extern char f_empty[]; /* jfw */ + +#ifndef FAST + if(ptr == NULL) + abort(); +#endif + deref = NULL; + switch(ptr->type) { + case Node_var: + deref=ptr->var_value; + return &(ptr->var_value); + + case Node_field_spec: + num=(int)force_number(tree_eval(ptr->lnode)); + if(num<0) num=0; /* JF what should I do? */ + if(num>f_arr_siz) + set_field(num,f_empty,0); /* jfw: so blank_strings can be simpler */ + deref = NULL; + return &fields_arr[num]; + + case Node_subscript: + subexp = tree_eval(ptr->rnode); + aptr=assoc_lookup(ptr->lnode,subexp); + deref= *aptr; + return aptr; + } +#ifndef FAST + abort(); + return 0; +#endif +} + +do_deref() +{ + if(deref) { + switch(deref->type) { + case Node_string: + if(deref!=Nnull_string) + FREE_ONE_REFERENCE(deref); + break; + case Node_number: + free((char *)deref); + break; +#ifndef FAST + default: + abort(); +#endif + } + deref = 0; + } +} + +/* This makes numeric operations slightly more efficient. + Just change the value of a numeric node, if possible */ +assign_number (ptr, value) +NODE **ptr; +AWKNUM value; +{ + switch ((*ptr)->type) { + case Node_string: + if(*ptr!=Nnull_string) + FREE_ONE_REFERENCE (*ptr); + case Node_temp_string: /* jfw: dont crash if we say $2 += 4 */ + *ptr=make_number(value); + return; + case Node_number: + (*ptr)->numbr = value; + deref=0; + break; +#ifndef FAST + default: + printf("assign_number nodetype %d\n", (*ptr)->type); /* jfw: add mesg. */ + abort (); +#endif + } +} + + +/* Routines to deal with fields */ +#define ORIG_F 30 + +NODE **fields_arr; +NODE *fields_nodes; +int f_arr_siz; +char f_empty [] = ""; + +init_fields() +{ + register NODE **tmp; + register NODE *xtmp; + + f_arr_siz=ORIG_F; + fields_arr=(NODE **)malloc(ORIG_F * sizeof(NODE *)); + fields_nodes=(NODE *)malloc(ORIG_F * sizeof(NODE)); + tmp= &fields_arr[f_arr_siz]; + xtmp= &fields_nodes[f_arr_siz]; + while(--tmp>= &fields_arr[0]) { + --xtmp; + *tmp=xtmp; + xtmp->type=Node_temp_string; + xtmp->stlen=0; + xtmp->stref=1; + xtmp->stptr=f_empty; + } +} + +blank_fields() +{ + register NODE **tmp; + extern char *parse_end; + + tmp= &fields_arr[f_arr_siz]; + while(--tmp>= &fields_arr[0]) { + switch(tmp[0]->type) { + case Node_number: + free((char *)*tmp); + *tmp= &fields_nodes[tmp-fields_arr]; + break; + case Node_string: + if(*tmp!=Nnull_string) + FREE_ONE_REFERENCE(*tmp); + *tmp= &fields_nodes[tmp-fields_arr]; + break; + case Node_temp_string: + break; +#ifndef FAST + default: + abort(); +#endif + } + if ((*tmp)->stptr != f_empty) { /* jfw */ + /*Then it was assigned a string with set_field */ + /*out of a private buffer to inrec, so don't free it*/ + (*tmp)->stptr = f_empty; + (*tmp)->stlen = 0; + (*tmp)->stref = 1; + } + /* *tmp=Nnull_string; */ + } + /* Free the strings */ + obstack_free(&other_stack,parse_end); +} + +/* Danger! Must only be called for fields we know have just been blanked, + or fields we know don't exist yet. */ +set_field(n,str,len) +char *str; +{ + NODE *field_string(); + + if(n>f_arr_siz) { + int t; + + fields_arr=(NODE **)realloc((char *)fields_arr,(n+1)*sizeof(NODE *)); + fields_nodes=(NODE *)realloc((char *)fields_nodes,(n+1)*sizeof(NODE)); + for(t=f_arr_siz;t<=n;t++) { + fields_arr[t]= &fields_nodes[t]; + fields_nodes[t].type=Node_temp_string; + fields_nodes[t].stlen=0; + fields_nodes[t].stref=1; + fields_nodes[t].stptr=f_empty; + } + f_arr_siz=n+1; + } + fields_nodes[n].stlen=len; + if(n==0) { + fields_nodes[n].stptr=(char*)obstack_alloc(&other_stack,len+1); + bcopy(str,fields_nodes[n].stptr,len); + fields_nodes[n].stptr[len]='\0'; + } else { + fields_nodes[n].stptr=str; + str[len]='\0'; + } +} + +#ifdef DONTDEF +/* Nodes created with this will go away when the next input line is read */ +NODE * +field_string(s,len) +char *s; +{ + register NODE *r; + + r=(NODE *)obstack_alloc(&other_stack,sizeof(NODE)); + r->type=Node_temp_string; + r->stref=1; + r->stlen=len; + r->stptr=(char*)obstack_alloc(&other_stack,len+1); + bcopy(s,r->stptr,len); + /* r->stptr=s; + r->stptr[len]='\0'; */ + + return r; +} +#endif + +/* Someone assigned a value to $(something). Fix up $0 to be right */ +fix_fields() +{ + register int tlen; + register NODE *tmp; + NODE *ofs; + char *ops; + register char *cops; + register NODE **ptr,**maxp; + extern NODE *OFS_node; + + maxp=0; + tlen=0; + ofs=force_string(*get_lhs(OFS_node)); + ptr= &fields_arr[f_arr_siz]; + while(--ptr> &fields_arr[0]) { + tmp=force_string(*ptr); + tlen+=tmp->stlen; + if(tmp->stlen && !maxp) + maxp=ptr; + } + if(!maxp) { + if (fields_arr[0] != fields_nodes) + FREE_ONE_REFERENCE(fields_arr[0]); + fields_arr[0]=Nnull_string; + return; + } + + tlen+=((maxp-fields_arr)-1)*ofs->stlen; + ops=(char *)malloc(tlen+1); + cops=ops; + for(ptr= &fields_arr[1];ptr<=maxp;ptr++) { + tmp=force_string(*ptr); + bcopy(tmp->stptr,cops,tmp->stlen); + cops+=tmp->stlen; + if(ptr!=maxp) { + bcopy(ofs->stptr,cops,ofs->stlen); + cops+=ofs->stlen; + } + } + tmp=newnode(Node_string); + tmp->stptr=ops; + tmp->stlen=tlen; + tmp->stref=1; + tmp->stptr[tlen]='\0'; + /* don't free unless it's new */ + if (fields_arr[0] != fields_nodes) + FREE_ONE_REFERENCE(fields_arr[0]); + fields_arr[0]=tmp; +} + + +/* Is TREE true or false? Returns 0==false, non-zero==true */ +int +eval_condition (tree) +NODE *tree; +{ + register int di; + register NODE *t1,*t2; + + if(tree==NULL) /* Null trees are the easiest kinds */ + return 1; + switch (tree->type) { + /* Maybe it's easy; check and see. */ + /* BEGIN and END are always false */ + case Node_K_BEGIN: + return 0; + break; + + case Node_K_END: + return 0; + break; + + case Node_and: + return eval_condition (tree->lnode) + && eval_condition (tree->rnode); + + case Node_or: + return eval_condition (tree->lnode) + || eval_condition (tree->rnode); + + case Node_not: + return !eval_condition (tree->lnode); + + /* Node_line_range is kind of like Node_match, EXCEPT: + * the lnode field (more properly, the condpair field) is a node of + * a Node_cond_pair; whether we evaluate the lnode of that node or the + * rnode depends on the triggered word. More precisely: if we are not + * yet triggered, we tree_eval the lnode; if that returns true, we set + * the triggered word. If we are triggered (not ELSE IF, note), we + * tree_eval the rnode, clear triggered if it succeeds, and perform our + * action (regardless of success or failure). We want to be able to + * begin and end on a single input record, so this isn't an ELSE IF, as + * noted above. + * This feature was implemented by John Woods, jfw@eddie.mit.edu, during + * a rainy weekend. + */ + case Node_line_range: + if (!tree->triggered) + if (!eval_condition(tree->condpair->lnode)) + return 0; + else + tree->triggered = 1; + /* Else we are triggered */ + if (eval_condition(tree->condpair->rnode)) + tree->triggered = 0; + return 1; + } + + /* Could just be J.random expression. + in which case, null and 0 are false, + anything else is true */ + + switch(tree->type) { + case Node_match: + case Node_nomatch: + case Node_equal: + case Node_notequal: + case Node_less: + case Node_greater: + case Node_leq: + case Node_geq: + break; + + default: /* This is so 'if(iggy)', etc, will work */ + /* Non-zero and non-empty are true */ + t1=tree_eval(tree); + switch(t1->type) { + case Node_number: + return t1->numbr!=0.0; + case Node_string: + case Node_temp_string: + return t1->stlen!=0; +#ifndef FAST + default: + abort(); +#endif + } + } + /* couldn't fob it off recursively, eval left subtree and + see if it's a pattern match operation */ + + t1 = tree_eval (tree->lnode); + + if (tree->type == Node_match || tree->type == Node_nomatch) { + t1=force_string(t1); + return (re_search (tree->rereg, t1->stptr, + t1->stlen, 0, t1->stlen, + NULL) == -1) + ^ (tree->type == Node_match); + } + + /* still no luck--- eval the right subtree and try binary ops */ + + t2 = tree_eval (tree->rnode); + + di=cmp_nodes(t1,t2); + + switch (tree->type) { + case Node_equal: + return di == 0; + case Node_notequal: + return di != 0; + case Node_less: + return di < 0; + case Node_greater: + return di > 0; + case Node_leq: + return di <= 0; + case Node_geq: + return di >= 0; +#ifndef FAST + default: + fprintf(stderr,"Panic: unknown conditonal\n"); + abort (); +#endif + } + return 0; +} + +/* FOO this doesn't properly compare "12.0" and 12.0 etc */ +/* or "1E1" and 10 etc */ +/* Perhaps someone should fix it. */ +/* Consider it fixed (jfw) */ + +/* strtod() would have been better, except (1) real awk is needlessly + * restrictive in what strings it will consider to be numbers, and + * (2) I couldn't find the public domain version anywhere handy. + */ +is_a_number(str) /* does the string str have pure-numeric syntax? */ +char *str; /* don't convert it, assume that atof is better */ +{ + if (*str == 0) return 1; /* null string has numeric value of0 */ + /* This is still a bug: in real awk, an explicit "" string + * is not treated as a number. Perhaps it is only variables + * that, when empty, are also 0s. This bug-lette here at + * least lets uninitialized variables to compare equal to + * zero like they should. + */ + if (*str == '-') str++; + if (*str == 0) return 0; + /* must be either . or digits (.4 is legal) */ + if (*str != '.' && !isdigit(*str)) return 0; + while (isdigit(*str)) str++; + if (*str == '.') { + str++; + while (isdigit(*str)) str++; + } + /* curiously, real awk DOESN'T consider "1E1" to be equal to 10! + * Or even equal to 1E1 for that matter! For a laugh, try: + * awk 'BEGIN {if ("1E1" == 1E1) print "eq"; else print "neq";exit}' + * Since this behavior is QUITE curious, I include the code for the + * adventurous. One might also feel like skipping leading whitespace + * (awk doesn't) and allowing a leading + (awk doesn't). +#ifdef Allow_Exponents + if (*str == 'e' || *str == 'E') { + str++; + if (*str == '+' || *str == '-') str++; + if (!isdigit(*str)) return 0; + while (isdigit(*str)) str++; + } +#endif + /* if we have digested the whole string, we are successful */ + return (*str == 0); +} + +cmp_nodes(t1,t2) +NODE *t1,*t2; +{ + register int di; + register AWKNUM d; + + + if(t1==t2) { + return 0; + } +#ifndef FAST + if(!t1 || !t2) { + abort(); + return t1 ? 1 : -1; + } + +#endif + if (t1->type == Node_number && t2->type == Node_number) { + d = t1->numbr - t2->numbr; + if (d < 0.0) + return -1; + if (d > 0.0) + return 1; + return 0; + } + t1=force_string(t1); + t2=force_string(t2); + /* "real" awk treats things as numbers if they both "look" like numbers. */ + if (*t1->stptr && *t2->stptr /* don't allow both to be empty strings(jfw)*/ + && is_a_number(t1->stptr) && is_a_number(t2->stptr)) { + double atof(); + d = atof(t1->stptr) - atof(t2->stptr); + if (d < 0.0) return -1; + if (d > 0.0) return 1; + return 0; + } + di = strncmp (t1->stptr, t2->stptr, min (t1->stlen, t2->stlen)); + if (di == 0) + di = t1->stlen - t2->stlen; + if(di>0) return 1; + if(di<0) return -1; + return 0; +} + + +#ifdef DONTDEF +int primes[] = {31,61,127,257,509,1021,2053,4099,8191,16381}; +#endif + +/* routines for associative arrays. SYMBOL is the address of the node + (or other pointer) being dereferenced. SUBS is a number or string + used as the subscript. */ + +/* #define ASSOC_HASHSIZE 1009 /* prime */ +#define ASSOC_HASHSIZE 29 +#define STIR_BITS(n) ((n) << 5 | (((n) >> 27) & 0x1f)) +#define HASHSTEP(old, c) ((old << 1) + c) +#define MAKE_POS(v) (v & ~0x80000000) /* make number positive */ + +/* static AHASH *assoc_table[ASSOC_HASHSIZE]; */ + + +/* Flush all the values in symbol[] before doing a split() */ +assoc_clear(symbol) +NODE *symbol; +{ + int i; + AHASH *bucket,*next; + + if(symbol->var_array==0) + return; + for(i=0;i<ASSOC_HASHSIZE;i++) { + for(bucket=symbol->var_array[i];bucket;bucket=next) { + next=bucket->next; + deref=bucket->name; + do_deref(); + deref=bucket->value; + do_deref(); + free((void *)bucket); + } + symbol->var_array[i]=0; + } +} + +/* Find SYMBOL[SUBS] in the assoc array. Install it with value "" if it + isn't there. */ +/* Returns a pointer ala get_lhs to where its value is stored */ +NODE ** +assoc_lookup (symbol, subs) +NODE *symbol, + *subs; +{ + int hash1 = 0, hashf(), i; + AHASH *bucket; + NODETYPE ty; + + if(subs->type==Node_number) { + hash1=(int)subs->numbr; + ty=Node_number; + } else { + ty=Node_string; + subs=force_string(subs); + for(i=0;i<subs->stlen;i++) + hash1=HASHSTEP(hash1,subs->stptr[i]); + + /* hash1 ^= (int) STIR_BITS((int)symbol); */ + } + hash1 = MAKE_POS(STIR_BITS((int)hash1)) % ASSOC_HASHSIZE; + + /* this table really should grow dynamically */ + if(symbol->var_array==0) { + symbol->var_array=(AHASH **)malloc(sizeof(AHASH *)*ASSOC_HASHSIZE); + for(i=0;i<ASSOC_HASHSIZE;i++) { + symbol->var_array[i]=0; + } + } else { + for (bucket = symbol->var_array[hash1]; bucket; bucket = bucket->next) { + if (bucket->name->type!= ty || cmp_nodes(bucket->name,subs)) + continue; + return &(bucket->value); + } + /* Didn't find it on first pass. Try again. */ + for (bucket = symbol->var_array[hash1]; bucket; bucket = bucket->next) { + if (cmp_nodes(bucket->name,subs)) + continue; + return &(bucket->value); + } + } + bucket = (AHASH *) malloc(sizeof (AHASH)); + bucket->symbol = symbol; + bucket->name = dupnode(subs); + bucket->value = Nnull_string; + bucket->next = symbol->var_array[hash1]; + symbol->var_array[hash1]=bucket; + return &(bucket->value); +} + +struct search * +assoc_scan(symbol) +NODE *symbol; +{ + struct search *lookat; + + if(!symbol->var_array) + return 0; + lookat=(struct search *)obstack_alloc(&other_stack,sizeof(struct search)); + /* lookat->symbol=symbol; */ + lookat->numleft=ASSOC_HASHSIZE; + lookat->arr_ptr=symbol->var_array; + lookat->bucket=symbol->var_array[0]; + return assoc_next(lookat); +} + +struct search * +assoc_next(lookat) +struct search *lookat; +{ + for(;lookat->numleft;lookat->numleft--) { + while(lookat->bucket!=0) { + lookat->retval=lookat->bucket->name; + lookat->bucket=lookat->bucket->next; + return lookat; + } + lookat->bucket= *++(lookat->arr_ptr); + } + return 0; +} + + +#ifdef FAST +NODE * +strforce(n) +NODE *n; +{ + extern NODE dumb[],*OFMT_node; + NODE *do_sprintf(); + + dumb[1].lnode=n; + if(OFMT_node->var_value->type!=Node_string) + panic("Insane value for OFMT detected."); + return do_sprintf(&dumb[0]); +} + +#else +AWKNUM +force_number (n) +NODE *n; +{ + double atof(); /* Forgetting this is bad */ + + if(n==NULL) + abort(); + switch (n->type) { + case Node_number: + return n->numbr; + case Node_string: + case Node_temp_string: + return atof(n->stptr); + default: + abort (); + } + return 0.0; +} + +NODE * +force_string(s) +NODE *s; +{ + if(s==NULL) + abort(); + switch(s->type) { + case Node_string: + case Node_temp_string: + return s; + case Node_number: + if((*get_lhs(OFMT_node))->type!=Node_string) + panic("Insane value for OFMT!",0); + dumb[1].lnode=s; + return do_sprintf(&dumb[0]); + default: + abort(); + } + return NULL; +} +#endif @@ -0,0 +1,900 @@ +/* awk3.c -- Builtin functions and various utility procedures + Copyright (C) 1986,1987 Free Software Foundation + Written by Jay Fenlason, December 1986 + + */ + +/* +GAWK is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY. No author or distributor accepts responsibility to anyone +for the consequences of using it or for whether it serves any +particular purpose or works at all, unless he says so in writing. +Refer to the GAWK General Public License for full details. + +Everyone is granted permission to copy, modify and redistribute GAWK, +but only under the conditions described in the GAWK General Public +License. A copy of this license is supposed to have been given to you +along with GAWK so you can know your rights and responsibilities. It +should be in a file named COPYING. Among other things, the copyright +notice and this notice must be preserved on all copies. + +In other words, go ahead and share GAWK, but don't try to stop +anyone else from sharing it farther. Help stamp out software hoarding! +*/ +#include <stdio.h> +#include "awk.h" + +#include <obstack.h> + +extern struct obstack temp_strings; + +/* This node is the cannonical null string, used everywhere */ +extern NODE *Nnull_string; + +/* These nodes store all the special variables gAWK uses */ +NODE *FS_node, *NF_node, *RS_node, *NR_node; +NODE *FILENAME_node, *OFS_node, *ORS_node, *OFMT_node; + +/* This dumb kludge is used by force_string to turn a floating point + number into a string */ +NODE dumb[2]; + +NODE **get_lhs(); +FILE *deal_redirect(); + +struct redirect { + int flag; /* JF was NODETYPE */ + NODE *value; + FILE *fp; +}; +struct redirect reds[20]; /* An arbitrary limit, surely, but there's an + arbitrary limit on open files, too. So it + doesn't make much difference, does it? */ + + +long NR; +int NF; + +/* The next #define tells how you find $0. Its a hack */ +extern NODE **fields_arr; +#define WHOLELINE fields_arr[0] + +/* Set all the special variables to their initial values. Also sets up + the dumb[] array for force_string */ +init_vars() +{ + NODE *spc_var(); + NODE *do_sprintf(); + + FS_node=spc_var("FS",make_string(" ",1)); + NF_node=spc_var("NF",make_number(0.0)); + RS_node=spc_var("RS",make_string("\n",1)); + NR_node=spc_var("NR",make_number(0.0)); + FILENAME_node=spc_var("FILENAME",Nnull_string); + OFS_node=spc_var("OFS",make_string(" ",1)); + ORS_node=spc_var("ORS",make_string("\n",1)); + OFMT_node=spc_var("OFMT",make_string("%.6g",4)); + + /* This ugly hack is used by force_string + to fake a call to sprintf */ + dumb[0].type=Node_expression_list; + dumb[0].lnode=OFMT_node; + dumb[0].rnode= &dumb[1]; + dumb[1].type=Node_expression_list; + dumb[1].lnode=(NODE *)0; /* fill in the var here */ + dumb[1].rnode=(NODE *)0; + reds[0].flag=0; /* Don't depend on uninit data being zero, although it should be */ +} + +/* OFMT is special because we don't dare use force_string on it for fear of + infinite loops. Thus, if it isn't a string, we return the default "%.6g" + This may or may not be the right thing to do, but its the easiest */ +/* This routine isn't used! It should be. */ +char *get_ofmt() +{ + register NODE *tmp; + + tmp= *get_lhs(OFMT_node); + if(tmp->type!=Node_string || tmp->stlen==0) + return "%.6g"; + return tmp->stptr; +} + +int +get_fs() +{ + register NODE *tmp; + + tmp=force_string(FS_node->var_value); + if(tmp->stlen==0) return 0; + return *(tmp->stptr); +} + +set_fs(str) +char *str; +{ + register NODE **tmp; + + tmp= get_lhs(FS_node); + do_deref(); + /* stupid special case so -F\t works as documented in awk */ + /* even though the shell hands us -Ft. Bleah! (jfw) */ + if (*str == 't') *str == '\t'; + *tmp=make_string(str,1); +} + +set_rs(str) +char *str; +{ + register NODE **tmp; + + tmp= get_lhs(RS_node); + do_deref(); + /* stupid special case to be consistent with -F (jfw) */ + if (*str == 't') *str == '\t'; + *tmp=make_string(str,1); +} + + +int +get_rs() +{ + register NODE *tmp; + + tmp=force_string(RS_node->var_value); + if(tmp->stlen==0) return 0; + return *(tmp->stptr); +} + + +/* Builtin functions */ +NODE * +do_exp(tree) +NODE *tree; +{ + NODE *tmp; + double exp(); + + get_one(tree,&tmp); + return tmp_number(exp(force_number(tmp))); +} + +/* JF: I don't know what this should return. */ +/* jfw: 1 if successful or by land, 0 if end of file or by sea */ +NODE * +do_getline(tree) +NODE *tree; +{ + if(inrec() == 0) + return tmp_number(1.0); + else + return tmp_number(0.0); +} + +NODE * +do_index(tree) +NODE *tree; +{ + NODE *s1,*s2; + register char *p1,*p2; + register int l1,l2; + + get_two(tree,&s1,&s2); + p1=s1->stptr; + p2=s2->stptr; + l1=s1->stlen; + l2=s2->stlen; + while(l1) { + if(!strncmp(p1,p2,l2)) + return tmp_number((AWKNUM)(1+s1->stlen-l1)); + l1--; + p1++; + } + return tmp_number(0.0); +} + +NODE * +do_int(tree) +NODE *tree; +{ + NODE *tmp; + double floor(); + + get_one(tree,&tmp); + return tmp_number(floor(force_number(tmp))); +} + +NODE * +do_length(tree) +NODE *tree; +{ + NODE *tmp; + + get_one(tree,&tmp); + return tmp_number((AWKNUM)(force_string(tmp)->stlen)); +} + +NODE * +do_log(tree) +NODE *tree; +{ + NODE *tmp; + double log(); + + get_one(tree,&tmp); + return tmp_number(log(force_number(tmp))); +} + + +NODE * +do_printf(tree) +NODE *tree; +{ + register FILE *fp; + NODE *do_sprintf(); + + fp=deal_redirect(tree->rnode); + print_simple(do_sprintf(tree->lnode),fp); + return Nnull_string; +} + + +NODE * +do_split(tree) +NODE *tree; +{ + NODE *t1,*t2,*t3; + register int splitc; + register int num,snum,olds; + register char *ptr,*oldp; + NODE **assoc_lookup(); + + if(a_get_three(tree,&t1,&t2,&t3)<3) + splitc= get_fs(); + else + splitc= *(force_string(t3)->stptr); + num=0; + tree=force_string(t1); + olds=snum=tree->stlen; + oldp=ptr=tree->stptr; + assoc_clear(t2); + while(snum--) { + if(*ptr++==splitc) { + *assoc_lookup(t2,make_number((AWKNUM)(++num)))=make_string(oldp,(olds-snum)-1); + oldp=ptr; + olds=snum; + } + } + *assoc_lookup(t2,make_number((AWKNUM)(++num)))=make_string(oldp,(olds-snum)-1); + return tmp_number((AWKNUM)num); +} + +/* Note that the output buffer cannot be static because sprintf may get called + recursively by force_string. Hence the wasteful alloca calls */ + +/* %e and %f formats are not properly implemented. Someone should fix them */ +NODE * +do_sprintf(tree) +NODE *tree; +{ +#define bchunk(s,l) if(l) {\ + if((l)>ofre) {\ + char *tmp;\ + tmp=(char *)alloca(osiz*2);\ + bcopy(obuf,tmp,olen);\ + obuf=tmp;\ + ofre+=osiz;\ + osiz*=2;\ + }\ + bcopy(s,obuf+olen,(l));\ + olen+=(l);\ + ofre-=(l);\ + } + +/* Is there space for something L big in the buffer? */ +#define chksize(l) if((l)>ofre) {\ + char *tmp;\ + tmp=(char *)alloca(osiz*2);\ + bcopy(obuf,tmp,olen);\ + obuf=tmp;\ + ofre+=osiz;\ + osiz*=2;\ + } +/* Get the next arg to be formatted. If we've run out of args, return + "" (Null string) */ +#define parse_next_arg() {\ + if(!carg) arg= Nnull_string;\ + else {\ + get_one(carg,&arg);\ + carg=carg->rnode;\ + }\ + } + + char *obuf; + int osiz,ofre,olen; + static char chbuf[] = "0123456789abcdef"; + static char sp[] =" "; + char *s0,*s1; + int n0; + NODE *sfmt,*arg; + register NODE *carg; + long fw,prec,lj,alt,big; + long *cur; + long val; + unsigned long uval; + int sgn; + int base; + char cpbuf[30]; /* if we have numbers bigger than 30 */ + char *cend= &cpbuf[30]; /* chars, we lose, but seems unlikely */ + char *cp; + char *fill; + double tmpval; + char *pr_str; + + + obuf=(char *)alloca(120); + osiz=120; + ofre=osiz; + olen=0; + get_one(tree,&sfmt); + sfmt=force_string(sfmt); + carg=tree->rnode; + for(s0=s1=sfmt->stptr,n0=sfmt->stlen;n0-->0;) { + if(*s1!='%') { + s1++; + continue; + } + + bchunk(s0,s1-s0); + s0=s1; + cur= &fw; + fw=0; + prec=0; + lj=alt=big=0; + fill= sp; + cp=cend; + s1++; + + retry: + --n0; + switch(*s1++) { + case '%': + bchunk("%",1); + s0=s1; + break; + + case '0': + if(fill!=sp || lj) goto lose; + fill="0"; /* FALL through */ + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + if(cur==0) + goto lose; + *cur= s1[-1]-'0'; + while(n0>0 && *s1>='0' && *s1<='9') { + --n0; + *cur= *cur * 10 + *s1++ - '0'; + } + goto retry; + case '-': + if(lj || fill!=sp) goto lose; + lj++; + goto retry; + case '.': + if(cur!=&fw) goto lose; + cur= ≺ + goto retry; + case '#': + if(alt) goto lose; + alt++; + goto retry; + case 'l': + if(big) goto lose; + big++; + goto retry; + case '*': + if(cur==0) goto lose; + parse_next_arg(); + *cur=(int)arg; + goto retry; + case 'c': + parse_next_arg(); + if(arg->type==Node_number) { + uval=(unsigned long)arg->numbr; + cpbuf[0]=uval; + prec=1; + pr_str=cpbuf; + goto dopr_string; + } + if(!prec || prec>arg->stlen) + prec=arg->stlen; + pr_str=cpbuf; + goto dopr_string; + case 's': + parse_next_arg(); + arg=force_string(arg); + if(!prec || prec>arg->stlen) + prec=arg->stlen; + pr_str=arg->stptr; + + dopr_string: + if(fw>prec && !lj) { + while(fw>prec) { + bchunk(sp,1); + fw--; + } + } + bchunk(pr_str,(int)prec); + if(fw>prec) { + while(fw>prec) { + bchunk(sp,1); + fw--; + } + } + s0=s1; + break; + case 'd': + parse_next_arg(); + val=(long)force_number(arg); + if(val<0) { + sgn=1; + val= -val; + } else sgn=0; + do { + *--cp='0'+val%10; + val/=10; + } while (val); + if(sgn) *--cp='-'; + prec=cend-cp; + if(fw>prec && !lj) { + if(fill!=sp && *cp=='-') { + bchunk(cp,1); + cp++; + prec--; + fw--; + } + while(fw>prec) { + bchunk(fill,1); + fw--; + } + } + bchunk(cp,(int)prec); + if(fw>prec) { + while(fw>prec) { + bchunk(fill,1); + fw--; + } + } + s0=s1; + break; + case 'u': + base=10; + goto pr_unsigned; + case 'o': + base=8; + goto pr_unsigned; + case 'x': + base=16; + goto pr_unsigned; + pr_unsigned: + parse_next_arg(); + uval=(unsigned long)force_number(arg); + do { + *--cp=chbuf[uval%base]; + uval/=base; + } while(uval); + prec=cend-cp; + if(fw>prec && !lj) { + while(fw>prec) { + bchunk(fill,1); + fw--; + } + } + bchunk(cp,(int)prec); + if(fw>prec) { + while(fw>prec) { + bchunk(fill,1); + fw--; + } + } + s0=s1; + break; + case 'g': + parse_next_arg(); + tmpval=force_number(arg); + if(prec==0) prec=13; + gcvt(tmpval,prec,cpbuf); + prec=strlen(cpbuf); + cp=cpbuf; + if(fw>prec && !lj) { + if(fill!=sp && *cp=='-') { + bchunk(cp,1); + cp++; + prec--; + } /* Deal with .5 as 0.5 */ + if(fill==sp && *cp=='.') { + --fw; + while(--fw>=prec) { + bchunk(fill,1); + } + bchunk("0",1); + } else + while(fw-->prec) bchunk(fill,1); + } else { /* Turn .5 into 0.5 */ + /* FOO */ + if(*cp=='.' && fill==sp) { + bchunk("0",1); + --fw; + } + } + bchunk(cp,(int)prec); + if(fw>prec) while(fw-->prec) bchunk(fill,1); + s0=s1; + break; + /* JF how to handle these!? */ + case 'f': + parse_next_arg(); + tmpval=force_number(arg); + chksize(fw+prec+5); /* 5==slop */ +/* cp=fcvt(tmpval,prec,&dec,&sgn); + prec=strlen(cp); + if(sgn) prec++; */ + cp=cpbuf; + *cp++='%'; + if(lj) *cp++='-'; + if(fill!=sp) *cp++='0'; + if(prec!=0) { + strcpy(cp,"*.*f"); + sprintf(obuf+olen,cpbuf,fw,prec,(double)tmpval); + } else { + strcpy(cp,"*f"); + sprintf(obuf+olen,cpbuf,fw,(double)tmpval); + } + cp=obuf+olen; + ofre-=strlen(obuf+olen); + olen+=strlen(obuf+olen);/* There may be nulls */ + s0=s1; + break; + case 'e': + parse_next_arg(); + tmpval=force_number(arg); + chksize(fw+prec+5); /* 5==slop */ + cp=cpbuf; + *cp++='%'; + if(lj) *cp++='-'; + if(fill!=sp) *cp++='0'; + if(prec!=0) { + strcpy(cp,"*.*e"); + sprintf(obuf+olen,cpbuf,fw,prec,(double)tmpval); + } else { + strcpy(cp,"*e"); + sprintf(obuf+olen,cpbuf,fw,(double)tmpval); + } + cp=obuf+olen; + ofre-=strlen(obuf+olen); + olen+=strlen(obuf+olen);/* There may be nulls */ + s0=s1; + break; + break; + /* case 'g': + parse_next_arg(); + tmpval=force_number(arg); + if(prec!=0) sprintf(obuf+osiz-ofre,"%*.*g",fw,prec,(double)tmpval); + else sprintf(obuf+osiz-ofre,"%*g",fw,(double)tmpval); + ofre-=strlen(obuf+osiz-ofre); + s0=s1; + break; */ + default: + lose: + break; + } + } + bchunk(s0,s1-s0); + return tmp_string(obuf,olen); +} + +NODE * +do_sqrt(tree) +NODE *tree; +{ + NODE *tmp; + double sqrt(); + + get_one(tree,&tmp); + return tmp_number(sqrt(force_number(tmp))); +} + +NODE * +do_substr(tree) +NODE *tree; +{ + NODE *t1,*t2,*t3; + register int n1,n2; + + if(get_three(tree,&t1,&t2,&t3)<3) + n2=32000; + else + n2=(int)force_number(t3); + n1=(int)force_number(t2)-1; + tree=force_string(t1); + if(n1<0 || n1>=tree->stlen || n2<=0) + return Nnull_string; + if(n1+n2>tree->stlen) + n2=tree->stlen-n1; + return tmp_string(tree->stptr+n1,n2); +} + +/* The print command. Its name is historical */ +hack_print_node(tree) +NODE *tree; +{ + register FILE *fp; + +#ifndef FAST + if(!tree || tree->type != Node_K_print) + abort(); +#endif + fp=deal_redirect(tree->rnode); + tree=tree->lnode; + if(!tree) tree=WHOLELINE; + if(tree->type!=Node_expression_list) { + print_simple(tree,fp); + } else { + while(tree) { + print_simple(tree_eval(tree->lnode),fp); + tree=tree->rnode; + if(tree) print_simple(OFS_node->var_value,fp); + } + } + print_simple(ORS_node->var_value,fp); +} + + +/* Get the arguments to functions. No function cares if you give it + too many args (they're ignored). Only a few fuctions complain + about being given too few args. The rest have defaults */ + +get_one(tree,res) +NODE *tree,**res; +{ + if(!tree) { + *res= WHOLELINE; + return; + } +#ifndef FAST + if(tree->type!=Node_expression_list) + abort(); +#endif + *res=tree_eval(tree->lnode); +} + +get_two(tree,res1,res2) +NODE *tree,**res1,**res2; +{ + if(!tree) { + *res1= WHOLELINE; + return; + } +#ifndef FAST + if(tree->type!=Node_expression_list) + abort(); +#endif + *res1=tree_eval(tree->lnode); + if(!tree->rnode) + return; + tree=tree->rnode; +#ifndef FAST + if(tree->type!=Node_expression_list) + abort(); +#endif + *res2=tree_eval(tree->lnode); +} + +get_three(tree,res1,res2,res3) +NODE *tree,**res1,**res2,**res3; +{ + if(!tree) { + *res1= WHOLELINE; + return 0; + } +#ifndef FAST + if(tree->type!=Node_expression_list) + abort(); +#endif + *res1=tree_eval(tree->lnode); + if(!tree->rnode) + return 1; + tree=tree->rnode; +#ifndef FAST + if(tree->type!=Node_expression_list) + abort(); +#endif + *res2=tree_eval(tree->lnode); + if(!tree->rnode) + return 2; + tree=tree->rnode; +#ifndef FAST + if(tree->type!=Node_expression_list) + abort(); +#endif + *res3=tree_eval(tree->lnode); + return 3; +} + +a_get_three(tree,res1,res2,res3) +NODE *tree,**res1,**res2,**res3; +{ + if(!tree) { + *res1= WHOLELINE; + return 0; + } +#ifndef FAST + if(tree->type!=Node_expression_list) + abort(); +#endif + *res1=tree_eval(tree->lnode); + if(!tree->rnode) + return 1; + tree=tree->rnode; +#ifndef FAST + if(tree->type!=Node_expression_list) + abort(); +#endif + *res2=tree->lnode; + if(!tree->rnode) + return 2; + tree=tree->rnode; +#ifndef FAST + if(tree->type!=Node_expression_list) + abort(); +#endif + *res3=tree_eval(tree->lnode); + return 3; +} + +/* FOO this should re-allocate the buffer if it isn't big enough. + Also, it should do RMS style only-parse-enough stuff. */ +/* This reads in a line from the input file */ +inrec() +{ + static char *buf,*buf_end; + static bsz; + register char *cur; + register char *tmp; + register char *ttmp; + int cnt; + int tcnt; + register int c; + int rs; + int fs; + extern FILE *input_file; + NODE **get_lhs(); + + rs = get_rs(); + fs = get_fs(); + blank_fields(); + NR++; + NF=0; + if(!buf) { + buf=malloc(128); + bsz=128; + buf_end=buf+bsz; + } + cur=buf; + cnt=0; + while ((c=getc(input_file))!=EOF) { + if((!rs && c=='\n' && cur[-1]=='\n' && cur!=buf) || (c == rs)) + break; + *cur++=c; + cnt++; + if(cur==buf_end) { + buf=realloc(buf,bsz*2); + cur=buf+bsz; + bsz*=2; + buf_end=buf+bsz; + } + } + *cur='\0'; + set_field(0,buf,cnt); + assign_number(&(NF_node->var_value),0.0); + if(c==EOF && cnt==0) + return 1; + assign_number(&(NR_node->var_value),1.0+force_number(NR_node->var_value)); + for(tmp=buf;tmp<cur;tmp++) { + if(fs==' ') { + while((*tmp==' ' || *tmp=='\t') && tmp<cur) + tmp++; + if(tmp>=cur) + break; + } + tcnt=0; + ttmp=tmp; + if(fs==' ') { + while(*tmp!=' ' && *tmp!='\t' && tmp<cur) { + tmp++; + tcnt++; + } + } else { + while(*tmp!=fs && tmp<cur) { + tmp++; + tcnt++; + } + } + set_field(++NF,ttmp,tcnt); + } + assign_number(&(NF_node->var_value),(AWKNUM)NF); + return 0; +} + +/* Redirection for printf and print commands */ +FILE * +deal_redirect(tree) +NODE *tree; +{ + register NODE *tmp; + register struct redirect *rp; + register char *str; + register FILE *fp; + FILE *popen(); + int tflag; + + + if(!tree) return stdout; + tflag= (tree->type==Node_redirect_pipe) ? 1 : 2; + tmp=tree_eval(tree->subnode); + for(rp=reds;rp->flag!=0 && rp<&reds[20];rp++) { /* That limit again */ + if(rp->flag==tflag && cmp_nodes(rp->value,tmp)==0) + break; + } + if(rp==&reds[20]) { + panic("too many redirections",0); + return 0; + } + if(rp->flag!=0) + return rp->fp; + rp->flag=tflag; + rp->value=dupnode(tmp); + str=force_string(tmp)->stptr; + switch(tree->type) { + case Node_redirect_output: + fp=rp->fp=fopen(str,"w"); + break; + case Node_redirect_append: + fp=rp->fp=fopen(str,"a"); + break; + case Node_redirect_pipe: + fp=rp->fp=popen(str,"w"); + break; + } + if(fp==0) panic("can't redirect to '%s'\n",str); + rp++; + rp->flag=0; + return fp; +} + +print_simple(tree,fp) +NODE *tree; +FILE *fp; +{ +#ifndef FAST + /* Deal with some obscure bugs */ + if(tree==(NODE *)0x55000000) { + fprintf(fp,"***HUH***"); + return; + } + if((int)tree&01) { + fprintf(fp,"$that's odd$"); + return; + } +#endif + tree=force_string(tree); + fwrite(tree->stptr,sizeof(char),tree->stlen,fp); +} + diff --git a/debug.c b/debug.c new file mode 100644 index 00000000..59033abf --- /dev/null +++ b/debug.c @@ -0,0 +1,485 @@ +/* + Debug.c -- Various debugging routines + + Copyright (C) 1986 Free Software Foundation + Written by Jay Fenlason, December 1986 + + */ + +/* +GAWK is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY. No author or distributor accepts responsibility to anyone +for the consequences of using it or for whether it serves any +particular purpose or works at all, unless he says so in writing. +Refer to the GAWK General Public License for full details. + +Everyone is granted permission to copy, modify and redistribute GAWK, +but only under the conditions described in the GAWK General Public +License. A copy of this license is supposed to have been given to you +along with GAWK so you can know your rights and responsibilities. It +should be in a file named COPYING. Among other things, the copyright +notice and this notice must be preserved on all copies. + +In other words, go ahead and share GAWK, but don't try to stop +anyone else from sharing it farther. Help stamp out software hoarding! +*/ +#include "awk.h" +#include <stdio.h> + +#ifndef FAST + +extern NODE **fields_arr; +extern f_arr_siz; + + +/* This is all debugging stuff. Ignore it and maybe it'll go away. */ + +/* Some of it could be turned into a really cute trace command, if anyone + wants to. */ +char *nnames[] = { + "Illegal Node", + "Times", "Divide", "Mod", "Plus", "Minus", + "Cond-pair" /* jfw */, "Subscript", "Concat", + "++Pre", "--Pre", "Post++", + "Post--", "Uminus", "Field", + "Assign", "*=", "/=", "%=", + "+=", "-=", + "And", "Or", + "Equal", "!=", "Less", "Greater", "<=", ">=", + "Not", + "Match", "Nomatch", + "String", "TmpString", "Number", + "Rule_list", "Rule_node", "State_list", "If_branches", "Exp_list", + "BEGIN", "END", "IF", "WHILE", "FOR", + "arrayfor", "BREAK", "CONTINUE", "PRINT", "PRINTF", + "next", "exit", "redirect", "Append", + "Pipe", "variable", "Varray", "builtin", + "Line-range" /*jfw*/, +}; + +ptree(n) +{ + print_parse_tree((NODE *)n); +} + +pt() +{ + int x; + scanf("%x",&x); + printf("0x%x\n",x); + print_parse_tree((NODE *)x); + fflush(stdout); +} + +static depth = 0; +print_parse_tree(ptr) +NODE *ptr; +{ + register int n; + + if(!ptr) { + printf("NULL\n"); + return; + } + if((int)(ptr->type)<0 || (int)(ptr->type)>sizeof(nnames)/sizeof(nnames[0])) { + printf("(0x%x Type %d??)\n",ptr,ptr->type); + return; + } + printf("(%d)%*s",depth,depth,""); + switch((int)ptr->type) { + case (int)Node_string: + case (int)Node_temp_string: + printf("(0x%x String \"%.*s\")\n",ptr,ptr->stlen,ptr->stptr); + return; + case (int)Node_number: + printf("(0x%x Number %g)\n",ptr,ptr->numbr); + return; + case (int)Node_var_array: + printf("(0x%x Array of %d)\n",ptr,ptr->arrsiz); + for(n=0;n<ptr->arrsiz;n++) { + printf("'"); + print_simple((ptr->array)[n*2],stdout); + printf("' is '"); + print_simple((ptr->array)[n*2+1],stdout); + printf("'\n"); + } + return; + } + if(ptr->lnode) printf("0x%x = left<--",ptr->lnode); + printf("(0x%x %s.%d)",ptr,nnames[(int)(ptr->type)],ptr->type); + if(ptr->rnode) printf("-->right = 0x%x",ptr->rnode); + printf("\n"); + depth++; + if(ptr->lnode) + print_parse_tree(ptr->lnode); + switch((int)ptr->type) { + case (int)Node_line_range: /* jfw */ + case (int)Node_match: + case (int)Node_nomatch: + break; + case (int)Node_builtin: + printf("Builtin: %d\n",ptr->proc); /* jfw: was \N */ + break; + case (int)Node_K_for: + case (int)Node_K_arrayfor: + printf("(%s:)\n",nnames[(int)(ptr->type)]); + print_parse_tree(ptr->forloop->init); + printf("looping:\n"); + print_parse_tree(ptr->forloop->cond); + printf("doing:\n"); + print_parse_tree(ptr->forloop->incr); + break; + default: + if(ptr->rnode) + print_parse_tree(ptr->rnode); + break; + } + --depth; +} +#endif + +#ifndef FAST +/* + * print out all the variables in the world + */ + +dump_vars() +{ + register int n; + register HASHNODE *buc; + + printf("Fields:"); + dump_fields(); + printf("Vars:\n"); + for(n=0;n<HASHSIZE;n++) { + for(buc=variables[n];buc;buc=buc->next) { + printf("'%.*s': ",buc->length,buc->name); + print_simple(buc->value->var_value,stdout); + printf(":"); + print_parse_tree(buc->value->lnode); + /* print_parse_tree(buc->value); */ + } + } + printf("End\n"); +} +#endif + +#ifndef FAST +dump_fields() +{ + register NODE **p; + register int n; + + printf("%d fields\n",f_arr_siz); + for(n=0,p= &fields_arr[0];n<f_arr_siz;n++,p++) { + printf("$%d is '",n); + print_simple(*p,stdout); + printf("'\n"); + } +} +#endif + +#ifndef FAST +/*VARARGS1*/ +print_debug(str,n) +char *str; +{ + extern int debugging; + + if(debugging) + printf("%s:%d\n",str,n); +} + +int indent = 0; + +print_a_node(ptr) +NODE *ptr; +{ + NODE *p1; + char *str,*str2; + int n; + HASHNODE *buc; + + if(!ptr) return; /* don't print null ptrs */ + switch(ptr->type) { + case Node_number: + printf("%g",ptr->numbr); + return; + case Node_string: + printf("\"%.*s\"",ptr->stlen,ptr->stptr); + return; + case Node_times: + str="*"; + goto pr_twoop; + case Node_quotient: + str="/"; + goto pr_twoop; + case Node_mod: + str="%"; + goto pr_twoop; + case Node_plus: + str="+"; + goto pr_twoop; + case Node_minus: + str="-"; + goto pr_twoop; + case Node_concat: + str=" "; + goto pr_twoop; + case Node_assign: + str="="; + goto pr_twoop; + case Node_assign_times: + str="*="; + goto pr_twoop; + case Node_assign_quotient: + str="/="; + goto pr_twoop; + case Node_assign_mod: + str="%="; + goto pr_twoop; + case Node_assign_plus: + str="+="; + goto pr_twoop; + case Node_assign_minus: + str="-="; + goto pr_twoop; + case Node_and: + str="&&"; + goto pr_twoop; + case Node_or: + str="||"; + goto pr_twoop; + case Node_equal: + str="=="; + goto pr_twoop; + case Node_notequal: + str="!="; + goto pr_twoop; + case Node_less: + str="<"; + goto pr_twoop; + case Node_greater: + str=">"; + goto pr_twoop; + case Node_leq: + str="<="; + goto pr_twoop; + case Node_geq: + str=">="; + goto pr_twoop; + + pr_twoop: + print_a_node(ptr->lnode); + printf("%s",str); + print_a_node(ptr->rnode); + return; + + case Node_not: + str="!"; + str2=""; + goto pr_oneop; + case Node_field_spec: + str="$("; + str2=")"; + goto pr_oneop; + case Node_postincrement: + str=""; + str2="++"; + goto pr_oneop; + case Node_postdecrement: + str=""; + str2="--"; + goto pr_oneop; + case Node_preincrement: + str="++"; + str2=""; + goto pr_oneop; + case Node_predecrement: + str="--"; + str2=""; + goto pr_oneop; + pr_oneop: + printf(str); + print_a_node(ptr->subnode); + printf(str2); + return; + + case Node_expression_list: + print_a_node(ptr->lnode); + if(ptr->rnode) { + printf(","); + print_a_node(ptr->rnode); + } + return; + + case Node_var: + for(n=0;n<HASHSIZE;n++) { + for(buc=variables[n];buc;buc=buc->next) { + if(buc->value==ptr) { + printf("%.*s",buc->length,buc->name); + n=HASHSIZE; + break; + } + } + } + return; + case Node_subscript: + print_a_node(ptr->lnode); + printf("["); + print_a_node(ptr->rnode); + printf("]"); + return; + case Node_builtin: + printf("some_builtin("); + print_a_node(ptr->subnode); + printf(")"); + return; + + case Node_statement_list: + printf("{\n"); + indent++; + for(n=indent;n;--n) + printf(" "); + while(ptr) { + print_maybe_semi(ptr->lnode); + if(ptr->rnode) + for(n=indent;n;--n) + printf(" "); + ptr=ptr->rnode; + } + --indent; + for(n=indent;n;--n) + printf(" "); + printf("}\n"); + for(n=indent;n;--n) + printf(" "); + return; + + case Node_K_if: + printf("if("); + print_a_node(ptr->lnode); + printf(") "); + ptr=ptr->rnode; + if(ptr->lnode->type==Node_statement_list) { + printf("{\n"); + indent++; + for(p1=ptr->lnode;p1;p1=p1->rnode) { + for(n=indent;n;--n) + printf(" "); + print_maybe_semi(p1->lnode); + } + --indent; + for(n=indent;n;--n) + printf(" "); + if(ptr->rnode) { + printf("} else "); + } else { + printf("}\n"); + return; + } + } else { + print_maybe_semi(ptr->lnode); + if(ptr->rnode) { + for(n=indent;n;--n) + printf(" "); + printf("else "); + } else return; + } + if(!ptr->rnode) return; + deal_with_curls(ptr->rnode); + return; + + case Node_K_for: + printf("for("); + print_a_node(ptr->forloop->init); + printf(";"); + print_a_node(ptr->forloop->cond); + printf(";"); + print_a_node(ptr->forloop->incr); + printf(") "); + deal_with_curls(ptr->forsub); + return; + case Node_K_arrayfor: + printf("for("); + print_a_node(ptr->forloop->init); + printf(" in "); + print_a_node(ptr->forloop->incr); + printf(") "); + deal_with_curls(ptr->forsub); + return; + + case Node_K_printf: + printf("printf("); + print_a_node(ptr->lnode); + printf(")"); + return; + case Node_K_print: + printf("print("); + print_a_node(ptr->lnode); + printf(")"); + return; + case Node_K_next: + printf("next"); + return; + case Node_K_break: + printf("break"); + return; + default: + print_parse_tree(ptr); + return; + } +} + +print_maybe_semi(ptr) +NODE *ptr; +{ + print_a_node(ptr); + switch(ptr->type) { + case Node_K_if: + case Node_K_for: + case Node_K_arrayfor: + case Node_statement_list: + break; + default: + printf(";\n"); + break; + } +} +deal_with_curls(ptr) +NODE *ptr; +{ + int n; + + if(ptr->type==Node_statement_list) { + printf("{\n"); + indent++; + while(ptr) { + for(n=indent;n;--n) + printf(" "); + print_maybe_semi(ptr->lnode); + ptr=ptr->rnode; + } + --indent; + for(n=indent;n;--n) + printf(" "); + printf("}\n"); + } else { + print_maybe_semi(ptr); + } +} + +NODE * +do_prvars() +{ + dump_vars(); + return Nnull_string; +} + +NODE * +do_bp() +{ + return Nnull_string; +} + +#endif diff --git a/obstack.c b/obstack.c new file mode 100644 index 00000000..66148106 --- /dev/null +++ b/obstack.c @@ -0,0 +1,157 @@ +/* obstack.c - subroutines used implicitly by object stack macros + Copyright (c) 1986 Free Software Foundation, Inc. + + NO WARRANTY + + BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY +NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT +WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC, +RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE THIS PROGRAM "AS IS" +WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, +BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND +FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY +AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE +DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR +CORRECTION. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M. +STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY +WHO MAY MODIFY AND REDISTRIBUTE THIS PROGRAM AS PERMITTED BELOW, BE +LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR +OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR +DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR +A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) THIS +PROGRAM, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH +DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY. + + GENERAL PUBLIC LICENSE TO COPY + + 1. You may copy and distribute verbatim copies of this source file +as you receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy a valid copyright notice "Copyright +(C) 1986 Free Software Foundation, Inc."; and include following the +copyright notice a verbatim copy of the above disclaimer of warranty +and of this License. + + 2. You may modify your copy or copies of this source file or +any portion of it, and copy and distribute such modifications under +the terms of Paragraph 1 above, provided that you also do the following: + + a) cause the modified files to carry prominent notices stating + that you changed the files and the date of any change; and + + b) cause the whole of any work that you distribute or publish, + that in whole or in part contains or is a derivative of this + program or any part thereof, to be freely distributed + and licensed to all third parties on terms identical to those + contained in this License Agreement (except that you may choose + to grant more extensive warranty protection to third parties, + at your option). + + 3. You may copy and distribute this program or any portion of it in +compiled, executable or object code form under the terms of Paragraphs +1 and 2 above provided that you do the following: + + a) cause each such copy to be accompanied by the + corresponding machine-readable source code, which must + be distributed under the terms of Paragraphs 1 and 2 above; or, + + b) cause each such copy to be accompanied by a + written offer, with no time limit, to give any third party + free (except for a nominal shipping charge) a machine readable + copy of the corresponding source code, to be distributed + under the terms of Paragraphs 1 and 2 above; or, + + c) in the case of a recipient of this program in compiled, executable + or object code form (without the corresponding source code) you + shall cause copies you distribute to be accompanied by a copy + of the written offer of source code which you received along + with the copy you received. + + 4. You may not copy, sublicense, distribute or transfer this program +except as expressly provided under this License Agreement. Any attempt +otherwise to copy, sublicense, distribute or transfer this program is void and +your rights to use the program under this License agreement shall be +automatically terminated. However, parties who have received computer +software programs from you with this License Agreement will not have +their licenses terminated so long as such parties remain in full compliance. +*/ + +#include <obstack.h> + +void +_obstack_begin (h, chunkfun) + struct obstack *h; + int (*chunkfun) (); +{ + register _Ll* chunk; /* points to new chunk */ + chunk = h->chunk = + (_Ll*) (*chunkfun) (h->chunk_size); + h->next_free = h->object_base = chunk->obstack_l_0; + h->chunk_limit = chunk->obstack_l_limit + = (char *) chunk + h->chunk_size; + chunk->obstack_l_prev = 0; +} + +/* Allocate a new current chunk for the obstack *H + on the assumption that LENGTH bytes need to be added + to the current object, or a new object of length LENGTH allocated. + Copies any partial object from the end of the old chunk + to the beginning of the new one. */ + +void +_obstack_newchunk (h, chunkfun, length) + struct obstack *h; + int (*chunkfun) (); + int length; +{ + register _Ll* old_chunk = h->chunk; + register _Ll* new_chunk; + register long new_size; + register int obj_size = h->next_free - h->object_base; + + /* Compute size for new chunk. */ + new_size = (obj_size + length) << 1; + if (new_size < h->chunk_size) + new_size = h->chunk_size; + + /* Allocate and initialize the new chunk. */ + new_chunk = h->chunk = (_Ll*) (*chunkfun) (new_size); + new_chunk->obstack_l_prev = old_chunk; + new_chunk->obstack_l_limit = h->chunk_limit = (char *) new_chunk + new_size; + + /* Move the existing object to the new chunk. */ + bcopy (h->object_base, new_chunk->obstack_l_0, obj_size); + h->object_base = new_chunk->obstack_l_0; + h->next_free = h->object_base + obj_size; + }; + +void +_obstack_free (h, freechunkfun, obj) + struct obstack *h; + void (*freechunkfun) (); + char *obj; +{ + register _Ll* lp; /* below addr of any objects in this chunk */ + register _Ll* plp; /* point to previous chunk if any */ + + lp = (h)->chunk; + while (lp != 0 && ((char *)lp > obj || (h)->chunk_limit < obj)) + { + plp = lp -> obstack_l_prev; + (*freechunkfun) (lp); + if(lp==plp) + plp=0; + lp = plp; + } + if (lp) + { + (h)->object_base = (h)->next_free = (char *)(obj); + (h)->chunk_limit = lp->obstack_l_limit; + (h)->chunk = lp; + } + else if (obj != 0) + /* obj is not in any of the chunks! */ + abort (); +} diff --git a/obstack.h b/obstack.h new file mode 100644 index 00000000..772a5baa --- /dev/null +++ b/obstack.h @@ -0,0 +1,204 @@ +/* obstack.h - object stack macros + Copyright (c) 1986 Free Software Foundation, Inc. + +Summary: + +All the apparent functions defined here are macros. The idea +is that you would use these pre-tested macros to solve a +very specific set of problems, and they would run fast. +Caution: no side-effects in arguments please!! They may be +evaluated MANY times!! + +These macros operate a stack of objects. Each object starts life +small, and may grow to maturity. (Consider building a word syllable +by syllable.) An object can move while it is growing. Once it has +been "finished" it never changes address again. So the "top of the +stack" is typically an immature growing object, while the rest of the +stack is of mature, fixed size and fixed address objects. + +These routines grab large chunks of memory, using a function you +supply, called `obstack_chunk_alloc'. On occasion, they free chunks, +by calling `obstack_chunk_free'. You must define them and declare +them before using any obstack macros. + +Each independent stack is represented by a `struct obstack'. +Each of the obstack macros expects a pointer to such a structure +as the first argument. + +One motivation for this package is the problem of growing char strings +in symbol tables. Unless you are "facist pig with a read-only mind" +[Gosper's immortal quote from HAKMEM item 154, out of context] you +would not like to put any arbitrary upper limit on the length of your +symbols. + +In practice this often means you will build many short symbols and a +few long symbols. At the time you are reading a symbol you don't know +how long it is. One traditional method is to read a symbol into a +buffer, realloc()ating the buffer every time you try to read a symbol +that is longer than the buffer. This is beaut, but you still will +want to copy the symbol from the buffer to a more permanent +symbol-table entry say about half the time. + +With obstacks, you can work differently. Use one obstack for all symbol +names. As you read a symbol, grow the name in the obstack gradually. +When the name is complete, finalize it. Then, if the symbol exists already, +free the newly read name. + +The way we do this is to take a large chunk, allocating memory from +low addresses. When you want to build a aymbol in the chunk you just +add chars above the current "high water mark" in the chunk. When you +have finished adding chars, because you got to the end of the symbol, +you know how long the chars are, and you can create a new object. +Mostly the chars will not burst over the highest address of the chunk, +because you would typically expect a chunk to be (say) 100 times as +long as an average object. + +In case that isn't clear, when we have enough chars to make up +the object, THEY ARE ALREADY CONTIGUOUS IN THE CHUNK (guaranteed) +so we just point to it where it lies. No moving of chars is +needed and this is the second win: potentially long strings need +never be explicitly shuffled. Once an object is formed, it does not +change its address during its lifetime. + +When the chars burst over a chunk boundary, we allocate a larger +chunk, and then copy the partly formed object from the end of the old +chunk to the beggining of the new larger chunk. We then carry on +accreting characters to the end of the object as we normaly would. + +A special macro is provided to add a single char at a time to a +growing object. This allows the use of register variables, which +break the ordinary 'growth' macro. + +Summary: + We allocate large chunks. + We carve out one object at a time from the current chunk. + Once carved, an object never moves. + We are free to append data of any size to the currently + growing object. + Exactly one object is growing in an obstack at any one time. + You can run one obstack per control block. + You may have as many control blocks as you dare. + Because of the way we do it, you can `unwind' a obstack + back to a previous state. (You may remove objects much + as you would with a stack.) +*/ + +#ifndef obstackH +#define obstackH + /* these #defines keep it brief */ +#define _Ll struct obstack_chunk +#define _LL (8) /* _L length in chars */ + +struct obstack_chunk /* Lives at front of each chunk. */ +{ + char *obstack_l_limit; /* 1 past end of this chunk */ + _Ll *obstack_l_prev; /* address of prior chunk or NULL */ + char obstack_l_0[4]; /* objects begin here */ +}; + +#if 0 +This function, called like malloc but not returning on failure, +must return a chunk of the size given to it as argument, +aligned on a boundary of 2**OBSTACK_LOG_DEFAULT_ALIGNMENT bytes. + +struct obstack_chunk * obstack_chunk_alloc(); +#endif /* 0 */ + +struct obstack /* control current object in current chunk */ +{ + long chunk_size; /* preferred size to allocate chunks in */ + _Ll* chunk; /* address of current struct obstack_chunk */ + char *object_base; /* address of object we are building */ + char *next_free; /* where to add next char to current object */ + char *chunk_limit; /* address of char after current chunk */ + int temp; /* Temporary for some macros. */ + int alignment_mask; /* Mask of alignment for each object. */ +}; + +/* Pointer to beginning of object being allocated or to be allocated next. + Note that this might not be the final address of the object + because a new chunk might be needed to hold the final size. */ + +#define obstack_base(h) ((h)->object_base) + +/* Pointer to next byte not yet allocated in current chunk. */ + +#define obstack_next_free(h) ((h)->next_free) + +/* Size of object currently growing */ + +#define obstack_object_size(h) ((h)->next_free - (h)->object_base) + +/* Mask specifying low bits that should be clear in address of an object. */ + +#define obstack_alignment_mask(h) ((h)->alignment_mask) + +#define obstack_init(h) obstack_begin (h, 4096 - 4 - _LL) + +#define obstack_begin(h,try_length) \ +((h)->chunk_size = (try_length) + (_LL), \ + (h)->alignment_mask = ((1 << 2) - 1), \ + _obstack_begin ((h), obstack_chunk_alloc)) + +#define obstack_grow(h,where,length) \ +( (h)->temp = (length), \ + (((h)->next_free + (h)->temp > (h)->chunk_limit) \ + ? _obstack_newchunk ((h), obstack_chunk_alloc, (h)->temp) : 0), \ + bcopy (where, (h)->next_free, (h)->temp), \ + (h)->next_free += (h)->temp) + +#define obstack_grow0(h,where,length) \ +( (h)->temp = (length), \ + (((h)->next_free + (h)->temp + 1 > (h)->chunk_limit) \ + ? _obstack_newchunk ((h), obstack_chunk_alloc, (h)->temp + 1) : 0), \ + bcopy (where, (h)->next_free, (h)->temp), \ + (h)->next_free += (h)->temp, \ + *((h)->next_free)++ = 0) + +#define obstack_1grow(h,datum) \ +( (((h)->next_free + 1 > (h)->chunk_limit) \ + ? _obstack_newchunk ((h), obstack_chunk_alloc, 1) : 0), \ + *((h)->next_free)++ = (datum)) + +#define obstack_blank(h,length) \ +( (h)->temp = (length), \ + (((h)->next_free + (h)->temp > (h)->chunk_limit) \ + ? _obstack_newchunk ((h), obstack_chunk_alloc, (h)->temp) : 0), \ + (h)->next_free += (h)->temp) + +#define obstack_alloc(h,length) \ + (obstack_blank ((h), (length)), obstack_finish (h)) + +#define obstack_copy(h,where,length) \ + (obstack_grow ((h), (where), (length)), obstack_finish (h)) + +#define obstack_copy0(h,where,length) \ + (obstack_grow0 ((h), (where), (length)), obstack_finish (h)) + +#define obstack_room(h) ((long unsigned int) \ + ((h)->chunk_limit - (h)->next_free)) + +#define obstack_1grow_fast(h,achar) (*((h)->next_free)++ = achar) + +#define obstack_blank_fast(h,n) ((h)->next_free += (n)) + +#define obstack_finish(h) \ + ((h)->temp = (int) (h)->object_base, \ + (h)->next_free \ + = (char*)((int)((h)->next_free+(h)->alignment_mask) \ + & ~ ((h)->alignment_mask)), \ + (((h)->next_free - (char *)(h)->chunk \ + > (h)->chunk_limit - (char *)(h)->chunk) \ + ? (h)->next_free = (h)->chunk_limit : 0), \ + (h)->object_base = (h)->next_free, \ + (char *) (h)->temp) + +#define obstack_free(h,obj) \ +(((h)->temp = (char *)(obj) - (char *) (h)->chunk), \ + (((h)->temp >= 0 && (h)->temp < (h)->chunk_limit - (char *) (h)->chunk)\ + ? (int) ((h)->next_free = (h)->object_base \ + = (h)->temp + (char *) (h)->chunk) \ + : (int) _obstack_free ((h), obstack_chunk_free, \ + (h)->temp + (char *) (h)->chunk))) + +#endif /* #ifndef obstackH */ diff --git a/regex.c b/regex.c new file mode 100644 index 00000000..ebfd612e --- /dev/null +++ b/regex.c @@ -0,0 +1,1689 @@ +/* Extended regular expression matching and search. + Copyright (C) 1985 Free Software Foundation, Inc. + + NO WARRANTY + + BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY +NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT +WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC, +RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE THIS PROGRAM "AS IS" +WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, +BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND +FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY +AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE +DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR +CORRECTION. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M. +STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY +WHO MAY MODIFY AND REDISTRIBUTE THIS PROGRAM AS PERMITTED BELOW, BE +LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR +OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR +DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR +A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) THIS +PROGRAM, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH +DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY. + + GENERAL PUBLIC LICENSE TO COPY + + 1. You may copy and distribute verbatim copies of this source file +as you receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy a valid copyright notice "Copyright +(C) 1985 Free Software Foundation, Inc."; and include following the +copyright notice a verbatim copy of the above disclaimer of warranty +and of this License. + + 2. You may modify your copy or copies of this source file or +any portion of it, and copy and distribute such modifications under +the terms of Paragraph 1 above, provided that you also do the following: + + a) cause the modified files to carry prominent notices stating + that you changed the files and the date of any change; and + + b) cause the whole of any work that you distribute or publish, + that in whole or in part contains or is a derivative of this + program or any part thereof, to be freely distributed + and licensed to all third parties on terms identical to those + contained in this License Agreement (except that you may choose + to grant more extensive warranty protection to third parties, + at your option). + + 3. You may copy and distribute this program or any portion of it in +compiled, executable or object code form under the terms of Paragraphs +1 and 2 above provided that you do the following: + + a) cause each such copy to be accompanied by the + corresponding machine-readable source code, which must + be distributed under the terms of Paragraphs 1 and 2 above; or, + + b) cause each such copy to be accompanied by a + written offer, with no time limit, to give any third party + free (except for a nominal shipping charge) a machine readable + copy of the corresponding source code, to be distributed + under the terms of Paragraphs 1 and 2 above; or, + + c) in the case of a recipient of this program in compiled, executable + or object code form (without the corresponding source code) you + shall cause copies you distribute to be accompanied by a copy + of the written offer of source code which you received along + with the copy you received. + + 4. You may not copy, sublicense, distribute or transfer this program +except as expressly provided under this License Agreement. Any attempt +otherwise to copy, sublicense, distribute or transfer this program is void and +your rights to use the program under this License agreement shall be +automatically terminated. However, parties who have received computer +software programs from you with this License Agreement will not have +their licenses terminated so long as such parties remain in full compliance. + + +In other words, you are welcome to use, share and improve this program. +You are forbidden to forbid anyone else to use, share and improve +what you give them. Help stamp out software-hoarding! */ + + +/* To test, compile with -Dtest. + This Dtestable feature turns this into a self-contained program + which reads a pattern, describes how it compiles, + then reads a string and searches for it. */ + +/* JF this var has taken on whole new meanings as time goes by. Various bits +in this int tell how certain pieces of syntax should work */ + +static int obscure_syntax = 0; + +#ifdef emacs + +/* The `emacs' switch turns on certain special matching commands + that make sense only in emacs. */ + +#include "config.h" +#include "lisp.h" +#include "buffer.h" +#include "syntax.h" + +#else /* not emacs */ + +/* + * Define the syntax stuff, so we can do the \<...\> things. + */ + +#ifndef Sword /* must be non-zero in some of the tests below... */ +#define Sword 1 +#endif + +#define SYNTAX(c) re_syntax_table[c] + +#ifdef SYNTAX_TABLE + +char *re_syntax_table; + +#else + +static char re_syntax_table[256]; + +static void +init_syntax_once () +{ + register int c; + static int done = 0; + + if (done) + return; + + bzero (re_syntax_table, sizeof re_syntax_table); + + for (c = 'a'; c <= 'z'; c++) + re_syntax_table[c] = Sword; + + for (c = 'A'; c <= 'Z'; c++) + re_syntax_table[c] = Sword; + + for (c = '0'; c <= '9'; c++) + re_syntax_table[c] = Sword; + + done = 1; +} + +#endif /* SYNTAX_TABLE */ +#endif /* not emacs */ + +#include "regex.h" + +/* Number of failure points to allocate space for initially, + when matching. If this number is exceeded, more space is allocated, + so it is not a hard limit. */ + +#ifndef NFAILURES +#define NFAILURES 80 +#endif NFAILURES + +/* width of a byte in bits */ + +#define BYTEWIDTH 8 + +#ifndef SIGN_EXTEND_CHAR +#define SIGN_EXTEND_CHAR(x) (x) +#endif + +/* compile_pattern takes a regular-expression descriptor string in the user's format + and converts it into a buffer full of byte commands for matching. + + pattern is the address of the pattern string + size is the length of it. + bufp is a struct re_pattern_buffer * which points to the info + on where to store the byte commands. + This structure contains a char * which points to the + actual space, which should have been obtained with malloc. + compile_pattern may use realloc to grow the buffer space. + + The number of bytes of commands can be found out by looking in + the struct re_pattern_buffer that bufp pointed to, + after compile_pattern returns. +*/ + +#define PATPUSH(ch) (*b++ = (char) (ch)) + +#define PATFETCH(c) \ + {if (p == pend) goto end_of_pattern; \ + c = * (unsigned char *) p++; \ + if (translate) c = translate[c]; } + +#define PATFETCH_RAW(c) \ + {if (p == pend) goto end_of_pattern; \ + c = * (unsigned char *) p++; } + +#define PATUNFETCH p-- + +#define EXTEND_BUFFER \ + { char *old_buffer = bufp->buffer; \ + if (bufp->allocated == (1<<16)) goto too_big; \ + bufp->allocated *= 2; \ + if (bufp->allocated > (1<<16)) bufp->allocated = (1<<16); \ + if (!(bufp->buffer = (char *) realloc (bufp->buffer, bufp->allocated))) \ + goto memory_exhausted; \ + c = bufp->buffer - old_buffer; \ + b += c; \ + if (fixup_jump) \ + fixup_jump += c; \ + if (laststart) \ + laststart += c; \ + begalt += c; \ + if (pending_exact) \ + pending_exact += c; \ + } + +static int store_jump (), insert_jump (); + +/* JF this function is used to compile UN*X style regexps. In particular, + ( ) and | don't have to be \ed to have a special meaning */ + +int +re_set_syntax(syntax) +{ + int ret; + + ret=obscure_syntax; + obscure_syntax=syntax; + return ret; +} + + +char * +re_compile_pattern (pattern, size, bufp) + char *pattern; + int size; + struct re_pattern_buffer *bufp; +{ + register char *b = bufp->buffer; + register char *p = pattern; + char *pend = pattern + size; + register unsigned c, c1; + char *p1; + unsigned char *translate = (unsigned char *) bufp->translate; + + /* address of the count-byte of the most recently inserted "exactn" command. + This makes it possible to tell whether a new exact-match character + can be added to that command or requires a new "exactn" command. */ + + char *pending_exact = 0; + + /* address of the place where a forward-jump should go + to the end of the containing expression. + Each alternative of an "or", except the last, ends with a forward-jump + of this sort. */ + + char *fixup_jump = 0; + + /* address of start of the most recently finished expression. + This tells postfix * where to find the start of its operand. */ + + char *laststart = 0; + + /* In processing a repeat, 1 means zero matches is allowed */ + + char zero_times_ok; + + /* In processing a repeat, 1 means many matches is allowed */ + + char many_times_ok; + + /* address of beginning of regexp, or inside of last \( */ + + char *begalt = b; + + /* Stack of information saved by \( and restored by \). + Four stack elements are pushed by each \(: + First, the value of b. + Second, the value of fixup_jump. + Third, the value of regnum. + Fourth, the value of begalt. */ + + int stackb[40]; + int *stackp = stackb; + int *stacke = stackb + 40; + int *stackt; + + /* Counts \('s as they are encountered. Remembered for the matching \), + where it becomes the "register number" to put in the stop_memory command */ + + int regnum = 1; + + bufp->fastmap_accurate = 0; + +#ifndef emacs +#ifndef SYNTAX_TABLE + /* + * Initialize the syntax table. + */ + init_syntax_once(); +#endif +#endif + + if (bufp->allocated == 0) + { + bufp->allocated = 28; + if (bufp->buffer) + /* EXTEND_BUFFER loses when bufp->allocated is 0 */ + bufp->buffer = (char *) realloc (bufp->buffer, 28); + else + /* Caller did not allocate a buffer. Do it for him */ + bufp->buffer = (char *) malloc (28); + if (!bufp->buffer) goto memory_exhausted; + begalt = b = bufp->buffer; + } + + while (p != pend) + { + if (b - bufp->buffer > bufp->allocated - 10) + /* Note that EXTEND_BUFFER clobbers c */ + EXTEND_BUFFER; + + PATFETCH (c); + + switch (c) + { + case '$': + /* $ means succeed if at end of line, but only in special contexts. + If randonly in the middle of a pattern, it is a normal character. */ + if (p == pend || (*p == '\\' && (p[1] == ')' || p[1] == '|'))) + { + PATPUSH (endline); + break; + } + goto normal_char; + + case '^': + /* ^ means succeed if at beg of line, but only if no preceding pattern. */ + if (laststart) goto normal_char; + PATPUSH (begline); + break; + + case '*': + case '+': + case '?': + /* If there is no previous pattern, char not special. */ + if (!laststart) + goto normal_char; + /* If there is a sequence of repetition chars, + collapse it down to equivalent to just one. */ + zero_times_ok = 0; + many_times_ok = 0; + while (1) + { + zero_times_ok |= c != '+'; + many_times_ok |= c != '?'; + if (p == pend) + break; + PATFETCH (c); + if (!(c == '*' || c == '+' || c == '?')) + { + PATUNFETCH; + break; + } + } + + /* Now we know whether 0 matches is allowed, + and whether 2 or more matches is allowed. */ + if (many_times_ok) + { + /* If more than one repetition is allowed, + put in a backward jump at the end. */ + store_jump (b, maybe_finalize_jump, laststart - 3); + b += 3; + } + insert_jump (on_failure_jump, laststart, b + 3, b); + pending_exact = 0; + b += 3; + if (!zero_times_ok) + { + /* At least one repetition required: insert before the loop + a skip over the initial on-failure-jump instruction */ + insert_jump (dummy_failure_jump, laststart, laststart + 6, b); + b += 3; + } + break; + + case '.': + laststart = b; + PATPUSH (anychar); + break; + + case '[': + if (b - bufp->buffer + > bufp->allocated - 3 - (1 << BYTEWIDTH) / BYTEWIDTH) + /* Note that EXTEND_BUFFER clobbers c */ + EXTEND_BUFFER; + + laststart = b; + if (*p == '^') + PATPUSH (charset_not), p++; + else + PATPUSH (charset); + p1 = p; + + PATPUSH ((1 << BYTEWIDTH) / BYTEWIDTH); + /* Clear the whole map */ + bzero (b, (1 << BYTEWIDTH) / BYTEWIDTH); + /* Read in characters and ranges, setting map bits */ + while (1) + { + PATFETCH (c); + if (c == ']' && p != p1 + 1) break; + if (*p == '-') + { + PATFETCH (c1); + PATFETCH (c1); + while (c <= c1) + b[c / BYTEWIDTH] |= 1 << (c % BYTEWIDTH), c++; + } + else + { + b[c / BYTEWIDTH] |= 1 << (c % BYTEWIDTH); + } + } + /* Discard any bitmap bytes that are all 0 at the end of the map. + Decrement the map-length byte too. */ + while (b[-1] > 0 && b[b[-1] - 1] == 0) + b[-1]--; + b += b[-1]; + break; + + case '(': + if(!(obscure_syntax&RE_NO_BK_PARENS)) goto normal_char; + if (stackp == stacke) goto nesting_too_deep; + if (regnum < RE_NREGS) + { + PATPUSH (start_memory); + PATPUSH (regnum); + } + *stackp++ = b - bufp->buffer; + *stackp++ = fixup_jump ? fixup_jump - bufp->buffer + 1 : 0; + *stackp++ = regnum++; + *stackp++ = begalt - bufp->buffer; + fixup_jump = 0; + laststart = 0; + begalt = b; + break; + + case ')': + if(!(obscure_syntax&RE_NO_BK_PARENS)) goto normal_char; + if (stackp == stackb) goto unmatched_close; + begalt = *--stackp + bufp->buffer; + if (fixup_jump) + store_jump (fixup_jump, jump, b); + if (stackp[-1] < RE_NREGS) + { + PATPUSH (stop_memory); + PATPUSH (stackp[-1]); + } + stackp -= 2; + fixup_jump = 0; + if (*stackp) + fixup_jump = *stackp + bufp->buffer - 1; + laststart = *--stackp + bufp->buffer; + break; + + case '|': + if(!(obscure_syntax&RE_NO_BK_VBAR)) goto normal_char; + insert_jump (on_failure_jump, begalt, b + 6, b); + pending_exact = 0; + b += 3; + if (fixup_jump) + store_jump (fixup_jump, jump, b); + fixup_jump = b; + b += 3; + laststart = 0; + begalt = b; + break; + + case '\\': + if (p == pend) goto invalid_pattern; + PATFETCH_RAW (c); + switch (c) + { +#ifdef emacs + case '=': + PATPUSH (at_dot); + break; + + case 's': + laststart = b; + PATPUSH (syntaxspec); + PATFETCH (c); + PATPUSH (syntax_spec_code[c]); + break; + + case 'S': + laststart = b; + PATPUSH (notsyntaxspec); + PATFETCH (c); + PATPUSH (syntax_spec_code[c]); + break; +#endif emacs + + case '(': + if(obscure_syntax&RE_NO_BK_PARENS) goto normal_backsl; + if (stackp == stacke) goto nesting_too_deep; + if (regnum < RE_NREGS) + { + PATPUSH (start_memory); + PATPUSH (regnum); + } + *stackp++ = b - bufp->buffer; + *stackp++ = fixup_jump ? fixup_jump - bufp->buffer + 1 : 0; + *stackp++ = regnum++; + *stackp++ = begalt - bufp->buffer; + fixup_jump = 0; + laststart = 0; + begalt = b; + break; + + case ')': + if(obscure_syntax&RE_NO_BK_PARENS) goto normal_backsl; + if (stackp == stackb) goto unmatched_close; + begalt = *--stackp + bufp->buffer; + if (fixup_jump) + store_jump (fixup_jump, jump, b); + if (stackp[-1] < RE_NREGS) + { + PATPUSH (stop_memory); + PATPUSH (stackp[-1]); + } + stackp -= 2; + fixup_jump = 0; + if (*stackp) + fixup_jump = *stackp + bufp->buffer - 1; + laststart = *--stackp + bufp->buffer; + break; + + case '|': + if(obscure_syntax&RE_NO_BK_VBAR) goto normal_backsl; + insert_jump (on_failure_jump, begalt, b + 6, b); + pending_exact = 0; + b += 3; + if (fixup_jump) + store_jump (fixup_jump, jump, b); + fixup_jump = b; + b += 3; + laststart = 0; + begalt = b; + break; + + case 'w': + laststart = b; + PATPUSH (wordchar); + break; + + case 'W': + laststart = b; + PATPUSH (notwordchar); + break; + + case '<': + PATPUSH (wordbeg); + break; + + case '>': + PATPUSH (wordend); + break; + + case 'b': + PATPUSH (wordbound); + break; + + case 'B': + PATPUSH (notwordbound); + break; + + case '`': + PATPUSH (begbuf); + break; + + case '\'': + PATPUSH (endbuf); + break; + + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + c1 = c - '0'; + if (c1 >= regnum) + goto normal_char; + for (stackt = stackp - 2; stackt > stackb; stackt -= 4) + if (*stackt == c1) + goto normal_char; + laststart = b; + PATPUSH (duplicate); + PATPUSH (c1); + break; + default: + normal_backsl: + /* You might think it wuld be useful for \ to mean + not to translate; but if we don't translate it + it will never match anything. */ + if (translate) c = translate[c]; + goto normal_char; + } + break; + + default: + normal_char: + if (!pending_exact || pending_exact + *pending_exact + 1 != b + || *pending_exact == 0177 || *p == '*' || *p == '^' + || *p == '+' || *p == '?') + { + laststart = b; + PATPUSH (exactn); + pending_exact = b; + PATPUSH (0); + } + PATPUSH (c); + (*pending_exact)++; + } + } + + if (fixup_jump) + store_jump (fixup_jump, jump, b); + + if (stackp != stackb) goto unmatched_open; + + bufp->used = b - bufp->buffer; + return 0; + + invalid_pattern: + return "Invalid regular expression"; + + unmatched_open: + return "Unmatched \\("; + + unmatched_close: + return "Unmatched \\)"; + + end_of_pattern: + return "Premature end of regular expression"; + + nesting_too_deep: + return "Nesting too deep"; + + too_big: + return "Regular expression too big"; + + memory_exhausted: + return "Memory exhausted"; +} + +/* Store where `from' points a jump operation to jump to where `to' points. + `opcode' is the opcode to store. */ + +static int +store_jump (from, opcode, to) + char *from, *to; + char opcode; +{ + from[0] = opcode; + from[1] = (to - (from + 3)) & 0377; + from[2] = (to - (from + 3)) >> 8; +} + +/* Open up space at char FROM, and insert there a jump to TO. + CURRENT_END gives te end of the storage no in use, + so we know how much data to copy up. + OP is the opcode of the jump to insert. + + If you call this function, you must zero out pending_exact. */ + +static int +insert_jump (op, from, to, current_end) + char op; + char *from, *to, *current_end; +{ + register char *pto = current_end + 3; + register char *pfrom = current_end; + while (pfrom != from) + *--pto = *--pfrom; + store_jump (from, op, to); +} + +/* Given a pattern, compute a fastmap from it. + The fastmap records which of the (1 << BYTEWIDTH) possible characters + can start a string that matches the pattern. + This fastmap is used by re_search to skip quickly over totally implausible text. + + The caller must supply the address of a (1 << BYTEWIDTH)-byte data area + as bufp->fastmap. + The other components of bufp describe the pattern to be used. */ + +void +re_compile_fastmap (bufp) + struct re_pattern_buffer *bufp; +{ + unsigned char *pattern = (unsigned char *) bufp->buffer; + int size = bufp->used; + register char *fastmap = bufp->fastmap; + register unsigned char *p = pattern; + register unsigned char *pend = pattern + size; + register int j, k; + unsigned char *translate = (unsigned char *) bufp->translate; + + unsigned char *stackb[NFAILURES]; + unsigned char **stackp = stackb; + + bzero (fastmap, (1 << BYTEWIDTH)); + bufp->fastmap_accurate = 1; + bufp->can_be_null = 0; + + while (p) + { + if (p == pend) + { + bufp->can_be_null = 1; + break; + } +#ifdef SWITCH_ENUM_BUG + switch ((int) ((enum regexpcode) *p++)) +#else + switch ((enum regexpcode) *p++) +#endif + { + case exactn: + if (translate) + fastmap[translate[p[1]]] = 1; + else + fastmap[p[1]] = 1; + break; + + case begline: + case before_dot: + case at_dot: + case after_dot: + case begbuf: + case endbuf: + case wordbound: + case notwordbound: + case wordbeg: + case wordend: + continue; + + case endline: + if (translate) + fastmap[translate['\n']] = 1; + else + fastmap['\n'] = 1; + if (bufp->can_be_null != 1) + bufp->can_be_null = 2; + break; + + case finalize_jump: + case maybe_finalize_jump: + case jump: + case dummy_failure_jump: + bufp->can_be_null = 1; + j = *p++ & 0377; + j += SIGN_EXTEND_CHAR (*(char *)p) << 8; + p += j + 1; /* The 1 compensates for missing ++ above */ + if (j > 0) + continue; + /* Jump backward reached implies we just went through + the body of a loop and matched nothing. + Opcode jumped to should be an on_failure_jump. + Just treat it like an ordinary jump. + For a * loop, it has pushed its failure point already; + if so, discard that as redundant. */ + if ((enum regexpcode) *p != on_failure_jump) + continue; + p++; + j = *p++ & 0377; + j += SIGN_EXTEND_CHAR (*(char *)p) << 8; + p += j + 1; /* The 1 compensates for missing ++ above */ + if (stackp != stackb && *stackp == p) + stackp--; + continue; + + case on_failure_jump: + j = *p++ & 0377; + j += SIGN_EXTEND_CHAR (*(char *)p) << 8; + p++; + *++stackp = p + j; + continue; + + case start_memory: + case stop_memory: + p++; + continue; + + case duplicate: + bufp->can_be_null = 1; + fastmap['\n'] = 1; + case anychar: + for (j = 0; j < (1 << BYTEWIDTH); j++) + if (j != '\n') + fastmap[j] = 1; + if (bufp->can_be_null) + return; + /* Don't return; check the alternative paths + so we can set can_be_null if appropriate. */ + break; + + case wordchar: + for (j = 0; j < (1 << BYTEWIDTH); j++) + if (SYNTAX (j) == Sword) + fastmap[j] = 1; + break; + + case notwordchar: + for (j = 0; j < (1 << BYTEWIDTH); j++) + if (SYNTAX (j) != Sword) + fastmap[j] = 1; + break; + +#ifdef emacs + case syntaxspec: + k = *p++; + for (j = 0; j < (1 << BYTEWIDTH); j++) + if (SYNTAX (j) == (enum syntaxcode) k) + fastmap[j] = 1; + break; + + case notsyntaxspec: + for (j = 0; j < (1 << BYTEWIDTH); j++) + if (SYNTAX (j) != (enum syntaxcode) k) + fastmap[j] = 1; + break; +#endif emacs + + case charset: + for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--) + if (p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH))) + { + if (translate) + fastmap[translate[j]] = 1; + else + fastmap[j] = 1; + } + break; + + case charset_not: + /* Chars beyond end of map must be allowed */ + for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++) + if (translate) + fastmap[translate[j]] = 1; + else + fastmap[j] = 1; + + for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--) + if (!(p[j / BYTEWIDTH] & (1 << (j % BYTEWIDTH)))) + { + if (translate) + fastmap[translate[j]] = 1; + else + fastmap[j] = 1; + } + break; + } + + /* Get here means we have successfully found the possible starting characters + of one path of the pattern. We need not follow this path any farther. + Instead, look at the next alternative remembered in the stack. */ + if (stackp != stackb) + p = *stackp--; + else + break; + } +} + +/* Like re_search_2, below, but only one string is specified. */ + +int +re_search (pbufp, string, size, startpos, range, regs) + struct re_pattern_buffer *pbufp; + char *string; + int size, startpos, range; + struct re_registers *regs; +{ + return re_search_2 (pbufp, 0, 0, string, size, startpos, range, regs, size); +} + +/* Like re_match_2 but tries first a match starting at index `startpos', + then at startpos + 1, and so on. + `range' is the number of places to try before giving up. + If `range' is negative, the starting positions tried are + startpos, startpos - 1, etc. + It is up to the caller to make sure that range is not so large + as to take the starting position outside of the input strings. + +The value returned is the position at which the match was found, + or -1 if no match was found. */ + +int +re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs, mstop) + struct re_pattern_buffer *pbufp; + char *string1, *string2; + int size1, size2; + int startpos; + register int range; + struct re_registers *regs; + int mstop; +{ + register char *fastmap = pbufp->fastmap; + register char *translate = pbufp->translate; + int total = size1 + size2; + + /* Update the fastmap now if not correct already */ + if (fastmap && !pbufp->fastmap_accurate) + re_compile_fastmap (pbufp); + + while (1) + { + /* If a fastmap is supplied, skip quickly over characters + that cannot possibly be the start of a match. + Note, however, that if the pattern can possibly match + the null string, we must test it at each starting point + so that we take the first null string we get. */ + + if (fastmap && startpos < total && pbufp->can_be_null != 1) + { + if (range > 0) + { + register int lim = 0; + register char *p; + int irange = range; + if (startpos < size1 && startpos + range >= size1) + lim = range - (size1 - startpos); + + p = &(startpos >= size1 ? string2 - size1 : string1)[startpos]; + + if (translate) + { + while (range > lim && !fastmap[translate[*p++]]) + range--; + } + else + { + while (range > lim && !fastmap[*p++]) + range--; + } + startpos += irange - range; + } + else + { + register char c; + if (startpos >= size1) c = string2[startpos - size1]; + else c = string1[startpos]; + if (translate ? !fastmap[translate[c]] : !fastmap[c]) + goto advance; + } + } + + if (range >= 0 && startpos == total + && fastmap && pbufp->can_be_null == 0) + return -1; + + if (0 <= re_match_2 (pbufp, string1, size1, string2, size2, startpos, regs, mstop)) + return startpos; + +#ifdef C_ALLOCA + alloca (0); +#endif /* C_ALLOCA */ + + advance: + if (!range) break; + if (range > 0) range--, startpos++; else range++, startpos--; + } + return -1; +} + +#ifndef emacs /* emacs never uses this */ +int +re_match (pbufp, string, size, pos, regs) + struct re_pattern_buffer *pbufp; + char *string; + int size, pos; + struct re_registers *regs; +{ + return re_match_2 (pbufp, 0, 0, string, size, pos, regs, size); +} +#endif /* emacs */ + +/* Match the pattern described by `pbufp' + against data which is the virtual concatenation of `string1' and `string2'. + `size1' and `size2' are the sizes of the two data strings. + Start the match at position `pos'. + Do not consider matching past the position `mstop'. + + If pbufp->fastmap is nonzero, then it had better be up to date. + + The reason that the data to match is specified as two components + which are to be regarded as concatenated + is so that this function can be used directly on the contents of an Emacs buffer. + + -1 is returned if there is no match. Otherwise the value is the length + of the substring which was matched. +*/ + +int +re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop) + struct re_pattern_buffer *pbufp; + char *string1, *string2; + int size1, size2; + int pos; + struct re_registers *regs; + int mstop; +{ + register char *p = pbufp->buffer; + register char *pend = p + pbufp->used; + /* End of first string */ + char *end1; + /* End of second string */ + char *end2; + /* Pointer just past last char to consider matching */ + char *end_match_1, *end_match_2; + register char *d, *dend; + register int mcnt; + char *translate = pbufp->translate; + + /* Failure point stack. Each place that can handle a failure further down the line + pushes a failure point on this stack. It consists of two char *'s. + The first one pushed is where to resume scanning the pattern; + the second pushed is where to resume scanning the strings. + If the latter is zero, the failure point is a "dummy". + If a failure happens and the innermost failure point is dormant, + it discards that failure point and tries the next one. */ + + char **stackb = (char **) alloca (2 * NFAILURES * sizeof (char *)); + char **stackp = stackb, **stacke = &stackb[2 * NFAILURES]; + + /* Information on the "contents" of registers. + These are pointers into the input strings; they record + just what was matched (on this attempt) by some part of the pattern. + The start_memory command stores the start of a register's contents + and the stop_memory command stores the end. + + At that point, regstart[regnum] points to the first character in the register, + regend[regnum] points to the first character beyond the end of the register, + and regstart_segend[regnum] is either the same as regend[regnum] + or else points to the end of the input string into which regstart[regnum] points. + The latter case happens when regstart[regnum] is in string1 and + regend[regnum] is in string2. */ + + char *regstart[RE_NREGS]; + char *regstart_segend[RE_NREGS]; + char *regend[RE_NREGS]; + + /* Set up pointers to ends of strings. + Don't allow the second string to be empty unless both are empty. */ + if (!size2) + { + string2 = string1; + size2 = size1; + string1 = 0; + size1 = 0; + } + end1 = string1 + size1; + end2 = string2 + size2; + + /* Compute where to stop matching, within the two strings */ + if (mstop <= size1) + { + end_match_1 = string1 + mstop; + end_match_2 = string2; + } + else + { + end_match_1 = end1; + end_match_2 = string2 + mstop - size1; + } + + /* Initialize \( and \) text positions to -1 + to mark ones that no \( or \) has been seen for. */ + + for (mcnt = 0; mcnt < sizeof (regstart) / sizeof (*regstart); mcnt++) + regstart[mcnt] = (char *) -1; + + /* `p' scans through the pattern as `d' scans through the data. + `dend' is the end of the input string that `d' points within. + `d' is advanced into the following input string whenever necessary, + but this happens before fetching; + therefore, at the beginning of the loop, + `d' can be pointing at the end of a string, + but it cannot equal string2. */ + + if (pos <= size1) + d = string1 + pos, dend = end_match_1; + else + d = string2 + pos - size1, dend = end_match_2; + +/* Write PREFETCH; just before fetching a character with *d. */ +#define PREFETCH \ + while (d == dend) \ + { if (dend == end_match_2) goto fail; /* end of string2 => failure */ \ + d = string2; /* end of string1 => advance to string2. */ \ + dend = end_match_2; } + + /* This loop loops over pattern commands. + It exits by returning from the function if match is complete, + or it drops through if match fails at this starting point in the input data. */ + + while (1) + { + if (p == pend) + /* End of pattern means we have succeeded! */ + { + /* If caller wants register contents data back, convert it to indices */ + if (regs) + { + regend[0] = d; + regstart[0] = string1; + for (mcnt = 0; mcnt < RE_NREGS; mcnt++) + { + if ((mcnt != 0) && regstart[mcnt] == (char *) -1) + { + regs->start[mcnt] = -1; + regs->end[mcnt] = -1; + continue; + } + if (regstart[mcnt] - string1 < 0 || + regstart[mcnt] - string1 > size1) + regs->start[mcnt] = regstart[mcnt] - string2 + size1; + else + regs->start[mcnt] = regstart[mcnt] - string1; + if (regend[mcnt] - string1 < 0 || + regend[mcnt] - string1 > size1) + regs->end[mcnt] = regend[mcnt] - string2 + size1; + else + regs->end[mcnt] = regend[mcnt] - string1; + } + regs->start[0] = pos; + } + if (d - string1 >= 0 && d - string1 <= size1) + return d - string1 - pos; + else + return d - string2 + size1 - pos; + } + + /* Otherwise match next pattern command */ +#ifdef SWITCH_ENUM_BUG + switch ((int) ((enum regexpcode) *p++)) +#else + switch ((enum regexpcode) *p++) +#endif + { + + /* \( is represented by a start_memory, \) by a stop_memory. + Both of those commands contain a "register number" argument. + The text matched within the \( and \) is recorded under that number. + Then, \<digit> turns into a `duplicate' command which + is followed by the numeric value of <digit> as the register number. */ + + case start_memory: + regstart[*p] = d; + regstart_segend[*p++] = dend; + break; + + case stop_memory: + regend[*p] = d; + if (regstart_segend[*p] == dend) + regstart_segend[*p] = d; + p++; + break; + + case duplicate: + { + int regno = *p++; /* Get which register to match against */ + register char *d2, *dend2; + + d2 = regstart[regno]; + dend2 = regstart_segend[regno]; + while (1) + { + /* Advance to next segment in register contents, if necessary */ + while (d2 == dend2) + { + if (dend2 == end_match_2) break; + if (dend2 == regend[regno]) break; + d2 = string2, dend2 = regend[regno]; /* end of string1 => advance to string2. */ + } + /* At end of register contents => success */ + if (d2 == dend2) break; + + /* Advance to next segment in data being matched, if necessary */ + PREFETCH; + + /* mcnt gets # consecutive chars to compare */ + mcnt = dend - d; + if (mcnt > dend2 - d2) + mcnt = dend2 - d2; + /* Compare that many; failure if mismatch, else skip them. */ + if (translate ? bcmp_translate (d, d2, mcnt, translate) : bcmp (d, d2, mcnt)) + goto fail; + d += mcnt, d2 += mcnt; + } + } + break; + + case anychar: + /* fetch a data character */ + PREFETCH; + /* Match anything but a newline. */ + if ((translate ? translate[*d++] : *d++) == '\n') + goto fail; + break; + + case charset: + case charset_not: + { + /* Nonzero for charset_not */ + int not = 0; + register int c; + if (*(p - 1) == (char) charset_not) + not = 1; + + /* fetch a data character */ + PREFETCH; + + if (translate) + c = translate [*(unsigned char *)d]; + else + c = *(unsigned char *)d; + + if (c < *p * BYTEWIDTH + && p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH))) + not = !not; + + p += 1 + *p; + + if (!not) goto fail; + d++; + break; + } + + case begline: + if (d == string1 || d[-1] == '\n') + break; + goto fail; + + case endline: + if (d == end2 + || (d == end1 ? (size2 == 0 || *string2 == '\n') : *d == '\n')) + break; + goto fail; + + /* "or" constructs ("|") are handled by starting each alternative + with an on_failure_jump that points to the start of the next alternative. + Each alternative except the last ends with a jump to the joining point. + (Actually, each jump except for the last one really jumps + to the following jump, because tensioning the jumps is a hassle.) */ + + /* The start of a stupid repeat has an on_failure_jump that points + past the end of the repeat text. + This makes a failure point so that, on failure to match a repetition, + matching restarts past as many repetitions have been found + with no way to fail and look for another one. */ + + /* A smart repeat is similar but loops back to the on_failure_jump + so that each repetition makes another failure point. */ + + case on_failure_jump: + if (stackp == stacke) + { + char **stackx = (char **) alloca (2 * (stacke - stackb) * sizeof (char *)); + bcopy (stackb, stackx, (stacke - stackb) * sizeof (char *)); + stackp += stackx - stackb; + stacke = stackx + 2 * (stacke - stackb); + stackb = stackx; + } + mcnt = *p++ & 0377; + mcnt += SIGN_EXTEND_CHAR (*p) << 8; + p++; + *stackp++ = mcnt + p; + *stackp++ = d; + break; + + /* The end of a smart repeat has an maybe_finalize_jump back. + Change it either to a finalize_jump or an ordinary jump. */ + + case maybe_finalize_jump: + mcnt = *p++ & 0377; + mcnt += SIGN_EXTEND_CHAR (*p) << 8; + p++; + /* Compare what follows with the begining of the repeat. + If we can establish that there is nothing that they would + both match, we can change to finalize_jump */ + if (p == pend) + p[-3] = (char) finalize_jump; + else if (*p == (char) exactn || *p == (char) endline) + { + register int c = *p == (char) endline ? '\n' : p[2]; + register char *p1 = p + mcnt; + /* p1[0] ... p1[2] are an on_failure_jump. + Examine what follows that */ + if (p1[3] == (char) exactn && p1[5] != c) + p[-3] = (char) finalize_jump; + else if (p1[3] == (char) charset || p1[3] == (char) charset_not) + { + int not = p1[3] == (char) charset_not; + if (c < p1[4] * BYTEWIDTH + && p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH))) + not = !not; + /* not is 1 if c would match */ + /* That means it is not safe to finalize */ + if (!not) + p[-3] = (char) finalize_jump; + } + } + p -= 2; + if (p[-1] != (char) finalize_jump) + { + p[-1] = (char) jump; + goto nofinalize; + } + + /* The end of a stupid repeat has a finalize-jump + back to the start, where another failure point will be made + which will point after all the repetitions found so far. */ + + case finalize_jump: + stackp -= 2; + + case jump: + nofinalize: + mcnt = *p++ & 0377; + mcnt += SIGN_EXTEND_CHAR (*p) << 8; + p += mcnt + 1; /* The 1 compensates for missing ++ above */ + break; + + case dummy_failure_jump: + if (stackp == stacke) + { + char **stackx = (char **) alloca (2 * (stacke - stackb) * sizeof (char *)); + bcopy (stackb, stackx, (stacke - stackb) * sizeof (char *)); + stackp += stackx - stackb; + stacke = stackx + 2 * (stacke - stackb); + stackb = stackx; + } + *stackp++ = 0; + *stackp++ = 0; + goto nofinalize; + + case wordbound: + if (d == string1 /* Points to first char */ + || d == end2 /* Points to end */ + || (d == end1 && size2 == 0)) /* Points to end */ + break; + if ((SYNTAX (((unsigned char *)d)[-1]) == Sword) + != (SYNTAX (d == end1 ? *(unsigned char *)string2 : *(unsigned char *)d) == Sword)) + break; + goto fail; + + case notwordbound: + if (d == string1 /* Points to first char */ + || d == end2 /* Points to end */ + || (d == end1 && size2 == 0)) /* Points to end */ + goto fail; + if ((SYNTAX (((unsigned char *)d)[-1]) == Sword) + != (SYNTAX (d == end1 ? *(unsigned char *)string2 : *(unsigned char *)d) == Sword)) + goto fail; + break; + + case wordbeg: + if (d == end2 /* Points to end */ + || (d == end1 && size2 == 0) /* Points to end */ + || SYNTAX (*(unsigned char *) (d == end1 ? string2 : d)) != Sword) /* Next char not a letter */ + goto fail; + if (d == string1 /* Points to first char */ + || SYNTAX (((unsigned char *)d)[-1]) != Sword) /* prev char not letter */ + break; + goto fail; + + case wordend: + if (d == string1 /* Points to first char */ + || SYNTAX (((unsigned char *)d)[-1]) != Sword) /* prev char not letter */ + goto fail; + if (d == end2 /* Points to end */ + || (d == end1 && size2 == 0) /* Points to end */ + || SYNTAX (d == end1 ? *(unsigned char *)string2 : *(unsigned char *)d) != Sword) /* Next char not a letter */ + break; + goto fail; + +#ifdef emacs + case before_dot: + if (((d - string2 <= (unsigned) size2) + ? d - (char *) bf_p2 : d - (char *) bf_p1) + <= point) + goto fail; + break; + + case at_dot: + if (((d - string2 <= (unsigned) size2) + ? d - (char *) bf_p2 : d - (char *) bf_p1) + == point) + goto fail; + break; + + case after_dot: + if (((d - string2 <= (unsigned) size2) + ? d - (char *) bf_p2 : d - (char *) bf_p1) + >= point) + goto fail; + break; + + case wordchar: + mcnt = (int) Sword; + goto matchsyntax; + + case syntaxspec: + mcnt = *p++; + matchsyntax: + PREFETCH; + if (SYNTAX (*(unsigned char *)d++) != (enum syntaxcode) mcnt) goto fail; + break; + + case notwordchar: + mcnt = (int) Sword; + goto matchnotsyntax; + + case notsyntaxspec: + mcnt = *p++; + matchnotsyntax: + PREFETCH; + if (SYNTAX (*(unsigned char *)d++) == (enum syntaxcode) mcnt) goto fail; + break; +#else + case wordchar: + PREFETCH; + if (SYNTAX (*(unsigned char *)d++) == 0) goto fail; + break; + + case notwordchar: + PREFETCH; + if (SYNTAX (*(unsigned char *)d++) != 0) goto fail; + break; +#endif not emacs + + case begbuf: + if (d == string1) /* Note, d cannot equal string2 */ + break; /* unless string1 == string2. */ + goto fail; + + case endbuf: + if (d == end2 || (d == end1 && size2 == 0)) + break; + goto fail; + + case exactn: + /* Match the next few pattern characters exactly. + mcnt is how many characters to match. */ + mcnt = *p++; + if (translate) + { + do + { + PREFETCH; + if (translate[*(unsigned char *)d++] != *p++) goto fail; + } + while (--mcnt); + } + else + { + do + { + PREFETCH; + if (*d++ != *p++) goto fail; + } + while (--mcnt); + } + break; + } + continue; /* Successfully matched one pattern command; keep matching */ + + /* Jump here if any matching operation fails. */ + fail: + if (stackp != stackb) + /* A restart point is known. Restart there and pop it. */ + { + if (!stackp[-2]) + { /* If innermost failure point is dormant, flush it and keep looking */ + stackp -= 2; + goto fail; + } + d = *--stackp; + p = *--stackp; + if (d >= string1 && d <= end1) + dend = end_match_1; + } + else break; /* Matching at this starting point really fails! */ + } + return -1; /* Failure to match */ +} + +static int +bcmp_translate (s1, s2, len, translate) + char *s1, *s2; + register int len; + char *translate; +{ + register char *p1 = s1, *p2 = s2; + while (len) + { + if (translate [*p1++] != translate [*p2++]) return 1; + len--; + } + return 0; +} + +/* Entry points compatible with bsd4.2 regex library */ + +#ifndef emacs + +static struct re_pattern_buffer re_comp_buf; + +char * +re_comp (s) + char *s; +{ + if (!s) + { + if (!re_comp_buf.buffer) + return "No previous regular expression"; + return 0; + } + + if (!re_comp_buf.buffer) + { + if (!(re_comp_buf.buffer = (char *) malloc (200))) + return "Memory exhausted"; + re_comp_buf.allocated = 200; + if (!(re_comp_buf.fastmap = (char *) malloc (1 << BYTEWIDTH))) + return "Memory exhausted"; + } + return re_compile_pattern (s, strlen (s), &re_comp_buf); +} + +int +re_exec (s) + char *s; +{ + int len = strlen (s); + return 0 <= re_search (&re_comp_buf, s, len, 0, len, 0); +} + +#endif /* emacs */ + +#ifdef test + +#include <stdio.h> + +/* Indexed by a character, gives the upper case equivalent of the character */ + +static char upcase[0400] = + { 000, 001, 002, 003, 004, 005, 006, 007, + 010, 011, 012, 013, 014, 015, 016, 017, + 020, 021, 022, 023, 024, 025, 026, 027, + 030, 031, 032, 033, 034, 035, 036, 037, + 040, 041, 042, 043, 044, 045, 046, 047, + 050, 051, 052, 053, 054, 055, 056, 057, + 060, 061, 062, 063, 064, 065, 066, 067, + 070, 071, 072, 073, 074, 075, 076, 077, + 0100, 0101, 0102, 0103, 0104, 0105, 0106, 0107, + 0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117, + 0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127, + 0130, 0131, 0132, 0133, 0134, 0135, 0136, 0137, + 0140, 0101, 0102, 0103, 0104, 0105, 0106, 0107, + 0110, 0111, 0112, 0113, 0114, 0115, 0116, 0117, + 0120, 0121, 0122, 0123, 0124, 0125, 0126, 0127, + 0130, 0131, 0132, 0173, 0174, 0175, 0176, 0177, + 0200, 0201, 0202, 0203, 0204, 0205, 0206, 0207, + 0210, 0211, 0212, 0213, 0214, 0215, 0216, 0217, + 0220, 0221, 0222, 0223, 0224, 0225, 0226, 0227, + 0230, 0231, 0232, 0233, 0234, 0235, 0236, 0237, + 0240, 0241, 0242, 0243, 0244, 0245, 0246, 0247, + 0250, 0251, 0252, 0253, 0254, 0255, 0256, 0257, + 0260, 0261, 0262, 0263, 0264, 0265, 0266, 0267, + 0270, 0271, 0272, 0273, 0274, 0275, 0276, 0277, + 0300, 0301, 0302, 0303, 0304, 0305, 0306, 0307, + 0310, 0311, 0312, 0313, 0314, 0315, 0316, 0317, + 0320, 0321, 0322, 0323, 0324, 0325, 0326, 0327, + 0330, 0331, 0332, 0333, 0334, 0335, 0336, 0337, + 0340, 0341, 0342, 0343, 0344, 0345, 0346, 0347, + 0350, 0351, 0352, 0353, 0354, 0355, 0356, 0357, + 0360, 0361, 0362, 0363, 0364, 0365, 0366, 0367, + 0370, 0371, 0372, 0373, 0374, 0375, 0376, 0377 + }; + +main () +{ + char pat[80]; + struct re_pattern_buffer buf; + int i; + char c; + char fastmap[(1 << BYTEWIDTH)]; + char *gets(); + + buf.allocated = 40; + buf.buffer = (char *) malloc (buf.allocated); + buf.fastmap = fastmap; + buf.translate = upcase; + + while (gets(pat)) + { + + if (*pat) + { + re_compile_pattern (pat, strlen(pat), &buf); + + for (i = 0; i < buf.used; i++) + printchar (buf.buffer[i]); + + putchar ('\n'); + + printf ("%d allocated, %d used.\n", buf.allocated, buf.used); + + re_compile_fastmap (&buf); + printf ("Allowed by fastmap: "); + for (i = 0; i < (1 << BYTEWIDTH); i++) + if (fastmap[i]) printchar (i); + putchar ('\n'); + } + + gets (pat); /* Now read the string to match against */ + + i = re_match (&buf, pat, strlen (pat), 0, 0); + printf ("Match value %d.\n", i); + } +} + +#ifdef NOTDEF +print_buf (bufp) + struct re_pattern_buffer *bufp; +{ + int i; + + printf ("buf is :\n----------------\n"); + for (i = 0; i < bufp->used; i++) + printchar (bufp->buffer[i]); + + printf ("\n%d allocated, %d used.\n", bufp->allocated, bufp->used); + + printf ("Allowed by fastmap: "); + for (i = 0; i < (1 << BYTEWIDTH); i++) + if (bufp->fastmap[i]) + printchar (i); + printf ("\nAllowed by translate: "); + if (bufp->translate) + for (i = 0; i < (1 << BYTEWIDTH); i++) + if (bufp->translate[i]) + printchar (i); + printf ("\nfastmap is%s accurate\n", bufp->fastmap_accurate ? "" : "n't"); + printf ("can %s be null\n----------", bufp->can_be_null ? "" : "not"); +} +#endif + +printchar (c) + char c; +{ + if (c < 041 || c >= 0177) + { + putchar ('\\'); + putchar (((c >> 6) & 3) + '0'); + putchar (((c >> 3) & 7) + '0'); + putchar ((c & 7) + '0'); + } + else + putchar (c); +} + +error (string) + char *string; +{ + puts (string); + exit (1); +} + +#endif test diff --git a/regex.h b/regex.h new file mode 100644 index 00000000..b87b47e6 --- /dev/null +++ b/regex.h @@ -0,0 +1,213 @@ +/* Definitions for data structures callers pass the regex library. + Copyright (C) 1985 Free Software Foundation, Inc. + + NO WARRANTY + + BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY +NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT +WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC, +RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE THIS PROGRAM "AS IS" +WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, +BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND +FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY +AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE +DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR +CORRECTION. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M. +STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY +WHO MAY MODIFY AND REDISTRIBUTE THIS PROGRAM AS PERMITTED BELOW, BE +LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR +OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR +DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR +A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) THIS +PROGRAM, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH +DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY. + + GENERAL PUBLIC LICENSE TO COPY + + 1. You may copy and distribute verbatim copies of this source file +as you receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy a valid copyright notice "Copyright +(C) 1985 Free Software Foundation, Inc."; and include following the +copyright notice a verbatim copy of the above disclaimer of warranty +and of this License. + + 2. You may modify your copy or copies of this source file or +any portion of it, and copy and distribute such modifications under +the terms of Paragraph 1 above, provided that you also do the following: + + a) cause the modified files to carry prominent notices stating + that you changed the files and the date of any change; and + + b) cause the whole of any work that you distribute or publish, + that in whole or in part contains or is a derivative of this + program or any part thereof, to be freely distributed + and licensed to all third parties on terms identical to those + contained in this License Agreement (except that you may choose + to grant more extensive warranty protection to third parties, + at your option). + + 3. You may copy and distribute this program or any portion of it in +compiled, executable or object code form under the terms of Paragraphs +1 and 2 above provided that you do the following: + + a) cause each such copy to be accompanied by the + corresponding machine-readable source code, which must + be distributed under the terms of Paragraphs 1 and 2 above; or, + + b) cause each such copy to be accompanied by a + written offer, with no time limit, to give any third party + free (except for a nominal shipping charge) a machine readable + copy of the corresponding source code, to be distributed + under the terms of Paragraphs 1 and 2 above; or, + + c) in the case of a recipient of this program in compiled, executable + or object code form (without the corresponding source code) you + shall cause copies you distribute to be accompanied by a copy + of the written offer of source code which you received along + with the copy you received. + + 4. You may not copy, sublicense, distribute or transfer this program +except as expressly provided under this License Agreement. Any attempt +otherwise to copy, sublicense, distribute or transfer this program is void and +your rights to use the program under this License agreement shall be +automatically terminated. However, parties who have received computer +software programs from you with this License Agreement will not have +their licenses terminated so long as such parties remain in full compliance. + + +In other words, you are welcome to use, share and improve this program. +You are forbidden to forbid anyone else to use, share and improve +what you give them. Help stamp out software-hoarding! */ + + +#ifndef RE_NREGS +#define RE_NREGS 10 +#endif + + +/* JF for syntax stuff */ +/* To add more variable-syntax features, just use more bits. If we go over 16, + we probably should make obscure_syntax a long. (JF: Yes, virgina, there +really are 16 bit machines out there) */ +#define RE_NO_BK_PARENS (1<<0) +#define RE_NO_BK_VBAR (1<<1) + +/* This data structure is used to represent a compiled pattern. */ + +struct re_pattern_buffer + { + char *buffer; /* Space holding the compiled pattern commands. */ + int allocated; /* Size of space that buffer points to */ + int used; /* Length of portion of buffer actually occupied */ + char *fastmap; /* Pointer to fastmap, if any, or zero if none. */ + /* re_search uses the fastmap, if there is one, + to skip quickly over totally implausible characters */ + char *translate; /* Translate table to apply to all characters before comparing. + Or zero for no translation. + The translation is applied to a pattern when it is compiled + and to data when it is matched. */ + char fastmap_accurate; + /* Set to zero when a new pattern is stored, + set to one when the fastmap is updated from it. */ + char can_be_null; /* Set to one by compiling fastmap + if this pattern might match the null string. + It does not necessarily match the null string + in that case, but if this is zero, it cannot. + 2 as value means can match null string + but at end of range or before a character + listed in the fastmap. */ + }; + +/* Structure to store "register" contents data in. + + Pass the address of such a structure as an argument to re_match, etc., + if you want this information back. + + start[i] and end[i] record the string matched by \( ... \) grouping i, + for i from 1 to RE_NREGS - 1. + start[0] and end[0] record the entire string matched. */ + +struct re_registers + { + int start[RE_NREGS]; + int end[RE_NREGS]; + }; + +/* These are the command codes that appear in compiled regular expressions, one per byte. + Some command codes are followed by argument bytes. + A command code can specify any interpretation whatever for its arguments. + Zero-bytes may appear in the compiled regular expression. */ + +enum regexpcode + { + unused, + exactn, /* followed by one byte giving n, and then by n literal bytes */ + begline, /* fails unless at beginning of line */ + endline, /* fails unless at end of line */ + jump, /* followed by two bytes giving relative address to jump to */ + on_failure_jump, /* followed by two bytes giving relative address of place + to resume at in case of failure. */ + finalize_jump, /* Throw away latest failure point and then jump to address. */ + maybe_finalize_jump, /* Like jump but finalize if safe to do so. + This is used to jump back to the beginning + of a repeat. If the command that follows + this jump is clearly incompatible with the + one at the beginning of the repeat, such that + we can be sure that there is no use backtracking + out of repetitions already completed, + then we finalize. */ + dummy_failure_jump, /* jump, and push a dummy failure point. + This failure point will be thrown away + if an attempt is made to use it for a failure. + A + construct makes this before the first repeat. */ + anychar, /* matches any one character */ + charset, /* matches any one char belonging to specified set. + First following byte is # bitmap bytes. + Then come bytes for a bit-map saying which chars are in. + Bits in each byte are ordered low-bit-first. + A character is in the set if its bit is 1. + A character too large to have a bit in the map + is automatically not in the set */ + charset_not, /* similar but match any character that is NOT one of those specified */ + start_memory, /* starts remembering the text that is matched + and stores it in a memory register. + followed by one byte containing the register number. + Register numbers must be in the range 0 through NREGS. */ + stop_memory, /* stops remembering the text that is matched + and stores it in a memory register. + followed by one byte containing the register number. + Register numbers must be in the range 0 through NREGS. */ + duplicate, /* match a duplicate of something remembered. + Followed by one byte containing the index of the memory register. */ + before_dot, /* Succeeds if before dot */ + at_dot, /* Succeeds if at dot */ + after_dot, /* Succeeds if after dot */ + begbuf, /* Succeeds if at beginning of buffer */ + endbuf, /* Succeeds if at end of buffer */ + wordchar, /* Matches any word-constituent character */ + notwordchar, /* Matches any char that is not a word-constituent */ + wordbeg, /* Succeeds if at word beginning */ + wordend, /* Succeeds if at word end */ + wordbound, /* Succeeds if at a word boundary */ + notwordbound, /* Succeeds if not at a word boundary */ + syntaxspec, /* Matches any character whose syntax is specified. + followed by a byte which contains a syntax code, Sword or such like */ + notsyntaxspec /* Matches any character whose syntax differs from the specified. */ + }; + +extern char *re_compile_pattern (); +/* Is this really advertised? */ +extern void re_compile_fastmap (); +extern int re_search (), re_search_2 (); +extern int re_match (), re_match_2 (); + +/* 4.2 bsd compatibility (yuck) */ +extern char *re_comp (); +extern int re_exec (); + +#ifdef SYNTAX_TABLE +extern char *re_syntax_table; +#endif |