aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawktexi.in
diff options
context:
space:
mode:
authorKaz Kylheku <kaz@kylheku.com>2022-04-09 11:55:51 -0700
committerKaz Kylheku <kaz@kylheku.com>2022-04-09 11:55:51 -0700
commit5a0d83f4b42b9ca28cc6b8dd190a570c47b203c8 (patch)
tree2ef4893a8f91d411f48f527877b35f180597d2ed /doc/gawktexi.in
parentb6fd48c9891858d9f84ee49b6735be5db950a8a0 (diff)
downloadegawk-5a0d83f4b42b9ca28cc6b8dd190a570c47b203c8.tar.gz
egawk-5a0d83f4b42b9ca28cc6b8dd190a570c47b203c8.tar.bz2
egawk-5a0d83f4b42b9ca28cc6b8dd190a570c47b203c8.zip
Feature: @let statement provides block-scoped locals
This is a feature which builds on the @local work, providing a statement which looks like this: @let (var1 [= init1], var2 [= init2], ...) statement The variables are bound, given their optional values or else left undefined, and the statement is executed in the scope in which they exist. These variables may have the same name as existing block-scoped locals or function-wide @local locals, or function parameters, in which case shadowing takes place. * awk.h (NODETYPE): New enum symbol Node_alias. This marks entries in the new symbol table alias_table which is needed for compiling @let statements that occur outside of functions. (NODE): New union member sub.nodep.n. Member sub.nodep.rn is moved into this union becoming sub.nodep.n.rn. New member sub.nodep.n.rpn, of type NODE **. This is now the the implementation of fparms. (nxparam): New macro, denoting sub.nodep.x.extra. This is used as the link pointer for maintaining a stack of lexicals while compiling a function. (let_alias): New macro, denoting sub.nodep.l.lptr. This is the new Node_alias type's semantic payload: the node being aliased. (frame_cnt): New macro for sub.nodep.reserved. This is used to keep track of a function's frame size (number of local variables). If there are no local variables other than function parameters, then then f->frame_cnt == f->param_cnt. Otherwise f->frame_cnt > f->param_cnt. Note that traditional Awk local variables, which are indistinguishable from parameters, are part of param_cnt. We are talking about the new style local variables here. (fparms): Now sub.nodep.n.rpn. (for_array, xarray): Macros follow the move of sub.nodep.rn to sub.nodep.n.rn. (OPCODE): New opcode, Op_clear_val for clearing a variable to the null value. (make_params): Return NODE **, rather than NODE *. (extend_locals, install_let, install_global_let, remove_let): Functions declared. * awkgram.y (in_function): Global variable changes type from bool to INSTRUCTION *. It still functions as a Boolean, indicating that the parser is processing a function, but it also gives the instruction node. (in_loop): New static variable: loop nesting count. If this is positive, we are parsing a construct located inside a loop. (let_free): New static variable. This is the free list of lexical variables. When, during the compilation of a function, a let block is done, the lexical variable nodes are returned to the free list. The next time a let block opens, it can allocate nodes from that list. Thanks to this recycling list, lexical variables with non-overlapping scopes will map to the same underlying locations in the function's param/local frame. (let_stack): New static variable. When a new let variable is introduced into the scope, it is pushed onto this stack. The @let construct remembers the original stack top. When it's done compiling, it pops the stack back down to the original top, and returns all the entries thus removed into the let_free list. (LEX_LOCAL, LEX_LET): New terminal symbols for the param keyword in the @ param notation. (function_prologue): Reset let_stack and let_free to NULL when starting to compile function. Store the function node $1 into in_function. (statement): For all the looping constructs, decrement the in_loop nesting count that is incremented by the lexical analyzer. New @ 'LEX_LET' '(' ... phrase structure here. Add new production for '@' LEX_LOCAL. (local_var_list_opt, local_var_list): New nonterminal symbols. (let_var_list_opt, let_var_list): New nonterminal symbols. (simple_variable): New production for @ LEX_LOCAL ':' NAME. This is a copy of the NAME production, with the additional logic that when in_function is true, the symbol is added as a local to the current function via add_local before being processed as a variable reference. So it is as if it had been a parameter all along. (tokentab): Register "let" with the LEX_LET token number, attributing it as a special symbol that is a Gawk extension, much like "include" and other "@" items. Same for "local" and LET_LOCAL. (yylex): Recognize "let" and "local" only after "@". Increment in_loop nesting counter for for, while and do. (parms_shadow): Check the entire frame, not just the parameters, for shadowing. Update to NODE ** fparms representation. (mk_function): Updated due to rename of remove_params to remove_locals. (install_function): Pass the new third parameter to make_params in order to receive the allocation size of the parameter vector. This is installed into f->param_alloc. The new add_local function makes use of this to diagnose it if there is no room to add parameters. Update to NODE ** fparams, and initialize frame_cnt equal to param_cnt. (add_local): New static function for adding a parameter to the function currently being compiled. This has to check perform diagnostics on the local variable, similar to the checks done on a parameter. The extend_locals function in symbol.c is relied upon to do the reallocation to add the parameter and also register it in the symbol table. We strdup parm->lextok parameter name, because that will later be freed during parsing. (add_let): New function, closely based on add_local, but with these differences: no duplicate check since shadowing is allowed. And extend_locals is only used if the let_free list is empty. (check_params): A few parameter checks are moved into the new check_param function. This was because they were shared with an earlier implementation of add_local. The function then got cloned into check_locals in order to have different wording. (check_param): New static function, split off from check_params. (check_local): New static function, closely based on check_param, but using different wording to distinguish locals from params. Uses NULL value of fname to disable checks; this is for when we are are compiling a let that isn't in a function. (genym): New static function. Used for generating anonymous globals. Let variables outside of functions are implemented as aliases for anonymous globals. The globals are allocated/freed in a stack-like manner, exactly like frame locations are allocated for let variables inside a function. * command.y (variable_generator): Update to NODE ** fparms. * debug.c (do_info): Report the locals as if they were parameters, so that they are visible under debugging. Update parameter access to reflect NODE ** representation. (find_param): Find a local variable too. Update to NODE ** fparms. (print_function, print_memory, print_instruction): Adjust to NODE ** fparms. In print_instruction, handle new Op_clear_var. (do_eval): Follow rename of remove_params to remove_locals. Append all the locals to the frame stack. Update to NODE ** fparms. (parse_condition): Follow rename of remove_params to remove_locals. * eval.c (nodetypes): Entry for Node_alias. (optypes): Entry for Op_clear_var. (setup_frame): Allocate the full locals frame (f->frame_cnt), but check the arguments against only the param count (f->param_cnt). Update to NODE ** fparms. (restore_frame): Destroy all the locals, not just parameters. * interpret.h (r_interpret): Implement Op_clear_var opcode. * profile.c (func_params): Change to NODE ** type. (pprint, pp_func): Update to NODE ** fparms. In pprint Handle Op_clear_var. * symbol.c (alias_table): New static variable. (TAB_PARAM, TAB_ALIAS, TAB_GLOBAL, TAB_FUNC, TAB_SYM, TAB_COUNT): New enum symbols. These identify indices into the tables array. (tables): New static array variable: replaces the automatic array used in the lookup function which is initialized on every lookup call. This array is no longer terminated by a null pointer. (init_symbol_table): Initialize alias_table. Initialize entries of tables. (lookup): Local array tables and its initialization removed. Iterate over table up to below TAB_COUNT rather than by looking for a null pointer. Check for the global_table by comparing the index to TAB_GLOBAL rather than the table pointer to global_table. Interestingly, gawk already has support for placing multiple NODE objects in the param_table under the same name with LIFO discipline, i.e. shadowing. Each node in the param_table is understood to have a stack of duplicates using the dup_ent link pointer. Only thing is, the lookup() function ignores this. We fix it so that if there is a stack of duplicates, it substitutes the top indicated by dup_ent. We implement the alias table: if lookup finds an entry in this table, we don't return that entry, but what it aliases for, indicated by n->let_alias. The alias table thus implements symbol macros; Node_alias nodes are never seen at run-time because aliases resolve during compiling, and that's why we don't check for them in any of the run-time code, like eval.c, debug.c, profile.c or interpret.h. (make_params): Returns NODE ** now and allocates array of pointers, using individual getnode calls to allocate the NODE objects, rather than allocating params as a contiguous block of NODE objects. (extend_locals): New function. This reallocates the param array to hold one more parameter, on the assumption that the caller's lcount parameter is one greater than the current size. The parameter is initialized and registered in the symbol table. (install_params): Update to NODE ** fparms. (remove_params): Renamed to remove_locals. Refers to frame_cnt rather than param_cnt. Symbol logic moved to remove_common. (remove_common): New static function, factored out from remove_locals. (install_let): New public wrapper around install: lets us install a node under a specific alias name. (install_global_let): New function: this is the public interface for installing an entry in the alias table. (remove_let): New function: essentially a wrapper around remove_common. Previously, this functionality is only exposed to other modules via the install_params and remove_locals, which take a function object from which they retrieve an array of locals. For the let construct, we need more flexibility to bind and unbind individual symbols, not everything at once. (destroy_symbol): Refer to frame_cnt rather than param_cnt. (install): Handle nodes of NODETYPE Node_alias, by selecting the alias_table. The alias_table supports entries via the dup_ent mechanism just like param_table. (check_param_names): Refer to frame_cnt rather than param_cnt and update to NODE ** representation of fparms. * test/let[1-6].awk, test/let[1-6].ok: New files * test/Makefile.am, test/Makefile.in, test/Maketests: rules for new tests. * doc/gawktexi.in, doc/gawk.texi: Documented.
Diffstat (limited to 'doc/gawktexi.in')
-rw-r--r--doc/gawktexi.in185
1 files changed, 183 insertions, 2 deletions
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index bfefda24..651bd8d2 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -778,6 +778,7 @@ particular records in a file and perform operations upon them.
* Function Calling:: Calling user-defined functions.
* Calling A Function:: Don't use spaces.
* Variable Scope:: Controlling variable scope.
+* Local Variables:: Enhanced Awk (@command{egawk}) local variables.
* Pass By Value/Reference:: Passing parameters.
* Function Caveats:: Other points to know about functions.
* Return Statement:: Specifying the value a function
@@ -20376,6 +20377,7 @@ the function.
@menu
* Calling A Function:: Don't use spaces.
* Variable Scope:: Controlling variable scope.
+* Local Variables:: Enhanced Awk (@command{egawk}) local variables.
* Pass By Value/Reference:: Passing parameters.
* Function Caveats:: Other points to know about functions.
@end menu
@@ -20414,8 +20416,11 @@ there is no way to make a variable local to a @code{@{} @dots{} @code{@}} block
good practice to do so whenever a variable is needed only in that
function.
-To make a variable local to a function, simply declare the variable as
-an argument after the actual function arguments
+Enhanced GNU Awk (@command{egawk}) has language extensions in this area,
+described in @ref{Local Variables}.
+
+In standard @command{awk}, to make a variable local to a function, simply declare the
+variable as an argument after the actual function arguments
(@pxref{Definition Syntax}).
Look at the following example, where variable
@code{i} is a global variable used by both functions @code{foo()} and
@@ -20540,6 +20545,182 @@ At level 2, index 1 is not found in a
At level 2, index 2 is found in a
@end example
+@node Local Variables
+@subsubsection @command{egawk} Local Variable Extension
+@cindex @code{@@let} statement
+This @value{SECTION} describes an extension specific to a custom
+version of GNU Awk called Enhanced GNU Awk, which is installed
+under the command name @command{egawk}.
+
+As documented in @ref{Variable Scope}, function-wide local variables
+are defined as function parameters in standard @command{awk}. The
+language does not distinguish parameters used as local variables
+from true parameters that receive arguments. This is only a programmer
+convention, which is enforced by discipline and the use of traditional
+annotation devices, such as visually separating the parameters intended
+for argument passing from the parameters intended to serve as local
+variables.
+
+@command{egawk} provides a language extension in this area, allowing
+the programmer to specify conventional function-wide local variables which do
+not appear in the parameter list and cannot receive arguments.
+
+The extension takes the form of the construct @code{@@let}
+statement.
+
+The @code{@@let} statement is introduced by the @code{@@} symbol
+followed by the special keyword @code{let}. These tokens are
+then followed by a comma-separated list of variable declarators,
+enclosed in parentheses. After the parentheses comes a required statement,
+The list of variables may be empty.
+
+The statement is executed in a scope in which the specified variables are
+visible, in addition to any other variables that were previously visible that
+do not have the same names. When the statement terminates, the variables
+specified in that statement disappear.
+
+Declarators consist of variable names, optionally initialized by expressions.
+The initializing expressions are indicated by the @code{=} sign:
+
+@example
+function fun(x)
+{
+ ...
+ @let (a, b = 3, ir2 = 0.707) {
+ ...
+ }
+ ...
+}
+@end example
+
+Local variables introduced by @code{@@let} may have the same names as global
+variables, or, in a function, the parameter names of the enclosing function.
+In this situation, over their scope, the @code{@@let} variables are visible,
+hiding the same-named parameters or variables. This is called @emph{shadowing}.
+
+Shadowing also takes place among same-named @code{@@let} variables,
+which occurs when a variable name is repeated in the same @code{@@let}
+construct, or in two different @code{@@let} constructs which are nested.
+
+A @code{@@let} variable may not have the same name as the enclosing
+function, or the same name as an Awk special variable such as @code{NF}.
+A name with a namespace prefix such as @code{awk::score} also may not be used
+as a local variable.
+
+The @code{@@let} construct may be used outside or inside of a function.
+The semantics is identical, but the implementation is different.
+Inside a function, the construct allocates variables from the local variable
+frame of the function invocation. Outside of a function, it allocates
+anonymous variables from the global namespace. These hidden variables
+can be seen in the output of the @code{-d} option, having numbered names
+which look like @code{$let0001}. This is an implementation detail that may
+change in the future.
+
+A local variable that has no initializing expression has the empty numeric
+string value, just like a regular Awk variable that has not been assigned: it
+compares equal to the empty string as well as to zero.
+
+In the following example, the function's first reference to @code{accum} is a
+reference to the global variable. The second reference is local.
+
+@example
+function fun()
+{
+ accum = 42
+ @let (accum) {
+ print "fun: accum = ", accum
+ accum = 43
+ }
+}
+
+BEGIN { fun(); print "BEGIN: accum = ", accum }
+@end example
+
+The output is
+
+@example
+fun: accum =
+BEGIN: accum = 42
+@end example
+
+After the @code{@@let} statement inside the function, @code{accum} no longer
+appears to have a defined value, even though @code{accum} was just assigned the
+value 42. This is because @code{@@let} has introduced a local variable
+unrelated to any global variable, and that variable is not initialized.
+
+The @code{print} statement in the @code{BEGIN} block confirms that the
+assigning the value 43 to the local @code{accum} had no effect on the global
+@code{accum}.
+
+The scope of a local variable begins from its point of declaration, just
+after the initializing expression, if any. The initializing expression
+is evaluated in a scope in which the variable is not yet visible.
+
+@example
+function helper()
+{
+ print "helper: level =", level
+}
+
+function main()
+{
+ @let (level = level + 1) {
+ print "main: level =", level
+ helper()
+ }
+}
+
+BEGIN {
+ level = 0
+ main()
+}
+@end example
+
+the output is:
+
+@example
+main: level = 1
+helper: level = 0
+@end example
+
+In this example, the function @code{main} locally shadows the global
+variable @code{level}, giving the local @code{level} value which is one greater
+than the global @code{level}.
+
+This local variable is lexically scoped; when @code{main} invokes
+@code{helper}, it is evident that @code{helper} is, again, referring to the
+global @code{level} variable; the @code{helper} function has no visibility
+into the scope of the caller, @code{main}.
+
+Because a local variable's scope begins immediately after its declaration,
+within a single @code{@@let} statement, the initializing expressions of
+later variables are evaluated in the scope of earlier variables. Furthermore,
+later variables may repeat the names of earlier variables. These later
+variables are new variables which shadow the earlier ones.
+
+The following statement makes sense:
+
+@example
+BEGIN {
+ @let (x = 0, x = x + 1, x = x + 1, x = x + 1)
+ print x
+}
+@end example
+
+Output:
+
+@example
+3
+@end example
+
+While the variable initializations may resemble the steps of an
+imperative program which assigns four successive values to a single
+accumulator, that is not the case; four different variables named
+@code{x} are defined here, each one shadowing the preceding one.
+The @code{print} statement is then executed in the scope of the
+rightmost @code{x}. The initializing expressions @code{x + 1}
+have the previous @code{x} still in scope.
+
@node Pass By Value/Reference
@subsubsection Passing Function Arguments by Value Or by Reference