From 2ab35a998fb2707ed63ffac9bc926db69f454ab3 Mon Sep 17 00:00:00 2001 From: Kaz Kylheku Date: Tue, 12 Apr 2022 07:58:00 -0700 Subject: Add README.md for cgit, about egawk and @let. * README.md: New file. --- README.md | 225 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ pc/Makefile.tst | 3 +- 2 files changed, 227 insertions(+), 1 deletion(-) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 00000000..b16afb38 --- /dev/null +++ b/README.md @@ -0,0 +1,225 @@ +## What is `egawk`? + +`egawk` is Enhanced GNU Awk. It is a fork of GNU Awk with some +enhancements designed and implemented by Kaz Kylheku. + +**NOTE:** If you have problems with `egawk` or questions about it, please do +not contact the GNU Awk maintainers or the `bug-gawk` mailing list at +`gnu.org`. Only contact those people if you have a bug that you can +reproduce with the mainline GNU Awk. Always remember to try to produce +a minimal sample of code, and any required input, to reproduce the problem. + +## The `@let` statement. + +The `@let` statement in Enhanced GNU Awk provides block-scoped lexical +variables. The syntax looks like this: + + ::awk + @let (x = 1, y = 3, z = f(x, y)) + print z + +The token sequence `@` and `let` introduces the statement. This is +followed by a list of variable bindings in parentheses. That list is +then followed by a statement. + +The statement is executed in a scope in which the variables are visible. + +As the above example shows, the bindings are established sequentially, which is +why `z` can be initialized using an expression which depends on `x` and `y`. + +A `@let` variable need not have an initializer: + + ::awk + @let (a, b) + print a == 0 && b == "" # prints 1 + +Variables without an initializer are reliably initialized to the Awk +null value: the same value that is exhibited by ordinary Awk variables +that have not been assigned. This value compares equal to both 0 and +the empty string `""` under the `==` operator. + +The scope of a `@let` variable begins immediately after its binding, +including initializing expression, if any. The following is possible: + + ::awk + function f(x) + { + @let (x = x + 1) + return x + } + +Here `x` is initialized with an expression that uses `x`. That expression +still refers to the previously visible `x`; the scope of the newly +introduced `x` begins after that initializing expression. +The new `x` shadows the previous `x`. + +## Restrictions + +`@let` variables may not have the same names as Awk's special variables such as +`NF`, `FS` and whatnot. + +Inside a function, a `@let` variable must not have the same name as the +function. + +Lastly, variables may not use namespace prefixes: `foo::bar` cannot be used +as a `@let` variables names. + +These restrictions are not new; mainline GNU Awk's function parameters +have the same restrictions. + +`@let` may appear inside functions, as well as outside of functions in +the actions bodies of patterns, and in the `BEGIN` and `END` blocks: + + ::awk + BEGIN { @let (x = 3) ... } + /^id=/ { @let (id = ...) ... } + + +## Rationale + +Why not Javascript-like syntax? + + ::js + { + let x = 3 + ... + } + +The reason is that this syntax is not friendly toward macros. The motivation +for `egawk` comes from the [`cppawk`](https://www.kylheku.com/cgit/cppawk) +project. With `@let`, this sort of thing is possible: + + ::c + #define repeat(n) @let (__c, __n = (n)) for (__c = 0; __c < __n; __c++) + +Here, the expansion of `repeat(42)` produces the structure +`@let (...) for (...)` which just requires the addition of a statement +to produce a complete construct: + + ::c + repeat(42) { print "hello" } + +The Javascript-style syntax doesn't make it possible. We would have +to rely on the feature of declaring variables inside the `for`: + + ::js + for (let __c = 0, __n = (n); __c < __n; c++) + +This is not attractive because it requires us to inject the `let` +syntax into the phrase structure of every statement type: `if`, +`while`, `switch`. Whereas the selected design blends easily with +any statement like a prefix: + + ::awk + @let (x = 3) + return x + + @let (x = c / 2) switch (x) { + + } + +The `@` prefix in `@let` follows a convention established by GNU Awk. +GNU has extensions like `@include` for including files, and `@fun(arg)` +for indirect functions. + +## Compatibility + +If you have GNU Awk code that uses `let` as the name of an +indirect function, `egawk` interpret that as the start of a let statement. +It's possible that no syntax error will take place, only different +behavior. This GNU Awk program produces the output `42`, because +`@let()` means "call the function whose name is stored in the `let` +variable": + + ::awk + function f() { print 42 } + BEGIN { let = "f"; @let(); } + +When executed with `egawk`, it produces no output, because `@let();` +looks like an empty let statement. The superfluous semicolon satisfies +its need for a statement and so everything parses. + +## Implementation Notes + +The implementation of `@let` is different inside functions versus outside. +`@let` statements outside of a function are compiled to code which +uses hidden, global variables. These variables have numbered names similar to +`$let0001`. When the GNU Awk `-d-` option is used to dump the symbol table, +these names show up in it. + +Inside a function `@let` is compiled to code which assumes that the variables +are allocated in the function's local frame. Unmodified, upstream GNU Awk +has a parameter frame which is entirely dedicated to parameter passing. +Local variables are simulated by defining additional parameters, which +is a standard Awk idiom. Enhanced GNU Awk separates the frame into a parameter +area and a locals-only area that is off-limits to the parameter passing +mechanism. The compiler extends this local-only area to accommodate all the +`@let` variables that occur in the function. + +Whether inside or outside a function, `@let` statements allocate variables +in a stack-like fashion. Whenever a `@let` scope terminates, the compiler +releases the storage locations used for that let, allowing them to be re-used +for a subsequent `@let`. Thus this program allocates exactly two hidden +global variables: + + ::awk + BEGIN { + @let (a, b); + @let (c, d); + @let (e, f); + } + +This one allocates three: + + ::awk + BEGIN { + @let (z) { + @let (a, b); + @let (c, d); + @let (e, f); + } + } + +In order to support the dynamic (compile-time) extension of the local frame +with new local variables, I changed the representation of the function +parameter frame. Upstream `gawk` has it as dynamic array of `NODE` objects; I +made it a dynamic array of `NODE *` pointers to individually allocated `NODE` +objects. Gawk's fixed array cannot be reallocated to fit the exact size, +because the `NODE` addresses would change, after they have been inserted into +generated bytecode. I have an idea for solving that, which could restore +the original representation. + +Enhanced GNU Awk adds one bytecode instruction called `Op_clear_var`. +This is necessary to reset lexical variables. Recall from the above +paragraphs that lexicals variables whose scopes do not overlap are allocated +in the same storage. This causes several problems. When a new variable +is allocated in the space of an old one, the space contains the prior value. +That "garbage" must be cleared out. (Contrast that with the C language in +which uninitialized block-scope locals appear to have garbage values, +taking on whatever bits happen to be in the memory.) There is another +problem though, which affects even initialized variables. Awk does not +like it when a variable that holds an array is used as a scalar, or vice versa: + + ::awk + x[3] = 42 + x = "abc" # error: array used as scalar + + y = 3.14 + y["foo"] = "bar" # error: scalar used as array + +In standard Awk, and in GNU Awk, there is nothing that a program can do +change the variable `x` such that it forgets it was an array, or to +change `y` to forget that it was a scalar and work as an array. + +The new `Op_clear_var` opcode used by the `@let` implementation in `egawk` +solves this problem, thanks to its access to the internal representation of +a variable. + +## Credits + +The `@let` syntax is inspired by Lisp: + + ::lisp + (let* ((x 1) + (y 2)) + ...) diff --git a/pc/Makefile.tst b/pc/Makefile.tst index 98ec2853..16d19a6d 100644 --- a/pc/Makefile.tst +++ b/pc/Makefile.tst @@ -158,7 +158,8 @@ BASIC_TESTS = \ getline4 getline5 getlnbuf getnr2tb getnr2tm gsubasgn gsubtest \ gsubtst2 gsubtst3 gsubtst4 gsubtst5 gsubtst6 gsubtst7 gsubtst8 \ hex hex2 hsprint inpref inputred intest intprec iobug1 leaddig \ - leadnl litoct longsub longwrds manglprm math membug1 memleak \ + leadnl litoct let1 let2 let3 let4 let5 let6 \ + longsub longwrds manglprm math membug1 memleak \ messages minusstr mmap8k nasty nasty2 negexp negrange nested \ nfldstr nfloop nfneg nfset nlfldsep nlinstr nlstrina noeffect \ nofile nofmtch noloop1 noloop2 nonl noparms nors nulinsrc \ -- cgit v1.2.3