aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKaz Kylheku <kaz@kylheku.com>2022-04-12 07:58:00 -0700
committerKaz Kylheku <kaz@kylheku.com>2022-04-12 07:58:00 -0700
commit2ab35a998fb2707ed63ffac9bc926db69f454ab3 (patch)
tree04465281e4ae1812e5ad70d26f2271f9ed17415a
parentca8dc591b84286462c941ebe88b9411cffbffbcc (diff)
downloadegawk-2ab35a998fb2707ed63ffac9bc926db69f454ab3.tar.gz
egawk-2ab35a998fb2707ed63ffac9bc926db69f454ab3.tar.bz2
egawk-2ab35a998fb2707ed63ffac9bc926db69f454ab3.zip
Add README.md for cgit, about egawk and @let.
* README.md: New file.
-rw-r--r--README.md225
-rw-r--r--pc/Makefile.tst3
2 files changed, 227 insertions, 1 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 00000000..b16afb38
--- /dev/null
+++ b/README.md
@@ -0,0 +1,225 @@
+## What is `egawk`?
+
+`egawk` is Enhanced GNU Awk. It is a fork of GNU Awk with some
+enhancements designed and implemented by Kaz Kylheku.
+
+**NOTE:** If you have problems with `egawk` or questions about it, please do
+not contact the GNU Awk maintainers or the `bug-gawk` mailing list at
+`gnu.org`. Only contact those people if you have a bug that you can
+reproduce with the mainline GNU Awk. Always remember to try to produce
+a minimal sample of code, and any required input, to reproduce the problem.
+
+## The `@let` statement.
+
+The `@let` statement in Enhanced GNU Awk provides block-scoped lexical
+variables. The syntax looks like this:
+
+ ::awk
+ @let (x = 1, y = 3, z = f(x, y))
+ print z
+
+The token sequence `@` and `let` introduces the statement. This is
+followed by a list of variable bindings in parentheses. That list is
+then followed by a statement.
+
+The statement is executed in a scope in which the variables are visible.
+
+As the above example shows, the bindings are established sequentially, which is
+why `z` can be initialized using an expression which depends on `x` and `y`.
+
+A `@let` variable need not have an initializer:
+
+ ::awk
+ @let (a, b)
+ print a == 0 && b == "" # prints 1
+
+Variables without an initializer are reliably initialized to the Awk
+null value: the same value that is exhibited by ordinary Awk variables
+that have not been assigned. This value compares equal to both 0 and
+the empty string `""` under the `==` operator.
+
+The scope of a `@let` variable begins immediately after its binding,
+including initializing expression, if any. The following is possible:
+
+ ::awk
+ function f(x)
+ {
+ @let (x = x + 1)
+ return x
+ }
+
+Here `x` is initialized with an expression that uses `x`. That expression
+still refers to the previously visible `x`; the scope of the newly
+introduced `x` begins after that initializing expression.
+The new `x` shadows the previous `x`.
+
+## Restrictions
+
+`@let` variables may not have the same names as Awk's special variables such as
+`NF`, `FS` and whatnot.
+
+Inside a function, a `@let` variable must not have the same name as the
+function.
+
+Lastly, variables may not use namespace prefixes: `foo::bar` cannot be used
+as a `@let` variables names.
+
+These restrictions are not new; mainline GNU Awk's function parameters
+have the same restrictions.
+
+`@let` may appear inside functions, as well as outside of functions in
+the actions bodies of patterns, and in the `BEGIN` and `END` blocks:
+
+ ::awk
+ BEGIN { @let (x = 3) ... }
+ /^id=/ { @let (id = ...) ... }
+
+
+## Rationale
+
+Why not Javascript-like syntax?
+
+ ::js
+ {
+ let x = 3
+ ...
+ }
+
+The reason is that this syntax is not friendly toward macros. The motivation
+for `egawk` comes from the [`cppawk`](https://www.kylheku.com/cgit/cppawk)
+project. With `@let`, this sort of thing is possible:
+
+ ::c
+ #define repeat(n) @let (__c, __n = (n)) for (__c = 0; __c < __n; __c++)
+
+Here, the expansion of `repeat(42)` produces the structure
+`@let (...) for (...)` which just requires the addition of a statement
+to produce a complete construct:
+
+ ::c
+ repeat(42) { print "hello" }
+
+The Javascript-style syntax doesn't make it possible. We would have
+to rely on the feature of declaring variables inside the `for`:
+
+ ::js
+ for (let __c = 0, __n = (n); __c < __n; c++)
+
+This is not attractive because it requires us to inject the `let`
+syntax into the phrase structure of every statement type: `if`,
+`while`, `switch`. Whereas the selected design blends easily with
+any statement like a prefix:
+
+ ::awk
+ @let (x = 3)
+ return x
+
+ @let (x = c / 2) switch (x) {
+
+ }
+
+The `@` prefix in `@let` follows a convention established by GNU Awk.
+GNU has extensions like `@include` for including files, and `@fun(arg)`
+for indirect functions.
+
+## Compatibility
+
+If you have GNU Awk code that uses `let` as the name of an
+indirect function, `egawk` interpret that as the start of a let statement.
+It's possible that no syntax error will take place, only different
+behavior. This GNU Awk program produces the output `42`, because
+`@let()` means "call the function whose name is stored in the `let`
+variable":
+
+ ::awk
+ function f() { print 42 }
+ BEGIN { let = "f"; @let(); }
+
+When executed with `egawk`, it produces no output, because `@let();`
+looks like an empty let statement. The superfluous semicolon satisfies
+its need for a statement and so everything parses.
+
+## Implementation Notes
+
+The implementation of `@let` is different inside functions versus outside.
+`@let` statements outside of a function are compiled to code which
+uses hidden, global variables. These variables have numbered names similar to
+`$let0001`. When the GNU Awk `-d-` option is used to dump the symbol table,
+these names show up in it.
+
+Inside a function `@let` is compiled to code which assumes that the variables
+are allocated in the function's local frame. Unmodified, upstream GNU Awk
+has a parameter frame which is entirely dedicated to parameter passing.
+Local variables are simulated by defining additional parameters, which
+is a standard Awk idiom. Enhanced GNU Awk separates the frame into a parameter
+area and a locals-only area that is off-limits to the parameter passing
+mechanism. The compiler extends this local-only area to accommodate all the
+`@let` variables that occur in the function.
+
+Whether inside or outside a function, `@let` statements allocate variables
+in a stack-like fashion. Whenever a `@let` scope terminates, the compiler
+releases the storage locations used for that let, allowing them to be re-used
+for a subsequent `@let`. Thus this program allocates exactly two hidden
+global variables:
+
+ ::awk
+ BEGIN {
+ @let (a, b);
+ @let (c, d);
+ @let (e, f);
+ }
+
+This one allocates three:
+
+ ::awk
+ BEGIN {
+ @let (z) {
+ @let (a, b);
+ @let (c, d);
+ @let (e, f);
+ }
+ }
+
+In order to support the dynamic (compile-time) extension of the local frame
+with new local variables, I changed the representation of the function
+parameter frame. Upstream `gawk` has it as dynamic array of `NODE` objects; I
+made it a dynamic array of `NODE *` pointers to individually allocated `NODE`
+objects. Gawk's fixed array cannot be reallocated to fit the exact size,
+because the `NODE` addresses would change, after they have been inserted into
+generated bytecode. I have an idea for solving that, which could restore
+the original representation.
+
+Enhanced GNU Awk adds one bytecode instruction called `Op_clear_var`.
+This is necessary to reset lexical variables. Recall from the above
+paragraphs that lexicals variables whose scopes do not overlap are allocated
+in the same storage. This causes several problems. When a new variable
+is allocated in the space of an old one, the space contains the prior value.
+That "garbage" must be cleared out. (Contrast that with the C language in
+which uninitialized block-scope locals appear to have garbage values,
+taking on whatever bits happen to be in the memory.) There is another
+problem though, which affects even initialized variables. Awk does not
+like it when a variable that holds an array is used as a scalar, or vice versa:
+
+ ::awk
+ x[3] = 42
+ x = "abc" # error: array used as scalar
+
+ y = 3.14
+ y["foo"] = "bar" # error: scalar used as array
+
+In standard Awk, and in GNU Awk, there is nothing that a program can do
+change the variable `x` such that it forgets it was an array, or to
+change `y` to forget that it was a scalar and work as an array.
+
+The new `Op_clear_var` opcode used by the `@let` implementation in `egawk`
+solves this problem, thanks to its access to the internal representation of
+a variable.
+
+## Credits
+
+The `@let` syntax is inspired by Lisp:
+
+ ::lisp
+ (let* ((x 1)
+ (y 2))
+ ...)
diff --git a/pc/Makefile.tst b/pc/Makefile.tst
index 98ec2853..16d19a6d 100644
--- a/pc/Makefile.tst
+++ b/pc/Makefile.tst
@@ -158,7 +158,8 @@ BASIC_TESTS = \
getline4 getline5 getlnbuf getnr2tb getnr2tm gsubasgn gsubtest \
gsubtst2 gsubtst3 gsubtst4 gsubtst5 gsubtst6 gsubtst7 gsubtst8 \
hex hex2 hsprint inpref inputred intest intprec iobug1 leaddig \
- leadnl litoct longsub longwrds manglprm math membug1 memleak \
+ leadnl litoct let1 let2 let3 let4 let5 let6 \
+ longsub longwrds manglprm math membug1 memleak \
messages minusstr mmap8k nasty nasty2 negexp negrange nested \
nfldstr nfloop nfneg nfset nlfldsep nlinstr nlstrina noeffect \
nofile nofmtch noloop1 noloop2 nonl noparms nors nulinsrc \