aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
blob: b16afb38fa995402f09e34512b5b8bf8676d01a9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
## What is `egawk`?

`egawk` is Enhanced GNU Awk. It is a fork of GNU Awk with some
enhancements designed and implemented by Kaz Kylheku.

**NOTE:** If you have problems with `egawk` or questions about it, please do
not contact the GNU Awk maintainers or the `bug-gawk` mailing list at
`gnu.org`. Only contact those people if you have a bug that you can
reproduce with the mainline GNU Awk. Always remember to try to produce
a minimal sample of code, and any required input, to reproduce the problem.

## The `@let` statement.

The `@let` statement in Enhanced GNU Awk provides block-scoped lexical
variables. The syntax looks like this:

    ::awk
    @let (x = 1, y = 3, z = f(x, y))
      print z

The token sequence `@` and `let` introduces the statement. This is
followed by a list of variable bindings in parentheses. That list is
then followed by a statement.

The statement is executed in a scope in which the variables are visible.

As the above example shows, the bindings are established sequentially, which is
why `z` can be initialized using an expression which depends on `x` and `y`.

A `@let` variable need not have an initializer:

    ::awk
    @let (a, b)
      print a == 0 && b == ""   # prints 1

Variables without an initializer are reliably initialized to the Awk
null value: the same value that is exhibited by ordinary Awk variables
that have not been assigned. This value compares equal to both 0 and
the empty string `""` under the `==` operator.

The scope of a `@let` variable begins immediately after its binding,
including initializing expression, if any. The following is possible:

    ::awk
    function f(x)
    {
       @let (x = x + 1)
         return x
    }

Here `x` is initialized with an expression that uses `x`. That expression
still refers to the previously visible `x`; the scope of the newly
introduced `x` begins after that initializing expression.
The new `x` shadows the previous `x`.

## Restrictions

`@let` variables may not have the same names as Awk's special variables such as
`NF`, `FS` and whatnot.

Inside a function, a `@let` variable must not have the same name as the
function.

Lastly, variables may not use namespace prefixes: `foo::bar` cannot be used
as a `@let` variables names.

These restrictions are not new; mainline GNU Awk's function parameters
have the same restrictions.

`@let` may appear inside functions, as well as outside of functions in
the actions bodies of patterns, and in the `BEGIN` and `END` blocks:

    ::awk
    BEGIN { @let (x = 3) ... }
    /^id=/ { @let (id = ...) ... }


## Rationale

Why not Javascript-like syntax?

    ::js
    {
       let x = 3
       ...
    }

The reason is that this syntax is not friendly toward macros. The motivation
for `egawk` comes from the [`cppawk`](https://www.kylheku.com/cgit/cppawk)
project. With `@let`, this sort of thing is possible:

    ::c
    #define repeat(n) @let (__c, __n = (n)) for (__c = 0; __c < __n; __c++)

Here, the expansion of `repeat(42)` produces the structure
`@let (...) for (...)` which just requires the addition of a statement
to produce a complete construct:

    ::c
    repeat(42) { print "hello" }

The Javascript-style syntax doesn't make it possible. We would have
to rely on the feature of declaring variables inside the `for`:

    ::js
    for (let __c = 0, __n = (n); __c < __n; c++)

This is not attractive because it requires us to inject the `let`
syntax into the phrase structure of every statement type: `if`,
`while`, `switch`.  Whereas the selected design blends easily with
any statement like a prefix:

    ::awk
    @let (x = 3)
      return x

    @let (x = c / 2) switch (x) {

    }

The `@` prefix in `@let` follows a convention established by GNU Awk.
GNU has extensions like `@include` for including files, and `@fun(arg)`
for indirect functions.

## Compatibility

If you have GNU Awk code that uses `let` as the name of an
indirect function, `egawk` interpret that as the start of a let statement.
It's possible that no syntax error will take place, only different
behavior. This GNU Awk program produces the output `42`, because
`@let()` means "call the function whose name is stored in the `let`
variable":

    ::awk
    function f() { print 42 }
    BEGIN { let = "f"; @let(); }

When executed with `egawk`, it produces no output, because `@let();`
looks like an empty let statement. The superfluous semicolon satisfies
its need for a statement and so everything parses.

## Implementation Notes

The implementation of `@let` is different inside functions versus outside.
`@let` statements outside of a function are compiled to code which
uses hidden, global variables. These variables have numbered names similar to
`$let0001`. When the GNU Awk `-d-` option is used to dump the symbol table,
these names show up in it.

Inside a function `@let` is compiled to code which assumes that the variables
are allocated in the function's local frame. Unmodified, upstream GNU Awk
has a parameter frame which is entirely dedicated to parameter passing.
Local variables are simulated by defining additional parameters, which
is a standard Awk idiom. Enhanced GNU Awk separates the frame into a parameter
area and a locals-only area that is off-limits to the parameter passing
mechanism. The compiler extends this local-only area to accommodate all the
`@let` variables that occur in the function.

Whether inside or outside a function, `@let` statements allocate variables
in a stack-like fashion. Whenever a `@let` scope terminates, the compiler
releases the storage locations used for that let, allowing them to be re-used
for a subsequent `@let`. Thus this program allocates exactly two hidden
global variables:

    ::awk
    BEGIN {
      @let (a, b);
      @let (c, d);
      @let (e, f);
    }

This one allocates three:

    ::awk
    BEGIN {
      @let (z) {
        @let (a, b);
        @let (c, d);
        @let (e, f);
      }
    }

In order to support the dynamic (compile-time) extension of the local frame
with new local variables, I changed the representation of the function
parameter frame. Upstream `gawk` has it as dynamic array of `NODE` objects; I
made it a dynamic array of `NODE *` pointers to individually allocated `NODE`
objects. Gawk's fixed array cannot be reallocated to fit the exact size,
because the `NODE` addresses would change, after they have been inserted into
generated bytecode. I have an idea for solving that, which could restore
the original representation.

Enhanced GNU Awk adds one bytecode instruction called `Op_clear_var`.
This is necessary to reset lexical variables. Recall from the above
paragraphs that lexicals variables whose scopes do not overlap are allocated
in the same storage. This causes several problems. When a new variable
is allocated in the space of an old one, the space contains the prior value.
That "garbage" must be cleared out. (Contrast that with the C language in
which uninitialized block-scope locals appear to have garbage values,
taking on whatever bits happen to be in the memory.) There is another
problem though, which affects even initialized variables. Awk does not
like it when a variable that holds an array is used as a scalar, or vice versa:

    ::awk
    x[3] = 42
    x = "abc"          # error: array used as scalar

    y = 3.14
    y["foo"] = "bar"   # error: scalar used as array

In standard Awk, and in GNU Awk, there is nothing that a program can do
change the variable `x` such that it forgets it was an array, or to
change `y` to forget that it was a scalar and work as an array.

The new `Op_clear_var` opcode used by the `@let` implementation in `egawk`
solves this problem, thanks to its access to the internal representation of
a variable.

## Credits

The `@let` syntax is inspired by Lisp:

    ::lisp
    (let* ((x 1)
           (y 2))
      ...)