aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
blob: 8fa661cc85679cb4e6be1a13c25a76144b6425ae (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
## What is `cppawk`?

`cppawk` is a tiny shell script that is used like `awk`. It invokes
the C preprocessor (GNU `cpp`) on the Awk code and calls Awk on the result.

`cppawk` understands the basic Awk options like `-F` and `-v`, and also
understands common `cpp` options like `-I` and `-Dmacro=value`.
The [`cppawk` man page](../tree/cppawk.1) describes all the invocation and
usage details.

For instance, if we define a file called `awkloop.h` which has these contents:

    ::c
    #define awkloop(file)  for (; getline < file > 0 || (close(file) && 0); )
    #define nextrec        continue
    #define rule(cond)     if (cond)

Then this sort of code is possible:

    ::c
    #include "awkloop.h"

    function main()
    {
      awkloop ("/proc/mounts") {
        rule ($3 != "ext4") { nextrec }
        rule ($2 == "/") { print $1 }
      }
    }

    BEGIN {
      main()
    }

We have implemented a facsimile of an Awk input-scanning loop inside a function
with a bit of syntactic sugar. However, these few preprocessing directives are
just a toy example, compared to what is provided in the `cppawk` standard
headers.

`cppawk` has few dependencies. It's written in shell, and makes use of the
`sed` and `printf` utilities.  Preprocessed programs can be captured and
transferred for execution to systems that have Awk but do not have a
preprocessor.

## The `cppawk` Library

`cppawk` is sprouting a small library of useful macros and functions.
One of them is a powerful `loop` facility that allows iteration (both
parallel and nested) to be expressed by combining abstract clauses.

Here is a program designed to demonstrate the `cppawk` `loop`
macro, with its multiple clauses.  It solves the following problem: a
projectile is fired vertically with an initial speed of 5. Every step of the
simulation, the speed drops by 1 due to gravity, eventually becoming negative.
What is the maximum height achieved?

    ::c
    #include <iter.h>

    BEGIN {
      loop (from_step   (vel, 5, -1),
            from_step   (pos, 0, vel),
            while       (pos >= 0),
            maximizing  (maxpos, pos))
      {
        print pos
      }
      print "maxpos =", maxpos
    }

The output is

    ::txt
    0
    4
    7
    9
    10
    10
    9
    7
    4
    0
    maxpos = 10

This example is taken from the
[`testcases-iter`](../tree/testcases-iter) file.

By the way, how is it possible that we can implement an iteration language with
expressive, not to mention **user-definable** classes using a preprocessor that
is famous for lacking power?

It turns out that a significant part of the problem with C preprocessing is the
backend languages being targeted. The C and C++ languages rob their
preprocessing frontend of its full power.  C macros in the context of the
"home language" have to contend with syntactic roadblocks, such as identifiers
having to be declared before use.  Because Awk is a flexibly typed language in
which variables don't have to be declared, it creates opportunities for
significantly more freedom in how the C preprocessor can be applied, and the
resulting macros can be much more clutter-free and ergonomic compared to if
similar techniques were attempted on the "home turf". It's as if the C
preprocessor were tailor-made for Awk.

## Roadmap

`cppawk` has been carefully developed, and has a regression test suite.
Nearly every feature and fix was developed by first writing one or more
failing tests and getting them to pass. The script is stable and nearly
feature-complete, since it is out of the project scope to modify Awk
or the C preprocessor. The remaining work is likely solving portability
issues, like using different implementations of the C preprocessor.

Among future directions for `cppawk` is the development of a small
library of useful standard headers. The foundation has been laid for
this because when `#include <...>` is used (angle bracket include), it looks in
a subdirectory called `cppawk-include` which is in the same directory as
itself. For instance if `cppawk` is `/usr/bin/cppawk`, it looks in
`/usr/bin/cppawk-include`.

There are currently

*   [`<case.h>`](../tree/cppawk-case.1): provides a portable
    `case` statement macro which
    efficiently translates to a GNU Awk `switch` statement or else to less
    efficient but portable code. Additionally, the `case` statement requires
    clauses to be explicit about whether they fall through or break, which
    makes it safer to use.

*   [`<narg.h>`](../tree/cppawk-narg.1): provides useful primitives for easily
    writing variadic macros.

*   [`<iter.h>`](../tree/cppawk-iter.1): provides powerful iteration
    constructs, including a `loop`
    macro that features the ability for the application to define
    new iteration clauses, in addition to the numerous useful ones that
    come with `loop`.

*   [`<cons.h>`](../tree/cppawk-cons.1): provides Lisp-like functional, heterogeneous list manipulation,
    higher order functions, some useful control operators, and functions
    combining Lisp lists and Awk arrays such as `group_by`.

*   [`<fun.h>`](../tree/cppawk-fun.1): three macros for indirect functions,
    with a simple partial application mechanism for binding the leftmost
    argument.  This requires GNU Awk 4.0 or higher, which features indirect
    function calls. Note: there are bugs in GNU Awk's indirect function
    calls feature that are present right through 5.1.1.

*   [`<varg.h>`](../tree/cppawk-varg.1): utilities for working with
    variadic functions in Awk, as well as with optional arguments.

*   [`<field.h>`](../tree/cppawk-field.1): utilities for manipulating
    the Awk positional parameters ("fields").

*   [`<quote.h>`](../tree/cppawk-quote.1): provides the `q` function for
    quoting text for safe insertion into shells cripts.

Several unreleased headers are in the development queue:

*   `<array.h>`: some associative array utilities.

*   `<alist.h>`: Lisp-like assoc lists: addendum to `<cons.h>`.

*   Certain utilities in the private header `<base.h>` should be made public.

## License

`cppawk` is offered under the two-clause BSD license. See the copyright
header in the source files and the LICENSE file in the source tree.

## Why?

*   Why not?

*   You know Awk. You know C preprocessing inside out. Now use two things
    that you know, together, in obvious ways.

*   You can organize an Awk program into a tree of files that
    the preprocessor "compiles" into a single "executable".

*   You can use macros for C-style metaprogramming, and for conditional
    selection of code.

*   Powerful library: list manipulation, iteration, variadic functions.

*   Other minor benefits: Awk has no comments other than from a `#`
    character to the end of the line. You get `/* ... */` comments
    with `cppawk`, and also `#if 0` ... `#endif` for temporarily
    disabling code.

*   Some techniques from the `cppawk` header files would be useful in C and
    C++. Everything is BSD-licensed; you are welcome to use it as you please,
    whole or just bits and pieces.

## But GNU Awk has `@include`?

*   GNU Awk's `@include` isn't a full preprocessor. There are no conditional
    expressions, and no macros.

*   It is only implemented in GNU Awk.

*   It provides no way to capture all the included output.

*   The way `@include` searches for files is inferior to `cpp`;
    it doesn't look in the same directory as the parent file which contains the
    `@include` syntax. It reacts to an `AWKPATH` environment variable which has
    no provision for referencing relative to the location of the parent file.

*   `@include` requires, syntactically, a string-literal–like specification
    of the path name to be included. An expression is not allowed. For
    instance, a GNU Awk program cannot do this:

        ::awk
        self = calculate_own_path_somehow();
        @include self "lib/util"  # error

    By contrast, a `cppawk` program just does this:

        ::c
        #include "lib/util"  // no problem

    The C preprocessor allows macro-replacement to take place in `#include`:

        ::c
        #include FOO_LIB   // conditionally-defined macro to select lib