What is cppawk
?
cppawk
is a tiny shell script that is used like awk
. It invokes
the C preprocessor (GNU cpp
) on the Awk code and calls Awk on the result.
cppawk
understands the basic Awk options like -F
and -v
, and also
understands common cpp
options like -I
and -Dmacro=value
.
The cppawk
man page describes all the invocation and
usage details.
For instance, if we define a file called awkloop.h
which has these contents:
#define awkloop(file) for (; getline < file > 0 || (close(file) && 0); )
#define nextrec continue
#define rule(cond) if (cond)
Then this sort of code is possible:
#include "awkloop.h"
function main()
{
awkloop ("/proc/mounts") {
rule ($3 != "ext4") { nextrec }
rule ($2 == "/") { print $1 }
}
}
BEGIN {
main()
}
We have implemented a facsimile of an Awk input-scanning loop inside a function
with a bit of syntactic sugar. However, these few preprocessing directives are
just a toy example, compared to what is provided in the cppawk
standard
headers.
cppawk
has few dependencies. It's written in shell, and makes use of the
sed
and printf
utilities. Preprocessed programs can be captured and
transferred for execution to systems that have Awk but do not have a
preprocessor.
The cppawk
Library
cppawk
is sprouting a small library of useful macros and functions.
One of them is a powerful loop
facility that allows iteration (both
parallel and nested) to be expressed by combining abstract clauses.
Here is a program designed to demonstrate the cppawk
loop
macro, with its multiple clauses. It solves the following problem: a
projectile is fired vertically with an initial speed of 5. Every step of the
simulation, the speed drops by 1 due to gravity, eventually becoming negative.
What is the maximum height achieved?
#include <iter.h>
BEGIN {
loop (from_step (vel, 5, -1),
from_step (pos, 0, vel),
while (pos >= 0),
maximizing (maxpos, pos))
{
print pos
}
print "maxpos =", maxpos
}
The output is
0
4
7
9
10
10
9
7
4
0
maxpos = 10
This example is taken from the
testcases-iter
file.
By the way, how is it possible that we can implement an iteration language with expressive, not to mention user-definable classes using a preprocessor that is famous for lacking power?
It turns out that a significant part of the problem with C preprocessing is the backend languages being targeted. The C and C++ languages rob their preprocessing frontend of its full power. C macros in the context of the "home language" have to contend with syntactic roadblocks, such as identifiers having to be declared before use. Because Awk is a flexibly typed language in which variables don't have to be declared, it creates opportunities for significantly more freedom in how the C preprocessor can be applied, and the resulting macros can be much more clutter-free and ergonomic compared to if similar techniques were attempted on the "home turf". It's as if the C preprocessor were tailor-made for Awk.
Roadmap
cppawk
has been carefully developed, and has a regression test suite.
Nearly every feature and fix was developed by first writing one or more
failing tests and getting them to pass. The script is stable and nearly
feature-complete, since it is out of the project scope to modify Awk
or the C preprocessor. The remaining work is likely solving portability
issues, like using different implementations of the C preprocessor.
Among future directions for cppawk
is the development of a small
library of useful standard headers. The foundation has been laid for
this because when #include <...>
is used (angle bracket include), it looks in
a subdirectory called cppawk-include
which is in the same directory as
itself. For instance if cppawk
is /usr/bin/cppawk
, it looks in
/usr/bin/cppawk-include
.
There are currently
-
<case.h>
: provides a portablecase
statement macro which efficiently translates to a GNU Awkswitch
statement or else to less efficient but portable code. Additionally, thecase
statement requires clauses to be explicit about whether they fall through or break, which makes it safer to use. -
<narg.h>
: provides useful primitives for easily writing variadic macros. -
<iter.h>
: provides powerful iteration constructs, including aloop
macro that features the ability for the application to define new iteration clauses, in addition to the numerous useful ones that come withloop
. -
<cons.h>
: provides Lisp-like functional, heterogeneous list manipulation, higher order functions, some useful control operators, and functions combining Lisp lists and Awk arrays such asgroup_by
. -
<fun.h>
: three macros for indirect functions, with a simple partial application mechanism for binding the leftmost argument. This requires GNU Awk 4.0 or higher, which features indirect function calls. Note: there are bugs in GNU Awk's indirect function calls feature that are present right through 5.1.1. -
<varg.h>
: utilities for working with variadic functions in Awk, as well as with optional arguments. -
<field.h>
: utilities for manipulating the Awk positional parameters ("fields"). -
<quote.h>
: provides theq
function for quoting text for safe insertion into shells cripts.
Several unreleased headers are in the development queue:
-
<array.h>
: some associative array utilities. -
<alist.h>
: Lisp-like assoc lists: addendum to<cons.h>
. -
Certain utilities in the private header
<base.h>
should be made public.
License
cppawk
is offered under the two-clause BSD license. See the copyright
header in the source files and the LICENSE file in the source tree.
Why?
-
Why not?
-
You know Awk. You know C preprocessing inside out. Now use two things that you know, together, in obvious ways.
-
You can organize an Awk program into a tree of files that the preprocessor "compiles" into a single "executable".
-
You can use macros for C-style metaprogramming, and for conditional selection of code.
-
Powerful library: list manipulation, iteration, variadic functions.
-
Other minor benefits: Awk has no comments other than from a
#
character to the end of the line. You get/* ... */
comments withcppawk
, and also#if 0
...#endif
for temporarily disabling code. -
Some techniques from the
cppawk
header files would be useful in C and C++. Everything is BSD-licensed; you are welcome to use it as you please, whole or just bits and pieces.
But GNU Awk has @include
?
-
GNU Awk's
@include
isn't a full preprocessor. There are no conditional expressions, and no macros. -
It is only implemented in GNU Awk.
-
It provides no way to capture all the included output.
-
The way
@include
searches for files is inferior tocpp
; it doesn't look in the same directory as the parent file which contains the@include
syntax. It reacts to anAWKPATH
environment variable which has no provision for referencing relative to the location of the parent file. -
@include
requires, syntactically, a string-literal–like specification of the path name to be included. An expression is not allowed. For instance, a GNU Awk program cannot do this:self = calculate_own_path_somehow(); @include self "lib/util" # error
By contrast, a
cppawk
program just does this:#include "lib/util" // no problem
The C preprocessor allows macro-replacement to take place in
#include
:#include FOO_LIB // conditionally-defined macro to select lib