aboutsummaryrefslogtreecommitdiffstats

What is cppawk?

cppawk is a tiny shell script that is used like awk. It invokes the C preprocessor (GNU cpp) on the Awk code and calls Awk on the result.

cppawk understands the basic Awk options like -F and -v, and also understands common cpp options like -I and -Dmacro=value. The cppawk man page describes all the invocation and usage details.

For instance, if we define a file called awkloop.h which has these contents:

#define awkloop(file)  for (; getline < file > 0 || (close(file) && 0); )
#define nextrec        continue
#define rule(cond)     if (cond)

Then this sort of code is possible:

#include "awkloop.h"

function main()
{
  awkloop ("/proc/mounts") {
    rule ($3 != "ext4") { nextrec }
    rule ($2 == "/") { print $1 }
  }
}

BEGIN {
  main()
}

We have implemented a facsimile of an Awk input-scanning loop inside a function with a bit of syntactic sugar. However, these few preprocessing directives are just a toy example, compared to what is provided in the cppawk standard headers.

cppawk has few dependencies. It's written in shell, and makes use of the sed and printf utilities. Preprocessed programs can be captured and transferred for execution to systems that have Awk but do not have a preprocessor.

The cppawk Library

cppawk is sprouting a small library of useful macros and functions. One of them is a powerful loop facility that allows iteration (both parallel and nested) to be expressed by combining abstract clauses.

Here is a program designed to demonstrate the cppawk loop macro, with its multiple clauses. It solves the following problem: a projectile is fired vertically with an initial speed of 5. Every step of the simulation, the speed drops by 1 due to gravity, eventually becoming negative. What is the maximum height achieved?

#include <iter.h>

BEGIN {
  loop (from_step   (vel, 5, -1),
        from_step   (pos, 0, vel),
        while       (pos >= 0),
        maximizing  (maxpos, pos))
  {
    print pos
  }
  print "maxpos =", maxpos
}

The output is

0
4
7
9
10
10
9
7
4
0
maxpos = 10

This example is taken from the testcases-iter file.

By the way, how is it possible that we can implement an iteration language with expressive, not to mention user-definable classes using a preprocessor that is famous for lacking power?

It turns out that a significant part of the problem with C preprocessing is the backend languages being targeted. The C and C++ languages rob their preprocessing frontend of its full power. C macros in the context of the "home language" have to contend with syntactic roadblocks, such as identifiers having to be declared before use. Because Awk is a flexibly typed language in which variables don't have to be declared, it creates opportunities for significantly more freedom in how the C preprocessor can be applied, and the resulting macros can be much more clutter-free and ergonomic compared to if similar techniques were attempted on the "home turf". It's as if the C preprocessor were tailor-made for Awk.

Roadmap

cppawk has been carefully developed, and has a regression test suite. Nearly every feature and fix was developed by first writing one or more failing tests and getting them to pass. The script is stable and nearly feature-complete, since it is out of the project scope to modify Awk or the C preprocessor. The remaining work is likely solving portability issues, like using different implementations of the C preprocessor.

Among future directions for cppawk is the development of a small library of useful standard headers. The foundation has been laid for this because when #include <...> is used (angle bracket include), it looks in a subdirectory called cppawk-include which is in the same directory as itself. For instance if cppawk is /usr/bin/cppawk, it looks in /usr/bin/cppawk-include.

There are currently

  • <case.h>: provides a portable case statement macro which efficiently translates to a GNU Awk switch statement or else to less efficient but portable code. Additionally, the case statement requires clauses to be explicit about whether they fall through or break, which makes it safer to use.

  • <narg.h>: provides useful primitives for easily writing variadic macros.

  • <iter.h>: provides powerful iteration constructs, including a loop macro that features the ability for the application to define new iteration clauses, in addition to the numerous useful ones that come with loop.

  • <cons.h>: provides Lisp-like functional, heterogeneous list manipulation, higher order functions, some useful control operators, and functions combining Lisp lists and Awk arrays such as group_by.

  • <fun.h>: three macros for indirect functions, with a simple partial application mechanism for binding the leftmost argument. This requires GNU Awk 4.0 or higher, which features indirect function calls. Note: there are bugs in GNU Awk's indirect function calls feature that are present right through 5.1.1.

  • <varg.h>: utilities for working with variadic functions in Awk, as well as with optional arguments.

  • <field.h>: utilities for manipulating the Awk positional parameters ("fields").

  • <quote.h>: provides the q function for quoting text for safe insertion into shells cripts.

Several unreleased headers are in the development queue:

  • <array.h>: some associative array utilities.

  • <alist.h>: Lisp-like assoc lists: addendum to <cons.h>.

  • Certain utilities in the private header <base.h> should be made public.

License

cppawk is offered under the two-clause BSD license. See the copyright header in the source files and the LICENSE file in the source tree.

Why?

  • Why not?

  • You know Awk. You know C preprocessing inside out. Now use two things that you know, together, in obvious ways.

  • You can organize an Awk program into a tree of files that the preprocessor "compiles" into a single "executable".

  • You can use macros for C-style metaprogramming, and for conditional selection of code.

  • Powerful library: list manipulation, iteration, variadic functions.

  • Other minor benefits: Awk has no comments other than from a # character to the end of the line. You get /* ... */ comments with cppawk, and also #if 0 ... #endif for temporarily disabling code.

  • Some techniques from the cppawk header files would be useful in C and C++. Everything is BSD-licensed; you are welcome to use it as you please, whole or just bits and pieces.

But GNU Awk has @include?

  • GNU Awk's @include isn't a full preprocessor. There are no conditional expressions, and no macros.

  • It is only implemented in GNU Awk.

  • It provides no way to capture all the included output.

  • The way @include searches for files is inferior to cpp; it doesn't look in the same directory as the parent file which contains the @include syntax. It reacts to an AWKPATH environment variable which has no provision for referencing relative to the location of the parent file.

  • @include requires, syntactically, a string-literal–like specification of the path name to be included. An expression is not allowed. For instance, a GNU Awk program cannot do this:

    self = calculate_own_path_somehow();
    @include self "lib/util"  # error
    

    By contrast, a cppawk program just does this:

    #include "lib/util"  // no problem
    

    The C preprocessor allows macro-replacement to take place in #include:

    #include FOO_LIB   // conditionally-defined macro to select lib