diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 1599 |
1 files changed, 717 insertions, 882 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 40b85aae..3c5fa0ba 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -295,12 +295,14 @@ particular records in a file and perform operations upon them. * Sample Programs:: Many @command{awk} programs with complete explanations. * Debugger:: The @code{gawk} debugger. +* Dynamic Extensions:: Adding new built-in functions to + @command{gawk}. * Language History:: The evolution of the @command{awk} language. * Installation:: Installing @command{gawk} under various operating systems. -* Notes:: Notes about @command{gawk} extensions and - possible future work. +* Notes:: Notes about adding things to @command{gawk} + and possible future work. * Basic Concepts:: A very quick introduction to programming concepts. * Glossary:: An explanation of some unfamiliar terms. @@ -565,21 +567,21 @@ particular records in a file and perform operations upon them. Numbers. * POSIX Floating Point Problems:: Standards Versus Existing Practice. * Integer Programming:: Effective integer programming. -* Floating-point Programming:: Effective floating-point programming. +* Floating-point Programming:: Effective Floating-point Programming. * Floating-point Representation:: Binary floating-point representation. * Floating-point Context:: Floating-point context. * Rounding Mode:: Floating-point rounding mode. * Gawk and MPFR:: How @command{gawk} provides aribitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary precision floating-point - arithmetic with @command{gawk}. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. * Setting Precision:: Setting the working precision. * Setting Rounding Mode:: Setting the rounding mode. * Floating-point Constants:: Representing floating-point constants. * Changing Precision:: Changing the precision of a number. * Exact Arithmetic:: Exact arithmetic with floating-point numbers. -* Arbitrary Precision Integers:: Arbitrary precision integer arithmetic with +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with @command{gawk}. * Nondecimal Data:: Allowing nondecimal input data. * Array Sorting:: Facilities for controlling array traversal @@ -605,7 +607,7 @@ particular records in a file and perform operations upon them. * Ordinal Functions:: Functions for using characters as numbers and vice versa. * Join Function:: A function to join an array into a string. -* Gettimeofday Function:: A function to get formatted times. +* Getlocaltime Function:: A function to get formatted times. * Data File Management:: Functions for managing command-line data files. * Filetrans Function:: A function for handling data file @@ -662,6 +664,11 @@ particular records in a file and perform operations upon them. * Miscellaneous Debugger Commands:: Miscellaneous Commands. * Readline Support:: Readline support. * Limitations:: Limitations and future plans. +* Plugin License:: A note about licensing. +* Sample Library:: A example of new functions. +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. * V7/SVR3.1:: The major changes between V7 and System V Release 3.1. * SVR4:: Minor changes between System V Releases 3.1 @@ -712,16 +719,8 @@ particular records in a file and perform operations upon them. @command{gawk}. * New Ports:: Porting @command{gawk} to a new operating system. -* Dynamic Extensions:: Adding new built-in functions to - @command{gawk}. -* Internals:: A brief look at some @command{gawk} - internals. -* Plugin License:: A note about licensing. -* Loading Extensions:: How to load dynamic extensions. -* Sample Library:: A example of new functions. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. +* Derived Files:: Why derived files are kept in the + @command{git} repository. * Future Extensions:: New features that may be implemented one day. * Basic High Level:: The high level view. @@ -1209,8 +1208,7 @@ available @command{awk} implementations. @ref{Notes}, describes how to disable @command{gawk}'s extensions, as well as how to contribute new code to @command{gawk}, -how to write extension libraries, and some possible -future directions for @command{gawk} development. +and some possible future directions for @command{gawk} development. @ref{Basic Concepts}, provides some very cursory background material for those who @@ -3025,6 +3023,22 @@ This option may be given multiple times; the @command{awk} program consists of the concatenation the contents of each specified @var{source-file}. +@item -i @var{source-file} +@itemx --include @var{source-file} +@cindex @code{-i} option +@cindex @code{--include} option +@cindex @command{awk} programs, location of +Read @command{awk} source library from @var{source-file}. This option is +completely equivalent to using the @samp{@@include} directive inside +your program. This option is very +similar to the @option{-f} option, but there are two important differences. +First, when @option{-i} is used, the program source will not be loaded if it has +been previously loaded, whereas the @option{-f} will always load the file. +Second, because this option is intended to be used with code libraries, the +@command{awk} command does not recognize such files as constituting main program +input. Thus, after processing an @option{-i} argument, we still expect to +find the main source code via the @option{-f} option or on the command-line. + @item -v @var{var}=@var{val} @itemx --assign @var{var}=@var{val} @cindex @code{-v} option @@ -3222,7 +3236,7 @@ that @command{gawk} accepts and then exit. Load a shared library @var{lib}. This searches for the library using the @env{AWKLIBPATH} environment variable. The correct library suffix for your platform will be supplied by default, so it need not be specified in the library name. -The library initialization routine should be named @code{dlload()}. +The library initialization routine should be named @code{dl_load()}. An alternative is to use the @samp{@@load} keyword inside the program to load a shared library. @@ -3622,7 +3636,8 @@ on the command-line with the @option{-f} option. In most @command{awk} implementations, you must supply a precise path name for each program file, unless the file is in the current directory. -But in @command{gawk}, if the @value{FN} supplied to the @option{-f} option +But in @command{gawk}, if the @value{FN} supplied to the @option{-f} +or @option{-i} options does not contain a @samp{/}, then @command{gawk} searches a list of directories (called the @dfn{search path}), one by one, looking for a file with the specified name. @@ -3644,13 +3659,16 @@ standard directory in the default path and then specified on the command line with a short @value{FN}. Otherwise, the full @value{FN} would have to be typed for each file. -By using both the @option{--source} and @option{-f} options, your command-line +By using the @option{-i} option, or the @option{--source} and @option{-f} options, your command-line @command{awk} programs can use facilities in @command{awk} library files (@pxref{Library Functions}). Path searching is not done if @command{gawk} is in compatibility mode. This is true for both @option{--traditional} and @option{--posix}. @xref{Options}. +If the source code is not found after the initial search, the path is searched +again after adding the default @samp{.awk} suffix to the filename. + @quotation NOTE To include the current directory in the path, either place @@ -3797,7 +3815,8 @@ code from various @command{awk} scripts. In other words, you can group together @command{awk} functions, used to carry out specific tasks, into external files. These files can be used just like function libraries, using the @samp{@@include} keyword in conjunction with the @env{AWKPATH} -environment variable. +environment variable. Note that source files may also be included +using the @option{-i} option. Let's see an example. We'll start with two (trivial) @command{awk} scripts, namely @@ -7304,6 +7323,7 @@ can cause @code{FILENAME} to be updated if they cause summarizes the eight variants of @code{getline}, listing which built-in variables are set by each one, and whether the variant is standard or a @command{gawk} extension. +Note: for each variant, @command{gawk} sets the @code{RT} built-in variable. @float Table,table-getline-variants @caption{getline Variants and What They Set} @@ -11545,9 +11565,9 @@ fatal error. @item If you have written extensions that modify the record handling (by inserting -an ``open hook''), you can invoke them at this point, before @command{gawk} +an ``input parser''), you can invoke them at this point, before @command{gawk} has started processing the file. (This is a @emph{very} advanced feature, -currently used only by the @uref{http://xmlgawk.sourceforge.net, XMLgawk project}.) +currently used only by the @uref{http://gawkextlib.sourceforge.net, @code{gawkextlib} project}.) @end itemize The @code{ENDFILE} rule is called when @command{gawk} has finished processing @@ -16454,8 +16474,8 @@ bitwise operations just described. They are: @cindex @command{gawk}, bitwise operations in @table @code @cindex @code{and()} function (@command{gawk}) -@item and(@var{v1}, @var{v2}) -Return the bitwise AND of the values provided by @var{v1} and @var{v2}. +@item and(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) +Return the bitwise AND of the arguments. There must be at least two. @cindex @code{compl()} function (@command{gawk}) @item compl(@var{val}) @@ -16466,16 +16486,16 @@ Return the bitwise complement of @var{val}. Return the value of @var{val}, shifted left by @var{count} bits. @cindex @code{or()} function (@command{gawk}) -@item or(@var{v1}, @var{v2}) -Return the bitwise OR of the values provided by @var{v1} and @var{v2}. +@item or(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) +Return the bitwise OR of the arguments. There must be at least two. @cindex @code{rshift()} function (@command{gawk}) @item rshift(@var{val}, @var{count}) Return the value of @var{val}, shifted right by @var{count} bits. @cindex @code{xor()} function (@command{gawk}) -@item xor(@var{v1}, @var{v2}) -Return the bitwise XOR of the values provided by @var{v1} and @var{v2}. +@item xor(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) +Return the bitwise XOR of the arguments. There must be at least two. @end table For all of these functions, first the double precision floating-point value is @@ -18468,12 +18488,12 @@ arithmetic}, a feature which is specific to @command{gawk}. @menu * General Arithmetic:: An introduction to computer arithmetic. -* Floating-point Programming:: Effective floating-point programming. +* Floating-point Programming:: Effective Floating-point Programming. * Gawk and MPFR:: How @command{gawk} provides aribitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary precision floating-point arithmetic +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic with @command{gawk}. -* Arbitrary Precision Integers:: Arbitrary precision integer arithmetic with +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with @command{gawk}. @end menu @@ -20942,7 +20962,7 @@ programming use. * Ordinal Functions:: Functions for using characters as numbers and vice versa. * Join Function:: A function to join an array into a string. -* Gettimeofday Function:: A function to get formatted times. +* Getlocaltime Function:: A function to get formatted times. @end menu @node Strtonum Function @@ -21467,7 +21487,7 @@ be nice if @command{awk} had an assignment operator for concatenation. The lack of an explicit operator for concatenation makes string operations more difficult than they really need to be.} -@node Gettimeofday Function +@node Getlocaltime Function @subsection Managing the Time of Day @cindex libraries of @command{awk} functions, managing, time @@ -21481,14 +21501,14 @@ in human readable form. While @code{strftime()} is extensive, the control formats are not necessarily easy to remember or intuitively obvious when reading a program. -The following function, @code{gettimeofday()}, populates a user-supplied array +The following function, @code{getlocaltime()}, populates a user-supplied array with preformatted time information. It returns a string with the current time formatted in the same way as the @command{date} utility: -@cindex @code{gettimeofday()} user-defined function +@cindex @code{getlocaltime()} user-defined function @example @c file eg/lib/gettime.awk -# gettimeofday.awk --- get the time of day in a usable format +# getlocaltime.awk --- get the time of day in a usable format @c endfile @ignore @c file eg/lib/gettime.awk @@ -21521,7 +21541,7 @@ time formatted in the same way as the @command{date} utility: # time["weeknum"] -- week number, Sunday first day # time["altweeknum"] -- week number, Monday first day -function gettimeofday(time, ret, now, i) +function getlocaltime(time, ret, now, i) @{ # get time once, avoids unnecessary system calls now = systime() @@ -21563,7 +21583,7 @@ The string indices are easier to use and read than the various formats required by @code{strftime()}. The @code{alarm} program presented in @ref{Alarm Program}, uses this function. -A more general design for the @code{gettimeofday()} function would have +A more general design for the @code{getlocaltime()} function would have allowed the user to supply an optional timestamp value to use instead of the current time. @@ -24855,8 +24875,8 @@ it prints the message on the standard output. In addition, you can give it the number of times to repeat the message as well as a delay between repetitions. -This program uses the @code{gettimeofday()} function from -@ref{Gettimeofday Function}. +This program uses the @code{getlocaltime()} function from +@ref{Getlocaltime Function}. All the work is done in the @code{BEGIN} rule. The first part is argument checking and setting of defaults: the delay, the count, and the message to @@ -24875,7 +24895,7 @@ Here is the program: @c file eg/prog/alarm.awk # alarm.awk --- set an alarm # -# Requires gettimeofday() library function +# Requires getlocaltime() library function @c endfile @ignore @c file eg/prog/alarm.awk @@ -24947,7 +24967,7 @@ is how long to wait before setting off the alarm: minute = atime[2] + 0 # force numeric # get current broken down time - gettimeofday(now) + getlocaltime(now) # if time given is 12-hour hours and it's after that # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m., @@ -27863,6 +27883,471 @@ The @command{gawk} debugger only accepts source supplied with the @option{-f} op Look forward to a future release when these and other missing features may be added, and of course feel free to try to add them yourself! +@node Dynamic Extensions +@chapter Writing Extensions for @command{gawk} + +This chapter is a placeholder, pending a rewrite for the new API. +Some of the old bits remain, since they can be partially reused. + + +@c STARTOFRANGE gladfgaw +@cindex @command{gawk}, functions, adding +@c STARTOFRANGE adfugaw +@cindex adding, functions to @command{gawk} +@c STARTOFRANGE fubadgaw +@cindex functions, built-in, adding to @command{gawk} +It is possible to add new built-in +functions to @command{gawk} using dynamically loaded libraries. This +facility is available on systems (such as GNU/Linux) that support +the C @code{dlopen()} and @code{dlsym()} functions. +This @value{CHAPTER} describes how to write and use dynamically +loaded extensions for @command{gawk}. +Experience with programming in +C or C++ is necessary when reading this @value{SECTION}. + +@quotation NOTE +When @option{--sandbox} is specified, extensions are disabled +(@pxref{Options}. +@end quotation + +@menu +* Plugin License:: A note about licensing. +* Sample Library:: A example of new functions. +@end menu + +@node Plugin License +@section Extension Licensing + +Every dynamic extension should define the global symbol +@code{plugin_is_GPL_compatible} to assert that it has been licensed under +a GPL-compatible license. If this symbol does not exist, @command{gawk} +will emit a fatal error and exit. + +The declared type of the symbol should be @code{int}. It does not need +to be in any allocated section, though. The code merely asserts that +the symbol exists in the global scope. Something like this is enough: + +@example +int plugin_is_GPL_compatible; +@end example + +@node Sample Library +@section Example: Directory and File Operation Built-ins +@c STARTOFRANGE chdirg +@cindex @code{chdir()} function@comma{} implementing in @command{gawk} +@c STARTOFRANGE statg +@cindex @code{stat()} function@comma{} implementing in @command{gawk} +@c STARTOFRANGE filre +@cindex files, information about@comma{} retrieving +@c STARTOFRANGE dirch +@cindex directories, changing + +Two useful functions that are not in @command{awk} are @code{chdir()} +(so that an @command{awk} program can change its directory) and +@code{stat()} (so that an @command{awk} program can gather information about +a file). +This @value{SECTION} implements these functions for @command{gawk} in an +external extension library. + +@menu +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. +@end menu + +@node Internal File Description +@subsection Using @code{chdir()} and @code{stat()} + +This @value{SECTION} shows how to use the new functions at the @command{awk} +level once they've been integrated into the running @command{gawk} +interpreter. +Using @code{chdir()} is very straightforward. It takes one argument, +the new directory to change to: + +@example +@dots{} +newdir = "/home/arnold/funstuff" +ret = chdir(newdir) +if (ret < 0) @{ + printf("could not change to %s: %s\n", + newdir, ERRNO) > "/dev/stderr" + exit 1 +@} +@dots{} +@end example + +The return value is negative if the @code{chdir} failed, +and @code{ERRNO} +(@pxref{Built-in Variables}) +is set to a string indicating the error. + +Using @code{stat()} is a bit more complicated. +The C @code{stat()} function fills in a structure that has a fair +amount of information. +The right way to model this in @command{awk} is to fill in an associative +array with the appropriate information: + +@c broke printf for page breaking +@example +file = "/home/arnold/.profile" +fdata[1] = "x" # force `fdata' to be an array +ret = stat(file, fdata) +if (ret < 0) @{ + printf("could not stat %s: %s\n", + file, ERRNO) > "/dev/stderr" + exit 1 +@} +printf("size of %s is %d bytes\n", file, fdata["size"]) +@end example + +The @code{stat()} function always clears the data array, even if +the @code{stat()} fails. It fills in the following elements: + +@table @code +@item "name" +The name of the file that was @code{stat()}'ed. + +@item "dev" +@itemx "ino" +The file's device and inode numbers, respectively. + +@item "mode" +The file's mode, as a numeric value. This includes both the file's +type and its permissions. + +@item "nlink" +The number of hard links (directory entries) the file has. + +@item "uid" +@itemx "gid" +The numeric user and group ID numbers of the file's owner. + +@item "size" +The size in bytes of the file. + +@item "blocks" +The number of disk blocks the file actually occupies. This may not +be a function of the file's size if the file has holes. + +@item "atime" +@itemx "mtime" +@itemx "ctime" +The file's last access, modification, and inode update times, +respectively. These are numeric timestamps, suitable for formatting +with @code{strftime()} +(@pxref{Built-in}). + +@item "pmode" +The file's ``printable mode.'' This is a string representation of +the file's type and permissions, such as what is produced by +@samp{ls -l}---for example, @code{"drwxr-xr-x"}. + +@item "type" +A printable string representation of the file's type. The value +is one of the following: + +@table @code +@item "blockdev" +@itemx "chardev" +The file is a block or character device (``special file''). + +@ignore +@item "door" +The file is a Solaris ``door'' (special file used for +interprocess communications). +@end ignore + +@item "directory" +The file is a directory. + +@item "fifo" +The file is a named-pipe (also known as a FIFO). + +@item "file" +The file is just a regular file. + +@item "socket" +The file is an @code{AF_UNIX} (``Unix domain'') socket in the +filesystem. + +@item "symlink" +The file is a symbolic link. +@end table +@end table + +Several additional elements may be present depending upon the operating +system and the type of the file. You can test for them in your @command{awk} +program by using the @code{in} operator +(@pxref{Reference to Elements}): + +@table @code +@item "blksize" +The preferred block size for I/O to the file. This field is not +present on all POSIX-like systems in the C @code{stat} structure. + +@item "linkval" +If the file is a symbolic link, this element is the name of the +file the link points to (i.e., the value of the link). + +@item "rdev" +@itemx "major" +@itemx "minor" +If the file is a block or character device file, then these values +represent the numeric device number and the major and minor components +of that number, respectively. +@end table + +@node Internal File Ops +@subsection C Code for @code{chdir()} and @code{stat()} + +Here is the C code for these extensions. They were written for +GNU/Linux. The code needs some more work for complete portability +to other POSIX-compliant systems:@footnote{This version is edited +slightly for presentation. See +@file{extension/filefuncs.c} in the @command{gawk} distribution +for the complete version.} + +@c break line for page breaking +@example +#include "awk.h" + +#include <sys/sysmacros.h> + +int plugin_is_GPL_compatible; + +/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ + +static NODE * +do_chdir(int nargs) +@{ + NODE *newdir; + int ret = -1; + + if (do_lint && nargs != 1) + lintwarn("chdir: called with incorrect number of arguments"); + + newdir = get_scalar_argument(0, FALSE); +@end example + +The file includes the @code{"awk.h"} header file for definitions +for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>} +for access to the @code{major()} and @code{minor}() macros. + +@cindex programming conventions, @command{gawk} internals +By convention, for an @command{awk} function @code{foo}, the function that +implements it is called @samp{do_foo}. The function should take +a @samp{int} argument, usually called @code{nargs}, that +represents the number of defined arguments for the function. The @code{newdir} +variable represents the new directory to change to, retrieved +with @code{get_scalar_argument()}. Note that the first argument is +numbered zero. + +This code actually accomplishes the @code{chdir()}. It first forces +the argument to be a string and passes the string value to the +@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} +is updated. + +@example + (void) force_string(newdir); + ret = chdir(newdir->stptr); + if (ret < 0) + update_ERRNO_int(errno); +@end example + +Finally, the function returns the return value to the @command{awk} level: + +@example + return make_number((AWKNUM) ret); +@} +@end example + +The @code{stat()} built-in is more involved. First comes a function +that turns a numeric mode into a printable representation +(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity: + +@c break line for page breaking +@example +/* format_mode --- turn a stat mode field into something readable */ + +static char * +format_mode(unsigned long fmode) +@{ + @dots{} +@} +@end example + +Next comes the @code{do_stat()} function. It starts with +variable declarations and argument checking: + +@ignore +Changed message for page breaking. Used to be: + "stat: called with incorrect number of arguments (%d), should be 2", +@end ignore +@example +/* do_stat --- provide a stat() function for gawk */ + +static NODE * +do_stat(int nargs) +@{ + NODE *file, *array, *tmp; + struct stat sbuf; + int ret; + NODE **aptr; + char *pmode; /* printable mode */ + char *type = "unknown"; + + if (do_lint && nargs > 2) + lintwarn("stat: called with too many arguments"); +@end example + +Then comes the actual work. First, the function gets the arguments. +Then, it always clears the array. +The code use @code{lstat()} (instead of @code{stat()}) +to get the file information, +in case the file is a symbolic link. +If there's an error, it sets @code{ERRNO} and returns: + +@c comment made multiline for page breaking +@example + /* file is first arg, array to hold results is second */ + file = get_scalar_argument(0, FALSE); + array = get_array_argument(1, FALSE); + + /* empty out the array */ + assoc_clear(array); + + /* lstat the file, if error, set ERRNO and return */ + (void) force_string(file); + ret = lstat(file->stptr, & sbuf); + if (ret < 0) @{ + update_ERRNO_int(errno); + return make_number((AWKNUM) ret); + @} +@end example + +Now comes the tedious part: filling in the array. Only a few of the +calls are shown here, since they all follow the same pattern: + +@example + /* fill in the array */ + aptr = assoc_lookup(array, tmp = make_string("name", 4)); + *aptr = dupnode(file); + unref(tmp); + + aptr = assoc_lookup(array, tmp = make_string("mode", 4)); + *aptr = make_number((AWKNUM) sbuf.st_mode); + unref(tmp); + + aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); + pmode = format_mode(sbuf.st_mode); + *aptr = make_string(pmode, strlen(pmode)); + unref(tmp); +@end example + +When done, return the @code{lstat()} return value: + +@example + + return make_number((AWKNUM) ret); +@} +@end example + +@cindex programming conventions, @command{gawk} internals +Finally, it's necessary to provide the ``glue'' that loads the +new function(s) into @command{gawk}. By convention, each library has +a routine named @code{dl_load()} that does the job. The simplest way +is to use the @code{dl_load_func} macro in @code{gawkapi.h}. + +And that's it! As an exercise, consider adding functions to +implement system calls such as @code{chown()}, @code{chmod()}, +and @code{umask()}. + +@node Using Internal File Ops +@subsection Integrating the Extensions + +@cindex @command{gawk}, interpreter@comma{} adding code to +Now that the code is written, it must be possible to add it at +runtime to the running @command{gawk} interpreter. First, the +code must be compiled. Assuming that the functions are in +a file named @file{filefuncs.c}, and @var{idir} is the location +of the @command{gawk} include files, +the following steps create +a GNU/Linux shared library: + +@example +$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c} +$ @kbd{ld -o filefuncs.so -shared filefuncs.o} +@end example + +@cindex @code{extension()} function (@command{gawk}) +Once the library exists, it is loaded by calling the @code{extension()} +built-in function. +This function takes two arguments: the name of the +library to load and the name of a function to call when the library +is first loaded. This function adds the new functions to @command{gawk}. +It returns the value returned by the initialization function +within the shared library: + +@example +# file testff.awk +BEGIN @{ + extension("./filefuncs.so", "dl_load") + + chdir(".") # no-op + + data[1] = 1 # force `data' to be an array + print "Info for testff.awk" + ret = stat("testff.awk", data) + print "ret =", ret + for (i in data) + printf "data[\"%s\"] = %s\n", i, data[i] + print "testff.awk modified:", + strftime("%m %d %y %H:%M:%S", data["mtime"]) + + print "\nInfo for JUNK" + ret = stat("JUNK", data) + print "ret =", ret + for (i in data) + printf "data[\"%s\"] = %s\n", i, data[i] + print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) +@} +@end example + +Here are the results of running the program: + +@example +$ @kbd{gawk -f testff.awk} +@print{} Info for testff.awk +@print{} ret = 0 +@print{} data["size"] = 607 +@print{} data["ino"] = 14945891 +@print{} data["name"] = testff.awk +@print{} data["pmode"] = -rw-rw-r-- +@print{} data["nlink"] = 1 +@print{} data["atime"] = 1293993369 +@print{} data["mtime"] = 1288520752 +@print{} data["mode"] = 33204 +@print{} data["blksize"] = 4096 +@print{} data["dev"] = 2054 +@print{} data["type"] = file +@print{} data["gid"] = 500 +@print{} data["uid"] = 500 +@print{} data["blocks"] = 8 +@print{} data["ctime"] = 1290113572 +@print{} testff.awk modified: 10 31 10 12:25:52 +@print{} +@print{} Info for JUNK +@print{} ret = -1 +@print{} JUNK modified: 01 01 70 02:00:00 +@end example +@c ENDOFRANGE filre +@c ENDOFRANGE dirch +@c ENDOFRANGE statg +@c ENDOFRANGE chdirg +@c ENDOFRANGE gladfgaw +@c ENDOFRANGE adfugaw +@c ENDOFRANGE fubadgaw + @ignore @c Try this @iftex @@ -27917,8 +28402,6 @@ This @value{CHAPTER} briefly describes the evolution of the @command{awk} language, with cross-references to other parts of the @value{DOCUMENT} where you can find more information. -@c FIXME: Try to determine whether it was 3.1 or 3.2 that had new awk. - @menu * V7/SVR3.1:: The major changes between V7 and System V Release 3.1. @@ -28333,6 +28816,7 @@ and @code{xor()} functions for bit manipulation (@pxref{Bitwise Functions}). +@c In 4.1, and(), or() and xor() grew the ability to take > 2 arguments @item The @code{asort()} and @code{asorti()} functions for sorting arrays @@ -28344,11 +28828,6 @@ functions for internationalization (@pxref{Programmer i18n}). @item -The @code{extension()} built-in function and the ability to add -new functions dynamically -(@pxref{Dynamic Extensions}). - -@item The @code{fflush()} function from Brian Kernighan's version of @command{awk} (@pxref{I/O Functions}). @@ -28380,29 +28859,65 @@ the @option{-l} command-line option (@pxref{Options}). @item -The ability to use GNU-style long-named options that start with @option{--} +The +@option{-b}, +@option{-c}, +@option{-C}, +@option{-d}, +@option{-D}, +@option{-e}, +@option{-E}, +@option{-g}, +@option{-h}, +@option{-i}, +@option{-l}, +@option{-L}, +@option{-M}, +@option{-n}, +@option{-N}, +@option{-o}, +@option{-O}, +@option{-p}, +@option{-P}, +@option{-r}, +@option{-S}, +@option{-t}, +and +@option{-V} +short options. Also, the +ability to use GNU-style long-named options that start with @option{--} and the +@option{--assign}, +@option{--bignum}, @option{--characters-as-bytes}, -@option{--compat}, +@option{--copyright}, +@option{--debug}, @option{--dump-variables}, -@option{--exec}, +@option{--execle}, +@option{--field-separator}, +@option{--file}, @option{--gen-pot}, +@option{--help}, +@option{--include}, @option{--lint}, @option{--lint-old}, +@option{--load}, @option{--non-decimal-data}, +@option{--optimize}, @option{--posix}, +@option{--pretty-print}, @option{--profile}, @option{--re-interval}, @option{--sandbox}, @option{--source}, @option{--traditional}, +@option{--use-lc-numeric}, and -@option{--use-lc-numeric} -options +@option{--version} +long options (@pxref{Options}). @end itemize - @c new ports @item @@ -28708,6 +29223,7 @@ the various PC platforms. Christos Zoulas provided the @code{extension()} built-in function for dynamically adding new modules. +(This was removed at @command{gawk} 4.1.) @item @cindex Kahrs, J@"urgen @@ -30136,8 +30652,6 @@ maintainers of @command{gawk}. Everything in it applies specifically to * Compatibility Mode:: How to disable certain @command{gawk} extensions. * Additions:: Making Additions To @command{gawk}. -* Dynamic Extensions:: Adding new built-in functions to - @command{gawk}. * Future Extensions:: New features that may be implemented one day. @end menu @@ -30183,6 +30697,8 @@ as well as any considerations you should bear in mind. @command{gawk}. * New Ports:: Porting @command{gawk} to a new operating system. +* Derived Files:: Why derived files are kept in the + @command{git} repository. @end menu @node Accessing The Source @@ -30373,8 +30889,9 @@ You will also have to sign paperwork for your documentation changes. Submit changes as unified diffs. Use @samp{diff -u -r -N} to compare the original @command{gawk} source tree with your version. -I recommend using the GNU version of @command{diff}. -Send the output produced by either run of @command{diff} to me when you +I recommend using the GNU version of @command{diff}, or best of all, +@samp{git diff} or @samp{git format-patch}. +Send the output produced by @command{diff} to me when you submit your changes. (@xref{Bugs}, for the electronic mail information.) @@ -30500,848 +31017,188 @@ operating systems' code that is already there. In the code that you supply and maintain, feel free to use a coding style and brace layout that suits your taste. -@node Dynamic Extensions -@appendixsec Adding New Built-in Functions to @command{gawk} -@cindex Robinson, Will -@cindex robot, the -@cindex Lost In Space -@quotation -@i{Danger Will Robinson! Danger!!@* -Warning! Warning!}@* -The Robot -@end quotation - -@c STARTOFRANGE gladfgaw -@cindex @command{gawk}, functions, adding -@c STARTOFRANGE adfugaw -@cindex adding, functions to @command{gawk} -@c STARTOFRANGE fubadgaw -@cindex functions, built-in, adding to @command{gawk} -It is possible to add new built-in -functions to @command{gawk} using dynamically loaded libraries. This -facility is available on systems (such as GNU/Linux) that support -the C @code{dlopen()} and @code{dlsym()} functions. -This @value{SECTION} describes how to write and use dynamically -loaded extensions for @command{gawk}. -Experience with programming in -C or C++ is necessary when reading this @value{SECTION}. - -@quotation CAUTION -The facilities described in this @value{SECTION} -are very much subject to change in a future @command{gawk} release. -Be aware that you may have to re-do everything, -at some future time. - -If you have written your own dynamic extensions, -be sure to recompile them for each new @command{gawk} release. -There is no guarantee of binary compatibility between different -releases, nor will there ever be such a guarantee. -@end quotation - -@quotation NOTE -When @option{--sandbox} is specified, extensions are disabled -(@pxref{Options}. -@end quotation - -@menu -* Internals:: A brief look at some @command{gawk} internals. -* Plugin License:: A note about licensing. -* Loading Extensions:: How to load dynamic extensions. -* Sample Library:: A example of new functions. -@end menu - -@node Internals -@appendixsubsec A Minimal Introduction to @command{gawk} Internals -@c STARTOFRANGE gawint -@cindex @command{gawk}, internals - -The truth is that @command{gawk} was not designed for simple extensibility. -The facilities for adding functions using shared libraries work, but -are something of a ``bag on the side.'' Thus, this tour is -brief and simplistic; would-be @command{gawk} hackers are encouraged to -spend some time reading the source code before trying to write -extensions based on the material presented here. Of particular note -are the files @file{awk.h}, @file{builtin.c}, and @file{eval.c}. -Reading @file{awkgram.y} in order to see how the parse tree is built -would also be of use. - -@cindex @code{awk.h} file (internal) -With the disclaimers out of the way, the following types, structure -members, functions, and macros are declared in @file{awk.h} and are of -use when writing extensions. The next @value{SECTION} -shows how they are used: - -@table @code -@cindex floating-point, numbers, @code{AWKNUM} internal type -@cindex numbers, floating-point, @code{AWKNUM} internal type -@cindex @code{AWKNUM} internal type -@cindex internal type, @code{AWKNUM} -@item AWKNUM -An @code{AWKNUM} is the internal type of @command{awk} -floating-point numbers. Typically, it is a C @code{double}. - -@cindex @code{NODE} internal type -@cindex internal type, @code{NODE} -@cindex strings, @code{NODE} internal type -@cindex numbers, @code{NODE} internal type -@item NODE -Just about everything is done using objects of type @code{NODE}. -These contain both strings and numbers, as well as variables and arrays. - -@cindex @code{force_number()} internal function -@cindex internal function, @code{force_number()} -@cindex numeric, values -@item AWKNUM force_number(NODE *n) -This macro forces a value to be numeric. It returns the actual -numeric value contained in the node. -It may end up calling an internal @command{gawk} function. - -@cindex @code{force_string()} internal function -@cindex internal function, @code{force_string()} -@item void force_string(NODE *n) -This macro guarantees that a @code{NODE}'s string value is current. -It may end up calling an internal @command{gawk} function. -It also guarantees that the string is zero-terminated. - -@cindex @code{force_wstring()} internal function -@cindex internal function, @code{force_wstring()} -@item void force_wstring(NODE *n) -Similarly, this -macro guarantees that a @code{NODE}'s wide-string value is current. -It may end up calling an internal @command{gawk} function. -It also guarantees that the wide string is zero-terminated. - -@cindex parameters@comma{} number of -@cindex @code{nargs} internal variable -@cindex internal variable, @code{nargs} -@item nargs -Inside an extension function, this is the actual number of -parameters passed to the current function. - -@cindex @code{stptr} internal variable -@cindex internal variable, @code{stptr} -@cindex @code{stlen} internal variable -@cindex internal variable, @code{stlen} -@item n->stptr -@itemx n->stlen -The data and length of a @code{NODE}'s string value, respectively. -The string is @emph{not} guaranteed to be zero-terminated. -If you need to pass the string value to a C library function, save -the value in @code{n->stptr[n->stlen]}, assign @code{'\0'} to it, -call the routine, and then restore the value. - -@cindex @code{wstptr} internal variable -@cindex internal variable, @code{wstptr} -@cindex @code{wstlen} internal variable -@cindex internal variable, @code{wstlen} -@item n->wstptr -@itemx n->wstlen -The data and length of a @code{NODE}'s wide-string value, respectively. -Use @code{force_wstring()} to make sure these values are current. - -@cindex @code{type} internal variable -@cindex internal variable, @code{type} -@item n->type -The type of the @code{NODE}. This is a C @code{enum}. Values should -be one of @code{Node_var}, @code{Node_var_new}, or @code{Node_var_array} -for function parameters. - -@cindex @code{vname} internal variable -@cindex internal variable, @code{vname} -@item n->vname -The ``variable name'' of a node. This is not of much use inside -externally written extensions. - -@cindex arrays, associative, clearing -@cindex @code{assoc_clear()} internal function -@cindex internal function, @code{assoc_clear()} -@item void assoc_clear(NODE *n) -Clears the associative array pointed to by @code{n}. -Make sure that @samp{n->type == Node_var_array} first. - -@cindex arrays, elements, installing -@cindex @code{assoc_lookup()} internal function -@cindex internal function, @code{assoc_lookup()} -@item NODE **assoc_lookup(NODE *symbol, NODE *subs) -Finds, and installs if necessary, array elements. -@code{symbol} is the array, @code{subs} is the subscript. -This is usually a value created with @code{make_string()} (see below). - -@cindex strings -@cindex @code{make_string()} internal function -@cindex internal function, @code{make_string()} -@item NODE *make_string(char *s, size_t len) -Take a C string and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is permanent storage; understanding -of @command{gawk} memory management is helpful. - -@cindex numbers -@cindex @code{make_number()} internal function -@cindex internal function, @code{make_number()} -@item NODE *make_number(AWKNUM val) -Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is permanent storage; understanding -of @command{gawk} memory management is helpful. - - -@cindex nodes@comma{} duplicating -@cindex @code{dupnode()} internal function -@cindex internal function, @code{dupnode()} -@item NODE *dupnode(NODE *n) -Duplicate a node. In most cases, this increments an internal -reference count instead of actually duplicating the entire @code{NODE}; -understanding of @command{gawk} memory management is helpful. - -@cindex memory, releasing -@cindex @code{unref()} internal function -@cindex internal function, @code{unref()} -@item void unref(NODE *n) -This macro releases the memory associated with a @code{NODE} -allocated with @code{make_string()} or @code{make_number()}. -Understanding of @command{gawk} memory management is helpful. - -@cindex @code{make_builtin()} internal function -@cindex internal function, @code{make_builtin()} -@item void make_builtin(const char *name, NODE *(*func)(NODE *), int count) -Register a C function pointed to by @code{func} as new built-in -function @code{name}. @code{name} is a regular C string. @code{count} -is the maximum number of arguments that the function takes. -The function should be written in the following manner: - -@example -/* do_xxx --- do xxx function for gawk */ - -NODE * -do_xxx(int nargs) -@{ - @dots{} -@} -@end example - -@cindex arguments, retrieving -@cindex @code{get_argument()} internal function -@cindex internal function, @code{get_argument()} -@item NODE *get_argument(int i) -This function is called from within a C extension function to get -the @code{i}-th argument from the function call. -The first argument is argument zero. - -@cindex @code{get_actual_argument()} internal function -@cindex internal function, @code{get_actual_argument()} -@item NODE *get_actual_argument(int i, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int@ optional,@ int@ wantarray); -This function retrieves a particular argument @code{i}. @code{wantarray} is @code{TRUE} -if the argument should be an array, @code{FALSE} otherwise. If @code{optional} is -@code{TRUE}, the argument need not have been supplied. If it wasn't, the return -value is @code{NULL}. It is a fatal error if @code{optional} is @code{TRUE} but -the argument was not provided. - -@cindex @code{get_scalar_argument()} internal macro -@cindex internal macro, @code{get_scalar_argument()} -@item get_scalar_argument(i, opt) -This is a convenience macro that calls @code{get_actual_argument()}. - -@cindex @code{get_array_argument()} internal macro -@cindex internal macro, @code{get_array_argument()} -@item get_array_argument(i, opt) -This is a convenience macro that calls @code{get_actual_argument()}. - -@cindex functions, return values@comma{} setting - -@cindex @code{ERRNO} variable -@cindex @code{update_ERRNO_int()} internal function -@cindex internal function, @code{update_ERRNO_int()} -@item void update_ERRNO_int(int errno_saved) -This function is called from within a C extension function to set -the value of @command{gawk}'s @code{ERRNO} variable, based on the error -value provided as the argument. -It is provided as a convenience. - -@cindex @code{ERRNO} variable -@cindex @code{update_ERRNO_string()} internal function -@cindex internal function, @code{update_ERRNO_string()} -@item void update_ERRNO_string(const char *string, enum errno_translate) -This function is called from within a C extension function to set -the value of @command{gawk}'s @code{ERRNO} variable to a given string. -The second argument determines whether the string is translated before being -installed into @code{ERRNO}. It is provided as a convenience. - -@cindex @code{ERRNO} variable -@cindex @code{unset_ERRNO()} internal function -@cindex internal function, @code{unset_ERRNO()} -@item void unset_ERRNO(void) -This function is called from within a C extension function to set -the value of @command{gawk}'s @code{ERRNO} variable to a null string. -It is provided as a convenience. - -@cindex @code{ENVIRON} array -@cindex @code{PROCINFO} array -@cindex @code{register_deferred_variable()} internal function -@cindex internal function, @code{register_deferred_variable()} -@item void register_deferred_variable(const char *name, NODE *(*load_func)(void)) -This function is called to register a function to be called when a -reference to an undefined variable with the given name is encountered. -The callback function will never be called if the variable exists already, -so, unless the calling code is running at program startup, it should first -check whether a variable of the given name already exists. -The argument function must return a pointer to a @code{NODE} containing the -newly created variable. This function is used to implement the builtin -@code{ENVIRON} and @code{PROCINFO} arrays, so you can refer to them -for examples. - -@cindex @code{IOBUF} internal structure -@cindex internal structure, @code{IOBUF} -@cindex @code{iop_alloc()} internal function -@cindex internal function, @code{iop_alloc()} -@cindex @code{get_record()} input method -@cindex @code{close_func}() input method -@cindex @code{INVALID_HANDLE} internal constant -@cindex internal constant, @code{INVALID_HANDLE} -@cindex XML (eXtensible Markup Language) -@cindex eXtensible Markup Language (XML) -@cindex @code{register_open_hook()} internal function -@cindex internal function, @code{register_open_hook()} -@item void register_open_hook(void *(*open_func)(IOBUF *)) -This function is called to register a function to be called whenever -a new data file is opened, leading to the creation of an @code{IOBUF} -structure in @code{iop_alloc()}. After creating the new @code{IOBUF}, -@code{iop_alloc()} will call (in reverse order of registration, so the last -function registered is called first) each open hook until one returns -non-@code{NULL}. If any hook returns a non-@code{NULL} value, that value is assigned -to the @code{IOBUF}'s @code{opaque} field (which will presumably point -to a structure containing additional state associated with the input -processing), and no further open hooks are called. - -The function called will most likely want to set the @code{IOBUF}'s -@code{get_record} method to indicate that future input records should -be retrieved by calling that method instead of using the standard -@command{gawk} input processing. - -And the function will also probably want to set the @code{IOBUF}'s -@code{close_func} method to be called when the file is closed to clean -up any state associated with the input. - -Finally, hook functions should be prepared to receive an @code{IOBUF} -structure where the @code{fd} field is set to @code{INVALID_HANDLE}, -meaning that @command{gawk} was not able to open the file itself. In -this case, the hook function must be able to successfully open the file -and place a valid file descriptor there. - -Currently, for example, the hook function facility is used to implement -the XML parser shared library extension. For more info, please look in -@file{awk.h} and in @file{io.c}. -@end table - -An argument that is supposed to be an array needs to be handled with -some extra code, in case the array being passed in is actually -from a function parameter. - -The following boilerplate code shows how to do this: - -@example -NODE *the_arg; - -/* assume need 3rd arg, 0-based */ -the_arg = get_array_argument(2, FALSE); -@end example - -Again, you should spend time studying the @command{gawk} internals; -don't just blindly copy this code. -@c ENDOFRANGE gawint - -@node Plugin License -@appendixsubsec Extension Licensing - -Every dynamic extension should define the global symbol -@code{plugin_is_GPL_compatible} to assert that it has been licensed under -a GPL-compatible license. If this symbol does not exist, @command{gawk} -will emit a fatal error and exit. +@node Derived Files +@appendixsubsec Why Generated Files Are Kept In @command{git} -The declared type of the symbol should be @code{int}. It does not need -to be in any allocated section, though. The code merely asserts that -the symbol exists in the global scope. Something like this is enough: - -@example -int plugin_is_GPL_compatible; -@end example - -@node Loading Extensions -@appendixsubsec Loading a Dynamic Extension -@cindex loading extension -@cindex @command{gawk}, functions, loading -There are two ways to load a dynamically linked library. The first is to use the -builtin @code{extension()}: - -@example -extension(libname, init_func) -@end example - -where @file{libname} is the library to load, and @samp{init_func} is the -name of the initialization or bootstrap routine to run once loaded. - -The second method for dynamic loading of a library is to use the -command line option @option{-l}: - -@example -$ @kbd{gawk -l libname -f myprog} -@end example - -This will work only if the initialization routine is named @code{dlload()}. - -If you use @code{extension()}, the library will be loaded -at run time. This means that the functions are available only to the rest of -your script. If you use the command line option @option{-l} instead, -the library will be loaded before @command{gawk} starts compiling the -actual program. The net effect is that you can use those functions -anywhere in the program. - -@command{gawk} has a list of directories where it searches for libraries. -By default, the list includes directories that depend upon how gawk was built -and installed (@pxref{AWKLIBPATH Variable}). If you want @command{gawk} -to look for libraries in your private directory, you have to tell it. -The way to do it is to set the @env{AWKLIBPATH} environment variable -(@pxref{AWKLIBPATH Variable}). -@command{gawk} supplies the default shared library platform suffix if it is not -present in the name of the library. -If the name of your library is @file{mylib.so}, you can simply type +@c From emails written March 22, 2012, to the gawk developers list. -@example -$ @kbd{gawk -l mylib -f myprog} -@end example +If you look at the @command{gawk} source in the @command{git} +repository, you will notice that it includes files that are automatically +generated by GNU infrastructure tools, such as @file{Makefile.in} from +@command{automake} and even @file{configure} from @command{autoconf}. -and @command{gawk} will do everything necessary to load in your library, -and then call your @code{dlload()} routine. +This is different from many Free Software projects that do not store +the derived files, because that keeps the repository less cluttered, +and it is easier to see the substantive changes when comparing versions +and trying to understand what changed between commits. -You can always specify the library using an absolute pathname, in which -case @command{gawk} will not use @env{AWKLIBPATH} to search for it. - -@node Sample Library -@appendixsubsec Example: Directory and File Operation Built-ins -@c STARTOFRANGE chdirg -@cindex @code{chdir()} function@comma{} implementing in @command{gawk} -@c STARTOFRANGE statg -@cindex @code{stat()} function@comma{} implementing in @command{gawk} -@c STARTOFRANGE filre -@cindex files, information about@comma{} retrieving -@c STARTOFRANGE dirch -@cindex directories, changing +However, there are two reasons why the @command{gawk} maintainer +likes to have everything in the repository. -Two useful functions that are not in @command{awk} are @code{chdir()} -(so that an @command{awk} program can change its directory) and -@code{stat()} (so that an @command{awk} program can gather information about -a file). -This @value{SECTION} implements these functions for @command{gawk} in an -external extension library. +First, because it is then easy to reproduce any given version completely, +without relying upon the availability of (older, likely obsolete, and +maybe even impossible to find) other tools. -@menu -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -@end menu +As an extreme example, if you ever even think about trying to compile, +oh, say, the V7 @command{awk}, you will discover that not only do you +have to bootstrap the V7 @command{yacc} to do so, but you also need the +V7 @command{lex}. And the latter is pretty much impossible to bring up +on a modern GNU/Linux system.@footnote{We tried. It was painful.} -@node Internal File Description -@appendixsubsubsec Using @code{chdir()} and @code{stat()} +(Or, let's say @command{gawk} 1.2 required @command{bison} whatever-it-was +in 1989 and that there was no @file{awkgram.c} file in the repository. Is +there a guarantee that we could find that @command{bison} version? Or that +@emph{it} would build?) -This @value{SECTION} shows how to use the new functions at the @command{awk} -level once they've been integrated into the running @command{gawk} -interpreter. -Using @code{chdir()} is very straightforward. It takes one argument, -the new directory to change to: +If the repository has all the generated files, then it's easy to just check +them out and build. (Or @emph{easier}, depending upon how far back we go. +@code{:-)}) -@example -@dots{} -newdir = "/home/arnold/funstuff" -ret = chdir(newdir) -if (ret < 0) @{ - printf("could not change to %s: %s\n", - newdir, ERRNO) > "/dev/stderr" - exit 1 -@} -@dots{} -@end example +And that brings us to the second (and stronger) reason why all the files +really need to be in @command{git}. It boils down to who do you cater +to---the @command{gawk} developer(s), or the user who just wants to check +out a version and try it out? -The return value is negative if the @code{chdir} failed, -and @code{ERRNO} -(@pxref{Built-in Variables}) -is set to a string indicating the error. +The @command{gawk} maintainer +wants it to be possible for any interested @command{awk} user in the +world to just clone the repository, check out the branch of interest and +build it. Without their having to have the correct version(s) of the +autotools.@footnote{There is one GNU program that is (in our opinion) +severely difficult to bootstrap from the @command{git} repository. For +example, on the author's old (but still working) PowerPC macintosh with +Mac OS X 10.5, it was necessary to bootstrap a ton of software, starting +with @command{git} itself, in order to try to work with the latest code. +It's not pleasant, and especially on older systems, it's a big waste +of time. -Using @code{stat()} is a bit more complicated. -The C @code{stat()} function fills in a structure that has a fair -amount of information. -The right way to model this in @command{awk} is to fill in an associative -array with the appropriate information: +Starting with the latest tarball was no picnic either. The maintainers +had dropped @file{.gz} and @file{.bz2} files and only distribute +@file{.tar.xz} files. It was necessary to bootstrap @command{xz} first!} +That is the point of the @file{bootstrap.sh} file. It touches the +various other files in the right order such that -@c broke printf for page breaking @example -file = "/home/arnold/.profile" -fdata[1] = "x" # force `fdata' to be an array -ret = stat(file, fdata) -if (ret < 0) @{ - printf("could not stat %s: %s\n", - file, ERRNO) > "/dev/stderr" - exit 1 -@} -printf("size of %s is %d bytes\n", file, fdata["size"]) +# The canonical incantation for building GNU software: +./bootstrap.sh && ./configure && make @end example -The @code{stat()} function always clears the data array, even if -the @code{stat()} fails. It fills in the following elements: - -@table @code -@item "name" -The name of the file that was @code{stat()}'ed. +@noindent +will @emph{just work}. -@item "dev" -@itemx "ino" -The file's device and inode numbers, respectively. +This is extremely important for the @code{master} and +@code{gawk-@var{X}.@var{Y}-stable} branches. -@item "mode" -The file's mode, as a numeric value. This includes both the file's -type and its permissions. +Further, the @command{gawk} maintainer would argue that it's also +important for the @command{gawk} developers. When he tried to check out +the @code{xgawk} branch@footnote{A branch created by one of the other +developers that did not include the generated files.} to build it, he +couldn't. (No @file{ltmain.sh} file, and he had no idea how to create it, +and that was not the only problem.) -@item "nlink" -The number of hard links (directory entries) the file has. +He felt @emph{extremely} frustrated. With respect to that branch, +the maintainer is no different than Jane User who wants to try to build +@code{gawk-4.0-stable} or @code{master} from the repository. -@item "uid" -@itemx "gid" -The numeric user and group ID numbers of the file's owner. +Thus, the maintainer thinks that it's not just important, but critical, +that for any given branch, the above incantation @emph{just works}. -@item "size" -The size in bytes of the file. +@c So - that's my reasoning and philosophy. -@item "blocks" -The number of disk blocks the file actually occupies. This may not -be a function of the file's size if the file has holes. +What are some of the consequences and/or actions to take? -@item "atime" -@itemx "mtime" -@itemx "ctime" -The file's last access, modification, and inode update times, -respectively. These are numeric timestamps, suitable for formatting -with @code{strftime()} -(@pxref{Built-in}). +@enumerate 1 +@item +We don't mind that there are differing files in the different branches +as a result of different versions of the autotools. -@item "pmode" -The file's ``printable mode.'' This is a string representation of -the file's type and permissions, such as what is produced by -@samp{ls -l}---for example, @code{"drwxr-xr-x"}. +@enumerate A +@item +It's the maintainer's job to merge them and he will deal with it. -@item "type" -A printable string representation of the file's type. The value -is one of the following: +@item +He is really good at @samp{git diff x y > /tmp/diff1 ; gvim /tmp/diff1} to +remove the diffs that aren't of interest in order to review code. @code{:-)} +@end enumerate -@table @code -@item "blockdev" -@itemx "chardev" -The file is a block or character device (``special file''). +@item +It would certainly help if everyone used the same versions of the GNU tools +as he does, which in general are the latest released versions of +@command{automake}, +@command{autoconf}, +@command{bison}, +and +@command{gettext}. @ignore -@item "door" -The file is a Solaris ``door'' (special file used for -interprocess communications). +If it would help if I sent out an "I just upgraded to version x.y +of tool Z" kind of message to this list, I can do that. Up until +now it hasn't been a real issue since I'm the only one who's been +dorking with the configuration machinery. @end ignore -@item "directory" -The file is a directory. - -@item "fifo" -The file is a named-pipe (also known as a FIFO). - -@item "file" -The file is just a regular file. - -@item "socket" -The file is an @code{AF_UNIX} (``Unix domain'') socket in the -filesystem. - -@item "symlink" -The file is a symbolic link. -@end table -@end table - -Several additional elements may be present depending upon the operating -system and the type of the file. You can test for them in your @command{awk} -program by using the @code{in} operator -(@pxref{Reference to Elements}): - -@table @code -@item "blksize" -The preferred block size for I/O to the file. This field is not -present on all POSIX-like systems in the C @code{stat} structure. - -@item "linkval" -If the file is a symbolic link, this element is the name of the -file the link points to (i.e., the value of the link). - -@item "rdev" -@itemx "major" -@itemx "minor" -If the file is a block or character device file, then these values -represent the numeric device number and the major and minor components -of that number, respectively. -@end table - -@node Internal File Ops -@appendixsubsubsec C Code for @code{chdir()} and @code{stat()} - -Here is the C code for these extensions. They were written for -GNU/Linux. The code needs some more work for complete portability -to other POSIX-compliant systems:@footnote{This version is edited -slightly for presentation. See -@file{extension/filefuncs.c} in the @command{gawk} distribution -for the complete version.} - -@c break line for page breaking -@example -#include "awk.h" - -#include <sys/sysmacros.h> - -int plugin_is_GPL_compatible; - -/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ - -static NODE * -do_chdir(int nargs) -@{ - NODE *newdir; - int ret = -1; - - if (do_lint && nargs != 1) - lintwarn("chdir: called with incorrect number of arguments"); - - newdir = get_scalar_argument(0, FALSE); -@end example - -The file includes the @code{"awk.h"} header file for definitions -for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>} -for access to the @code{major()} and @code{minor}() macros. - -@cindex programming conventions, @command{gawk} internals -By convention, for an @command{awk} function @code{foo}, the function that -implements it is called @samp{do_foo}. The function should take -a @samp{int} argument, usually called @code{nargs}, that -represents the number of defined arguments for the function. The @code{newdir} -variable represents the new directory to change to, retrieved -with @code{get_scalar_argument()}. Note that the first argument is -numbered zero. - -This code actually accomplishes the @code{chdir()}. It first forces -the argument to be a string and passes the string value to the -@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} -is updated. - -@example - (void) force_string(newdir); - ret = chdir(newdir->stptr); - if (ret < 0) - update_ERRNO_int(errno); -@end example - -Finally, the function returns the return value to the @command{awk} level: - -@example - return make_number((AWKNUM) ret); -@} -@end example - -The @code{stat()} built-in is more involved. First comes a function -that turns a numeric mode into a printable representation -(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity: +@enumerate A +@item +Installing from source is quite easy. It's how the maintainer worked for years +under Fedora. +He had @file{/usr/local/bin} at the front of hs @env{PATH} and just did: -@c break line for page breaking @example -/* format_mode --- turn a stat mode field into something readable */ - -static char * -format_mode(unsigned long fmode) -@{ - @dots{} -@} +wget http://ftp.gnu.org/gnu/@var{package}/@var{package}-@var{x}.@var{y}.@var{z}.tar.gz +tar -xpzvf @var{package}-@var{x}.@var{y}.@var{z}.tar.gz +cd @var{package}-@var{x}.@var{y}.@var{z} +./configure && make && make check +make install # as root @end example -Next comes the @code{do_stat()} function. It starts with -variable declarations and argument checking: +@item +These days the maintainer uses Ubuntu 10.11 which is medium current, but +he is already doing the above for @command{autoconf} and @command{bison}. @ignore -Changed message for page breaking. Used to be: - "stat: called with incorrect number of arguments (%d), should be 2", +(C. Rant: Recent Linux versions with GNOME 3 really suck. What + are all those people thinking? Fedora 15 was such a bust it drove + me to Ubuntu, but Ubuntu 11.04 and 11.10 are totally unusable from + a UI perspective. Bleah.) @end ignore -@example -/* do_stat --- provide a stat() function for gawk */ - -static NODE * -do_stat(int nargs) -@{ - NODE *file, *array, *tmp; - struct stat sbuf; - int ret; - NODE **aptr; - char *pmode; /* printable mode */ - char *type = "unknown"; - - if (do_lint && nargs > 2) - lintwarn("stat: called with too many arguments"); -@end example - -Then comes the actual work. First, the function gets the arguments. -Then, it always clears the array. -The code use @code{lstat()} (instead of @code{stat()}) -to get the file information, -in case the file is a symbolic link. -If there's an error, it sets @code{ERRNO} and returns: - -@c comment made multiline for page breaking -@example - /* file is first arg, array to hold results is second */ - file = get_scalar_argument(0, FALSE); - array = get_array_argument(1, FALSE); - - /* empty out the array */ - assoc_clear(array); - - /* lstat the file, if error, set ERRNO and return */ - (void) force_string(file); - ret = lstat(file->stptr, & sbuf); - if (ret < 0) @{ - update_ERRNO_int(errno); - return make_number((AWKNUM) ret); - @} -@end example - -Now comes the tedious part: filling in the array. Only a few of the -calls are shown here, since they all follow the same pattern: - -@example - /* fill in the array */ - aptr = assoc_lookup(array, tmp = make_string("name", 4)); - *aptr = dupnode(file); - unref(tmp); - - aptr = assoc_lookup(array, tmp = make_string("mode", 4)); - *aptr = make_number((AWKNUM) sbuf.st_mode); - unref(tmp); - - aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); - pmode = format_mode(sbuf.st_mode); - *aptr = make_string(pmode, strlen(pmode)); - unref(tmp); -@end example - -When done, return the @code{lstat()} return value: - -@example - - return make_number((AWKNUM) ret); -@} -@end example - -@cindex programming conventions, @command{gawk} internals -Finally, it's necessary to provide the ``glue'' that loads the -new function(s) into @command{gawk}. By convention, each library has -a routine named @code{dlload()} that does the job: +@end enumerate -@example -/* dlload --- load new builtins in this library */ +@ignore +@item +If someone still feels really strongly about all this, then perhaps they +can have two branches, one for their development with just the clean +changes, and one that is buildable (xgawk and xgawk-buildable, maybe). +Or, as I suggested in another mail, make commits in pairs, the first with +the "real" changes and the second with "everything else needed for + building". +@end ignore +@end enumerate -NODE * -dlload(NODE *tree, void *dl) -@{ - make_builtin("chdir", do_chdir, 1); - make_builtin("stat", do_stat, 2); - return make_number((AWKNUM) 0); -@} -@end example +Most of the above was originally written by the maintainer to other +@command{gawk} developers. It raised the objection from one of +the devlopers ``@dots{} that anybody pulling down the source from +@command{git} is not an end user.'' -And that's it! As an exercise, consider adding functions to -implement system calls such as @code{chown()}, @code{chmod()}, -and @code{umask()}. +However, this is not true. There are ``power @command{awk} users'' +who can build @command{gawk} (using the magic incantation shown previously) +but who can't program in C. Thus, the major branches should be +kept buildable all the time. -@node Using Internal File Ops -@appendixsubsubsec Integrating the Extensions +It was then suggested that there be a @command{cron} job to create +nightly tarballs of ``the source.'' Here, the problem is that there +are source trees, corresponding to the various branches! So, +nightly tar balls aren't the answer, especially as the repository can go +for weeks without significant change being introduced. -@cindex @command{gawk}, interpreter@comma{} adding code to -Now that the code is written, it must be possible to add it at -runtime to the running @command{gawk} interpreter. First, the -code must be compiled. Assuming that the functions are in -a file named @file{filefuncs.c}, and @var{idir} is the location -of the @command{gawk} include files, -the following steps create -a GNU/Linux shared library: +Fortunately, the @command{git} server can meet this need. For any given +branch named @var{branchname}, use: @example -$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c} -$ @kbd{ld -o filefuncs.so -shared filefuncs.o} +wget http://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-@var{branchname}.tar.gz @end example -@cindex @code{extension()} function (@command{gawk}) -Once the library exists, it is loaded by calling the @code{extension()} -built-in function. -This function takes two arguments: the name of the -library to load and the name of a function to call when the library -is first loaded. This function adds the new functions to @command{gawk}. -It returns the value returned by the initialization function -within the shared library: - -@example -# file testff.awk -BEGIN @{ - extension("./filefuncs.so", "dlload") - - chdir(".") # no-op - - data[1] = 1 # force `data' to be an array - print "Info for testff.awk" - ret = stat("testff.awk", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "testff.awk modified:", - strftime("%m %d %y %H:%M:%S", data["mtime"]) - - print "\nInfo for JUNK" - ret = stat("JUNK", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) -@} -@end example +@noindent +to retrieve a snapshot of the given branch. -Here are the results of running the program: - -@example -$ @kbd{gawk -f testff.awk} -@print{} Info for testff.awk -@print{} ret = 0 -@print{} data["size"] = 607 -@print{} data["ino"] = 14945891 -@print{} data["name"] = testff.awk -@print{} data["pmode"] = -rw-rw-r-- -@print{} data["nlink"] = 1 -@print{} data["atime"] = 1293993369 -@print{} data["mtime"] = 1288520752 -@print{} data["mode"] = 33204 -@print{} data["blksize"] = 4096 -@print{} data["dev"] = 2054 -@print{} data["type"] = file -@print{} data["gid"] = 500 -@print{} data["uid"] = 500 -@print{} data["blocks"] = 8 -@print{} data["ctime"] = 1290113572 -@print{} testff.awk modified: 10 31 10 12:25:52 -@print{} -@print{} Info for JUNK -@print{} ret = -1 -@print{} JUNK modified: 01 01 70 02:00:00 -@end example -@c ENDOFRANGE filre -@c ENDOFRANGE dirch -@c ENDOFRANGE statg -@c ENDOFRANGE chdirg -@c ENDOFRANGE gladfgaw -@c ENDOFRANGE adfugaw -@c ENDOFRANGE fubadgaw @node Future Extensions @appendixsec Probable Future Extensions @@ -31400,12 +31257,8 @@ Following is a list of probable future changes visible at the @c these are ordered by likelihood @table @asis -@item Loadable module interface -It is not clear that the @command{awk}-level interface to the -modules facility is as good as it should be. The interface needs to be -redesigned, particularly taking namespace issues into account, as -well as possibly including issues such as library search path order -and versioning. +@item Databases +It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array. @item @code{RECLEN} variable for fixed-length records Along with @code{FIELDWIDTHS}, this would speed up the processing of @@ -31413,9 +31266,6 @@ fixed-length records. @code{PROCINFO["RS"]} would be @code{"RS"} or @code{"RECLEN"}, depending upon which kind of record processing is in effect. -@item Databases -It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array. - @item More @code{lint} warnings There are more things that could be checked for portability. @end table @@ -31424,21 +31274,6 @@ Following is a list of probable improvements that will make @command{gawk}'s source code easier to work with: @table @asis -@item Loadable module mechanics -The current extension mechanism works -(@pxref{Dynamic Extensions}), -but is rather primitive. It requires a fair amount of manual work -to create and integrate a loadable module. -Nor is the current mechanism as portable as might be desired. -The GNU @command{libtool} package provides a number of features that -would make using loadable modules much easier. -@command{gawk} should be changed to use @command{libtool}. - -@item Loadable module internals -The API to its internals that @command{gawk} ``exports'' should be revised. -Too many things are needlessly exposed. A new API should be designed -and implemented to make module writing easier. - @item Better array subscript management @command{gawk}'s management of array subscript storage could use revamping, so that using the same value to index multiple arrays only |