aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi1449
1 files changed, 529 insertions, 920 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 12b77556..ceea9a92 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -295,12 +295,14 @@ particular records in a file and perform operations upon them.
* Sample Programs:: Many @command{awk} programs with complete
explanations.
* Debugger:: The @code{gawk} debugger.
+* Dynamic Extensions:: Adding new built-in functions to
+ @command{gawk}.
* Language History:: The evolution of the @command{awk}
language.
* Installation:: Installing @command{gawk} under various
operating systems.
-* Notes:: Notes about @command{gawk} extensions and
- possible future work.
+* Notes:: Notes about adding things to @command{gawk}
+ and possible future work.
* Basic Concepts:: A very quick introduction to programming
concepts.
* Glossary:: An explanation of some unfamiliar terms.
@@ -558,21 +560,22 @@ particular records in a file and perform operations upon them.
* I18N Portability:: @command{awk}-level portability issues.
* I18N Example:: A simple i18n example.
* Gawk I18N:: @command{gawk} is also internationalized.
-* Floating-point Programming:: Effective floating-point programming.
-* Floating-point Representation:: Binary floating-point representation.
-* Floating-point Context:: Floating-point context.
-* Rounding Mode:: Floating-point rounding mode.
-* Arbitrary Precision Floats:: Arbitrary precision floating-point
- arithmetic with @command{gawk}.
-* Setting Precision:: Setting the working precision.
-* Setting Rounding Mode:: Setting the rounding mode.
-* Floating-point Constants:: Representing floating-point constants.
-* Changing Precision:: Changing the precision of a number.
-* Exact Arithmetic:: Exact arithmetic with floating-point numbers.
-* Integer Programming:: Effective integer programming.
-* Arbitrary Precision Integers:: Arbitrary precision integer
- arithmetic with @command{gawk}.
-* MPFR and GMP Libraries:: Information about the MPFR and GMP libraries.
+* Floating-point Programming:: Effective Floating-point Programming.
+* Floating-point Representation:: Binary Floating-point Representation.
+* Floating-point Context:: Floating-point Context.
+* Rounding Mode:: Floating-point Rounding Mode.
+* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
+ Arithmetic with @command{gawk}.
+* Setting Precision:: Setting the Working Precision.
+* Setting Rounding Mode:: Setting the Rounding Mode.
+* Floating-point Constants:: Representing Floating-point Constants.
+* Changing Precision:: Changing the Precision of a Number.
+* Exact Arithmetic:: Exact Arithmetic with Floating-point
+ Numbers.
+* Integer Programming:: Effective Integer Programming.
+* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
+ @command{gawk}.
+* MPFR and GMP Libraries ::
* Nondecimal Data:: Allowing nondecimal input data.
* Array Sorting:: Facilities for controlling array traversal
and sorting arrays.
@@ -637,14 +640,14 @@ particular records in a file and perform operations upon them.
* Anagram Program:: Finding anagrams from a dictionary.
* Signature Program:: People do amazing things with too much time
on their hands.
-* Debugging:: Introduction to @command{gawk} Debugger.
+* Debugging:: Introduction to @command{gawk} debugger.
* Debugging Concepts:: Debugging in General.
* Debugging Terms:: Additional Debugging Concepts.
* Awk Debugging:: Awk Debugging.
-* Sample Debugging Session:: Sample Debugging Session.
+* Sample Debugging Session:: Sample debugging session.
* Debugger Invocation:: How to Start the Debugger.
* Finding The Bug:: Finding the Bug.
-* List of Debugger Commands:: Main Commands.
+* List of Debugger Commands:: Main debugger commands.
* Breakpoint Control:: Control of Breakpoints.
* Debugger Execution Control:: Control of Execution.
* Viewing And Changing Data:: Viewing and Changing Data.
@@ -652,8 +655,13 @@ particular records in a file and perform operations upon them.
* Debugger Info:: Obtaining Information about the Program and
the Debugger State.
* Miscellaneous Debugger Commands:: Miscellaneous Commands.
-* Readline Support:: Readline Support.
-* Limitations:: Limitations and Future Plans.
+* Readline Support:: Readline support.
+* Limitations:: Limitations and future plans.
+* Plugin License:: A note about licensing.
+* Sample Library:: A example of new functions.
+* Internal File Description:: What the new functions will do.
+* Internal File Ops:: The code for internal file operations.
+* Using Internal File Ops:: How to use an external extension.
* V7/SVR3.1:: The major changes between V7 and System V
Release 3.1.
* SVR4:: Minor changes between System V Releases 3.1
@@ -704,16 +712,6 @@ particular records in a file and perform operations upon them.
@command{gawk}.
* New Ports:: Porting @command{gawk} to a new operating
system.
-* Dynamic Extensions:: Adding new built-in functions to
- @command{gawk}.
-* Internals:: A brief look at some @command{gawk}
- internals.
-* Plugin License:: A note about licensing.
-* Loading Extensions:: How to load dynamic extensions.
-* Sample Library:: A example of new functions.
-* Internal File Description:: What the new functions will do.
-* Internal File Ops:: The code for internal file operations.
-* Using Internal File Ops:: How to use an external extension.
* Future Extensions:: New features that may be implemented one
day.
* Basic High Level:: The high level view.
@@ -1206,8 +1204,7 @@ available @command{awk} implementations.
@ref{Notes},
describes how to disable @command{gawk}'s extensions, as
well as how to contribute new code to @command{gawk},
-how to write extension libraries, and some possible
-future directions for @command{gawk} development.
+and some possible future directions for @command{gawk} development.
@ref{Basic Concepts},
provides some very cursory background material for those who
@@ -3616,8 +3613,8 @@ behaves.
@menu
* AWKPATH Variable:: Searching directories for @command{awk}
programs.
-* AWKLIBPATH Variable:: Searching directories for @command{awk}
- shared libraries.
+* AWKLIBPATH Variable:: Searching directories for @command{awk} shared
+ libraries.
* Other Environment Variables:: The environment variables.
@end menu
@@ -5263,7 +5260,6 @@ used with it do not have to be named on the @command{awk} command line
* Getline:: Reading files under explicit program control
using the @code{getline} function.
* Read Timeout:: Reading input with a timeout.
-
* Command line directories:: What happens if you put a directory on the
command line.
@end menu
@@ -11565,9 +11561,9 @@ fatal error.
@item
If you have written extensions that modify the record handling (by inserting
-an ``open hook''), you can invoke them at this point, before @command{gawk}
+an ``input parser''), you can invoke them at this point, before @command{gawk}
has started processing the file. (This is a @emph{very} advanced feature,
-currently used only by the @uref{http://xmlgawk.sourceforge.net, XMLgawk project}.)
+currently used only by the @uref{http://gawkextlib.sourceforge.net, @code{gawkextlib} project}.)
@end itemize
The @code{ENDFILE} rule is called when @command{gawk} has finished processing
@@ -18508,21 +18504,22 @@ in general, and the limitations of doing arithmetic with ordinary
@command{gawk} numbers.
@menu
-* Floating-point Programming:: Effective Floating-point Programming.
-* Floating-point Representation:: Binary Floating-point Representation.
-* Floating-point Context:: Floating-point Context.
-* Rounding Mode:: Floating-point Rounding Mode.
-* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
- Arithmetic with @command{gawk}.
-* Setting Precision:: Setting the Working Precision.
-* Setting Rounding Mode:: Setting the Rounding Mode.
-* Floating-point Constants:: Representing Floating-point Constants.
-* Changing Precision:: Changing the Precision of a Number.
-* Exact Arithmetic:: Exact Arithmetic with Floating-point Numbers.
-* Integer Programming:: Effective Integer Programming.
-* Arbitrary Precision Integers:: Arbitrary Precision Integer
- Arithmetic with @command{gawk}.
-* MPFR and GMP Libraries:: Information About the MPFR and GMP Libraries.
+* Floating-point Programming:: Effective Floating-point Programming.
+* Floating-point Representation:: Binary Floating-point Representation.
+* Floating-point Context:: Floating-point Context.
+* Rounding Mode:: Floating-point Rounding Mode.
+* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
+ Arithmetic with @command{gawk}.
+* Setting Precision:: Setting the Working Precision.
+* Setting Rounding Mode:: Setting the Rounding Mode.
+* Floating-point Constants:: Representing Floating-point Constants.
+* Changing Precision:: Changing the Precision of a Number.
+* Exact Arithmetic:: Exact Arithmetic with Floating-point
+ Numbers.
+* Integer Programming:: Effective Integer Programming.
+* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
+ @command{gawk}.
+* MPFR and GMP Libraries ::
@end menu
@node Floating-point Programming
@@ -27530,6 +27527,471 @@ The @command{gawk} debugger only accepts source supplied with the @option{-f} op
Look forward to a future release when these and other missing features may
be added, and of course feel free to try to add them yourself!
+@node Dynamic Extensions
+@chapter Writing Extensions for @command{gawk}
+
+This chapter is a placeholder, pending a rewrite for the new API.
+Some of the old bits remain, since they can be partially reused.
+
+
+@c STARTOFRANGE gladfgaw
+@cindex @command{gawk}, functions, adding
+@c STARTOFRANGE adfugaw
+@cindex adding, functions to @command{gawk}
+@c STARTOFRANGE fubadgaw
+@cindex functions, built-in, adding to @command{gawk}
+It is possible to add new built-in
+functions to @command{gawk} using dynamically loaded libraries. This
+facility is available on systems (such as GNU/Linux) that support
+the C @code{dlopen()} and @code{dlsym()} functions.
+This @value{CHAPTER} describes how to write and use dynamically
+loaded extensions for @command{gawk}.
+Experience with programming in
+C or C++ is necessary when reading this @value{SECTION}.
+
+@quotation NOTE
+When @option{--sandbox} is specified, extensions are disabled
+(@pxref{Options}.
+@end quotation
+
+@menu
+* Plugin License:: A note about licensing.
+* Sample Library:: A example of new functions.
+@end menu
+
+@node Plugin License
+@section Extension Licensing
+
+Every dynamic extension should define the global symbol
+@code{plugin_is_GPL_compatible} to assert that it has been licensed under
+a GPL-compatible license. If this symbol does not exist, @command{gawk}
+will emit a fatal error and exit.
+
+The declared type of the symbol should be @code{int}. It does not need
+to be in any allocated section, though. The code merely asserts that
+the symbol exists in the global scope. Something like this is enough:
+
+@example
+int plugin_is_GPL_compatible;
+@end example
+
+@node Sample Library
+@section Example: Directory and File Operation Built-ins
+@c STARTOFRANGE chdirg
+@cindex @code{chdir()} function@comma{} implementing in @command{gawk}
+@c STARTOFRANGE statg
+@cindex @code{stat()} function@comma{} implementing in @command{gawk}
+@c STARTOFRANGE filre
+@cindex files, information about@comma{} retrieving
+@c STARTOFRANGE dirch
+@cindex directories, changing
+
+Two useful functions that are not in @command{awk} are @code{chdir()}
+(so that an @command{awk} program can change its directory) and
+@code{stat()} (so that an @command{awk} program can gather information about
+a file).
+This @value{SECTION} implements these functions for @command{gawk} in an
+external extension library.
+
+@menu
+* Internal File Description:: What the new functions will do.
+* Internal File Ops:: The code for internal file operations.
+* Using Internal File Ops:: How to use an external extension.
+@end menu
+
+@node Internal File Description
+@subsection Using @code{chdir()} and @code{stat()}
+
+This @value{SECTION} shows how to use the new functions at the @command{awk}
+level once they've been integrated into the running @command{gawk}
+interpreter.
+Using @code{chdir()} is very straightforward. It takes one argument,
+the new directory to change to:
+
+@example
+@dots{}
+newdir = "/home/arnold/funstuff"
+ret = chdir(newdir)
+if (ret < 0) @{
+ printf("could not change to %s: %s\n",
+ newdir, ERRNO) > "/dev/stderr"
+ exit 1
+@}
+@dots{}
+@end example
+
+The return value is negative if the @code{chdir} failed,
+and @code{ERRNO}
+(@pxref{Built-in Variables})
+is set to a string indicating the error.
+
+Using @code{stat()} is a bit more complicated.
+The C @code{stat()} function fills in a structure that has a fair
+amount of information.
+The right way to model this in @command{awk} is to fill in an associative
+array with the appropriate information:
+
+@c broke printf for page breaking
+@example
+file = "/home/arnold/.profile"
+fdata[1] = "x" # force `fdata' to be an array
+ret = stat(file, fdata)
+if (ret < 0) @{
+ printf("could not stat %s: %s\n",
+ file, ERRNO) > "/dev/stderr"
+ exit 1
+@}
+printf("size of %s is %d bytes\n", file, fdata["size"])
+@end example
+
+The @code{stat()} function always clears the data array, even if
+the @code{stat()} fails. It fills in the following elements:
+
+@table @code
+@item "name"
+The name of the file that was @code{stat()}'ed.
+
+@item "dev"
+@itemx "ino"
+The file's device and inode numbers, respectively.
+
+@item "mode"
+The file's mode, as a numeric value. This includes both the file's
+type and its permissions.
+
+@item "nlink"
+The number of hard links (directory entries) the file has.
+
+@item "uid"
+@itemx "gid"
+The numeric user and group ID numbers of the file's owner.
+
+@item "size"
+The size in bytes of the file.
+
+@item "blocks"
+The number of disk blocks the file actually occupies. This may not
+be a function of the file's size if the file has holes.
+
+@item "atime"
+@itemx "mtime"
+@itemx "ctime"
+The file's last access, modification, and inode update times,
+respectively. These are numeric timestamps, suitable for formatting
+with @code{strftime()}
+(@pxref{Built-in}).
+
+@item "pmode"
+The file's ``printable mode.'' This is a string representation of
+the file's type and permissions, such as what is produced by
+@samp{ls -l}---for example, @code{"drwxr-xr-x"}.
+
+@item "type"
+A printable string representation of the file's type. The value
+is one of the following:
+
+@table @code
+@item "blockdev"
+@itemx "chardev"
+The file is a block or character device (``special file'').
+
+@ignore
+@item "door"
+The file is a Solaris ``door'' (special file used for
+interprocess communications).
+@end ignore
+
+@item "directory"
+The file is a directory.
+
+@item "fifo"
+The file is a named-pipe (also known as a FIFO).
+
+@item "file"
+The file is just a regular file.
+
+@item "socket"
+The file is an @code{AF_UNIX} (``Unix domain'') socket in the
+filesystem.
+
+@item "symlink"
+The file is a symbolic link.
+@end table
+@end table
+
+Several additional elements may be present depending upon the operating
+system and the type of the file. You can test for them in your @command{awk}
+program by using the @code{in} operator
+(@pxref{Reference to Elements}):
+
+@table @code
+@item "blksize"
+The preferred block size for I/O to the file. This field is not
+present on all POSIX-like systems in the C @code{stat} structure.
+
+@item "linkval"
+If the file is a symbolic link, this element is the name of the
+file the link points to (i.e., the value of the link).
+
+@item "rdev"
+@itemx "major"
+@itemx "minor"
+If the file is a block or character device file, then these values
+represent the numeric device number and the major and minor components
+of that number, respectively.
+@end table
+
+@node Internal File Ops
+@subsection C Code for @code{chdir()} and @code{stat()}
+
+Here is the C code for these extensions. They were written for
+GNU/Linux. The code needs some more work for complete portability
+to other POSIX-compliant systems:@footnote{This version is edited
+slightly for presentation. See
+@file{extension/filefuncs.c} in the @command{gawk} distribution
+for the complete version.}
+
+@c break line for page breaking
+@example
+#include "awk.h"
+
+#include <sys/sysmacros.h>
+
+int plugin_is_GPL_compatible;
+
+/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */
+
+static NODE *
+do_chdir(int nargs)
+@{
+ NODE *newdir;
+ int ret = -1;
+
+ if (do_lint && nargs != 1)
+ lintwarn("chdir: called with incorrect number of arguments");
+
+ newdir = get_scalar_argument(0, FALSE);
+@end example
+
+The file includes the @code{"awk.h"} header file for definitions
+for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>}
+for access to the @code{major()} and @code{minor}() macros.
+
+@cindex programming conventions, @command{gawk} internals
+By convention, for an @command{awk} function @code{foo}, the function that
+implements it is called @samp{do_foo}. The function should take
+a @samp{int} argument, usually called @code{nargs}, that
+represents the number of defined arguments for the function. The @code{newdir}
+variable represents the new directory to change to, retrieved
+with @code{get_scalar_argument()}. Note that the first argument is
+numbered zero.
+
+This code actually accomplishes the @code{chdir()}. It first forces
+the argument to be a string and passes the string value to the
+@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO}
+is updated.
+
+@example
+ (void) force_string(newdir);
+ ret = chdir(newdir->stptr);
+ if (ret < 0)
+ update_ERRNO_int(errno);
+@end example
+
+Finally, the function returns the return value to the @command{awk} level:
+
+@example
+ return make_number((AWKNUM) ret);
+@}
+@end example
+
+The @code{stat()} built-in is more involved. First comes a function
+that turns a numeric mode into a printable representation
+(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity:
+
+@c break line for page breaking
+@example
+/* format_mode --- turn a stat mode field into something readable */
+
+static char *
+format_mode(unsigned long fmode)
+@{
+ @dots{}
+@}
+@end example
+
+Next comes the @code{do_stat()} function. It starts with
+variable declarations and argument checking:
+
+@ignore
+Changed message for page breaking. Used to be:
+ "stat: called with incorrect number of arguments (%d), should be 2",
+@end ignore
+@example
+/* do_stat --- provide a stat() function for gawk */
+
+static NODE *
+do_stat(int nargs)
+@{
+ NODE *file, *array, *tmp;
+ struct stat sbuf;
+ int ret;
+ NODE **aptr;
+ char *pmode; /* printable mode */
+ char *type = "unknown";
+
+ if (do_lint && nargs > 2)
+ lintwarn("stat: called with too many arguments");
+@end example
+
+Then comes the actual work. First, the function gets the arguments.
+Then, it always clears the array.
+The code use @code{lstat()} (instead of @code{stat()})
+to get the file information,
+in case the file is a symbolic link.
+If there's an error, it sets @code{ERRNO} and returns:
+
+@c comment made multiline for page breaking
+@example
+ /* file is first arg, array to hold results is second */
+ file = get_scalar_argument(0, FALSE);
+ array = get_array_argument(1, FALSE);
+
+ /* empty out the array */
+ assoc_clear(array);
+
+ /* lstat the file, if error, set ERRNO and return */
+ (void) force_string(file);
+ ret = lstat(file->stptr, & sbuf);
+ if (ret < 0) @{
+ update_ERRNO_int(errno);
+ return make_number((AWKNUM) ret);
+ @}
+@end example
+
+Now comes the tedious part: filling in the array. Only a few of the
+calls are shown here, since they all follow the same pattern:
+
+@example
+ /* fill in the array */
+ aptr = assoc_lookup(array, tmp = make_string("name", 4));
+ *aptr = dupnode(file);
+ unref(tmp);
+
+ aptr = assoc_lookup(array, tmp = make_string("mode", 4));
+ *aptr = make_number((AWKNUM) sbuf.st_mode);
+ unref(tmp);
+
+ aptr = assoc_lookup(array, tmp = make_string("pmode", 5));
+ pmode = format_mode(sbuf.st_mode);
+ *aptr = make_string(pmode, strlen(pmode));
+ unref(tmp);
+@end example
+
+When done, return the @code{lstat()} return value:
+
+@example
+
+ return make_number((AWKNUM) ret);
+@}
+@end example
+
+@cindex programming conventions, @command{gawk} internals
+Finally, it's necessary to provide the ``glue'' that loads the
+new function(s) into @command{gawk}. By convention, each library has
+a routine named @code{dl_load()} that does the job. The simplest way
+is to use the @code{dl_load_func} macro in @code{gawkapi.h}.
+
+And that's it! As an exercise, consider adding functions to
+implement system calls such as @code{chown()}, @code{chmod()},
+and @code{umask()}.
+
+@node Using Internal File Ops
+@subsection Integrating the Extensions
+
+@cindex @command{gawk}, interpreter@comma{} adding code to
+Now that the code is written, it must be possible to add it at
+runtime to the running @command{gawk} interpreter. First, the
+code must be compiled. Assuming that the functions are in
+a file named @file{filefuncs.c}, and @var{idir} is the location
+of the @command{gawk} include files,
+the following steps create
+a GNU/Linux shared library:
+
+@example
+$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c}
+$ @kbd{ld -o filefuncs.so -shared filefuncs.o}
+@end example
+
+@cindex @code{extension()} function (@command{gawk})
+Once the library exists, it is loaded by calling the @code{extension()}
+built-in function.
+This function takes two arguments: the name of the
+library to load and the name of a function to call when the library
+is first loaded. This function adds the new functions to @command{gawk}.
+It returns the value returned by the initialization function
+within the shared library:
+
+@example
+# file testff.awk
+BEGIN @{
+ extension("./filefuncs.so", "dl_load")
+
+ chdir(".") # no-op
+
+ data[1] = 1 # force `data' to be an array
+ print "Info for testff.awk"
+ ret = stat("testff.awk", data)
+ print "ret =", ret
+ for (i in data)
+ printf "data[\"%s\"] = %s\n", i, data[i]
+ print "testff.awk modified:",
+ strftime("%m %d %y %H:%M:%S", data["mtime"])
+
+ print "\nInfo for JUNK"
+ ret = stat("JUNK", data)
+ print "ret =", ret
+ for (i in data)
+ printf "data[\"%s\"] = %s\n", i, data[i]
+ print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"])
+@}
+@end example
+
+Here are the results of running the program:
+
+@example
+$ @kbd{gawk -f testff.awk}
+@print{} Info for testff.awk
+@print{} ret = 0
+@print{} data["size"] = 607
+@print{} data["ino"] = 14945891
+@print{} data["name"] = testff.awk
+@print{} data["pmode"] = -rw-rw-r--
+@print{} data["nlink"] = 1
+@print{} data["atime"] = 1293993369
+@print{} data["mtime"] = 1288520752
+@print{} data["mode"] = 33204
+@print{} data["blksize"] = 4096
+@print{} data["dev"] = 2054
+@print{} data["type"] = file
+@print{} data["gid"] = 500
+@print{} data["uid"] = 500
+@print{} data["blocks"] = 8
+@print{} data["ctime"] = 1290113572
+@print{} testff.awk modified: 10 31 10 12:25:52
+@print{}
+@print{} Info for JUNK
+@print{} ret = -1
+@print{} JUNK modified: 01 01 70 02:00:00
+@end example
+@c ENDOFRANGE filre
+@c ENDOFRANGE dirch
+@c ENDOFRANGE statg
+@c ENDOFRANGE chdirg
+@c ENDOFRANGE gladfgaw
+@c ENDOFRANGE adfugaw
+@c ENDOFRANGE fubadgaw
+
@ignore
@c Try this
@iftex
@@ -28010,11 +28472,6 @@ functions for internationalization
(@pxref{Programmer i18n}).
@item
-The @code{extension()} built-in function and the ability to add
-new functions dynamically
-(@pxref{Dynamic Extensions}).
-
-@item
The @code{fflush()} function from Brian Kernighan's
version of @command{awk}
(@pxref{I/O Functions}).
@@ -28048,15 +28505,21 @@ the @option{-l} command-line option
@item
The ability to use GNU-style long-named options that start with @option{--}
and the
+@option{--bignum},
@option{--characters-as-bytes},
-@option{--compat},
+@option{--copyright},
+@option{--debug},
@option{--dump-variables},
@option{--exec},
@option{--gen-pot},
+@option{--include},
@option{--lint},
@option{--lint-old},
+@option{--load},
@option{--non-decimal-data},
+@option{--optimize},
@option{--posix},
+@option{--pretty-print},
@option{--profile},
@option{--re-interval},
@option{--sandbox},
@@ -28374,6 +28837,7 @@ the various PC platforms.
Christos Zoulas
provided the @code{extension()}
built-in function for dynamically adding new modules.
+(This was removed at @command{gawk} 4.1.)
@item
@cindex Kahrs, J@"urgen
@@ -29802,8 +30266,6 @@ maintainers of @command{gawk}. Everything in it applies specifically to
* Compatibility Mode:: How to disable certain @command{gawk}
extensions.
* Additions:: Making Additions To @command{gawk}.
-* Dynamic Extensions:: Adding new built-in functions to
- @command{gawk}.
* Future Extensions:: New features that may be implemented one day.
@end menu
@@ -30039,8 +30501,9 @@ You will also have to sign paperwork for your documentation changes.
Submit changes as unified diffs.
Use @samp{diff -u -r -N} to compare
the original @command{gawk} source tree with your version.
-I recommend using the GNU version of @command{diff}.
-Send the output produced by either run of @command{diff} to me when you
+I recommend using the GNU version of @command{diff}, or best of all,
+@samp{git diff} or @samp{git format-patch}.
+Send the output produced by @command{diff} to me when you
submit your changes.
(@xref{Bugs}, for the electronic mail
information.)
@@ -30166,838 +30629,6 @@ operating systems' code that is already there.
In the code that you supply and maintain, feel free to use a
coding style and brace layout that suits your taste.
-@node Dynamic Extensions
-@appendixsec Adding New Built-in Functions to @command{gawk}
-@cindex Robinson, Will
-@cindex robot, the
-@cindex Lost In Space
-@quotation
-@i{Danger Will Robinson! Danger!!@*
-Warning! Warning!}@*
-The Robot
-@end quotation
-
-@c STARTOFRANGE gladfgaw
-@cindex @command{gawk}, functions, adding
-@c STARTOFRANGE adfugaw
-@cindex adding, functions to @command{gawk}
-@c STARTOFRANGE fubadgaw
-@cindex functions, built-in, adding to @command{gawk}
-It is possible to add new built-in
-functions to @command{gawk} using dynamically loaded libraries. This
-facility is available on systems (such as GNU/Linux) that support
-the C @code{dlopen()} and @code{dlsym()} functions.
-This @value{SECTION} describes how to write and use dynamically
-loaded extensions for @command{gawk}.
-Experience with programming in
-C or C++ is necessary when reading this @value{SECTION}.
-
-@quotation CAUTION
-The facilities described in this @value{SECTION}
-are very much subject to change in a future @command{gawk} release.
-Be aware that you may have to re-do everything,
-at some future time.
-
-If you have written your own dynamic extensions,
-be sure to recompile them for each new @command{gawk} release.
-There is no guarantee of binary compatibility between different
-releases, nor will there ever be such a guarantee.
-@end quotation
-
-@quotation NOTE
-When @option{--sandbox} is specified, extensions are disabled
-(@pxref{Options}.
-@end quotation
-
-@menu
-* Internals:: A brief look at some @command{gawk} internals.
-* Plugin License:: A note about licensing.
-* Loading Extensions:: How to load dynamic extensions.
-* Sample Library:: A example of new functions.
-@end menu
-
-@node Internals
-@appendixsubsec A Minimal Introduction to @command{gawk} Internals
-@c STARTOFRANGE gawint
-@cindex @command{gawk}, internals
-
-The truth is that @command{gawk} was not designed for simple extensibility.
-The facilities for adding functions using shared libraries work, but
-are something of a ``bag on the side.'' Thus, this tour is
-brief and simplistic; would-be @command{gawk} hackers are encouraged to
-spend some time reading the source code before trying to write
-extensions based on the material presented here. Of particular note
-are the files @file{awk.h}, @file{builtin.c}, and @file{eval.c}.
-Reading @file{awkgram.y} in order to see how the parse tree is built
-would also be of use.
-
-@cindex @code{awk.h} file (internal)
-With the disclaimers out of the way, the following types, structure
-members, functions, and macros are declared in @file{awk.h} and are of
-use when writing extensions. The next @value{SECTION}
-shows how they are used:
-
-@table @code
-@cindex floating-point, numbers, @code{AWKNUM} internal type
-@cindex numbers, floating-point, @code{AWKNUM} internal type
-@cindex @code{AWKNUM} internal type
-@cindex internal type, @code{AWKNUM}
-@item AWKNUM
-An @code{AWKNUM} is the internal type of @command{awk}
-floating-point numbers. Typically, it is a C @code{double}.
-
-@cindex @code{NODE} internal type
-@cindex internal type, @code{NODE}
-@cindex strings, @code{NODE} internal type
-@cindex numbers, @code{NODE} internal type
-@item NODE
-Just about everything is done using objects of type @code{NODE}.
-These contain both strings and numbers, as well as variables and arrays.
-
-@cindex @code{force_number()} internal function
-@cindex internal function, @code{force_number()}
-@cindex numeric, values
-@item AWKNUM force_number(NODE *n)
-This macro forces a value to be numeric. It returns the actual
-numeric value contained in the node.
-It may end up calling an internal @command{gawk} function.
-
-@cindex @code{force_string()} internal function
-@cindex internal function, @code{force_string()}
-@item void force_string(NODE *n)
-This macro guarantees that a @code{NODE}'s string value is current.
-It may end up calling an internal @command{gawk} function.
-It also guarantees that the string is zero-terminated.
-
-@cindex @code{force_wstring()} internal function
-@cindex internal function, @code{force_wstring()}
-@item void force_wstring(NODE *n)
-Similarly, this
-macro guarantees that a @code{NODE}'s wide-string value is current.
-It may end up calling an internal @command{gawk} function.
-It also guarantees that the wide string is zero-terminated.
-
-@cindex parameters@comma{} number of
-@cindex @code{nargs} internal variable
-@cindex internal variable, @code{nargs}
-@item nargs
-Inside an extension function, this is the actual number of
-parameters passed to the current function.
-
-@cindex @code{stptr} internal variable
-@cindex internal variable, @code{stptr}
-@cindex @code{stlen} internal variable
-@cindex internal variable, @code{stlen}
-@item n->stptr
-@itemx n->stlen
-The data and length of a @code{NODE}'s string value, respectively.
-The string is @emph{not} guaranteed to be zero-terminated.
-If you need to pass the string value to a C library function, save
-the value in @code{n->stptr[n->stlen]}, assign @code{'\0'} to it,
-call the routine, and then restore the value.
-
-@cindex @code{wstptr} internal variable
-@cindex internal variable, @code{wstptr}
-@cindex @code{wstlen} internal variable
-@cindex internal variable, @code{wstlen}
-@item n->wstptr
-@itemx n->wstlen
-The data and length of a @code{NODE}'s wide-string value, respectively.
-Use @code{force_wstring()} to make sure these values are current.
-
-@cindex @code{type} internal variable
-@cindex internal variable, @code{type}
-@item n->type
-The type of the @code{NODE}. This is a C @code{enum}. Values should
-be one of @code{Node_var}, @code{Node_var_new}, or @code{Node_var_array}
-for function parameters.
-
-@cindex @code{vname} internal variable
-@cindex internal variable, @code{vname}
-@item n->vname
-The ``variable name'' of a node. This is not of much use inside
-externally written extensions.
-
-@cindex arrays, associative, clearing
-@cindex @code{assoc_clear()} internal function
-@cindex internal function, @code{assoc_clear()}
-@item void assoc_clear(NODE *n)
-Clears the associative array pointed to by @code{n}.
-Make sure that @samp{n->type == Node_var_array} first.
-
-@cindex arrays, elements, installing
-@cindex @code{assoc_lookup()} internal function
-@cindex internal function, @code{assoc_lookup()}
-@item NODE **assoc_lookup(NODE *symbol, NODE *subs)
-Finds, and installs if necessary, array elements.
-@code{symbol} is the array, @code{subs} is the subscript.
-This is usually a value created with @code{make_string()} (see below).
-
-@cindex strings
-@cindex @code{make_string()} internal function
-@cindex internal function, @code{make_string()}
-@item NODE *make_string(char *s, size_t len)
-Take a C string and turn it into a pointer to a @code{NODE} that
-can be stored appropriately. This is permanent storage; understanding
-of @command{gawk} memory management is helpful.
-
-@cindex numbers
-@cindex @code{make_number()} internal function
-@cindex internal function, @code{make_number()}
-@item NODE *make_number(AWKNUM val)
-Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that
-can be stored appropriately. This is permanent storage; understanding
-of @command{gawk} memory management is helpful.
-
-
-@cindex nodes@comma{} duplicating
-@cindex @code{dupnode()} internal function
-@cindex internal function, @code{dupnode()}
-@item NODE *dupnode(NODE *n)
-Duplicate a node. In most cases, this increments an internal
-reference count instead of actually duplicating the entire @code{NODE};
-understanding of @command{gawk} memory management is helpful.
-
-@cindex memory, releasing
-@cindex @code{unref()} internal function
-@cindex internal function, @code{unref()}
-@item void unref(NODE *n)
-This macro releases the memory associated with a @code{NODE}
-allocated with @code{make_string()} or @code{make_number()}.
-Understanding of @command{gawk} memory management is helpful.
-
-@cindex @code{make_builtin()} internal function
-@cindex internal function, @code{make_builtin()}
-@item void make_builtin(const char *name, NODE *(*func)(NODE *), int count)
-Register a C function pointed to by @code{func} as new built-in
-function @code{name}. @code{name} is a regular C string. @code{count}
-is the maximum number of arguments that the function takes.
-The function should be written in the following manner:
-
-@example
-/* do_xxx --- do xxx function for gawk */
-
-NODE *
-do_xxx(int nargs)
-@{
- @dots{}
-@}
-@end example
-
-@cindex arguments, retrieving
-@cindex @code{get_argument()} internal function
-@cindex internal function, @code{get_argument()}
-@item NODE *get_argument(int i)
-This function is called from within a C extension function to get
-the @code{i}-th argument from the function call.
-The first argument is argument zero.
-
-@cindex @code{get_actual_argument()} internal function
-@cindex internal function, @code{get_actual_argument()}
-@item NODE *get_actual_argument(int i,
-@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int@ optional,@ int@ wantarray);
-This function retrieves a particular argument @code{i}. @code{wantarray} is @code{TRUE}
-if the argument should be an array, @code{FALSE} otherwise. If @code{optional} is
-@code{TRUE}, the argument need not have been supplied. If it wasn't, the return
-value is @code{NULL}. It is a fatal error if @code{optional} is @code{TRUE} but
-the argument was not provided.
-
-@cindex @code{get_scalar_argument()} internal macro
-@cindex internal macro, @code{get_scalar_argument()}
-@item get_scalar_argument(i, opt)
-This is a convenience macro that calls @code{get_actual_argument()}.
-
-@cindex @code{get_array_argument()} internal macro
-@cindex internal macro, @code{get_array_argument()}
-@item get_array_argument(i, opt)
-This is a convenience macro that calls @code{get_actual_argument()}.
-
-@cindex functions, return values@comma{} setting
-
-@cindex @code{ERRNO} variable
-@cindex @code{update_ERRNO_int()} internal function
-@cindex internal function, @code{update_ERRNO_int()}
-@item void update_ERRNO_int(int errno_saved)
-This function is called from within a C extension function to set
-the value of @command{gawk}'s @code{ERRNO} variable, based on the error
-value provided as the argument.
-It is provided as a convenience.
-
-@cindex @code{ERRNO} variable
-@cindex @code{update_ERRNO_string()} internal function
-@cindex internal function, @code{update_ERRNO_string()}
-@item void update_ERRNO_string(const char *string, enum errno_translate)
-This function is called from within a C extension function to set
-the value of @command{gawk}'s @code{ERRNO} variable to a given string.
-The second argument determines whether the string is translated before being
-installed into @code{ERRNO}. It is provided as a convenience.
-
-@cindex @code{ERRNO} variable
-@cindex @code{unset_ERRNO()} internal function
-@cindex internal function, @code{unset_ERRNO()}
-@item void unset_ERRNO(void)
-This function is called from within a C extension function to set
-the value of @command{gawk}'s @code{ERRNO} variable to a null string.
-It is provided as a convenience.
-
-@cindex @code{ENVIRON} array
-@cindex @code{PROCINFO} array
-@cindex @code{register_deferred_variable()} internal function
-@cindex internal function, @code{register_deferred_variable()}
-@item void register_deferred_variable(const char *name, NODE *(*load_func)(void))
-This function is called to register a function to be called when a
-reference to an undefined variable with the given name is encountered.
-The callback function will never be called if the variable exists already,
-so, unless the calling code is running at program startup, it should first
-check whether a variable of the given name already exists.
-The argument function must return a pointer to a @code{NODE} containing the
-newly created variable. This function is used to implement the builtin
-@code{ENVIRON} and @code{PROCINFO} arrays, so you can refer to them
-for examples.
-
-@cindex @code{IOBUF} internal structure
-@cindex internal structure, @code{IOBUF}
-@cindex @code{iop_alloc()} internal function
-@cindex internal function, @code{iop_alloc()}
-@cindex @code{get_record()} input method
-@cindex @code{close_func}() input method
-@cindex @code{INVALID_HANDLE} internal constant
-@cindex internal constant, @code{INVALID_HANDLE}
-@cindex XML (eXtensible Markup Language)
-@cindex eXtensible Markup Language (XML)
-@cindex @code{register_open_hook()} internal function
-@cindex internal function, @code{register_open_hook()}
-@item void register_open_hook(void *(*open_func)(IOBUF *))
-This function is called to register a function to be called whenever
-a new data file is opened, leading to the creation of an @code{IOBUF}
-structure in @code{iop_alloc()}. After creating the new @code{IOBUF},
-@code{iop_alloc()} will call (in reverse order of registration, so the last
-function registered is called first) each open hook until one returns
-non-@code{NULL}. If any hook returns a non-@code{NULL} value, that value is assigned
-to the @code{IOBUF}'s @code{opaque} field (which will presumably point
-to a structure containing additional state associated with the input
-processing), and no further open hooks are called.
-
-The function called will most likely want to set the @code{IOBUF}'s
-@code{get_record} method to indicate that future input records should
-be retrieved by calling that method instead of using the standard
-@command{gawk} input processing.
-
-And the function will also probably want to set the @code{IOBUF}'s
-@code{close_func} method to be called when the file is closed to clean
-up any state associated with the input.
-
-Finally, hook functions should be prepared to receive an @code{IOBUF}
-structure where the @code{fd} field is set to @code{INVALID_HANDLE},
-meaning that @command{gawk} was not able to open the file itself. In
-this case, the hook function must be able to successfully open the file
-and place a valid file descriptor there.
-
-Currently, for example, the hook function facility is used to implement
-the XML parser shared library extension. For more info, please look in
-@file{awk.h} and in @file{io.c}.
-@end table
-
-An argument that is supposed to be an array needs to be handled with
-some extra code, in case the array being passed in is actually
-from a function parameter.
-
-The following boilerplate code shows how to do this:
-
-@example
-NODE *the_arg;
-
-/* assume need 3rd arg, 0-based */
-the_arg = get_array_argument(2, FALSE);
-@end example
-
-Again, you should spend time studying the @command{gawk} internals;
-don't just blindly copy this code.
-@c ENDOFRANGE gawint
-
-@node Plugin License
-@appendixsubsec Extension Licensing
-
-Every dynamic extension should define the global symbol
-@code{plugin_is_GPL_compatible} to assert that it has been licensed under
-a GPL-compatible license. If this symbol does not exist, @command{gawk}
-will emit a fatal error and exit.
-
-The declared type of the symbol should be @code{int}. It does not need
-to be in any allocated section, though. The code merely asserts that
-the symbol exists in the global scope. Something like this is enough:
-
-@example
-int plugin_is_GPL_compatible;
-@end example
-
-@node Loading Extensions
-@appendixsubsec Loading a Dynamic Extension
-@cindex loading extension
-@cindex @command{gawk}, functions, loading
-There are two ways to load a dynamically linked library. The first is to use the
-builtin @code{extension()}:
-
-@example
-extension(libname, init_func)
-@end example
-
-where @file{libname} is the library to load, and @samp{init_func} is the
-name of the initialization or bootstrap routine to run once loaded.
-
-The second method for dynamic loading of a library is to use the
-command line option @option{-l}:
-
-@example
-$ @kbd{gawk -l libname -f myprog}
-@end example
-
-This will work only if the initialization routine is named @code{dl_load()}.
-
-If you use @code{extension()}, the library will be loaded
-at run time. This means that the functions are available only to the rest of
-your script. If you use the command line option @option{-l} instead,
-the library will be loaded before @command{gawk} starts compiling the
-actual program. The net effect is that you can use those functions
-anywhere in the program.
-
-@command{gawk} has a list of directories where it searches for libraries.
-By default, the list includes directories that depend upon how gawk was built
-and installed (@pxref{AWKLIBPATH Variable}). If you want @command{gawk}
-to look for libraries in your private directory, you have to tell it.
-The way to do it is to set the @env{AWKLIBPATH} environment variable
-(@pxref{AWKLIBPATH Variable}).
-@command{gawk} supplies the default shared library platform suffix if it is not
-present in the name of the library.
-If the name of your library is @file{mylib.so}, you can simply type
-
-@example
-$ @kbd{gawk -l mylib -f myprog}
-@end example
-
-and @command{gawk} will do everything necessary to load in your library,
-and then call your @code{dl_load()} routine.
-
-You can always specify the library using an absolute pathname, in which
-case @command{gawk} will not use @env{AWKLIBPATH} to search for it.
-
-@node Sample Library
-@appendixsubsec Example: Directory and File Operation Built-ins
-@c STARTOFRANGE chdirg
-@cindex @code{chdir()} function@comma{} implementing in @command{gawk}
-@c STARTOFRANGE statg
-@cindex @code{stat()} function@comma{} implementing in @command{gawk}
-@c STARTOFRANGE filre
-@cindex files, information about@comma{} retrieving
-@c STARTOFRANGE dirch
-@cindex directories, changing
-
-Two useful functions that are not in @command{awk} are @code{chdir()}
-(so that an @command{awk} program can change its directory) and
-@code{stat()} (so that an @command{awk} program can gather information about
-a file).
-This @value{SECTION} implements these functions for @command{gawk} in an
-external extension library.
-
-@menu
-* Internal File Description:: What the new functions will do.
-* Internal File Ops:: The code for internal file operations.
-* Using Internal File Ops:: How to use an external extension.
-@end menu
-
-@node Internal File Description
-@appendixsubsubsec Using @code{chdir()} and @code{stat()}
-
-This @value{SECTION} shows how to use the new functions at the @command{awk}
-level once they've been integrated into the running @command{gawk}
-interpreter.
-Using @code{chdir()} is very straightforward. It takes one argument,
-the new directory to change to:
-
-@example
-@dots{}
-newdir = "/home/arnold/funstuff"
-ret = chdir(newdir)
-if (ret < 0) @{
- printf("could not change to %s: %s\n",
- newdir, ERRNO) > "/dev/stderr"
- exit 1
-@}
-@dots{}
-@end example
-
-The return value is negative if the @code{chdir} failed,
-and @code{ERRNO}
-(@pxref{Built-in Variables})
-is set to a string indicating the error.
-
-Using @code{stat()} is a bit more complicated.
-The C @code{stat()} function fills in a structure that has a fair
-amount of information.
-The right way to model this in @command{awk} is to fill in an associative
-array with the appropriate information:
-
-@c broke printf for page breaking
-@example
-file = "/home/arnold/.profile"
-fdata[1] = "x" # force `fdata' to be an array
-ret = stat(file, fdata)
-if (ret < 0) @{
- printf("could not stat %s: %s\n",
- file, ERRNO) > "/dev/stderr"
- exit 1
-@}
-printf("size of %s is %d bytes\n", file, fdata["size"])
-@end example
-
-The @code{stat()} function always clears the data array, even if
-the @code{stat()} fails. It fills in the following elements:
-
-@table @code
-@item "name"
-The name of the file that was @code{stat()}'ed.
-
-@item "dev"
-@itemx "ino"
-The file's device and inode numbers, respectively.
-
-@item "mode"
-The file's mode, as a numeric value. This includes both the file's
-type and its permissions.
-
-@item "nlink"
-The number of hard links (directory entries) the file has.
-
-@item "uid"
-@itemx "gid"
-The numeric user and group ID numbers of the file's owner.
-
-@item "size"
-The size in bytes of the file.
-
-@item "blocks"
-The number of disk blocks the file actually occupies. This may not
-be a function of the file's size if the file has holes.
-
-@item "atime"
-@itemx "mtime"
-@itemx "ctime"
-The file's last access, modification, and inode update times,
-respectively. These are numeric timestamps, suitable for formatting
-with @code{strftime()}
-(@pxref{Built-in}).
-
-@item "pmode"
-The file's ``printable mode.'' This is a string representation of
-the file's type and permissions, such as what is produced by
-@samp{ls -l}---for example, @code{"drwxr-xr-x"}.
-
-@item "type"
-A printable string representation of the file's type. The value
-is one of the following:
-
-@table @code
-@item "blockdev"
-@itemx "chardev"
-The file is a block or character device (``special file'').
-
-@ignore
-@item "door"
-The file is a Solaris ``door'' (special file used for
-interprocess communications).
-@end ignore
-
-@item "directory"
-The file is a directory.
-
-@item "fifo"
-The file is a named-pipe (also known as a FIFO).
-
-@item "file"
-The file is just a regular file.
-
-@item "socket"
-The file is an @code{AF_UNIX} (``Unix domain'') socket in the
-filesystem.
-
-@item "symlink"
-The file is a symbolic link.
-@end table
-@end table
-
-Several additional elements may be present depending upon the operating
-system and the type of the file. You can test for them in your @command{awk}
-program by using the @code{in} operator
-(@pxref{Reference to Elements}):
-
-@table @code
-@item "blksize"
-The preferred block size for I/O to the file. This field is not
-present on all POSIX-like systems in the C @code{stat} structure.
-
-@item "linkval"
-If the file is a symbolic link, this element is the name of the
-file the link points to (i.e., the value of the link).
-
-@item "rdev"
-@itemx "major"
-@itemx "minor"
-If the file is a block or character device file, then these values
-represent the numeric device number and the major and minor components
-of that number, respectively.
-@end table
-
-@node Internal File Ops
-@appendixsubsubsec C Code for @code{chdir()} and @code{stat()}
-
-Here is the C code for these extensions. They were written for
-GNU/Linux. The code needs some more work for complete portability
-to other POSIX-compliant systems:@footnote{This version is edited
-slightly for presentation. See
-@file{extension/filefuncs.c} in the @command{gawk} distribution
-for the complete version.}
-
-@c break line for page breaking
-@example
-#include "awk.h"
-
-#include <sys/sysmacros.h>
-
-int plugin_is_GPL_compatible;
-
-/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */
-
-static NODE *
-do_chdir(int nargs)
-@{
- NODE *newdir;
- int ret = -1;
-
- if (do_lint && nargs != 1)
- lintwarn("chdir: called with incorrect number of arguments");
-
- newdir = get_scalar_argument(0, FALSE);
-@end example
-
-The file includes the @code{"awk.h"} header file for definitions
-for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>}
-for access to the @code{major()} and @code{minor}() macros.
-
-@cindex programming conventions, @command{gawk} internals
-By convention, for an @command{awk} function @code{foo}, the function that
-implements it is called @samp{do_foo}. The function should take
-a @samp{int} argument, usually called @code{nargs}, that
-represents the number of defined arguments for the function. The @code{newdir}
-variable represents the new directory to change to, retrieved
-with @code{get_scalar_argument()}. Note that the first argument is
-numbered zero.
-
-This code actually accomplishes the @code{chdir()}. It first forces
-the argument to be a string and passes the string value to the
-@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO}
-is updated.
-
-@example
- (void) force_string(newdir);
- ret = chdir(newdir->stptr);
- if (ret < 0)
- update_ERRNO_int(errno);
-@end example
-
-Finally, the function returns the return value to the @command{awk} level:
-
-@example
- return make_number((AWKNUM) ret);
-@}
-@end example
-
-The @code{stat()} built-in is more involved. First comes a function
-that turns a numeric mode into a printable representation
-(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity:
-
-@c break line for page breaking
-@example
-/* format_mode --- turn a stat mode field into something readable */
-
-static char *
-format_mode(unsigned long fmode)
-@{
- @dots{}
-@}
-@end example
-
-Next comes the @code{do_stat()} function. It starts with
-variable declarations and argument checking:
-
-@ignore
-Changed message for page breaking. Used to be:
- "stat: called with incorrect number of arguments (%d), should be 2",
-@end ignore
-@example
-/* do_stat --- provide a stat() function for gawk */
-
-static NODE *
-do_stat(int nargs)
-@{
- NODE *file, *array, *tmp;
- struct stat sbuf;
- int ret;
- NODE **aptr;
- char *pmode; /* printable mode */
- char *type = "unknown";
-
- if (do_lint && nargs > 2)
- lintwarn("stat: called with too many arguments");
-@end example
-
-Then comes the actual work. First, the function gets the arguments.
-Then, it always clears the array.
-The code use @code{lstat()} (instead of @code{stat()})
-to get the file information,
-in case the file is a symbolic link.
-If there's an error, it sets @code{ERRNO} and returns:
-
-@c comment made multiline for page breaking
-@example
- /* file is first arg, array to hold results is second */
- file = get_scalar_argument(0, FALSE);
- array = get_array_argument(1, FALSE);
-
- /* empty out the array */
- assoc_clear(array);
-
- /* lstat the file, if error, set ERRNO and return */
- (void) force_string(file);
- ret = lstat(file->stptr, & sbuf);
- if (ret < 0) @{
- update_ERRNO_int(errno);
- return make_number((AWKNUM) ret);
- @}
-@end example
-
-Now comes the tedious part: filling in the array. Only a few of the
-calls are shown here, since they all follow the same pattern:
-
-@example
- /* fill in the array */
- aptr = assoc_lookup(array, tmp = make_string("name", 4));
- *aptr = dupnode(file);
- unref(tmp);
-
- aptr = assoc_lookup(array, tmp = make_string("mode", 4));
- *aptr = make_number((AWKNUM) sbuf.st_mode);
- unref(tmp);
-
- aptr = assoc_lookup(array, tmp = make_string("pmode", 5));
- pmode = format_mode(sbuf.st_mode);
- *aptr = make_string(pmode, strlen(pmode));
- unref(tmp);
-@end example
-
-When done, return the @code{lstat()} return value:
-
-@example
-
- return make_number((AWKNUM) ret);
-@}
-@end example
-
-@cindex programming conventions, @command{gawk} internals
-Finally, it's necessary to provide the ``glue'' that loads the
-new function(s) into @command{gawk}. By convention, each library has
-a routine named @code{dl_load()} that does the job. The simplest way
-is to use the @code{dl_load_func} macro in @code{gawkapi.h}.
-
-And that's it! As an exercise, consider adding functions to
-implement system calls such as @code{chown()}, @code{chmod()},
-and @code{umask()}.
-
-@node Using Internal File Ops
-@appendixsubsubsec Integrating the Extensions
-
-@cindex @command{gawk}, interpreter@comma{} adding code to
-Now that the code is written, it must be possible to add it at
-runtime to the running @command{gawk} interpreter. First, the
-code must be compiled. Assuming that the functions are in
-a file named @file{filefuncs.c}, and @var{idir} is the location
-of the @command{gawk} include files,
-the following steps create
-a GNU/Linux shared library:
-
-@example
-$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c}
-$ @kbd{ld -o filefuncs.so -shared filefuncs.o}
-@end example
-
-@cindex @code{extension()} function (@command{gawk})
-Once the library exists, it is loaded by calling the @code{extension()}
-built-in function.
-This function takes two arguments: the name of the
-library to load and the name of a function to call when the library
-is first loaded. This function adds the new functions to @command{gawk}.
-It returns the value returned by the initialization function
-within the shared library:
-
-@example
-# file testff.awk
-BEGIN @{
- extension("./filefuncs.so", "dl_load")
-
- chdir(".") # no-op
-
- data[1] = 1 # force `data' to be an array
- print "Info for testff.awk"
- ret = stat("testff.awk", data)
- print "ret =", ret
- for (i in data)
- printf "data[\"%s\"] = %s\n", i, data[i]
- print "testff.awk modified:",
- strftime("%m %d %y %H:%M:%S", data["mtime"])
-
- print "\nInfo for JUNK"
- ret = stat("JUNK", data)
- print "ret =", ret
- for (i in data)
- printf "data[\"%s\"] = %s\n", i, data[i]
- print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"])
-@}
-@end example
-
-Here are the results of running the program:
-
-@example
-$ @kbd{gawk -f testff.awk}
-@print{} Info for testff.awk
-@print{} ret = 0
-@print{} data["size"] = 607
-@print{} data["ino"] = 14945891
-@print{} data["name"] = testff.awk
-@print{} data["pmode"] = -rw-rw-r--
-@print{} data["nlink"] = 1
-@print{} data["atime"] = 1293993369
-@print{} data["mtime"] = 1288520752
-@print{} data["mode"] = 33204
-@print{} data["blksize"] = 4096
-@print{} data["dev"] = 2054
-@print{} data["type"] = file
-@print{} data["gid"] = 500
-@print{} data["uid"] = 500
-@print{} data["blocks"] = 8
-@print{} data["ctime"] = 1290113572
-@print{} testff.awk modified: 10 31 10 12:25:52
-@print{}
-@print{} Info for JUNK
-@print{} ret = -1
-@print{} JUNK modified: 01 01 70 02:00:00
-@end example
-@c ENDOFRANGE filre
-@c ENDOFRANGE dirch
-@c ENDOFRANGE statg
-@c ENDOFRANGE chdirg
-@c ENDOFRANGE gladfgaw
-@c ENDOFRANGE adfugaw
-@c ENDOFRANGE fubadgaw
-
@node Future Extensions
@appendixsec Probable Future Extensions
@ignore
@@ -31055,12 +30686,8 @@ Following is a list of probable future changes visible at the
@c these are ordered by likelihood
@table @asis
-@item Loadable module interface
-It is not clear that the @command{awk}-level interface to the
-modules facility is as good as it should be. The interface needs to be
-redesigned, particularly taking namespace issues into account, as
-well as possibly including issues such as library search path order
-and versioning.
+@item Databases
+It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array.
@item @code{RECLEN} variable for fixed-length records
Along with @code{FIELDWIDTHS}, this would speed up the processing of
@@ -31068,9 +30695,6 @@ fixed-length records.
@code{PROCINFO["RS"]} would be @code{"RS"} or @code{"RECLEN"},
depending upon which kind of record processing is in effect.
-@item Databases
-It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array.
-
@item More @code{lint} warnings
There are more things that could be checked for portability.
@end table
@@ -31079,21 +30703,6 @@ Following is a list of probable improvements that will make @command{gawk}'s
source code easier to work with:
@table @asis
-@item Loadable module mechanics
-The current extension mechanism works
-(@pxref{Dynamic Extensions}),
-but is rather primitive. It requires a fair amount of manual work
-to create and integrate a loadable module.
-Nor is the current mechanism as portable as might be desired.
-The GNU @command{libtool} package provides a number of features that
-would make using loadable modules much easier.
-@command{gawk} should be changed to use @command{libtool}.
-
-@item Loadable module internals
-The API to its internals that @command{gawk} ``exports'' should be revised.
-Too many things are needlessly exposed. A new API should be designed
-and implemented to make module writing easier.
-
@item Better array subscript management
@command{gawk}'s management of array subscript storage could use revamping,
so that using the same value to index multiple arrays only