diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 3828 |
1 files changed, 2511 insertions, 1317 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index f0f74fef..2da2a246 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -88,9 +88,11 @@ @c some special symbols @iftex @set LEQ @math{@leq} +@set PI @math{@pi} @end iftex @ifnottex @set LEQ <= +@set PI @i{pi} @end ifnottex @ifnottex @@ -299,13 +301,17 @@ particular records in a file and perform operations upon them. * Library Functions:: A Library of @command{awk} Functions. * Sample Programs:: Many @command{awk} programs with complete explanations. -* Debugger:: The @code{dgawk} debugger. +* Debugger:: The @code{gawk} debugger. +* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with + @command{gawk}. +* Dynamic Extensions:: Adding new built-in functions to + @command{gawk}. * Language History:: The evolution of the @command{awk} language. * Installation:: Installing @command{gawk} under various operating systems. -* Notes:: Notes about @command{gawk} extensions and - possible future work. +* Notes:: Notes about adding things to @command{gawk} + and possible future work. * Basic Concepts:: A very quick introduction to programming concepts. * Glossary:: An explanation of some unfamiliar terms. @@ -360,9 +366,12 @@ particular records in a file and perform operations upon them. uses. * AWKPATH Variable:: Searching directories for @command{awk} programs. +* AWKLIBPATH Variable:: Searching directories for @command{awk} + shared libraries. * Other Environment Variables:: The environment variables. * Exit Status:: @command{gawk}'s exit status. * Include Files:: Including other files into your program. +* Loading Shared Libraries:: Loading shared libraries into your program. * Obsolete:: Obsolete Options and/or features. * Undocumented:: Undocumented Options and Features. * Regexp Usage:: How to Use Regular Expressions. @@ -402,6 +411,7 @@ particular records in a file and perform operations upon them. * Getline Notes:: Important things to know about @code{getline}. * Getline Summary:: Summary of @code{getline} Variants. +* Read Timeout:: Reading input with a timeout. * Command line directories:: What happens if you put a directory on the command line. * Print:: The @code{print} statement. @@ -583,7 +593,7 @@ particular records in a file and perform operations upon them. * Ordinal Functions:: Functions for using characters as numbers and vice versa. * Join Function:: A function to join an array into a string. -* Gettimeofday Function:: A function to get formatted times. +* Getlocaltime Function:: A function to get formatted times. * Data File Management:: Functions for managing command-line data files. * Filetrans Function:: A function for handling data file @@ -623,23 +633,51 @@ particular records in a file and perform operations upon them. * Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. -* Debugging:: Introduction to @command{dgawk}. -* Debugging Concepts:: Debugging In General. +* Debugging:: Introduction to @command{gawk} debugger. +* Debugging Concepts:: Debugging in General. * Debugging Terms:: Additional Debugging Concepts. * Awk Debugging:: Awk Debugging. -* Sample dgawk session:: Sample @command{dgawk} session. -* dgawk invocation:: @command{dgawk} Invocation. -* Finding The Bug:: Finding The Bug. -* List of Debugger Commands:: Main @command{dgawk} Commands. -* Breakpoint Control:: Control of breakpoints. -* Dgawk Execution Control:: Control of execution. -* Viewing And Changing Data:: Viewing and changing data. -* Dgawk Stack:: Dealing with the stack. -* Dgawk Info:: Obtaining information about the program and - the debugger state. -* Miscellaneous Dgawk Commands:: Miscellaneous Commands. -* Readline Support:: Readline Support. -* Dgawk Limitations:: Limitations and future plans. +* Sample Debugging Session:: Sample debugging session. +* Debugger Invocation:: How to Start the Debugger. +* Finding The Bug:: Finding the Bug. +* List of Debugger Commands:: Main debugger commands. +* Breakpoint Control:: Control of Breakpoints. +* Debugger Execution Control:: Control of Execution. +* Viewing And Changing Data:: Viewing and Changing Data. +* Execution Stack:: Dealing with the Stack. +* Debugger Info:: Obtaining Information about the Program and + the Debugger State. +* Miscellaneous Debugger Commands:: Miscellaneous Commands. +* Readline Support:: Readline support. +* Limitations:: Limitations and future plans. +* General Arithmetic:: An introduction to computer arithmetic. +* Floating Point Issues:: Stuff to know about floating-point numbers. +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not Abstract + Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. +* Integer Programming:: Effective integer programming. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +* Gawk and MPFR:: How @command{gawk} provides + arbitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point + numbers. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + @command{gawk}. +* Plugin License:: A note about licensing. +* Sample Library:: A example of new functions. +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. * V7/SVR3.1:: The major changes between V7 and System V Release 3.1. * SVR4:: Minor changes between System V Releases 3.1 @@ -690,24 +728,12 @@ particular records in a file and perform operations upon them. @command{gawk}. * New Ports:: Porting @command{gawk} to a new operating system. -* Dynamic Extensions:: Adding new built-in functions to - @command{gawk}. -* Internals:: A brief look at some @command{gawk} - internals. -* Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. +* Derived Files:: Why derived files are kept in the + @command{git} repository. * Future Extensions:: New features that may be implemented one day. * Basic High Level:: The high level view. * Basic Data Typing:: A very quick intro to data types. -* Floating Point Issues:: Stuff to know about floating-point numbers. -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. @end detailmenu @end menu @@ -1173,8 +1199,14 @@ provide many sample @command{awk} programs. Reading them allows you to see @command{awk} solving real problems. -@ref{Debugger}, describes the @command{awk} debugger, -@command{dgawk}. +@ref{Debugger}, describes the @command{awk} debugger. + +@ref{Arbitrary Precision Arithmetic}, +describes advanced arithmetic facilities provided by +@command{gawk}. + +@ref{Dynamic Extensions}, describes how to add new variables and +functions to @command{gawk} by writing extensions in C. @ref{Language History}, describes how the @command{awk} language has evolved since @@ -1192,8 +1224,7 @@ available @command{awk} implementations. @ref{Notes}, describes how to disable @command{gawk}'s extensions, as well as how to contribute new code to @command{gawk}, -how to write extension libraries, and some possible -future directions for @command{gawk} development. +and some possible future directions for @command{gawk} development. @ref{Basic Concepts}, provides some very cursory background material for those who @@ -1603,10 +1634,13 @@ has been and continues to be a pleasure working with this team of fine people. John Haque contributed the modifications to convert @command{gawk} -into a byte-code interpreter, including the debugger. Stephen Davies +into a byte-code interpreter, including the debugger, and the +additional modifications for support of arbitrary precision arithmetic. +Stephen Davies contributed to the effort to bring the byte-code changes into the mainstream code base. Efraim Yawitz contributed the initial text of @ref{Debugger}. +John Haque contributed the initial text of @ref{Arbitrary Precision Arithmetic}. @cindex Kernighan, Brian I would like to thank Brian Kernighan for invaluable assistance during the @@ -2913,6 +2947,7 @@ things in this @value{CHAPTER} that don't interest you right now. * Environment Variables:: The environment variables @command{gawk} uses. * Exit Status:: @command{gawk}'s exit status. * Include Files:: Including other files into your program. +* Loading Shared Libraries:: Loading shared libraries into your program. * Obsolete:: Obsolete Options and/or features. * Undocumented:: Undocumented Options and Features. @end menu @@ -3004,6 +3039,22 @@ This option may be given multiple times; the @command{awk} program consists of the concatenation the contents of each specified @var{source-file}. +@item -i @var{source-file} +@itemx --include @var{source-file} +@cindex @code{-i} option +@cindex @code{--include} option +@cindex @command{awk} programs, location of +Read @command{awk} source library from @var{source-file}. This option is +completely equivalent to using the @samp{@@include} directive inside +your program. This option is very +similar to the @option{-f} option, but there are two important differences. +First, when @option{-i} is used, the program source will not be loaded if it has +been previously loaded, whereas the @option{-f} will always load the file. +Second, because this option is intended to be used with code libraries, the +@command{awk} command does not recognize such files as constituting main program +input. Thus, after processing an @option{-i} argument, we still expect to +find the main source code via the @option{-f} option or on the command-line. + @item -v @var{var}=@var{val} @itemx --assign @var{var}=@var{val} @cindex @code{-v} option @@ -3115,6 +3166,19 @@ inadvertently use global variables that you meant to be local. (This is a particularly easy mistake to make with simple variable names like @code{i}, @code{j}, etc.) +@item -D@r{[}@var{file}@r{]} +@itemx --debug=@r{[}@var{file}@r{]} +@cindex @code{-D} option +@cindex @code{--debug} option +@cindex @command{awk} debugging, enabling +Enable debugging of @command{awk} programs +(@pxref{Debugging}). +By default, the debugger reads commands interactively from the terminal. +The optional @var{file} argument allows you to specify a file with a list +of commands for the debugger to execute non-interactively. +No space is allowed between the @option{-D} and @var{file}, if +@var{file} is supplied. + @item -e @var{program-text} @itemx --source @var{program-text} @cindex @code{-e} option @@ -3180,6 +3244,18 @@ for information about this option. Print a ``usage'' message summarizing the short and long style options that @command{gawk} accepts and then exit. +@item -l @var{lib} +@itemx --load @var{lib} +@cindex @code{-l} option +@cindex @code{--load} option +@cindex loading, library +Load a shared library @var{lib}. This searches for the library using the @env{AWKLIBPATH} +environment variable. The correct library suffix for your platform will be +supplied by default, so it need not be specified in the library name. +The library initialization routine should be named @code{dl_load()}. +An alternative is to use the @samp{@@load} keyword inside the program to load +a shared library. + @item -L @r{[}value@r{]} @itemx --lint@r{[}=value@r{]} @cindex @code{-l} option @@ -3203,6 +3279,14 @@ when eliminating problems pointed out by @option{--lint}, you should take care to search for all occurrences of each inappropriate construct. As @command{awk} programs are usually short, doing so is not burdensome. +@item -M +@itemx --bignum +@cindex @code{-M} option +@cindex @code{--bignum} option +Force arbitrary precision arithmetic on numbers. This option has no effect +if @command{gawk} is not compiled to use the GNU MPFR and MP libraries +(@pxref{Arbitrary Precision Arithmetic}). + @item -n @itemx --non-decimal-data @cindex @code{-n} option @@ -3226,6 +3310,18 @@ Use with care. Force the use of the locale's decimal point character when parsing numeric input data (@pxref{Locales}). +@item -o@r{[}@var{file}@r{]} +@itemx --pretty-print@r{[}=@var{file}@r{]} +@cindex @code{-o} option +@cindex @code{--pretty-print} option +@cindex @command{awk} enabling +Enable pretty-printing of @command{awk} programs. +By default, output program is created in a file named @file{awkprof.out}. +The optional @var{file} argument allows you to specify a different +@value{FN} for the output. +No space is allowed between the @option{-o} and @var{file}, if +@var{file} is supplied. + @item -O @itemx --optimize @cindex @code{--optimize} option @@ -3238,7 +3334,7 @@ maintainer hopes to add more optimizations over time. @itemx --profile@r{[}=@var{file}@r{]} @cindex @code{-p} option @cindex @code{--profile} option -@cindex @command{awk} programs, profiling, enabling +@cindex @command{awk} profiling, enabling Enable profiling of @command{awk} programs (@pxref{Profiling}). By default, profiles are created in a file named @file{awkprof.out}. @@ -3247,10 +3343,8 @@ The optional @var{file} argument allows you to specify a different No space is allowed between the @option{-p} and @var{file}, if @var{file} is supplied. -When run with @command{gawk}, the profile is just a ``pretty printed'' version -of the program. When run with @command{pgawk}, the profile contains execution -counts for each statement in the program in the left margin, and function -call counts for each function. +The profile contains execution counts for each statement in the program +in the left margin, and function call counts for each function. @item -P @itemx --posix @@ -3314,14 +3408,6 @@ This is now @command{gawk}'s default behavior. Nevertheless, this option remains both for backward compatibility, and for use in combination with the @option{--traditional} option. -@item -R @var{file} -@itemx --command=@var{file} -@cindex @code{-R} option -@cindex @code{--command} option -@command{dgawk} only. -Read @command{dgawk} debugger options and commands from @var{file}. -@xref{Dgawk Info}, for more information. - @item -S @itemx --sandbox @cindex @code{-S} option @@ -3547,6 +3633,8 @@ behaves. @menu * AWKPATH Variable:: Searching directories for @command{awk} programs. +* AWKLIBPATH Variable:: Searching directories for @command{awk} shared + libraries. * Other Environment Variables:: The environment variables. @end menu @@ -3564,7 +3652,8 @@ on the command-line with the @option{-f} option. In most @command{awk} implementations, you must supply a precise path name for each program file, unless the file is in the current directory. -But in @command{gawk}, if the @value{FN} supplied to the @option{-f} option +But in @command{gawk}, if the @value{FN} supplied to the @option{-f} +or @option{-i} options does not contain a @samp{/}, then @command{gawk} searches a list of directories (called the @dfn{search path}), one by one, looking for a file with the specified name. @@ -3586,13 +3675,16 @@ standard directory in the default path and then specified on the command line with a short @value{FN}. Otherwise, the full @value{FN} would have to be typed for each file. -By using both the @option{--source} and @option{-f} options, your command-line +By using the @option{-i} option, or the @option{--source} and @option{-f} options, your command-line @command{awk} programs can use facilities in @command{awk} library files (@pxref{Library Functions}). Path searching is not done if @command{gawk} is in compatibility mode. This is true for both @option{--traditional} and @option{--posix}. @xref{Options}. +If the source code is not found after the initial search, the path is searched +again after adding the default @samp{.awk} suffix to the filename. + @quotation NOTE To include the current directory in the path, either place @@ -3622,6 +3714,21 @@ sense: the @env{AWKPATH} environment variable is used to find the program source files. Once your program is running, all the files have been found, and @command{gawk} no longer needs to use @env{AWKPATH}. +@node AWKLIBPATH Variable +@subsection The @env{AWKLIBPATH} Environment Variable +@cindex @env{AWKLIBPATH} environment variable +@cindex directories, searching +@cindex search paths +@cindex search paths, for shared libraries +@cindex differences in @command{awk} and @command{gawk}, @code{AWKLIBPATH} environment variable + +The @env{AWKLIBPATH} environment variable is similar to the @env{AWKPATH} +variable, but it is used to search for shared libraries specified +with the @option{-l} option rather than for source files. If the library +is not found, the path is searched again after adding the appropriate +shared library suffix for the platform. For example, on GNU/Linux systems, +the suffix @samp{.so} is used. + @node Other Environment Variables @subsection Other Environment Variables @@ -3645,6 +3752,11 @@ Specifies the interval between connection retries, in milliseconds. On systems that do not support the @code{usleep()} system call, the value is rounded up to an integral number of seconds. + +@item GAWK_READ_TIMEOUT +Specifies the time, in milliseconds, for @command{gawk} to +wait for input before returning with an error. +@xref{Read Timeout}. @end table The environment variables in the following list are meant @@ -3718,8 +3830,9 @@ into smaller, more manageable pieces, and also lets you reuse common @command{aw code from various @command{awk} scripts. In other words, you can group together @command{awk} functions, used to carry out specific tasks, into external files. These files can be used just like function libraries, -using the @samp{@@include} keyword in conjunction with the @code{AWKPATH} -environment variable. +using the @samp{@@include} keyword in conjunction with the @env{AWKPATH} +environment variable. Note that source files may also be included +using the @option{-i} option. Let's see an example. We'll start with two (trivial) @command{awk} scripts, namely @@ -3825,6 +3938,41 @@ As mentioned in @ref{AWKPATH Variable}, the current directory is always searched first for source files, before searching in @env{AWKPATH}, and this also applies to files named with @samp{@@include}. +@node Loading Shared Libraries +@section Loading Shared Libraries Into Your Program + +This @value{SECTION} describes a feature that is specific to @command{gawk}. + +The @samp{@@load} keyword can be used to read external @command{awk} shared +libraries. This allows you to link in compiled code that may offer superior +performance and/or give you access to extended capabilities not supported +by the @command{awk} language. The @env{AWKLIBPATH} variable is used to +search for the shared library. Using @samp{@@load} is completely equivalent +to using the @option{-l} command-line option. + +If the shared library is not initially found in @env{AWKLIBPATH}, another +search is conducted after appending the platform's default shared library +suffix to the filename. For example, on GNU/Linux systems, the suffix +@samp{.so} is used. + +@example +$ @kbd{gawk '@@load "ordchr"; BEGIN @{print chr(65)@}'} +@print{} A +@end example + +@noindent +This is equivalent to the following example: + +@example +$ @kbd{gawk -lordchr 'BEGIN @{print chr(65)@}'} +@print{} A +@end example + +@noindent +For command-line usage, the @option{-l} option is more convenient, +but @samp{@@load} is useful for embedding inside an @command{awk} source file +that requires access to a shared library. + @node Obsolete @section Obsolete Options and/or Features @@ -5131,6 +5279,7 @@ used with it do not have to be named on the @command{awk} command line * Multiple Line:: Reading multi-line records. * Getline:: Reading files under explicit program control using the @code{getline} function. +* Read Timeout:: Reading input with a timeout. * Command line directories:: What happens if you put a directory on the command line. @end menu @@ -7224,6 +7373,7 @@ know that there is a string value to be assigned. Caveat Emptor. summarizes the eight variants of @code{getline}, listing which built-in variables are set by each one, and whether the variant is standard or a @command{gawk} extension. +Note: for each variant, @command{gawk} sets the @code{RT} built-in variable. @float Table,table-getline-variants @caption{getline Variants and What They Set} @@ -7243,6 +7393,110 @@ and whether the variant is standard or a @command{gawk} extension. @c ENDOFRANGE inex @c ENDOFRANGE infir +@node Read Timeout +@section Reading Input With A Timeout +@cindex timeout, reading input + +You may specify a timeout in milliseconds for reading input from a terminal, +pipe or two-way communication including, TCP/IP sockets. This can be done +on a per input, command or connection basis, by setting a special element +in the @code{PROCINFO} array: + +@example +PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds} +@end example + +When set, this will cause @command{gawk} to time out and return failure +if no data is available to read within the specified timeout period. +For example, a TCP client can decide to give up on receiving +any response from the server after a certain amount of time: + +@example +Service = "/inet/tcp/0/localhost/daytime" +PROCINFO[Service, "READ_TIMEOUT"] = 100 +if ((Service |& getline) > 0) + print $0 +else if (ERRNO != "") + print ERRNO +@end example + +Here is how to read interactively from the terminal@footnote{This assumes +that standard input is the keyboard} without waiting +for more than five seconds: + +@example +PROCINFO["/dev/stdin", "READ_TIMEOUT"] = 5000 +while ((getline < "/dev/stdin") > 0) + print $0 +@end example + +@command{gawk} will terminate the read operation if input does not +arrive after waiting for the timeout period, return failure +and set the @code{ERRNO} variable to an appropriate string value. +A negative or zero value for the timeout is the same as specifying +no timeout at all. + +A timeout can also be set for reading from the terminal in the implicit +loop that reads input records and matches them against patterns, +like so: + +@example +$ @kbd{ gawk 'BEGIN @{ PROCINFO["-", "READ_TIMEOUT"] = 5000 @}} +> @kbd{@{ print "You entered: " $0 @}'} +@kbd{gawk} +@print{} You entered: gawk +@end example + +In this case, failure to respond within five seconds results in the following +error message: + +@example +@error{} gawk: cmd. line:2: (FILENAME=- FNR=1) fatal: error reading input file `-': Connection timed out +@end example + +The timeout can be set or changed at any time, and will take effect on the +next attempt to read from the input device. In the following example, +we start with a timeout value of one second, and progressively +reduce it by one-tenth of a second until we wait indefinitely +for the input to arrive: + +@example +PROCINFO[Service, "READ_TIMEOUT"] = 1000 +while ((Service |& getline) > 0) @{ + print $0 + PROCINFO[S, "READ_TIMEOUT"] -= 100 +@} +@end example + +@quotation NOTE +You should not assume that the read operation will block +exactly after the tenth record has been printed. It is possible that +@command{gawk} will read and buffer more than one record's +worth of data the first time. Because of this, changing the value +of timeout like in the above example is not very useful. +@end quotation + +If the @code{PROCINFO} element is not present and the environment +variable @env{GAWK_READ_TIMEOUT} exists, +@command{gawk} uses its value to initialize the timeout value. +The exclusive use of the environment variable to specify timeout +has the disadvantage of not being able to control it +on a per command or connection basis. + +@command{gawk} considers a timeout event to be an error even though +the attempt to read from the underlying device may +succeed in a later attempt. This is a limitation, and it also +means that you cannot use this to multiplex input from +two or more sources. + +Assigning a timeout value prevents read operations from +blocking indefinitely. But bear in mind that there are other ways +@command{gawk} can stall waiting for an input device to be ready. +A network client can sometimes take a long time to establish +a connection before it can start reading any data, +or the attempt to open a FIFO special file for reading can block +indefinitely until some other process opens it for writing. + @node Command line directories @section Directories On The Command Line @cindex directories, command line @@ -11361,9 +11615,9 @@ fatal error. @item If you have written extensions that modify the record handling (by inserting -an ``open hook''), you can invoke them at this point, before @command{gawk} +an ``input parser''), you can invoke them at this point, before @command{gawk} has started processing the file. (This is a @emph{very} advanced feature, -currently used only by the @uref{http://xmlgawk.sourceforge.net, XMLgawk project}.) +currently used only by the @uref{http://gawkextlib.sourceforge.net, @code{gawkextlib} project}.) @end itemize The @code{ENDFILE} rule is called when @command{gawk} has finished processing @@ -12525,6 +12779,18 @@ This is the output record separator. It is output at the end of every @code{print} statement. Its default value is @code{"\n"}, the newline character. (@xref{Output Separators}.) +@cindex @code{PREC} variable +@item PREC # +The working precision of arbitrary precision floating-point numbers, +53 by default (@pxref{Setting Precision}). + +@cindex @code{ROUNDMODE} variable +@item ROUNDMODE # +The rounding mode to use for arbitrary precision arithmetic on +numbers, by default @code{"N"} (@samp{roundTiesToEven} in +the IEEE-754 standard) +(@pxref{Setting Rounding Mode}). + @cindex @code{RS} variable @cindex separators, for records @cindex record separators @@ -12671,7 +12937,9 @@ does not affect the environment passed on to any programs that Some operating systems may not have environment variables. On such systems, the @code{ENVIRON} array is empty (except for @w{@code{ENVIRON["AWKPATH"]}}, -@pxref{AWKPATH Variable}). +@pxref{AWKPATH Variable} and +@w{@code{ENVIRON["AWKLIBPATH"]}}, +@pxref{AWKLIBPATH Variable}). @cindex @command{gawk}, @code{ERRNO} variable in @cindex @code{ERRNO} variable @@ -12747,6 +13015,16 @@ assigning a value to @code{NF} has the potential to affect to @code{NF} can be used to create or remove fields from the current record. @xref{Changing Fields}. +@cindex @code{FUNCTAB} array +@cindex @command{gawk}, @code{FUNCTAB} array in +@cindex differences in @command{awk} and @command{gawk}, @code{FUNCTAB} variable +@item FUNCTAB # +An array whose indices are the names of all the user-defined +or extension functions in the program. +@strong{NOTE}: The array values cannot currently be used. +Also, you may not use the @code{delete} statement with the +@code{FUNCTAB} array. + @cindex @code{NR} variable @item NR The number of input records @command{awk} has processed since @@ -12776,6 +13054,34 @@ This is @code{"FIELDWIDTHS"} if field splitting with @code{FIELDWIDTHS} is in effect, or @code{"FPAT"} if field matching with @code{FPAT} is in effect. +@item PROCINFO["identifiers"] +A subarray, indexed by the names of all identifiers used in the +text of the AWK program. For each identifier, the value of the element is one of the following: + +@table @code +@item "array" +The identifier is an array. + +@item "extension" +The identifier is an extension function loaded via +@code{@@load}. + +@item "scalar" +The identifier is a scalar. + +@item "untyped" +The identifier is untyped (could be used as a scalar or array, +@command{gawk} doesn't know yet). + +@item "user" +The identifier is a user-defined function. +@end table + +@noindent +The values indicate what @command{gawk} knows about the identifiers +after it has finished parsing the program; they are @emph{not} updated +while the program runs. + @item PROCINFO["gid"] The value of the @code{getgid()} system call. @@ -12808,6 +13114,25 @@ The value of the @code{getuid()} system call. The version of @command{gawk}. @end table +The following additional elements in the array +are available to provide information about the MPFR and GMP libraries +if your version of @command{gawk} supports arbitrary precision numbers +(@pxref{Arbitrary Precision Arithmetic}): + +@table @code +@item PROCINFO["mpfr_version"] +The version of the GNU MPFR library. + +@item PROCINFO["gmp_version"] +The version of the GNU MP library. + +@item PROCINFO["prec_max"] +The maximum precision supported by MPFR. + +@item PROCINFO["prec_min"] +The minimum precision required by MPFR. +@end table + On some systems, there may be elements in the array, @code{"group1"} through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of supplementary groups that the process has. Use the @code{in} operator @@ -12855,6 +13180,57 @@ In other @command{awk} implementations, or if @command{gawk} is in compatibility mode (@pxref{Options}), it is not special. + +@cindex @command{gawk}, @code{SYMTAB} array in +@cindex @code{SYMTAB} array +@cindex differences in @command{awk} and @command{gawk}, @code{SYMTAB} variable +@item SYMTAB # +An array whose indices are the names of all currently defined +global variables and arrays in the program. The array may be used +for indirect access to read or write the value of a variable: + +@example +foo = 5 +SYMTAB["foo"] = 4 +print foo # prints 4 +@end example + +@noindent +The @code{isarray()} function (@pxref{Type Functions}) may be used to test +if an element in @code{SYMTAB} is an array. +Also, you may not use the @code{delete} statement with the +@code{SYMTAB} array. + +You may use an index for @code{SYMTAB} that is not a predefined identifer: + +@example +SYMTAB["xxx"] = 5 +print SYMTAB["xxx"] +@end example + +@noindent +This works as expected: in this case @code{SYMTAB} acts just like +a regular array. The only difference is that you can't then delete +@code{SYMTAB["xxx"]}. + +The @code{SYMTAB} array is more interesting than it looks. Andrew Schorr +points out that it effectively gives @command{awk} data pointers. Consider his +example: + +@example +# Indirect multiply of any variable by amount, return result + +function multiply(variable, amount) +@{ + return SYMTAB[variable] *= amount +@} +@end example + +@quotation NOTE +In order to avoid severe time-travel paradoxes@footnote{Not to mention difficult +implementation issues.}, neither @code{FUNCTAB} nor @code{SYMTAB} +are available as elements within the @code{SYMTAB} array. +@end quotation @end table @c ENDOFRANGE bvconi @c ENDOFRANGE vbconi @@ -16253,8 +16629,8 @@ bitwise operations just described. They are: @cindex @command{gawk}, bitwise operations in @table @code @cindex @code{and()} function (@command{gawk}) -@item and(@var{v1}, @var{v2}) -Return the bitwise AND of the values provided by @var{v1} and @var{v2}. +@item and(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) +Return the bitwise AND of the arguments. There must be at least two. @cindex @code{compl()} function (@command{gawk}) @item compl(@var{val}) @@ -16265,16 +16641,16 @@ Return the bitwise complement of @var{val}. Return the value of @var{val}, shifted left by @var{count} bits. @cindex @code{or()} function (@command{gawk}) -@item or(@var{v1}, @var{v2}) -Return the bitwise OR of the values provided by @var{v1} and @var{v2}. +@item or(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) +Return the bitwise OR of the arguments. There must be at least two. @cindex @code{rshift()} function (@command{gawk}) @item rshift(@var{val}, @var{count}) Return the value of @var{val}, shifted right by @var{count} bits. @cindex @code{xor()} function (@command{gawk}) -@item xor(@var{v1}, @var{v2}) -Return the bitwise XOR of the values provided by @var{v1} and @var{v2}. +@item xor(@var{v1}, @var{v2} @r{[}, @r{@dots{}]}) +Return the bitwise XOR of the arguments. There must be at least two. @end table For all of these functions, first the double precision floating-point value is @@ -18537,7 +18913,7 @@ Running the program produces the following output: @example -$ @kbd{gawk -vPOS=1 -F: -f sort.awk /etc/passwd} +$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd} @print{} adm:x:3:4:adm:/var/adm:/sbin/nologin @print{} apache:x:48:48:Apache:/var/www:/sbin/nologin @print{} avahi:x:70:70:Avahi daemon:/:/sbin/nologin @@ -19029,40 +19405,32 @@ extensive examples. @cindex @command{awk} programs, profiling @c STARTOFRANGE proawk @cindex profiling @command{awk} programs -@c STARTOFRANGE pgawk -@cindex @command{pgawk} program -@cindex profiling @command{gawk}, See @command{pgawk} program - -You may produce execution -traces of your @command{awk} programs. -This is done with a specially compiled version of @command{gawk}, -called @command{pgawk} (``profiling @command{gawk}''). - +@cindex profiling @command{gawk} @cindex @code{awkprof.out} file @cindex files, @code{awkprof.out} -@cindex @command{pgawk} program, @code{awkprof.out} file -@command{pgawk} is identical in every way to @command{gawk}, except that when -it has finished running, it creates a profile of your program in a file -named @file{awkprof.out}. -Because it is profiling, it also executes up to 45% slower than + +You may produce execution traces of your @command{awk} programs. +This is done by passing the option @option{--profile} to @command{gawk}. +When @command{gawk} has finished running, it creates a profile of your program in a file +named @file{awkprof.out}. Because it is profiling, it also executes up to 45% slower than @command{gawk} normally does. @cindex @code{--profile} option As shown in the following example, the @option{--profile} option can be used to change the name of the file -where @command{pgawk} will write the profile: +where @command{gawk} will write the profile: @example -pgawk --profile=myprog.prof -f myprog.awk data1 data2 +gawk --profile=myprog.prof -f myprog.awk data1 data2 @end example @noindent -In the above example, @command{pgawk} places the profile in +In the above example, @command{gawk} places the profile in @file{myprog.prof} instead of in @file{awkprof.out}. -Here is a sample -session showing a simple @command{awk} program, its input data, and the -results from running @command{pgawk}. First, the @command{awk} program: +Here is a sample session showing a simple @command{awk} program, its input data, and the +results from running @command{gawk} with the @option{--profile} option. +First, the @command{awk} program: @example BEGIN @{ print "First BEGIN rule" @} @@ -19102,12 +19470,12 @@ foo junk @end example -Here is the @file{awkprof.out} that results from running @command{pgawk} -on this program and data (this example also illustrates that @command{awk} +Here is the @file{awkprof.out} that results from running the @command{gawk} +profiler on this program and data (this example also illustrates that @command{awk} programmers sometimes have to work late): -@cindex @code{BEGIN} pattern, @command{pgawk} program -@cindex @code{END} pattern, @command{pgawk} program +@cindex @code{BEGIN} pattern +@cindex @code{END} pattern @example # gawk profile, created Sun Aug 13 00:00:15 2000 @@ -19199,15 +19567,15 @@ keyword indicates how many times the function was called. The counts next to the statements in the body show how many times those statements were executed. -@cindex @code{@{@}} (braces), @command{pgawk} program -@cindex braces (@code{@{@}}), @command{pgawk} program +@cindex @code{@{@}} (braces) +@cindex braces (@code{@{@}}) @item The layout uses ``K&R'' style with TABs. Braces are used everywhere, even when the body of an @code{if}, @code{else}, or loop is only a single statement. -@cindex @code{()} (parentheses), @command{pgawk} program -@cindex parentheses @code{()}, @command{pgawk} program +@cindex @code{()} (parentheses) +@cindex parentheses @code{()} @item Parentheses are used only where needed, as indicated by the structure of the program and the precedence rules. @@ -19230,16 +19598,16 @@ Similarly, if the target of a redirection isn't a scalar, it gets parenthesized. @item -@command{pgawk} supplies leading comments in +@command{gawk} supplies leading comments in front of the @code{BEGIN} and @code{END} rules, the pattern/action rules, and the functions. @end itemize The profiled version of your program may not look exactly like what you -typed when you wrote it. This is because @command{pgawk} creates the +typed when you wrote it. This is because @command{gawk} creates the profiled version by ``pretty printing'' its internal representation of -the program. The advantage to this is that @command{pgawk} can produce +the program. The advantage to this is that @command{gawk} can produce a standard representation. The disadvantage is that all source-code comments are lost, as are the distinctions among multiple @code{BEGIN}, @code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as: @@ -19261,15 +19629,16 @@ come out as: which is correct, but possibly surprising. @cindex profiling @command{awk} programs, dynamically -@cindex @command{pgawk} program, dynamic profiling +@cindex @command{gawk} program, dynamic profiling Besides creating profiles when a program has completed, -@command{pgawk} can produce a profile while it is running. +@command{gawk} can produce a profile while it is running. This is useful if your @command{awk} program goes into an infinite loop and you want to see what has been executed. -To use this feature, run @command{pgawk} in the background: +To use this feature, run @command{gawk} with the @option{--profile} +option in the background: @example -$ @kbd{pgawk -f myprog &} +$ @kbd{gawk --profile -f myprog &} [1] 13992 @end example @@ -19280,7 +19649,7 @@ $ @kbd{pgawk -f myprog &} @noindent The shell prints a job number and process ID number; in this case, 13992. Use the @command{kill} command to send the @code{USR1} signal -to @command{pgawk}: +to @command{gawk}: @example $ @kbd{kill -USR1 13992} @@ -19288,8 +19657,8 @@ $ @kbd{kill -USR1 13992} @noindent As usual, the profiled version of the program is written to -@file{awkprof.out}, or to a different file if you use the @option{--profile} -option. +@file{awkprof.out}, or to a different file if one specified with +the @option{--profile} option. Along with the regular profile, as shown earlier, the profile includes a trace of any active functions: @@ -19303,7 +19672,7 @@ includes a trace of any active functions: # -- main -- @end example -You may send @command{pgawk} the @code{USR1} signal as many times as you like. +You may send @command{gawk} the @code{USR1} signal as many times as you like. Each time, the profile and function call trace are appended to the output profile file. @@ -19311,7 +19680,7 @@ profile file. @cindex @code{SIGHUP} signal @cindex signals, @code{HUP}/@code{SIGHUP} If you use the @code{HUP} signal instead of the @code{USR1} signal, -@command{pgawk} produces the profile and the function call trace and then exits. +@command{gawk} produces the profile and the function call trace and then exits. @cindex @code{INT} signal (MS-Windows) @cindex @code{SIGINT} signal (MS-Windows) @@ -19319,21 +19688,20 @@ If you use the @code{HUP} signal instead of the @code{USR1} signal, @cindex @code{QUIT} signal (MS-Windows) @cindex @code{SIGQUIT} signal (MS-Windows) @cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows) -When @command{pgawk} runs on MS-Windows systems, it uses the +When @command{gawk} runs on MS-Windows systems, it uses the @code{INT} and @code{QUIT} signals for producing the profile and, in -the case of the @code{INT} signal, @command{pgawk} exits. This is +the case of the @code{INT} signal, @command{gawk} exits. This is because these systems don't support the @command{kill} command, so the only signals you can deliver to a program are those generated by the keyboard. The @code{INT} signal is generated by the @kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the @code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key. -Finally, regular @command{gawk} also accepts the @option{--profile} option. +Finally, @command{gawk} also accepts another option @option{--pretty-print}. When called this way, @command{gawk} ``pretty prints'' the program into @file{awkprof.out}, without any execution counts. @c ENDOFRANGE advgaw @c ENDOFRANGE gawadv -@c ENDOFRANGE pgawk @c ENDOFRANGE awkp @c ENDOFRANGE proawk @@ -19539,7 +19907,7 @@ programming use. * Ordinal Functions:: Functions for using characters as numbers and vice versa. * Join Function:: A function to join an array into a string. -* Gettimeofday Function:: A function to get formatted times. +* Getlocaltime Function:: A function to get formatted times. @end menu @node Strtonum Function @@ -20064,7 +20432,7 @@ be nice if @command{awk} had an assignment operator for concatenation. The lack of an explicit operator for concatenation makes string operations more difficult than they really need to be.} -@node Gettimeofday Function +@node Getlocaltime Function @subsection Managing the Time of Day @cindex libraries of @command{awk} functions, managing, time @@ -20078,14 +20446,14 @@ in human readable form. While @code{strftime()} is extensive, the control formats are not necessarily easy to remember or intuitively obvious when reading a program. -The following function, @code{gettimeofday()}, populates a user-supplied array +The following function, @code{getlocaltime()}, populates a user-supplied array with preformatted time information. It returns a string with the current time formatted in the same way as the @command{date} utility: -@cindex @code{gettimeofday()} user-defined function +@cindex @code{getlocaltime()} user-defined function @example @c file eg/lib/gettime.awk -# gettimeofday.awk --- get the time of day in a usable format +# getlocaltime.awk --- get the time of day in a usable format @c endfile @ignore @c file eg/lib/gettime.awk @@ -20118,7 +20486,7 @@ time formatted in the same way as the @command{date} utility: # time["weeknum"] -- week number, Sunday first day # time["altweeknum"] -- week number, Monday first day -function gettimeofday(time, ret, now, i) +function getlocaltime(time, ret, now, i) @{ # get time once, avoids unnecessary system calls now = systime() @@ -20160,7 +20528,7 @@ The string indices are easier to use and read than the various formats required by @code{strftime()}. The @code{alarm} program presented in @ref{Alarm Program}, uses this function. -A more general design for the @code{gettimeofday()} function would have +A more general design for the @code{getlocaltime()} function would have allowed the user to supply an optional timestamp value to use instead of the current time. @@ -23452,8 +23820,8 @@ it prints the message on the standard output. In addition, you can give it the number of times to repeat the message as well as a delay between repetitions. -This program uses the @code{gettimeofday()} function from -@ref{Gettimeofday Function}. +This program uses the @code{getlocaltime()} function from +@ref{Getlocaltime Function}. All the work is done in the @code{BEGIN} rule. The first part is argument checking and setting of defaults: the delay, the count, and the message to @@ -23472,7 +23840,7 @@ Here is the program: @c file eg/prog/alarm.awk # alarm.awk --- set an alarm # -# Requires gettimeofday() library function +# Requires getlocaltime() library function @c endfile @ignore @c file eg/prog/alarm.awk @@ -23544,7 +23912,7 @@ is how long to wait before setting off the alarm: minute = atime[2] + 0 # force numeric # get current broken down time - gettimeofday(now) + getlocaltime(now) # if time given is 12-hour hours and it's after that # hour, e.g., `alarm 5:30' at 9 a.m. means 5:30 p.m., @@ -25215,41 +25583,41 @@ BEGIN { @c FIXME: Add more indexing. @node Debugger -@chapter @command{dgawk}: The @command{awk} Debugger -@cindex @command{dgawk} +@chapter Debugging @command{awk} Programs +@cindex debugging @command{awk} programs It would be nice if computer programs worked perfectly the first time they were run, but in real life, this rarely happens for programs of any complexity. Thus, most programming languages have facilities available for ``debugging'' programs, and now @command{awk} is no exception. -The @command{dgawk} debugger is purposely modeled after +The @command{gawk} debugger is purposely modeled after @uref{http://www.gnu.org/software/gdb/, the GNU Debugger (GDB)} command-line debugger. If you are familiar with GDB, learning -@command{dgawk} is easy. +how to use @command{gawk} for debugging your program is easy. @menu -* Debugging:: Introduction to @command{dgawk}. -* Sample dgawk session:: Sample @command{dgawk} session. -* List of Debugger Commands:: Main @command{dgawk} Commands. -* Readline Support:: Readline Support. -* Dgawk Limitations:: Limitations and future plans. +* Debugging:: Introduction to @command{gawk} debugger. +* Sample Debugging Session:: Sample debugging session. +* List of Debugger Commands:: Main debugger commands. +* Readline Support:: Readline support. +* Limitations:: Limitations and future plans. @end menu @node Debugging -@section Introduction to @command{dgawk} +@section Introduction to @command{gawk} Debugger This @value{SECTION} introduces debugging in general and begins the discussion of debugging in @command{gawk}. @menu -* Debugging Concepts:: Debugging In General. +* Debugging Concepts:: Debugging in General. * Debugging Terms:: Additional Debugging Concepts. * Awk Debugging:: Awk Debugging. @end menu @node Debugging Concepts -@subsection Debugging In General +@subsection Debugging in General (If you have used debuggers in other languages, you may want to skip ahead to the next section on the specific features of the @command{awk} @@ -25295,8 +25663,7 @@ functional program that you or someone else wrote). @subsection Additional Debugging Concepts Before diving in to the details, we need to introduce several -important concepts that apply to just about all debuggers, including -@command{dgawk}. +important concepts that apply to just about all debuggers. The following list defines terms used throughout the rest of this @value{CHAPTER}. @@ -25315,7 +25682,7 @@ that contains the function's parameters, local variables, and return value, as well as any other ``bookkeeping'' information needed to manage the call stack. This data area is termed a @dfn{stack frame}. -@command{gawk} also follows this model, and @command{dgawk} gives you +@command{gawk} also follows this model, and gives you access to the call stack and to each stack frame. You can see the call stack, as well as from where each function on the stack was invoked. Commands that print the call stack print information about @@ -25360,48 +25727,48 @@ each line of @command{awk} code. The debugger provides the opportunity to look at the individual primitive instructions carried out by the higher-level @command{awk} commands. -@node Sample dgawk session -@section Sample @command{dgawk} session +@node Sample Debugging Session +@section Sample Debugging Session -In order to illustrate the use of @command{dgawk}, let's look at a sample +In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample debugging session. We will use the @command{awk} implementation of the POSIX @command{uniq} command described earlier (@pxref{Uniq Program}) as our example. @menu -* dgawk invocation:: @command{dgawk} Invocation. -* Finding The Bug:: Finding The Bug. +* Debugger Invocation:: How to Start the Debugger. +* Finding The Bug:: Finding the Bug. @end menu -@node dgawk invocation -@subsection @command{dgawk} Invocation +@node Debugger Invocation +@subsection How to Start the Debugger -Starting @command{dgawk} is exactly like running @command{awk}. The -file(s) containing the program and any supporting code are given on the -command line as arguments to one or more @option{-f} options. -(@command{dgawk} is not designed to debug command-line -programs, only programs contained in files.) In our case, -we call @command{dgawk} like this: +Starting the debugger is almost exactly like running @command{awk}, except you have to +pass an additional option @option{--debug} or the corresponding short option @option{-D}. +The file(s) containing the program and any supporting code are given on the command +line as arguments to one or more @option{-f} options. (@command{gawk} is not designed +to debug command-line programs, only programs contained in files.) In our case, +we invoke the debugger like this: @example -$ @kbd{dgawk -f getopt.awk -f join.awk -f uniq.awk inputfile} +$ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk inputfile} @end example @noindent where both @file{getopt.awk} and @file{uniq.awk} are in @env{$AWKPATH}. (Experienced users of GDB or similar debuggers should note that this syntax is slightly different from what they are used to. -With @command{dgawk}, the arguments for running the program are given +With @command{gawk} debugger, the arguments for running the program are given in the command line to the debugger rather than as part of the @code{run} command at the debugger prompt.) Instead of immediately running the program on @file{inputfile}, as -@command{gawk} would ordinarily do, @command{dgawk} merely loads all +@command{gawk} would ordinarily do, the debugger merely loads all the program source files, compiles them internally, and then gives us a prompt: @example -dgawk> +gawk> @end example @noindent @@ -25409,7 +25776,7 @@ from which we can issue commands to the debugger. At this point, no code has been executed. @node Finding The Bug -@subsection Finding The Bug +@subsection Finding the Bug Let's say that we are having a problem using (a faulty version of) @file{uniq.awk} in the ``field-skipping'' mode, and it doesn't seem to be @@ -25445,7 +25812,7 @@ a breakpoint in @file{uniq.awk} is at the beginning of the function the breakpoint, use the @code{b} (breakpoint) command: @example -dgawk> @kbd{b are_equal} +gawk> @kbd{b are_equal} @print{} Breakpoint 1 set at file `awklib/eg/prog/uniq.awk', line 64 @end example @@ -25454,22 +25821,22 @@ Now type @samp{r} or @samp{run} and the program runs until it hits the breakpoint for the first time: @example -dgawk> @kbd{r} +gawk> @kbd{r} @print{} Starting program: @print{} Stopping in Rule ... @print{} Breakpoint 1, are_equal(n, m, clast, cline, alast, aline) at `awklib/eg/prog/uniq.awk':64 @print{} 64 if (fcount == 0 && charcount == 0) -dgawk> +gawk> @end example Now we can look at what's going on inside our program. First of all, let's see how we got to where we are. At the prompt, we type @samp{bt} -(short for ``backtrace''), and @command{dgawk} responds with a +(short for ``backtrace''), and the debugger responds with a listing of the current stack frames: @example -dgawk> @kbd{bt} +gawk> @kbd{bt} @print{} #0 are_equal(n, m, clast, cline, alast, aline) at `awklib/eg/prog/uniq.awk':69 @print{} #1 in main() at `awklib/eg/prog/uniq.awk':89 @@ -25484,11 +25851,11 @@ the key to finding the source of the problem.) Now that we're in @code{are_equal()}, we can start looking at the values of some variables. Let's say we type @samp{p n} (@code{p} is short for ``print''). We would expect to see the value of -@code{n}, a parameter to @code{are_equal()}. Actually, @command{dgawk} +@code{n}, a parameter to @code{are_equal()}. Actually, the debugger gives us: @example -dgawk> @kbd{p n} +gawk> @kbd{p n} @print{} n = untyped variable @end example @@ -25499,7 +25866,7 @@ function was called without arguments (@pxref{Function Calls}). A more useful variable to display might be the current record: @example -dgawk> @kbd{p $0} +gawk> @kbd{p $0} @print{} $0 = string ("gawk is a wonderful program!") @end example @@ -25508,7 +25875,7 @@ This might be a bit puzzling at first since this is the second line of our test input above. Let's look at @code{NR}: @example -dgawk> @kbd{p NR} +gawk> @kbd{p NR} @print{} NR = number (2) @end example @@ -25527,7 +25894,7 @@ NR == 1 @{ OK, let's just check that that rule worked correctly: @example -dgawk> @kbd{p last} +gawk> @kbd{p last} @print{} last = string ("awk is a wonderful program!") @end example @@ -25538,7 +25905,7 @@ be inside this function. To investigate further, we must begin @samp{n} (for ``next''): @example -dgawk> @kbd{n} +gawk> @kbd{n} @print{} 67 if (fcount > 0) @{ @end example @@ -25558,9 +25925,9 @@ Continuing to step, we now get to the splitting of the current and last records: @example -dgawk> @kbd{n} +gawk> @kbd{n} @print{} 68 n = split(last, alast) -dgawk> @kbd{n} +gawk> @kbd{n} @print{} 69 m = split($0, aline) @end example @@ -25568,7 +25935,7 @@ At this point, we should be curious to see what our records were split into, so we try to look: @example -dgawk> @kbd{p n m alast aline} +gawk> @kbd{p n m alast aline} @print{} n = number (5) @print{} m = number (5) @print{} alast = array, 5 elements @@ -25587,7 +25954,7 @@ inside the array? The first choice would be to use subscripts: @example -dgawk> @kbd{p alast[0]} +gawk> @kbd{p alast[0]} @print{} "0" not in array `alast' @end example @@ -25595,16 +25962,16 @@ dgawk> @kbd{p alast[0]} Oops! @example -dgawk> @kbd{p alast[1]} +gawk> @kbd{p alast[1]} @print{} alast["1"] = string ("awk") @end example This would be kind of slow for a 100-member array, though, so -@command{dgawk} provides a shortcut (reminiscent of another language +@command{gawk} provides a shortcut (reminiscent of another language not to be mentioned): @example -dgawk> @kbd{p @@alast} +gawk> @kbd{p @@alast} @print{} alast["1"] = string ("awk") @print{} alast["2"] = string ("is") @print{} alast["3"] = string ("a") @@ -25616,9 +25983,9 @@ It looks like we got this far OK. Let's take another step or two: @example -dgawk> @kbd{n} +gawk> @kbd{n} @print{} 70 clast = join(alast, fcount, n) -dgawk> @kbd{n} +gawk> @kbd{n} @print{} 71 cline = join(aline, fcount, m) @end example @@ -25628,7 +25995,7 @@ the virtual record to compare, and if the first field was numbered zero, this would work. Let's look at what we've got: @example -dgawk> @kbd{p cline clast} +gawk> @kbd{p cline clast} @print{} cline = string ("gawk is a wonderful program!") @print{} clast = string ("awk is a wonderful program!") @end example @@ -25637,10 +26004,10 @@ Hey, those look pretty familiar! They're just our original, unaltered, input records. A little thinking (the human brain is still the best debugging tool), and we realize that we were off by one! -We get out of @command{dgawk}: +We get out of the debugger: @example -dgawk> @kbd{q} +gawk> @kbd{q} @print{} The program is running. Exit anyway (y/n)? @kbd{y} @end example @@ -25656,9 +26023,9 @@ cline = join(aline, fcount+1, m) and problem solved! @node List of Debugger Commands -@section Main @command{dgawk} Commands +@section Main Debugger Commands -The @command{dgawk} command set can be divided into the +The @command{gawk} debugger command set can be divided into the following categories: @itemize @bullet{} @@ -25685,24 +26052,24 @@ Miscellaneous Each of these are discussed in the following subsections. In the following descriptions, commands which may be abbreviated show the abbreviation on a second description line. -A @command{dgawk} command name may also be truncated if that partial -name is unambiguous. @command{dgawk} has the built-in capability to +A debugger command name may also be truncated if that partial +name is unambiguous. The debugger has the built-in capability to automatically repeat the previous command when just hitting @key{Enter}. This works for the commands @code{list}, @code{next}, @code{nexti}, @code{step}, @code{stepi} and @code{continue} executed without any argument. @menu -* Breakpoint Control:: Control of breakpoints. -* Dgawk Execution Control:: Control of execution. -* Viewing And Changing Data:: Viewing and changing data. -* Dgawk Stack:: Dealing with the stack. -* Dgawk Info:: Obtaining information about the program and - the debugger state. -* Miscellaneous Dgawk Commands:: Miscellaneous Commands. +* Breakpoint Control:: Control of Breakpoints. +* Debugger Execution Control:: Control of Execution. +* Viewing And Changing Data:: Viewing and Changing Data. +* Execution Stack:: Dealing with the Stack. +* Debugger Info:: Obtaining Information about the Program and + the Debugger State. +* Miscellaneous Debugger Commands:: Miscellaneous Commands. @end menu @node Breakpoint Control -@subsection Control Of Breakpoints +@subsection Control of Breakpoints As we saw above, the first thing you probably want to do in a debugging session is to get your breakpoints set up, since otherwise your program @@ -25737,10 +26104,10 @@ Each breakpoint is assigned a number which can be used to delete it from the breakpoint list using the @code{delete} command. With a breakpoint, you may also supply a condition. This is an -@command{awk} expression (enclosed in double quotes) that @command{dgawk} +@command{awk} expression (enclosed in double quotes) that the debugger evaluates whenever the breakpoint is reached. If the condition is true, -then @command{dgawk} stops execution and prompts for a command. Otherwise, -@command{dgawk} continues executing the program. +then the debugger stops execution and prompts for a command. Otherwise, +it continues executing the program. @cindex debugger commands, @code{clear} @cindex @code{clear} debugger command @@ -25766,10 +26133,10 @@ Delete breakpoint(s) set at entry to function @var{function}. @cindex @code{condition} debugger command @item @code{condition} @var{n} @code{"@var{expression}"} Add a condition to existing breakpoint or watchpoint @var{n}. The -condition is an @command{awk} expression that @command{dgawk} evaluates +condition is an @command{awk} expression that the debugger evaluates whenever the breakpoint or watchpoint is reached. If the condition is true, then -@command{dgawk} stops execution and prompts for a command. Otherwise, -@command{dgawk} continues executing the program. If the condition expression is +the debugger stops execution and prompts for a command. Otherwise, +the debugger continues executing the program. If the condition expression is not specified, any existing condition is removed; i.e., the breakpoint or watchpoint is made unconditional. @@ -25825,7 +26192,7 @@ Set a temporary breakpoint (enabled for only one stop). The arguments are the same as for @code{break}. @end table -@node Dgawk Execution Control +@node Debugger Execution Control @subsection Control of Execution Now that your breakpoints are ready, you can start running the program @@ -25854,14 +26221,14 @@ in the list that resumes execution (e.g., @code{continue}) terminates the list For example: @example -dgawk> @kbd{commands} +gawk> @kbd{commands} > @kbd{silent} > @kbd{printf "A silent breakpoint; i = %d\n", i} > @kbd{info locals} > @kbd{set i = 10} > @kbd{continue} > @kbd{end} -dgawk> +gawk> @end example @cindex debugger commands, @code{c} (@code{continue}) @@ -25911,7 +26278,7 @@ and the caller of that frame becomes the innermost frame. @cindex @code{r} debugger command (alias for @code{run}) @item @code{run} @itemx @code{r} -Start/restart execution of the program. When restarting, @command{dgawk} +Start/restart execution of the program. When restarting, the debugger retains the current breakpoints, watchpoints, command history, automatic display variables, and debugger options. @@ -25934,7 +26301,7 @@ stopping, unless it encounters a breakpoint or watchpoint. @itemx @code{si} [@var{count}] Execute one (or @var{count}) instruction(s), stepping inside function calls. (For illustration of what is meant by an ``instruction'' in @command{gawk}, -see the output shown under @code{dump} in @ref{Miscellaneous Dgawk Commands}.) +see the output shown under @code{dump} in @ref{Miscellaneous Debugger Commands}.) @cindex debugger commands, @code{u} (@code{until}) @cindex debugger commands, @code{until} @@ -25962,7 +26329,7 @@ The value of the variable or field is displayed each time the program stops. Each variable added to the list is identified by a unique number: @example -dgawk> @kbd{display x} +gawk> @kbd{display x} @print{} 10: x = 1 @end example @@ -25999,7 +26366,7 @@ Print the value of a @command{gawk} variable or field. Fields must be referenced by constants: @example -dgawk> @kbd{print $3} +gawk> @kbd{print $3} @end example @noindent @@ -26041,16 +26408,16 @@ You can also set special @command{awk} variables, such as @code{FS}, @item @code{watch} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}] @itemx @code{w} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}] Add variable @var{var} (or field @code{$@var{n}}) to the watch list. -@command{dgawk} then stops whenever +The debugger then stops whenever the value of the variable or field changes. Each watched item is assigned a number which can be used to delete it from the watch list using the @code{unwatch} command. With a watchpoint, you may also supply a condition. This is an -@command{awk} expression (enclosed in double quotes) that @command{dgawk} +@command{awk} expression (enclosed in double quotes) that the debugger evaluates whenever the watchpoint is reached. If the condition is true, -then @command{dgawk} stops execution and prompts for a command. Otherwise, -@command{dgawk} continues executing the program. +then the debugger stops execution and prompts for a command. Otherwise, +@command{gawk} continues executing the program. @cindex debugger commands, @code{undisplay} @cindex @code{undisplay} debugger command @@ -26066,8 +26433,8 @@ watch list. @end table -@node Dgawk Stack -@subsection Dealing With The Stack +@node Execution Stack +@subsection Dealing with the Stack Whenever you run a program which contains any function calls, @command{gawk} maintains a stack of all of the function calls leading up @@ -26111,12 +26478,12 @@ Move @var{count} (default 1) frames up the stack toward the outermost frame. Then select and print the frame. @end table -@node Dgawk Info -@subsection Obtaining Information About The Program and The Debugger State +@node Debugger Info +@subsection Obtaining Information about the Program and the Debugger State Besides looking at the values of variables, there is often a need to get other sorts of information about the state of your program and of the -debugging environment itself. @command{dgawk} has one command which +debugging environment itself. The @command{gawk} debugger has one command which provides this information, appropriately called @code{info}. @code{info} is used with one of a number of arguments that tell it exactly what you want to know: @@ -26154,7 +26521,7 @@ Local variables of the selected frame. @item source The name of the current source file. Each time the program stops, the current source file is the file containing the current instruction. -When @command{dgawk} first starts, the current source file is the first file +When the debugger first starts, the current source file is the first file included via the @option{-f} option. The @samp{list @var{filename}:@var{lineno}} command can be used at any time to change the current source. @@ -26190,7 +26557,7 @@ The available options are: @c nested table @table @code @item history_size -The maximum number of lines to keep in the history file @file{./.dgawk_history}. +The maximum number of lines to keep in the history file @file{./.gawk_history}. The default is 100. @item listsize @@ -26202,14 +26569,14 @@ to standard output. An empty string (@code{""}) resets output to standard output. @item prompt -The debugger prompt. The default is @samp{@w{dgawk> }}. +The debugger prompt. The default is @samp{@w{gawk> }}. @item save_history @r{[}on @r{|} off@r{]} -Save command history to file @file{./.dgawk_history}. +Save command history to file @file{./.gawk_history}. The default is @code{on}. @item save_options @r{[}on @r{|} off@r{]} -Save current options to file @file{./.dgawkrc} upon exit. +Save current options to file @file{./.gawkrc} upon exit. The default is @code{on}. Options are read back in to the next session upon startup. @@ -26229,16 +26596,16 @@ Empty lines are ignored; they do @emph{not} repeat the last command. You can't restart the program by having more than one @code{run} command in the file. Also, the list of commands may include additional -@code{source} commands; however, @command{dgawk} will not source the +@code{source} commands; however, the @command{gawk} debugger will not source the same file more than once in order to avoid infinite recursion. In addition to, or instead of the @code{source} command, you can use -the @option{-R @var{file}} or @option{--command=@var{file}} command-line +the @option{-D @var{file}} or @option{--debug=@var{file}} command-line options to execute commands from a file non-interactively (@pxref{Options}. @end table -@node Miscellaneous Dgawk Commands +@node Miscellaneous Debugger Commands @subsection Miscellaneous Commands There are a few more commands which do not fit into the @@ -26256,7 +26623,7 @@ partial dump of Davide Brini's obfuscated code (@pxref{Signature Program}) demonstrates: @smallexample -dgawk> @kbd{dump} +gawk> @kbd{dump} @print{} # BEGIN @print{} @print{} [ 2:0x89faef4] Op_rule : [in_rule = BEGIN] [source_file = brini.awk] @@ -26305,7 +26672,7 @@ dgawk> @kbd{dump} @print{} [ :0x89fa3b0] Op_after_beginfile : @print{} [ :0x89fa388] Op_no_op : @print{} [ :0x89fa3c4] Op_after_endfile : -dgawk> +gawk> @end smallexample @cindex debugger commands, @code{h} (@code{help}) @@ -26314,7 +26681,7 @@ dgawk> @cindex @code{h} debugger command (alias for @code{help}) @item @code{help} @itemx @code{h} -Print a list of all of the @command{dgawk} commands with a short +Print a list of all of the @command{gawk} debugger commands with a short summary of their usage. @samp{help @var{command}} prints the information about the command @var{command}. @@ -26361,7 +26728,7 @@ function @var{function}. This command may change the current source file. Exit the debugger. Debugging is great fun, but sometimes we all have to tend to other obligations in life, and sometimes we find the bug, and are free to go on to the next one! As we saw above, if you are -running a program, @command{dgawk} warns you if you accidentally type +running a program, the debugger warns you if you accidentally type @samp{q} or @samp{quit}, to make sure you really want to quit. @cindex debugger commands, @code{trace} @@ -26380,7 +26747,7 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while @node Readline Support @section Readline Support -If @command{dgawk} is compiled with the @code{readline} library, you +If @command{gawk} is compiled with the @code{readline} library, you can take advantage of that library's command completion and history expansion features. The following types of completion are available: @@ -26412,28 +26779,28 @@ and @end table -@node Dgawk Limitations +@node Limitations @section Limitations and Future Plans -We hope you find @command{dgawk} useful and enjoyable to work with, +We hope you find the @command{gawk} debugger useful and enjoyable to work with, but as with any program, especially in its early releases, it still has some limitations. A few which are worth being aware of are: @itemize @bullet{} @item -At this point, @command{dgawk} does not give a detailed explanation of +At this point, the debugger does not give a detailed explanation of what you did wrong when you type in something it doesn't like. Rather, it just responds @samp{syntax error}. When you do figure out what your mistake was, though, you'll feel like a real guru. @item -If you perused the dump of opcodes in @ref{Miscellaneous Dgawk Commands}, +If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands}, (or if you are already familiar with @command{gawk} internals), you will realize that much of the internal manipulation of data in @command{gawk}, as in many interpreters, is done on a stack. @code{Op_push}, @code{Op_pop}, etc., are the ``bread and butter'' of -most @command{gawk} code. Unfortunately, as of now, @command{dgawk} -does not allow you to examine the stack's contents. +most @command{gawk} code. Unfortunately, as of now, the @command{gawk} +debugger does not allow you to examine the stack's contents. That is, the intermediate results of expression evaluation are on the stack, but cannot be printed. Rather, only variables which are defined @@ -26448,19 +26815,1702 @@ programmer, you are expected to know what @code{/[^[:alnum:][:blank:]]/} means. @item -@command{dgawk} is designed to be used by running a program (with all its -parameters) on the command line, as described in @ref{dgawk invocation}. +The @command{gawk} debugger is designed to be used by running a program (with all its +parameters) on the command line, as described in @ref{Debugger Invocation}. There is no way (as of now) to attach or ``break in'' to a running program. This seems reasonable for a language which is used mainly for quickly executing, short programs. @item -@command{dgawk} only accepts source supplied with the @option{-f} option. +The @command{gawk} debugger only accepts source supplied with the @option{-f} option. @end itemize Look forward to a future release when these and other missing features may be added, and of course feel free to try to add them yourself! +@node Arbitrary Precision Arithmetic +@chapter Arithmetic and Arbitrary Precision Arithmetic with @command{gawk} +@cindex arbitrary precision +@cindex multiple precision +@cindex infinite precision +@cindex floating-point numbers, arbitrary precision +@cindex MPFR +@cindex GMP + +@cindex Knuth, Donald +@quotation +@i{There's a credibility gap: We don't know how much of the computer's answers +to believe. Novice computer users solve this problem by implicitly trusting +in the computer as an infallible authority; they tend to believe that all +digits of a printed answer are significant. Disillusioned computer users have +just the opposite approach; they are constantly afraid that their answers +are almost meaningless.}@* +Donald Knuth@footnote{Donald E.@: Knuth. +@cite{The Art of Computer Programming}. Volume 2, +@cite{Seminumerical Algorithms}, third edition, +1998, ISBN 0-201-89683-4, p.@: 229.} +@end quotation + +This @value{CHAPTER} discusses issues that you may encounter +when performing arithmetic. It begins by discussing some of +the general attributes of computer arithmetic, along with how +this can influence what you see when running @command{awk} programs. +This discussion applies to all versions of @command{awk}. + +Then the @value{CHAPTER} moves on to @dfn{arbitrary precision +arithmetic}, a feature which is specific to @command{gawk}. + +@menu +* General Arithmetic:: An introduction to computer arithmetic. +* Floating-point Programming:: Effective Floating-point Programming. +* Gawk and MPFR:: How @command{gawk} provides + arbitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic + with @command{gawk}. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + @command{gawk}. +@end menu + +@node General Arithmetic +@section A General Description of Computer Arithmetic + +@cindex integers +@cindex floating-point, numbers +@cindex numbers, floating-point +Within computers, there are two kinds of numeric values: @dfn{integers} +and @dfn{floating-point}. +In school, integer values were referred to as ``whole'' numbers---that is, +numbers without any fractional part, such as 1, 42, or @minus{}17. +The advantage to integer numbers is that they represent values exactly. +The disadvantage is that their range is limited. On most systems, +this range is @minus{}2,147,483,648 to 2,147,483,647. +However, many systems now support a range from +@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. + +@cindex unsigned integers +@cindex integers, unsigned +Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}. +Signed values may be negative or positive, with the range of values just +described. +Unsigned values are always positive. On most systems, +the range is from 0 to 4,294,967,295. +However, many systems now support a range from +0 to 18,446,744,073,709,551,615. + +@cindex double precision floating-point +@cindex single precision floating-point +Floating-point numbers represent what are called ``real'' numbers; i.e., +those that do have a fractional part, such as 3.1415927. +The advantage to floating-point numbers is that they +can represent a much larger range of values. +The disadvantage is that there are numbers that they cannot represent +exactly. +@command{awk} uses @dfn{double precision} floating-point numbers, which +can hold more digits than @dfn{single precision} +floating-point numbers. +@c Floating-point issues are discussed more fully in +@c @ref{Floating Point Issues}. + +There a several important issues to be aware of, described next. + +@menu +* Floating Point Issues:: Stuff to know about floating-point numbers. +* Integer Programming:: Effective integer programming. +@end menu + +@node Floating Point Issues +@subsection Floating-Point Number Caveats + +This @value{SECTION} describes some of the issues +involved in using floating-point numbers. + +There is a very nice +@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic} +by David Goldberg, +``What Every Computer Scientist Should Know About Floating-point Arithmetic,'' +@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48. +This is worth reading if you are interested in the details, +but it does require a background in computer science. + +@menu +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not Abstract + Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. +@end menu + +@node String Conversion Precision +@subsubsection The String Value Can Lie + +Internally, @command{awk} keeps both the numeric value +(double precision floating-point) and the string value for a variable. +Separately, @command{awk} keeps +track of what type the variable has +(@pxref{Typing and Comparison}), +which plays a role in how variables are used in comparisons. + +It is important to note that the string value for a number may not +reflect the full value (all the digits) that the numeric value +actually contains. +The following program (@file{values.awk}) illustrates this: + +@example +@{ + sum = $1 + $2 + # see it for what it is + printf("sum = %.12g\n", sum) + # use CONVFMT + a = "<" sum ">" + print "a =", a + # use OFMT + print "sum =", sum +@} +@end example + +@noindent +This program shows the full value of the sum of @code{$1} and @code{$2} +using @code{printf}, and then prints the string values obtained +from both automatic conversion (via @code{CONVFMT}) and +from printing (via @code{OFMT}). + +Here is what happens when the program is run: + +@example +$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk} +@print{} sum = 4.8888888 +@print{} a = <4.88889> +@print{} sum = 4.88889 +@end example + +This makes it clear that the full numeric value is different from +what the default string representations show. + +@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with +at least six significant digits. For some applications, you might want to +change it to specify more precision. +On most modern machines, most of the time, +17 digits is enough to capture a floating-point number's +value exactly.@footnote{Pathological cases can require up to +752 digits (!), but we doubt that you need to worry about this.} + +@node Unexpected Results +@subsubsection Floating Point Numbers Are Not Abstract Numbers + +@cindex floating-point, numbers +Unlike numbers in the abstract sense (such as what you studied in high school +or college arithmetic), numbers stored in computers are limited in certain ways. +They cannot represent an infinite number of digits, nor can they always +represent things exactly. +In particular, +floating-point numbers cannot +always represent values exactly. Here is an example: + +@example +$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'} +515.79 +@print{} 0000051579 +515.80 +@print{} 0000051579 +515.81 +@print{} 0000051580 +515.82 +@print{} 0000051582 +@kbd{@value{CTL}-d} +@end example + +@noindent +This shows that some values can be represented exactly, +whereas others are only approximated. This is not a ``bug'' +in @command{awk}, but simply an artifact of how computers +represent numbers. + +@quotation NOTE +It cannot be emphasized enough that the behavior just +described is fundamental to modern computers. You will +see this kind of thing happen in @emph{any} programming +language using hardware floating-point numbers. It is @emph{not} +a bug in @command{gawk}, nor is it something that can be ``just +fixed.'' +@end quotation + +@cindex negative zero +@cindex positive zero +@cindex zero@comma{} negative vs.@: positive +Another peculiarity of floating-point numbers on modern systems +is that they often have more than one representation for the number zero! +In particular, it is possible to represent ``minus zero'' as well as +regular, or ``positive'' zero. + +This example shows that negative and positive zero are distinct values +when stored internally, but that they are in fact equal to each other, +as well as to ``regular'' zero: + +@example +$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0} +> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz} +> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0} +> @kbd{@}'} +@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1 +@print{} mz == 0 -> 1, pz == 0 -> 1 +@end example + +It helps to keep this in mind should you process numeric data +that contains negative zero values; the fact that the zero is negative +is noted and can affect comparisons. + +@node POSIX Floating Point Problems +@subsubsection Standards Versus Existing Practice + +Historically, @command{awk} has converted any non-numeric looking string +to the numeric value zero, when required. Furthermore, the original +definition of the language and the original POSIX standards specified that +@command{awk} only understands decimal numbers (base 10), and not octal +(base 8) or hexadecimal numbers (base 16). + +Changes in the language of the +2001 and 2004 POSIX standards can be interpreted to imply that @command{awk} +should support additional features. These features are: + +@itemize @bullet +@item +Interpretation of floating point data values specified in hexadecimal +notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not} +source code constants.) + +@item +Support for the special IEEE 754 floating point values ``Not A Number'' +(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf''). +In particular, the format for these values is as specified by the ISO 1999 +C standard, which ignores case and can allow machine-dependent additional +characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}. +@end itemize + +The first problem is that both of these are clear changes to historical +practice: + +@itemize @bullet +@item +The @command{gawk} maintainer feels that supporting hexadecimal floating +point values, in particular, is ugly, and was never intended by the +original designers to be part of the language. + +@item +Allowing completely alphabetic strings to have valid numeric +values is also a very severe departure from historical practice. +@end itemize + +The second problem is that the @code{gawk} maintainer feels that this +interpretation of the standard, which requires a certain amount of +``language lawyering'' to arrive at in the first place, was not even +intended by the standard developers. In other words, ``we see how you +got where you are, but we don't think that that's where you want to be.'' + +Recognizing the above issues, but attempting to provide compatibility +with the earlier versions of the standard, +the 2008 POSIX standard added explicit wording to allow, but not require, +that @command{awk} support hexadecimal floating point values and +special values for ``Not A Number'' and infinity. + +Although the @command{gawk} maintainer continues to feel that +providing those features is inadvisable, +nevertheless, on systems that support IEEE floating point, it seems +reasonable to provide @emph{some} way to support NaN and Infinity values. +The solution implemented in @command{gawk} is as follows: + +@itemize @bullet +@item +With the @option{--posix} command-line option, @command{gawk} becomes +``hands off.'' String values are passed directly to the system library's +@code{strtod()} function, and if it successfully returns a numeric value, +that is what's used.@footnote{You asked for it, you got it.} +By definition, the results are not portable across +different systems. They are also a little surprising: + +@example +$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'} +@print{} nan +$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'} +@print{} 3735928559 +@end example + +@item +Without @option{--posix}, @command{gawk} interprets the four strings +@samp{+inf}, +@samp{-inf}, +@samp{+nan}, +and +@samp{-nan} +specially, producing the corresponding special numeric values. +The leading sign acts a signal to @command{gawk} (and the user) +that the value is really numeric. Hexadecimal floating point is +not supported (unless you also use @option{--non-decimal-data}, +which is @emph{not} recommended). For example: + +@example +$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'} +@print{} 0 +$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'} +@print{} nan +$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'} +@print{} 0 +@end example + +@command{gawk} does ignore case in the four special values. +Thus @samp{+nan} and @samp{+NaN} are the same. +@end itemize + +@node Integer Programming +@subsection Mixing Integers And Floating-point + +As has been mentioned already, @command{gawk} ordinarily uses hardware double +precision with 64-bit IEEE binary floating-point representation +for numbers on most systems. A large integer like 9,007,199,254,740,997 +has a binary representation that, although finite, is more than 53 bits long; +it must also be rounded to 53 bits. +The biggest integer that can be stored in a C @code{double} is usually the same +as the largest possible value of a @code{double}. If your system @code{double} +is an IEEE 64-bit @code{double}, this largest possible value is an integer and +can be represented precisely. What more should one know about integers? + +If you want to know what is the largest integer, such that it and +all smaller integers can be stored in 64-bit doubles without losing precision, +then the answer is +@iftex +@math{2^{53}}. +@end iftex +@ifnottex +2^53. +@end ifnottex +The next representable number is the even number +@iftex +@math{2^{53} + 2}, +@end iftex +@ifnottex +2^53 + 2, +@end ifnottex +meaning it is unlikely that you will be able to make +@command{gawk} print +@iftex +@math{2^{53} + 1} +@end iftex +@ifnottex +2^53 + 1 +@end ifnottex +in integer format. +The range of integers exactly representable by a 64-bit double +is +@iftex +@math{[-2^{53}, 2^{53}]}. +@end iftex +@ifnottex +[@minus{}2^53, 2^53]. +@end ifnottex +If you ever see an integer outside this range in @command{gawk} +using 64-bit doubles, you have reason to be very suspicious about +the accuracy of the output. Here is a simple program with erroneous output: + +@example +$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'} +@print{} 9007199254740991 +@print{} 9007199254740992 +@print{} 9007199254740992 +@print{} 9007199254740994 +@end example + +The lesson is to not assume that any large integer printed by @command{gawk} +represents an exact result from your computation, especially if it wraps +around on your screen. + +@node Floating-point Programming +@section Understanding Floating-point Programming + +Numerical programming is an extensive area; if you need to develop +sophisticated numerical algorithms then @command{gawk} may not be +the ideal tool, and this documentation may not be sufficient. +@c FIXME: JOHN: Do you want to cite some actual books? +It might require digesting a book or two to really internalize how to compute +with ideal accuracy and precision, +and the result often depends on the particular application. + +@quotation NOTE +A floating-point calculation's @dfn{accuracy} is how close it comes +to the real value. This is as opposed to the @dfn{precision}, which +usually refers to the number of bits used to represent the number +(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision, +the Wikipedia article} for more information). +@end quotation + +There are two options for doing floating-point calculations: +hardware floating-point (as used by standard @command{awk} and +the default for @command{gawk}), and @dfn{arbitrary-precision} +floating-point, which is software based. +From this point forward, this @value{CHAPTER} +aims to provide enough information to understand both, and then +will focus on @command{gawk}'s facilities for the latter.@footnote{If you +are interested in other tools that perform arbitrary precision arithmetic, +you may want to investigate the POSIX @command{bc} tool. See +@uref{http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html, +the POSIX specification for it}, for more information.} + +Binary floating-point representations and arithmetic are inexact. +Simple values like 0.1 cannot be precisely represented using +binary floating-point numbers, and the limited precision of +floating-point numbers means that slight changes in +the order of operations or the precision of intermediate storage +can change the result. To make matters worse, with arbitrary precision +floating-point, you can set the precision before starting a computation, +but then you cannot be sure of the number of significant decimal places +in the final result. + +Sometimes, before you start to write any code, you should think more +about what you really want and what's really happening. Consider the +two numbers in the following example: + +@example +x = 0.875 # 1/2 + 1/4 + 1/8 +y = 0.425 +@end example + +Unlike the number in @code{y}, the number stored in @code{x} +is exactly representable +in binary since it can be written as a finite sum of one or +more fractions whose denominators are all powers of two. +When @command{gawk} reads a floating-point number from +program source, it automatically rounds that number to whatever +precision your machine supports. If you try to print the numeric +content of a variable using an output format string of @code{"%.17g"}, +it may not produce the same number as you assigned to it: + +@example +$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425} +> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'} +@print{} 0.875, 0.42499999999999999 +@end example + +Often the error is so small you do not even notice it, and if you do, +you can always specify how much precision you would like in your output. +Usually this is a format string like @code{"%.15g"}, which when +used in the previous example, produces an output identical to the input. + +Because the underlying representation can be a little bit off from the exact value, +comparing floating-point values to see if they are equal is generally not a good idea. +Here is an example where it does not work like you expect: + +@example +$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +The loss of accuracy during a single computation with floating-point numbers +usually isn't enough to worry about. However, if you compute a value +which is the result of a sequence of floating point operations, +the error can accumulate and greatly affect the computation itself. +Here is an attempt to compute the value of the constant +@value{PI} using one of its many series representations: + +@example +BEGIN @{ + x = 1.0 / sqrt(3.0) + n = 6 + for (i = 1; i < 30; i++) @{ + n = n * 2.0 + x = (sqrt(x * x + 1) - 1) / x + printf("%.15f\n", n * x) + @} +@} +@end example + +When run, the early errors propagating through later computations +cause the loop to terminate prematurely after an attempt to divide by zero. + +@example +$ @kbd{gawk -f pi.awk} +@print{} 3.215390309173475 +@print{} 3.159659942097510 +@print{} 3.146086215131467 +@print{} 3.142714599645573 +@dots{} +@print{} 3.224515243534819 +@print{} 2.791117213058638 +@print{} 0.000000000000000 +@error{} gawk: pi.awk:6: fatal: division by zero attempted +@end example + +Here is an additional example where the inaccuracies in internal representations +yield an unexpected result: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{i++} +> @kbd{print i} +> @kbd{@}'} +@print{} 4 +@end example + +Can computation using arbitrary precision help with the previous examples? +If you are impatient to know, see +@ref{Exact Arithmetic}. + +Instead of arbitrary precision floating-point arithmetic, +often all you need is an adjustment of your logic +or a different order for the operations in your calculation. +The stability and the accuracy of the computation of the constant @value{PI} +in the previous example can be enhanced by using the following +simple algebraic transformation: + +@example +(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1) +@end example + +@noindent +After making this, change the program does converge to +@value{PI} in under 30 iterations: + +@example +$ @kbd{gawk -f /tmp/pi2.awk} +@print{} 3.215390309173473 +@print{} 3.159659942097501 +@print{} 3.146086215131436 +@print{} 3.142714599645370 +@print{} 3.141873049979825 +@dots{} +@print{} 3.141592653589797 +@print{} 3.141592653589797 +@end example + +There is no need to be unduly suspicious about the results from +floating-point arithmetic. The lesson to remember is that +floating-point arithmetic is always more complex than arithmetic using +pencil and paper. In order to take advantage of the power +of computer floating-point, you need to know its limitations +and work within them. For most casual use of floating-point arithmetic, +you will often get the expected result in the end if you simply round +the display of your final results to the correct number of significant +decimal digits. + +As general advice, avoid presenting numerical data in a manner that +implies better precision than is actually the case. + +@menu +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +@end menu + +@node Floating-point Representation +@subsection Binary Floating-point Representation +@cindex IEEE-754 format + +Although floating-point representations vary from machine to machine, +the most commonly encountered representation is that defined by the +IEEE 754 Standard. An IEEE-754 format value has three components: + +@itemize @bullet +@item +A sign bit telling whether the number is positive or negative. + +@item +An @dfn{exponent}, @var{e}, giving its order of magnitude. + +@item +A @dfn{significand}, @var{s}, +specifying the actual digits of the number. +@end itemize + +The value of the +number is then +@iftex +@math{s @cdot 2^e}. +@end iftex +@ifnottex +@var{s * 2^e}. +@end ifnottex +The first bit of a non-zero binary significand +is always one, so the significand in an IEEE-754 format only includes the +fractional part, leaving the leading one implicit. +The significand is stored in @dfn{normalized} format, +which means that the first bit is always a one. + +Three of the standard IEEE-754 types are 32-bit single precision, +64-bit double precision and 128-bit quadruple precision. +The standard also specifies extended precision formats +to allow greater precisions and larger exponent ranges. + +@node Floating-point Context +@subsection Floating-point Context +@cindex context, floating-point + +A floating-point @dfn{context} defines the environment for arithmetic operations. +It governs precision, sets rules for rounding, and limits the range for exponents. +The context has the following primary components: + +@table @dfn +@item Precision +Precision of the floating-point format in bits. +@item emax +Maximum exponent allowed for this format. +@item emin +Minimum exponent allowed for this format. +@item Underflow behavior +The format may or may not support gradual underflow. +@item Rounding +The rounding mode of this context. +@end table + +@ref{table-ieee-formats} lists the precision and exponent +field values for the basic IEEE-754 binary formats: + +@float Table,table-ieee-formats +@caption{Basic IEEE Format Context Values} +@multitable @columnfractions .20 .20 .20 .20 .20 +@headitem Name @tab Total bits @tab Precision @tab emin @tab emax +@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127 +@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023 +@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383 +@end multitable +@end float + +@quotation NOTE +The precision numbers include the implied leading one that gives them +one extra bit of significand. +@end quotation + +A floating-point context can also determine which signals are treated +as exceptions, and can set rules for arithmetic with special values. +Please consult the IEEE-754 standard or other resources for details. + +@command{gawk} ordinarily uses the hardware double precision +representation for numbers. On most systems, this is IEEE-754 +floating-point format, corresponding to 64-bit binary with 53 bits +of precision. + +@quotation NOTE +In case an underflow occurs, the standard allows, but does not require, +the result from an arithmetic operation to be a number smaller than +the smallest nonzero normalized number. Such numbers do +not have as many significant digits as normal numbers, and are called +@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero, +is called @dfn{flush to zero}. The basic IEEE-754 binary formats +support subnormal numbers. +@end quotation + +@node Rounding Mode +@subsection Floating-point Rounding Mode +@cindex rounding mode, floating-point + +The @dfn{rounding mode} specifies the behavior for the results of numerical +operations when discarding extra precision. Each rounding mode indicates +how the least significant returned digit of a rounded result is to +be calculated. +@ref{table-rounding-modes} lists the IEEE-754 defined +rounding modes: + +@float Table,table-rounding-modes +@caption{IEEE 754 Rounding Modes} +@multitable @columnfractions .45 .55 +@headitem Rounding Mode @tab IEEE Name +@item Round to nearest, ties to even @tab @code{roundTiesToEven} +@item Round toward plus Infinity @tab @code{roundTowardPositive} +@item Round toward negative Infinity @tab @code{roundTowardNegative} +@item Round toward zero @tab @code{roundTowardZero} +@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} +@end multitable +@end float + +The default mode @code{roundTiesToEven} is the most preferred, +but the least intuitive. This method does the obvious thing for most values, +by rounding them up or down to the nearest digit. +For example, rounding 1.132 to two digits yields 1.13, +and rounding 1.157 yields 1.16. + +However, when it comes to rounding a value that is exactly halfway between, +things do not work the way you probably learned in school. +In this case, the number is rounded to the nearest even digit. +So rounding 0.125 to two digits rounds down to 0.12, +but rounding 0.6875 to three digits rounds up to 0.688. +You probably have already encountered this rounding mode when +using the @code{printf} routine to format floating-point numbers. +For example: + +@example +BEGIN @{ + x = -4.5 + for (i = 1; i < 10; i++) @{ + x += 1.0 + printf("%4.1f => %2.0f\n", x, x) + @} +@} +@end example + +@noindent +produces the following output when run:@footnote{It +is possible for the output to be completely different if the +C library in your system does not use the IEEE-754 even-rounding +rule to round halfway cases for @code{printf()}.} + +@example +-3.5 => -4 +-2.5 => -2 +-1.5 => -2 +-0.5 => 0 + 0.5 => 0 + 1.5 => 2 + 2.5 => 2 + 3.5 => 4 + 4.5 => 4 +@end example + +The theory behind the rounding mode @code{roundTiesToEven} is that +it more or less evenly distributes upward and downward rounds +of exact halves, which might cause the round-off error +to cancel itself out. This is the default rounding mode used +in IEEE-754 computing functions and operators. + +The other rounding modes are rarely used. +Round toward positive infinity (@code{roundTowardPositive}) +and round toward negative infinity (@code{roundTowardNegative}) +are often used to implement interval arithmetic, +where you adjust the rounding mode to calculate upper and lower bounds +for the range of output. The @code{roundTowardZero} +mode can be used for converting floating-point numbers to integers. +The rounding mode @code{roundTiesToAway} rounds the result to the +nearest number and selects the number with the larger magnitude +if a tie occurs. + +Some numerical analysts will tell you that your choice of rounding style +has tremendous impact on the final outcome, and advise you to wait until +final output for any rounding. Instead, you can often avoid round-off error problems by +setting the precision initially to some value sufficiently larger than +the final desired precision, so that the accumulation of round-off error +does not influence the outcome. +If you suspect that results from your computation are +sensitive to accumulation of round-off error, +one way to be sure is to look for a significant difference in output +when you change the rounding mode. + +@node Gawk and MPFR +@section @command{gawk} + MPFR = Powerful Arithmetic + +The rest of this @value{CHAPTER} describes how to use the arbitrary precision +(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric +capabilities in @command{gawk} to produce maximally accurate results +when you need it. + +But first you should check if your version of +@command{gawk} supports arbitrary precision arithmetic. +The easiest way to find out is to look at the output of +the following command: + +@example +$ @kbd{gawk --version} +@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) +@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation. +@dots{} +@end example + +@command{gawk} uses the +@uref{http://www.mpfr.org, GNU MPFR} +and +@uref{http://gmplib.org, GNU MP} (GMP) +libraries for arbitrary precision +arithmetic on numbers. So if you do not see the names of these libraries +in the output, then your version of @command{gawk} does not support +arbitrary precision arithmetic. + +Additionally, +there are a few elements available in the @code{PROCINFO} array +to provide information about the MPFR and GMP libraries. +@xref{Auto-set}, for more information. + +@ignore +Even if you aren't interested in arbitrary precision arithmetic, you +may still benefit from knowing about how @command{gawk} handles numbers +in general, and the limitations of doing arithmetic with ordinary +@command{gawk} numbers. +@end ignore + + +@node Arbitrary Precision Floats +@section Arbitrary Precision Floating-point Arithmetic with @command{gawk} + +@command{gawk} uses the GNU MPFR library +for arbitrary precision floating-point arithmetic. The MPFR library +provides precise control over precisions and rounding modes, and gives +correctly rounded, reproducible, platform-independent results. With the +command-line option @option{--bignum} or @option{-M}, +all floating-point arithmetic operators and numeric functions can yield +results to any desired precision level supported by MPFR. +Two built-in variables, @code{PREC} and @code{ROUNDMODE}, +provide control over the working precision and the rounding mode +(@pxref{Setting Precision}, and +@pxref{Setting Rounding Mode}). +The precision and the rounding mode are set globally for every operation +to follow. + +The default working precision for arbitrary precision floating-point values is 53, +and the default value for @code{ROUNDMODE} is @code{"N"}, +which selects the IEEE-754 @code{roundTiesToEven} rounding mode +(@pxref{Rounding Mode}).@footnote{The +default precision is 53, since according to the MPFR documentation, +the library should be able to exactly reproduce all computations with +double-precision machine floating-point numbers (@code{double} type +in C), except the default exponent range is much wider and subnormal +numbers are not implemented.} +@command{gawk} uses the default exponent range in MPFR +@iftex +(@math{emax = 2^{30} - 1, emin = -emax}) +@end iftex +@ifnottex +(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax}) +@end ifnottex +for all floating-point contexts. +There is no explicit mechanism to adjust the exponent range. +MPFR does not implement subnormal numbers by default, +and this behavior cannot be changed in @command{gawk}. + +@quotation NOTE +When emulating an IEEE-754 format (@pxref{Setting Precision}), +@command{gawk} internally adjusts the exponent range +to the value defined for the format and also performs computations needed for +gradual underflow (subnormal numbers). +@end quotation + +@quotation NOTE +MPFR numbers are variable-size entities, consuming only as much space as +needed to store the significant digits. Since the performance using MPFR +numbers pales in comparison to doing arithmetic using the underlying machine +types, you should consider using only as much precision as needed by +your program. +@end quotation + +@menu +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point numbers. +@end menu + +@node Setting Precision +@subsection Setting the Working Precision +@cindex @code{PREC} variable + +@command{gawk} uses a global working precision; it does not keep track of +the precision or accuracy of individual numbers. Performing an arithmetic +operation or calling a built-in function rounds the result to the current +working precision. The default working precision is 53, which can be +modified using the built-in variable @code{PREC}. You can also set the +value to one of the following pre-defined case-insensitive strings +to emulate an IEEE-754 binary format: + +@multitable {@code{"double"}} {12345678901234567890123456789012345} +@headitem @code{PREC} @tab IEEE-754 Binary Format +@item @code{"half"} @tab 16-bit half-precision. +@item @code{"single"} @tab Basic 32-bit single precision. +@item @code{"double"} @tab Basic 64-bit double precision. +@item @code{"quad"} @tab Basic 128-bit quadruple precision. +@item @code{"oct"} @tab 256-bit octuple precision. +@end multitable + +The following example illustrates the effects of changing precision +on arithmetic operations: + +@example +$ @kbd{gawk -M -v PREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \} +> @kbd{PREC = "double"; print x + 0 @}'} +@print{} 1e-400 +@print{} 0 +@end example + +Binary and decimal precisions are related approximately, according to the +formula: + +@iftex +@math{prec = 3.322 @cdot dps} +@end iftex +@ifnottex +@var{prec} = 3.322 * @var{dps} +@end ifnottex + +@noindent +Here, @var{prec} denotes the binary precision +(measured in bits) and @var{dps} (short for decimal places) +is the decimal digits. We can easily calculate how many decimal +digits the 53-bit significand of an IEEE double is equivalent to: +53 / 3.332 which is equal to about 15.95. +But what does 15.95 digits actually mean? It depends whether you are +concerned about how many digits you can rely on, or how many digits +you need. + +It is important to know how many bits it takes to uniquely identify +a double-precision value (the C type @code{double}). If you want to +convert from @code{double} to decimal and back to @code{double} (e.g., +saving a @code{double} representing an intermediate result to a file, and +later reading it back to restart the computation), then a few more decimal +digits are required. 17 digits is generally enough for a @code{double}. + +It can also be important to know what decimal numbers can be uniquely +represented with a @code{double}. If you want to convert +from decimal to @code{double} and back again, 15 digits is the most that +you can get. Stated differently, you should not present +the numbers from your floating-point computations with more than 15 +significant digits in them. + +Conversely, it takes a precision of 332 bits to hold an approximation +of the constant @value{PI} that is accurate to 100 decimal places. + +You should always add some extra bits in order to avoid the confusing round-off +issues that occur because numbers are stored internally in binary. + +@node Setting Rounding Mode +@subsection Setting the Rounding Mode +@cindex @code{ROUNDMODE} variable + +The @code{ROUNDMODE} variable provides +program level control over the rounding mode. +The correspondence between @code{ROUNDMODE} and the IEEE +rounding modes is shown in @ref{table-gawk-rounding-modes}. + +@float Table,table-gawk-rounding-modes +@caption{@command{gawk} Rounding Modes} +@multitable @columnfractions .45 .30 .25 +@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE} +@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"} +@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"} +@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"} +@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"} +@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"} +@end multitable +@end float + +@code{ROUNDMODE} has the default value @code{"N"}, +which selects the IEEE-754 rounding mode @code{roundTiesToEven}. +@ref{table-gawk-rounding-modes}, lists @code{"A"} to select the IEEE-754 mode +@code{roundTiesToAway}. This is only available +if your version of the MPFR library supports it; otherwise setting +@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode}, +for the meanings of the various rounding modes. + +Here is an example of how to change the default rounding behavior of +@code{printf}'s output: + +@example +$ @kbd{gawk -M -v ROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'} +@print{} 1.37 +@end example + +@node Floating-point Constants +@subsection Representing Floating-point Constants +@cindex constants, floating-point + +Be wary of floating-point constants! When reading a floating-point constant +from program source code, @command{gawk} uses the default precision, +unless overridden +by an assignment to the special variable @code{PREC} on the command +line, to store it internally as a MPFR number. +Changing the precision using @code{PREC} in the program text does +@emph{not} change the precision of a constant. If you need to +represent a floating-point constant at a higher precision than the +default and cannot use a command line assignment to @code{PREC}, +you should either specify the constant as a string, or +as a rational number, whenever possible. The following example +illustrates the differences among various ways to +print a floating-point constant: + +@example +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000055511151 +$ @kbd{gawk -M -v PREC=113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'} +@print{} 0.1000000000000000000000000 +@end example + +In the first case, the number is stored with the default precision of 53. + +@node Changing Precision +@subsection Changing the Precision of a Number + +@cindex Laurie, Dirk +@quotation +@i{The point is that in any variable-precision package, +a decision is made on how to treat numbers given as data, +or arising in intermediate results, which are represented in +floating-point format to a precision lower than working precision. +Do we promote them to full membership of the high-precision club, +or do we treat them and all their associates as second-class citizens? +Sometimes the first course is proper, sometimes the second, and it takes +careful analysis to tell which.} + +Dirk Laurie@footnote{Dirk Laurie. +@cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}. +Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.} +@end quotation + +@command{gawk} does not implicitly modify the precision of any previously +computed results when the working precision is changed with an assignment +to @code{PREC}. The precision of a number is always the one that was +used at the time of its creation, and there is no way for the user +to explicitly change it afterwards. However, since the result of a +floating-point arithmetic operation is always an arbitrary precision +floating-point value---with a precision set by the value of @code{PREC}---one of the +following workarounds effectively accomplishes the desired behavior: + +@example +x = x + 0.0 +@end example + +@noindent +or: + +@example +x += 0.0 +@end example + +@node Exact Arithmetic +@subsection Exact Arithmetic with Floating-point Numbers + +@quotation CAUTION +Never depend on the exactness of floating-point arithmetic, +even for apparently simple expressions! +@end quotation + +Can arbitrary precision arithmetic give exact results? There are +no easy answers. The standard rules of algebra often do not apply +when using floating-point arithmetic. +Among other things, the distributive and associative laws +do not hold completely, and order of operation may be important +for your computation. Rounding error, cumulative precision loss +and underflow are often troublesome. + +When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3} +for equality +using the machine double precision arithmetic, it decides that they +are not equal! +(@xref{Floating-point Programming}.) +You can get the result you want by increasing the precision; +56 in this case will get the job done: + +@example +$ @kbd{gawk -M -v PREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 1 +@end example + +If adding more bits is good, perhaps adding even more bits of +precision is better? +Here is what happens if we use an even larger value of @code{PREC}: + +@example +$ @kbd{gawk -M -v PREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +This is not a bug in @command{gawk} or in the MPFR library. +It is easy to forget that the finite number of bits used to store the value +is often just an approximation after proper rounding. +The test for equality succeeds if and only if @emph{all} bits in the two operands +are exactly the same. Since this is not necessarily true after floating-point +computations with a particular precision and effective rounding rule, +a straight test for equality may not work. + +So, don't assume that floating-point values can be compared for equality. +You should also exercise caution when using other forms of comparisons. +The standard way to compare between floating-point numbers is to determine +how much error (or @dfn{tolerance}) you will allow in a comparison and +check to see if one value is within this error range of the other. + +In applications where 15 or fewer decimal places suffice, +hardware double precision arithmetic can be adequate, and is usually much faster. +But you do need to keep in mind that every floating-point operation +can suffer a new rounding error with catastrophic consequences as illustrated +by our earlier attempt to compute the value of the constant @value{PI} +(@pxref{Floating-point Programming}). +Extra precision can greatly enhance the stability and the accuracy +of your computation in such cases. + +Repeated addition is not necessarily equivalent to multiplication +in floating-point arithmetic. In the example in +@ref{Floating-point Programming}: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{i++} +> @kbd{print i} +> @kbd{@}'} +@print{} 4 +@end example + +@noindent +you may or may not succeed in getting the correct result by choosing +an arbitrarily large value for @code{PREC}. Reformulation of +the problem at hand is often the correct approach in such situations. + +@node Arbitrary Precision Integers +@section Arbitrary Precision Integer Arithmetic with @command{gawk} +@cindex integer, arbitrary precision + +If the option @option{--bignum} or @option{-M} is specified, +@command{gawk} performs all +integer arithmetic using GMP arbitrary precision integers. +Any number that looks like an integer in a program source or data file +is stored as an arbitrary precision integer. +The size of the integer is limited only by your computer's memory. +The current floating-point context has no effect on operations involving integers. +For example, the following computes +@iftex +@math{5^{4^{3^{2}}}}, +@end iftex +@ifnottex +5^4^3^2, +@end ifnottex +the result of which is beyond the +limits of ordinary @command{gawk} numbers: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{x = 5^4^3^2} +> @kbd{print "# of digits =", length(x)} +> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)} +> @kbd{@}'} +@print{} # of digits = 183231 +@print{} 62060698786608744707 ... 92256259918212890625 +@end example + +If you were to compute the same value using arbitrary precision +floating-point values instead, the precision needed for correct output +(using the formula +@iftex +@math{prec = 3.322 @cdot dps}), +would be @math{3.322 @cdot 183231}, +@end iftex +@ifnottex +@samp{prec = 3.322 * dps}), +would be 3.322 x 183231, +@end ifnottex +or 608693. + +The result from an arithmetic operation with an integer and a floating-point value +is a floating-point value with a precision equal to the working precision. +The following program calculates the eighth term in +Sylvester's sequence@footnote{Weisstein, Eric W. +@cite{Sylvester's Sequence}. From MathWorld---A Wolfram Web Resource. +@url{http://mathworld.wolfram.com/SylvestersSequence.html}} +using a recurrence: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{s = 2.0} +> @kbd{for (i = 1; i <= 7; i++)} +> @kbd{s = s * (s - 1) + 1} +> @kbd{print s} +> @kbd{@}'} +@print{} 113423713055421845118910464 +@end example + +The output differs from the actual number, 113,423,713,055,421,844,361,000,443, +because the default precision of 53 is not enough to represent the +floating-point results exactly. You can either increase the precision +(100 is enough in this case), or replace the floating-point constant +@samp{2.0} with an integer, to perform all computations using integer +arithmetic to get the correct output. + +It will sometimes be necessary for @command{gawk} to implicitly convert an +arbitrary precision integer into an arbitrary precision floating-point value. +This is primarily because the MPFR library does not always provide the +relevant interface to process arbitrary precision integers or mixed-mode +numbers as needed by an operation or function. +In such a case, the precision is set to the minimum value necessary +for exact conversion, and the working precision is not used for this purpose. +If this is not what you need or want, you can employ a subterfuge +like this: + +@example +gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}' +@end example + +You can avoid this issue altogether by specifying the number as a floating-point value +to begin with: + +@example +gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}' +@end example + +Note that for the particular example above, there is likely best +to just use the following: + +@example +gawk -M 'BEGIN @{ n = 13; print n % 2 @}' +@end example + +@node Dynamic Extensions +@chapter Writing Extensions for @command{gawk} + +This chapter is a placeholder, pending a rewrite for the new API. +Some of the old bits remain, since they can be partially reused. + + +@c STARTOFRANGE gladfgaw +@cindex @command{gawk}, functions, adding +@c STARTOFRANGE adfugaw +@cindex adding, functions to @command{gawk} +@c STARTOFRANGE fubadgaw +@cindex functions, built-in, adding to @command{gawk} +It is possible to add new built-in +functions to @command{gawk} using dynamically loaded libraries. This +facility is available on systems (such as GNU/Linux) that support +the C @code{dlopen()} and @code{dlsym()} functions. +This @value{CHAPTER} describes how to write and use dynamically +loaded extensions for @command{gawk}. +Experience with programming in +C or C++ is necessary when reading this @value{SECTION}. + +@quotation NOTE +When @option{--sandbox} is specified, extensions are disabled +(@pxref{Options}. +@end quotation + +@menu +* Plugin License:: A note about licensing. +* Sample Library:: A example of new functions. +@end menu + +@node Plugin License +@section Extension Licensing + +Every dynamic extension should define the global symbol +@code{plugin_is_GPL_compatible} to assert that it has been licensed under +a GPL-compatible license. If this symbol does not exist, @command{gawk} +will emit a fatal error and exit. + +The declared type of the symbol should be @code{int}. It does not need +to be in any allocated section, though. The code merely asserts that +the symbol exists in the global scope. Something like this is enough: + +@example +int plugin_is_GPL_compatible; +@end example + +@node Sample Library +@section Example: Directory and File Operation Built-ins +@c STARTOFRANGE chdirg +@cindex @code{chdir()} function@comma{} implementing in @command{gawk} +@c STARTOFRANGE statg +@cindex @code{stat()} function@comma{} implementing in @command{gawk} +@c STARTOFRANGE filre +@cindex files, information about@comma{} retrieving +@c STARTOFRANGE dirch +@cindex directories, changing + +Two useful functions that are not in @command{awk} are @code{chdir()} +(so that an @command{awk} program can change its directory) and +@code{stat()} (so that an @command{awk} program can gather information about +a file). +This @value{SECTION} implements these functions for @command{gawk} in an +external extension library. + +@menu +* Internal File Description:: What the new functions will do. +* Internal File Ops:: The code for internal file operations. +* Using Internal File Ops:: How to use an external extension. +@end menu + +@node Internal File Description +@subsection Using @code{chdir()} and @code{stat()} + +This @value{SECTION} shows how to use the new functions at the @command{awk} +level once they've been integrated into the running @command{gawk} +interpreter. +Using @code{chdir()} is very straightforward. It takes one argument, +the new directory to change to: + +@example +@dots{} +newdir = "/home/arnold/funstuff" +ret = chdir(newdir) +if (ret < 0) @{ + printf("could not change to %s: %s\n", + newdir, ERRNO) > "/dev/stderr" + exit 1 +@} +@dots{} +@end example + +The return value is negative if the @code{chdir} failed, +and @code{ERRNO} +(@pxref{Built-in Variables}) +is set to a string indicating the error. + +Using @code{stat()} is a bit more complicated. +The C @code{stat()} function fills in a structure that has a fair +amount of information. +The right way to model this in @command{awk} is to fill in an associative +array with the appropriate information: + +@c broke printf for page breaking +@example +file = "/home/arnold/.profile" +fdata[1] = "x" # force `fdata' to be an array +ret = stat(file, fdata) +if (ret < 0) @{ + printf("could not stat %s: %s\n", + file, ERRNO) > "/dev/stderr" + exit 1 +@} +printf("size of %s is %d bytes\n", file, fdata["size"]) +@end example + +The @code{stat()} function always clears the data array, even if +the @code{stat()} fails. It fills in the following elements: + +@table @code +@item "name" +The name of the file that was @code{stat()}'ed. + +@item "dev" +@itemx "ino" +The file's device and inode numbers, respectively. + +@item "mode" +The file's mode, as a numeric value. This includes both the file's +type and its permissions. + +@item "nlink" +The number of hard links (directory entries) the file has. + +@item "uid" +@itemx "gid" +The numeric user and group ID numbers of the file's owner. + +@item "size" +The size in bytes of the file. + +@item "blocks" +The number of disk blocks the file actually occupies. This may not +be a function of the file's size if the file has holes. + +@item "atime" +@itemx "mtime" +@itemx "ctime" +The file's last access, modification, and inode update times, +respectively. These are numeric timestamps, suitable for formatting +with @code{strftime()} +(@pxref{Built-in}). + +@item "pmode" +The file's ``printable mode.'' This is a string representation of +the file's type and permissions, such as what is produced by +@samp{ls -l}---for example, @code{"drwxr-xr-x"}. + +@item "type" +A printable string representation of the file's type. The value +is one of the following: + +@table @code +@item "blockdev" +@itemx "chardev" +The file is a block or character device (``special file''). + +@ignore +@item "door" +The file is a Solaris ``door'' (special file used for +interprocess communications). +@end ignore + +@item "directory" +The file is a directory. + +@item "fifo" +The file is a named-pipe (also known as a FIFO). + +@item "file" +The file is just a regular file. + +@item "socket" +The file is an @code{AF_UNIX} (``Unix domain'') socket in the +filesystem. + +@item "symlink" +The file is a symbolic link. +@end table +@end table + +Several additional elements may be present depending upon the operating +system and the type of the file. You can test for them in your @command{awk} +program by using the @code{in} operator +(@pxref{Reference to Elements}): + +@table @code +@item "blksize" +The preferred block size for I/O to the file. This field is not +present on all POSIX-like systems in the C @code{stat} structure. + +@item "linkval" +If the file is a symbolic link, this element is the name of the +file the link points to (i.e., the value of the link). + +@item "rdev" +@itemx "major" +@itemx "minor" +If the file is a block or character device file, then these values +represent the numeric device number and the major and minor components +of that number, respectively. +@end table + +@node Internal File Ops +@subsection C Code for @code{chdir()} and @code{stat()} + +Here is the C code for these extensions. They were written for +GNU/Linux. The code needs some more work for complete portability +to other POSIX-compliant systems:@footnote{This version is edited +slightly for presentation. See +@file{extension/filefuncs.c} in the @command{gawk} distribution +for the complete version.} + +@c break line for page breaking +@example +#include "awk.h" + +#include <sys/sysmacros.h> + +int plugin_is_GPL_compatible; + +/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ + +static NODE * +do_chdir(int nargs) +@{ + NODE *newdir; + int ret = -1; + + if (do_lint && nargs != 1) + lintwarn("chdir: called with incorrect number of arguments"); + + newdir = get_scalar_argument(0, FALSE); +@end example + +The file includes the @code{"awk.h"} header file for definitions +for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>} +for access to the @code{major()} and @code{minor}() macros. + +@cindex programming conventions, @command{gawk} internals +By convention, for an @command{awk} function @code{foo}, the function that +implements it is called @samp{do_foo}. The function should take +a @samp{int} argument, usually called @code{nargs}, that +represents the number of defined arguments for the function. The @code{newdir} +variable represents the new directory to change to, retrieved +with @code{get_scalar_argument()}. Note that the first argument is +numbered zero. + +This code actually accomplishes the @code{chdir()}. It first forces +the argument to be a string and passes the string value to the +@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} +is updated. + +@example + (void) force_string(newdir); + ret = chdir(newdir->stptr); + if (ret < 0) + update_ERRNO_int(errno); +@end example + +Finally, the function returns the return value to the @command{awk} level: + +@example + return make_number((AWKNUM) ret); +@} +@end example + +The @code{stat()} built-in is more involved. First comes a function +that turns a numeric mode into a printable representation +(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity: + +@c break line for page breaking +@example +/* format_mode --- turn a stat mode field into something readable */ + +static char * +format_mode(unsigned long fmode) +@{ + @dots{} +@} +@end example + +Next comes the @code{do_stat()} function. It starts with +variable declarations and argument checking: + +@ignore +Changed message for page breaking. Used to be: + "stat: called with incorrect number of arguments (%d), should be 2", +@end ignore +@example +/* do_stat --- provide a stat() function for gawk */ + +static NODE * +do_stat(int nargs) +@{ + NODE *file, *array, *tmp; + struct stat sbuf; + int ret; + NODE **aptr; + char *pmode; /* printable mode */ + char *type = "unknown"; + + if (do_lint && nargs > 2) + lintwarn("stat: called with too many arguments"); +@end example + +Then comes the actual work. First, the function gets the arguments. +Then, it always clears the array. +The code use @code{lstat()} (instead of @code{stat()}) +to get the file information, +in case the file is a symbolic link. +If there's an error, it sets @code{ERRNO} and returns: + +@c comment made multiline for page breaking +@example + /* file is first arg, array to hold results is second */ + file = get_scalar_argument(0, FALSE); + array = get_array_argument(1, FALSE); + + /* empty out the array */ + assoc_clear(array); + + /* lstat the file, if error, set ERRNO and return */ + (void) force_string(file); + ret = lstat(file->stptr, & sbuf); + if (ret < 0) @{ + update_ERRNO_int(errno); + return make_number((AWKNUM) ret); + @} +@end example + +Now comes the tedious part: filling in the array. Only a few of the +calls are shown here, since they all follow the same pattern: + +@example + /* fill in the array */ + aptr = assoc_lookup(array, tmp = make_string("name", 4)); + *aptr = dupnode(file); + unref(tmp); + + aptr = assoc_lookup(array, tmp = make_string("mode", 4)); + *aptr = make_number((AWKNUM) sbuf.st_mode); + unref(tmp); + + aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); + pmode = format_mode(sbuf.st_mode); + *aptr = make_string(pmode, strlen(pmode)); + unref(tmp); +@end example + +When done, return the @code{lstat()} return value: + +@example + + return make_number((AWKNUM) ret); +@} +@end example + +@cindex programming conventions, @command{gawk} internals +Finally, it's necessary to provide the ``glue'' that loads the +new function(s) into @command{gawk}. By convention, each library has +a routine named @code{dl_load()} that does the job. The simplest way +is to use the @code{dl_load_func} macro in @code{gawkapi.h}. + +And that's it! As an exercise, consider adding functions to +implement system calls such as @code{chown()}, @code{chmod()}, +and @code{umask()}. + +@node Using Internal File Ops +@subsection Integrating the Extensions + +@cindex @command{gawk}, interpreter@comma{} adding code to +Now that the code is written, it must be possible to add it at +runtime to the running @command{gawk} interpreter. First, the +code must be compiled. Assuming that the functions are in +a file named @file{filefuncs.c}, and @var{idir} is the location +of the @command{gawk} include files, +the following steps create +a GNU/Linux shared library: + +@example +$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c} +$ @kbd{ld -o filefuncs.so -shared filefuncs.o} +@end example + +@cindex @code{extension()} function (@command{gawk}) +Once the library exists, it is loaded by calling the @code{extension()} +built-in function. +This function takes two arguments: the name of the +library to load and the name of a function to call when the library +is first loaded. This function adds the new functions to @command{gawk}. +It returns the value returned by the initialization function +within the shared library: + +@example +# file testff.awk +BEGIN @{ + extension("./filefuncs.so", "dl_load") + + chdir(".") # no-op + + data[1] = 1 # force `data' to be an array + print "Info for testff.awk" + ret = stat("testff.awk", data) + print "ret =", ret + for (i in data) + printf "data[\"%s\"] = %s\n", i, data[i] + print "testff.awk modified:", + strftime("%m %d %y %H:%M:%S", data["mtime"]) + + print "\nInfo for JUNK" + ret = stat("JUNK", data) + print "ret =", ret + for (i in data) + printf "data[\"%s\"] = %s\n", i, data[i] + print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) +@} +@end example + +Here are the results of running the program: + +@example +$ @kbd{gawk -f testff.awk} +@print{} Info for testff.awk +@print{} ret = 0 +@print{} data["size"] = 607 +@print{} data["ino"] = 14945891 +@print{} data["name"] = testff.awk +@print{} data["pmode"] = -rw-rw-r-- +@print{} data["nlink"] = 1 +@print{} data["atime"] = 1293993369 +@print{} data["mtime"] = 1288520752 +@print{} data["mode"] = 33204 +@print{} data["blksize"] = 4096 +@print{} data["dev"] = 2054 +@print{} data["type"] = file +@print{} data["gid"] = 500 +@print{} data["uid"] = 500 +@print{} data["blocks"] = 8 +@print{} data["ctime"] = 1290113572 +@print{} testff.awk modified: 10 31 10 12:25:52 +@print{} +@print{} Info for JUNK +@print{} ret = -1 +@print{} JUNK modified: 01 01 70 02:00:00 +@end example +@c ENDOFRANGE filre +@c ENDOFRANGE dirch +@c ENDOFRANGE statg +@c ENDOFRANGE chdirg +@c ENDOFRANGE gladfgaw +@c ENDOFRANGE adfugaw +@c ENDOFRANGE fubadgaw + @ignore @c Try this @iftex @@ -26515,8 +28565,6 @@ This @value{CHAPTER} briefly describes the evolution of the @command{awk} language, with cross-references to other parts of the @value{DOCUMENT} where you can find more information. -@c FIXME: Try to determine whether it was 3.1 or 3.2 that had new awk. - @menu * V7/SVR3.1:: The major changes between V7 and System V Release 3.1. @@ -26931,6 +28979,7 @@ and @code{xor()} functions for bit manipulation (@pxref{Bitwise Functions}). +@c In 4.1, and(), or() and xor() grew the ability to take > 2 arguments @item The @code{asort()} and @code{asorti()} functions for sorting arrays @@ -26942,11 +28991,6 @@ functions for internationalization (@pxref{Programmer i18n}). @item -The @code{extension()} built-in function and the ability to add -new functions dynamically -(@pxref{Dynamic Extensions}). - -@item The @code{fflush()} function from Brian Kernighan's version of @command{awk} (@pxref{I/O Functions}). @@ -26973,29 +29017,70 @@ the @option{-f} command-line option (@pxref{Options}). @item -The ability to use GNU-style long-named options that start with @option{--} +The @env{AWKLIBPATH} environment variable for specifying a path search for +the @option{-l} command-line option +(@pxref{Options}). + +@item +The +@option{-b}, +@option{-c}, +@option{-C}, +@option{-d}, +@option{-D}, +@option{-e}, +@option{-E}, +@option{-g}, +@option{-h}, +@option{-i}, +@option{-l}, +@option{-L}, +@option{-M}, +@option{-n}, +@option{-N}, +@option{-o}, +@option{-O}, +@option{-p}, +@option{-P}, +@option{-r}, +@option{-S}, +@option{-t}, +and +@option{-V} +short options. Also, the +ability to use GNU-style long-named options that start with @option{--} and the +@option{--assign}, +@option{--bignum}, @option{--characters-as-bytes}, -@option{--compat}, +@option{--copyright}, +@option{--debug}, @option{--dump-variables}, -@option{--exec}, +@option{--execle}, +@option{--field-separator}, +@option{--file}, @option{--gen-pot}, +@option{--help}, +@option{--include}, @option{--lint}, @option{--lint-old}, +@option{--load}, @option{--non-decimal-data}, +@option{--optimize}, @option{--posix}, +@option{--pretty-print}, @option{--profile}, @option{--re-interval}, @option{--sandbox}, @option{--source}, @option{--traditional}, +@option{--use-lc-numeric}, and -@option{--use-lc-numeric} -options +@option{--version} +long options (@pxref{Options}). @end itemize - @c new ports @item @@ -27060,7 +29145,7 @@ the three most widely-used freely available versions of @command{awk} @item @samp{\x} Escape sequence @tab X @tab X @tab X @item @code{RS} as regexp @tab @tab X @tab X @item @code{FS} as null string @tab X @tab X @tab X -@item @file{/dev/stdin} special file @tab X @tab X @tab X +@item @file{/dev/stdin} special file @tab X @tab @tab X @item @file{/dev/stdout} special file @tab X @tab X @tab X @item @file{/dev/stderr} special file @tab X @tab X @tab X @item @code{**} and @code{**=} operators @tab X @tab @tab X @@ -27171,7 +29256,6 @@ to implementors to implement ranges in whatever way they choose. The @command{gawk} maintainer chose to apply the pre-POSIX meaning in all cases: the default regexp matching; with @option{--traditional}, and with @option{--posix}; in all cases, @command{gawk} remains POSIX compliant. - @node Contributors @appendixsec Major Contributors to @command{gawk} @cindex @command{gawk}, list of contributors to @@ -27304,6 +29388,7 @@ the various PC platforms. Christos Zoulas provided the @code{extension()} built-in function for dynamically adding new modules. +(This was removed at @command{gawk} 4.1.) @item @cindex Kahrs, J@"urgen @@ -27367,7 +29452,7 @@ environments. @cindex Haque, John John Haque reworked the @command{gawk} internals to use a byte-code engine, -providing the @command{dgawk} debugger for @command{awk} programs. +providing the @command{gawk} debugger for @command{awk} programs. @item @cindex Yawitz, Efraim @@ -27933,10 +30018,6 @@ install-info --info-dir=x:/usr/info x:/usr/info/gawkinet.info The binary distribution may contain a separate file containing additional or more detailed installation instructions. -As of April, 2012, up to date @command{gawk} binaries for MS Windows -are available from @uref{http://sourceforge.net/projects/ezwinports/files/, -Eli Zaretskii's ports project}. - @node PC Compiling @appendixsubsubsec Compiling @command{gawk} for PC Operating Systems @@ -28641,7 +30722,7 @@ since approximately 2003. @item @command{pawk} Nelson H.F.@: Beebe at the University of Utah has modified Brian Kernighan's @command{awk} to provide timing and profiling information. -It is different from @command{pgawk} +It is different from @command{gawk} with the @option{--profile} option. (@pxref{Profiling}), in that it uses CPU-based profiling, not line-count profiling. You may find it at either @@ -28736,8 +30817,6 @@ maintainers of @command{gawk}. Everything in it applies specifically to * Compatibility Mode:: How to disable certain @command{gawk} extensions. * Additions:: Making Additions To @command{gawk}. -* Dynamic Extensions:: Adding new built-in functions to - @command{gawk}. * Future Extensions:: New features that may be implemented one day. @end menu @@ -28783,6 +30862,8 @@ as well as any considerations you should bear in mind. @command{gawk}. * New Ports:: Porting @command{gawk} to a new operating system. +* Derived Files:: Why derived files are kept in the + @command{git} repository. @end menu @node Accessing The Source @@ -28814,7 +30895,7 @@ git clone http://git.savannah.gnu.org/r/gawk.git @end example Once you have made changes, you can use @samp{git diff} to produce a -patch, and send that to the @command{gawk} maintainer; see @ref{Bugs} +patch, and send that to the @command{gawk} maintainer; see @ref{Bugs}, for how to do that. Finally, if you cannot install Git (e.g., if it hasn't been ported @@ -28926,7 +31007,8 @@ of @code{switch} statements, instead of just the plain pointer or character value. @item -Use the @code{TRUE}, @code{FALSE} and @code{NULL} symbolic constants +Use @code{true}, @code{false} for @code{bool} values, +the @code{NULL} symbolic constant for pointer values, and the character constant @code{'\0'} where appropriate, instead of @code{1} and @code{0}. @@ -28973,8 +31055,9 @@ You will also have to sign paperwork for your documentation changes. Submit changes as unified diffs. Use @samp{diff -u -r -N} to compare the original @command{gawk} source tree with your version. -I recommend using the GNU version of @command{diff}. -Send the output produced by either run of @command{diff} to me when you +I recommend using the GNU version of @command{diff}, or best of all, +@samp{git diff} or @samp{git format-patch}. +Send the output produced by @command{diff} to me when you submit your changes. (@xref{Bugs}, for the electronic mail information.) @@ -29100,802 +31183,188 @@ operating systems' code that is already there. In the code that you supply and maintain, feel free to use a coding style and brace layout that suits your taste. -@node Dynamic Extensions -@appendixsec Adding New Built-in Functions to @command{gawk} -@cindex Robinson, Will -@cindex robot, the -@cindex Lost In Space -@quotation -@i{Danger Will Robinson! Danger!!@* -Warning! Warning!}@* -The Robot -@end quotation +@node Derived Files +@appendixsubsec Why Generated Files Are Kept In @command{git} -@c STARTOFRANGE gladfgaw -@cindex @command{gawk}, functions, adding -@c STARTOFRANGE adfugaw -@cindex adding, functions to @command{gawk} -@c STARTOFRANGE fubadgaw -@cindex functions, built-in, adding to @command{gawk} -It is possible to add new built-in -functions to @command{gawk} using dynamically loaded libraries. This -facility is available on systems (such as GNU/Linux) that support -the C @code{dlopen()} and @code{dlsym()} functions. -This @value{SECTION} describes how to write and use dynamically -loaded extensions for @command{gawk}. -Experience with programming in -C or C++ is necessary when reading this @value{SECTION}. +@c From emails written March 22, 2012, to the gawk developers list. -@quotation CAUTION -The facilities described in this @value{SECTION} -are very much subject to change in a future @command{gawk} release. -Be aware that you may have to re-do everything, -at some future time. - -If you have written your own dynamic extensions, -be sure to recompile them for each new @command{gawk} release. -There is no guarantee of binary compatibility between different -releases, nor will there ever be such a guarantee. -@end quotation - -@quotation NOTE -When @option{--sandbox} is specified, extensions are disabled -(@pxref{Options}. -@end quotation +If you look at the @command{gawk} source in the @command{git} +repository, you will notice that it includes files that are automatically +generated by GNU infrastructure tools, such as @file{Makefile.in} from +@command{automake} and even @file{configure} from @command{autoconf}. -@menu -* Internals:: A brief look at some @command{gawk} internals. -* Plugin License:: A note about licensing. -* Sample Library:: A example of new functions. -@end menu +This is different from many Free Software projects that do not store +the derived files, because that keeps the repository less cluttered, +and it is easier to see the substantive changes when comparing versions +and trying to understand what changed between commits. -@node Internals -@appendixsubsec A Minimal Introduction to @command{gawk} Internals -@c STARTOFRANGE gawint -@cindex @command{gawk}, internals - -The truth is that @command{gawk} was not designed for simple extensibility. -The facilities for adding functions using shared libraries work, but -are something of a ``bag on the side.'' Thus, this tour is -brief and simplistic; would-be @command{gawk} hackers are encouraged to -spend some time reading the source code before trying to write -extensions based on the material presented here. Of particular note -are the files @file{awk.h}, @file{builtin.c}, and @file{eval.c}. -Reading @file{awkgram.y} in order to see how the parse tree is built -would also be of use. - -@cindex @code{awk.h} file (internal) -With the disclaimers out of the way, the following types, structure -members, functions, and macros are declared in @file{awk.h} and are of -use when writing extensions. The next @value{SECTION} -shows how they are used: +However, there are two reasons why the @command{gawk} maintainer +likes to have everything in the repository. -@table @code -@cindex floating-point, numbers, @code{AWKNUM} internal type -@cindex numbers, floating-point, @code{AWKNUM} internal type -@cindex @code{AWKNUM} internal type -@cindex internal type, @code{AWKNUM} -@item AWKNUM -An @code{AWKNUM} is the internal type of @command{awk} -floating-point numbers. Typically, it is a C @code{double}. - -@cindex @code{NODE} internal type -@cindex internal type, @code{NODE} -@cindex strings, @code{NODE} internal type -@cindex numbers, @code{NODE} internal type -@item NODE -Just about everything is done using objects of type @code{NODE}. -These contain both strings and numbers, as well as variables and arrays. - -@cindex @code{force_number()} internal function -@cindex internal function, @code{force_number()} -@cindex numeric, values -@item AWKNUM force_number(NODE *n) -This macro forces a value to be numeric. It returns the actual -numeric value contained in the node. -It may end up calling an internal @command{gawk} function. - -@cindex @code{force_string()} internal function -@cindex internal function, @code{force_string()} -@item void force_string(NODE *n) -This macro guarantees that a @code{NODE}'s string value is current. -It may end up calling an internal @command{gawk} function. -It also guarantees that the string is zero-terminated. - -@cindex @code{force_wstring()} internal function -@cindex internal function, @code{force_wstring()} -@item void force_wstring(NODE *n) -Similarly, this -macro guarantees that a @code{NODE}'s wide-string value is current. -It may end up calling an internal @command{gawk} function. -It also guarantees that the wide string is zero-terminated. - -@cindex @code{get_curfunc_arg_count()} internal function -@cindex internal function, @code{get_curfunc_arg_count()} -@item size_t get_curfunc_arg_count(void) -This function returns the actual number of parameters passed -to the current function. Inside the code of an extension -this can be used to determine the maximum index which is -safe to use with @code{get_actual_argument}. If this value is -greater than @code{nargs}, the function was -called incorrectly from the @command{awk} program. - -@cindex parameters@comma{} number of -@cindex @code{nargs} internal variable -@cindex internal variable, @code{nargs} -@item nargs -Inside an extension function, this is the maximum number of -expected parameters, as set by the @code{make_builtin()} function. - -@cindex @code{stptr} internal variable -@cindex internal variable, @code{stptr} -@cindex @code{stlen} internal variable -@cindex internal variable, @code{stlen} -@item n->stptr -@itemx n->stlen -The data and length of a @code{NODE}'s string value, respectively. -The string is @emph{not} guaranteed to be zero-terminated. -If you need to pass the string value to a C library function, save -the value in @code{n->stptr[n->stlen]}, assign @code{'\0'} to it, -call the routine, and then restore the value. - -@cindex @code{wstptr} internal variable -@cindex internal variable, @code{wstptr} -@cindex @code{wstlen} internal variable -@cindex internal variable, @code{wstlen} -@item n->wstptr -@itemx n->wstlen -The data and length of a @code{NODE}'s wide-string value, respectively. -Use @code{force_wstring()} to make sure these values are current. - -@cindex @code{type} internal variable -@cindex internal variable, @code{type} -@item n->type -The type of the @code{NODE}. This is a C @code{enum}. Values should -be one of @code{Node_var}, @code{Node_var_new}, or @code{Node_var_array} -for function parameters. - -@cindex @code{vname} internal variable -@cindex internal variable, @code{vname} -@item n->vname -The ``variable name'' of a node. This is not of much use inside -externally written extensions. - -@cindex arrays, associative, clearing -@cindex @code{assoc_clear()} internal function -@cindex internal function, @code{assoc_clear()} -@item void assoc_clear(NODE *n) -Clears the associative array pointed to by @code{n}. -Make sure that @samp{n->type == Node_var_array} first. - -@cindex arrays, elements, installing -@cindex @code{assoc_lookup()} internal function -@cindex internal function, @code{assoc_lookup()} -@item NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference) -Finds, and installs if necessary, array elements. -@code{symbol} is the array, @code{subs} is the subscript. -This is usually a value created with @code{make_string()} (see below). -@code{reference} should be @code{TRUE} if it is an error to use the -value before it is created. Typically, @code{FALSE} is the -correct value to use from extension functions. - -@cindex strings -@cindex @code{make_string()} internal function -@cindex internal function, @code{make_string()} -@item NODE *make_string(char *s, size_t len) -Take a C string and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is permanent storage; understanding -of @command{gawk} memory management is helpful. - -@cindex numbers -@cindex @code{make_number()} internal function -@cindex internal function, @code{make_number()} -@item NODE *make_number(AWKNUM val) -Take an @code{AWKNUM} and turn it into a pointer to a @code{NODE} that -can be stored appropriately. This is permanent storage; understanding -of @command{gawk} memory management is helpful. - - -@cindex nodes@comma{} duplicating -@cindex @code{dupnode()} internal function -@cindex internal function, @code{dupnode()} -@item NODE *dupnode(NODE *n) -Duplicate a node. In most cases, this increments an internal -reference count instead of actually duplicating the entire @code{NODE}; -understanding of @command{gawk} memory management is helpful. - -@cindex memory, releasing -@cindex @code{unref()} internal function -@cindex internal function, @code{unref()} -@item void unref(NODE *n) -This macro releases the memory associated with a @code{NODE} -allocated with @code{make_string()} or @code{make_number()}. -Understanding of @command{gawk} memory management is helpful. - -@cindex @code{make_builtin()} internal function -@cindex internal function, @code{make_builtin()} -@item void make_builtin(const char *name, NODE *(*func)(NODE *), int count) -Register a C function pointed to by @code{func} as new built-in -function @code{name}. @code{name} is a regular C string. @code{count} -is the maximum number of arguments that the function takes. -The function should be written in the following manner: - -@example -/* do_xxx --- do xxx function for gawk */ - -NODE * -do_xxx(int nargs) -@{ - @dots{} -@} -@end example +First, because it is then easy to reproduce any given version completely, +without relying upon the availability of (older, likely obsolete, and +maybe even impossible to find) other tools. -@cindex arguments, retrieving -@cindex @code{get_argument()} internal function -@cindex internal function, @code{get_argument()} -@item NODE *get_argument(int i) -This function is called from within a C extension function to get -the @code{i}-th argument from the function call. -The first argument is argument zero. - -@cindex @code{get_actual_argument()} internal function -@cindex internal function, @code{get_actual_argument()} -@item NODE *get_actual_argument(int i, -@itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ int@ optional,@ int@ wantarray); -This function retrieves a particular argument @code{i}. @code{wantarray} is @code{TRUE} -if the argument should be an array, @code{FALSE} otherwise. If @code{optional} is -@code{TRUE}, the argument need not have been supplied. If it wasn't, the return -value is @code{NULL}. It is a fatal error if @code{optional} is @code{TRUE} but -the argument was not provided. - -@cindex @code{get_scalar_argument()} internal macro -@cindex internal macro, @code{get_scalar_argument()} -@item get_scalar_argument(i, opt) -This is a convenience macro that calls @code{get_actual_argument()}. - -@cindex @code{get_array_argument()} internal macro -@cindex internal macro, @code{get_array_argument()} -@item get_array_argument(i, opt) -This is a convenience macro that calls @code{get_actual_argument()}. - -@cindex functions, return values@comma{} setting +As an extreme example, if you ever even think about trying to compile, +oh, say, the V7 @command{awk}, you will discover that not only do you +have to bootstrap the V7 @command{yacc} to do so, but you also need the +V7 @command{lex}. And the latter is pretty much impossible to bring up +on a modern GNU/Linux system.@footnote{We tried. It was painful.} -@cindex @code{ERRNO} variable -@cindex @code{update_ERRNO()} internal function -@cindex internal function, @code{update_ERRNO()} -@item void update_ERRNO(void) -This function is called from within a C extension function to set -the value of @command{gawk}'s @code{ERRNO} variable, based on the current -value of the C @code{errno} global variable. -It is provided as a convenience. +(Or, let's say @command{gawk} 1.2 required @command{bison} whatever-it-was +in 1989 and that there was no @file{awkgram.c} file in the repository. Is +there a guarantee that we could find that @command{bison} version? Or that +@emph{it} would build?) -@cindex @code{ERRNO} variable -@cindex @code{update_ERRNO_saved()} internal function -@cindex internal function, @code{update_ERRNO_saved()} -@item void update_ERRNO_saved(int errno_saved) -This function is called from within a C extension function to set -the value of @command{gawk}'s @code{ERRNO} variable, based on the error -value provided as the argument. -It is provided as a convenience. +If the repository has all the generated files, then it's easy to just check +them out and build. (Or @emph{easier}, depending upon how far back we go. +@code{:-)}) -@cindex @code{ENVIRON} array -@cindex @code{PROCINFO} array -@cindex @code{register_deferred_variable()} internal function -@cindex internal function, @code{register_deferred_variable()} -@item void register_deferred_variable(const char *name, NODE *(*load_func)(void)) -This function is called to register a function to be called when a -reference to an undefined variable with the given name is encountered. -The callback function will never be called if the variable exists already, -so, unless the calling code is running at program startup, it should first -check whether a variable of the given name already exists. -The argument function must return a pointer to a @code{NODE} containing the -newly created variable. This function is used to implement the builtin -@code{ENVIRON} and @code{PROCINFO} arrays, so you can refer to them -for examples. - -@cindex @code{IOBUF} internal structure -@cindex internal structure, @code{IOBUF} -@cindex @code{iop_alloc()} internal function -@cindex internal function, @code{iop_alloc()} -@cindex @code{get_record()} input method -@cindex @code{close_func}() input method -@cindex @code{INVALID_HANDLE} internal constant -@cindex internal constant, @code{INVALID_HANDLE} -@cindex XML (eXtensible Markup Language) -@cindex eXtensible Markup Language (XML) -@cindex @code{register_open_hook()} internal function -@cindex internal function, @code{register_open_hook()} -@item void register_open_hook(void *(*open_func)(IOBUF *)) -This function is called to register a function to be called whenever -a new data file is opened, leading to the creation of an @code{IOBUF} -structure in @code{iop_alloc()}. After creating the new @code{IOBUF}, -@code{iop_alloc()} will call (in reverse order of registration, so the last -function registered is called first) each open hook until one returns -non-@code{NULL}. If any hook returns a non-@code{NULL} value, that value is assigned -to the @code{IOBUF}'s @code{opaque} field (which will presumably point -to a structure containing additional state associated with the input -processing), and no further open hooks are called. - -The function called will most likely want to set the @code{IOBUF}'s -@code{get_record} method to indicate that future input records should -be retrieved by calling that method instead of using the standard -@command{gawk} input processing. - -And the function will also probably want to set the @code{IOBUF}'s -@code{close_func} method to be called when the file is closed to clean -up any state associated with the input. - -Finally, hook functions should be prepared to receive an @code{IOBUF} -structure where the @code{fd} field is set to @code{INVALID_HANDLE}, -meaning that @command{gawk} was not able to open the file itself. In -this case, the hook function must be able to successfully open the file -and place a valid file descriptor there. - -Currently, for example, the hook function facility is used to implement -the XML parser shared library extension. For more info, please look in -@file{awk.h} and in @file{io.c}. -@end table +And that brings us to the second (and stronger) reason why all the files +really need to be in @command{git}. It boils down to who do you cater +to---the @command{gawk} developer(s), or the user who just wants to check +out a version and try it out? -An argument that is supposed to be an array needs to be handled with -some extra code, in case the array being passed in is actually -from a function parameter. +The @command{gawk} maintainer +wants it to be possible for any interested @command{awk} user in the +world to just clone the repository, check out the branch of interest and +build it. Without their having to have the correct version(s) of the +autotools.@footnote{There is one GNU program that is (in our opinion) +severely difficult to bootstrap from the @command{git} repository. For +example, on the author's old (but still working) PowerPC macintosh with +Mac OS X 10.5, it was necessary to bootstrap a ton of software, starting +with @command{git} itself, in order to try to work with the latest code. +It's not pleasant, and especially on older systems, it's a big waste +of time. -The following boilerplate code shows how to do this: +Starting with the latest tarball was no picnic either. The maintainers +had dropped @file{.gz} and @file{.bz2} files and only distribute +@file{.tar.xz} files. It was necessary to bootstrap @command{xz} first!} +That is the point of the @file{bootstrap.sh} file. It touches the +various other files in the right order such that @example -NODE *the_arg; - -/* assume need 3rd arg, 0-based */ -the_arg = get_array_argument(2, FALSE); +# The canonical incantation for building GNU software: +./bootstrap.sh && ./configure && make @end example -Again, you should spend time studying the @command{gawk} internals; -don't just blindly copy this code. -@c ENDOFRANGE gawint - -@node Plugin License -@appendixsubsec Extension Licensing - -Every dynamic extension should define the global symbol -@code{plugin_is_GPL_compatible} to assert that it has been licensed under -a GPL-compatible license. If this symbol does not exist, @command{gawk} -will emit a fatal error and exit. - -The declared type of the symbol should be @code{int}. It does not need -to be in any allocated section, though. The code merely asserts that -the symbol exists in the global scope. Something like this is enough: - -@example -int plugin_is_GPL_compatible; -@end example - -@node Sample Library -@appendixsubsec Example: Directory and File Operation Built-ins -@c STARTOFRANGE chdirg -@cindex @code{chdir()} function@comma{} implementing in @command{gawk} -@c STARTOFRANGE statg -@cindex @code{stat()} function@comma{} implementing in @command{gawk} -@c STARTOFRANGE filre -@cindex files, information about@comma{} retrieving -@c STARTOFRANGE dirch -@cindex directories, changing - -Two useful functions that are not in @command{awk} are @code{chdir()} -(so that an @command{awk} program can change its directory) and -@code{stat()} (so that an @command{awk} program can gather information about -a file). -This @value{SECTION} implements these functions for @command{gawk} in an -external extension library. - -@menu -* Internal File Description:: What the new functions will do. -* Internal File Ops:: The code for internal file operations. -* Using Internal File Ops:: How to use an external extension. -@end menu - -@node Internal File Description -@appendixsubsubsec Using @code{chdir()} and @code{stat()} - -This @value{SECTION} shows how to use the new functions at the @command{awk} -level once they've been integrated into the running @command{gawk} -interpreter. -Using @code{chdir()} is very straightforward. It takes one argument, -the new directory to change to: - -@example -@dots{} -newdir = "/home/arnold/funstuff" -ret = chdir(newdir) -if (ret < 0) @{ - printf("could not change to %s: %s\n", - newdir, ERRNO) > "/dev/stderr" - exit 1 -@} -@dots{} -@end example - -The return value is negative if the @code{chdir} failed, -and @code{ERRNO} -(@pxref{Built-in Variables}) -is set to a string indicating the error. +@noindent +will @emph{just work}. -Using @code{stat()} is a bit more complicated. -The C @code{stat()} function fills in a structure that has a fair -amount of information. -The right way to model this in @command{awk} is to fill in an associative -array with the appropriate information: +This is extremely important for the @code{master} and +@code{gawk-@var{X}.@var{Y}-stable} branches. -@c broke printf for page breaking -@example -file = "/home/arnold/.profile" -fdata[1] = "x" # force `fdata' to be an array -ret = stat(file, fdata) -if (ret < 0) @{ - printf("could not stat %s: %s\n", - file, ERRNO) > "/dev/stderr" - exit 1 -@} -printf("size of %s is %d bytes\n", file, fdata["size"]) -@end example +Further, the @command{gawk} maintainer would argue that it's also +important for the @command{gawk} developers. When he tried to check out +the @code{xgawk} branch@footnote{A branch created by one of the other +developers that did not include the generated files.} to build it, he +couldn't. (No @file{ltmain.sh} file, and he had no idea how to create it, +and that was not the only problem.) -The @code{stat()} function always clears the data array, even if -the @code{stat()} fails. It fills in the following elements: +He felt @emph{extremely} frustrated. With respect to that branch, +the maintainer is no different than Jane User who wants to try to build +@code{gawk-4.0-stable} or @code{master} from the repository. -@table @code -@item "name" -The name of the file that was @code{stat()}'ed. +Thus, the maintainer thinks that it's not just important, but critical, +that for any given branch, the above incantation @emph{just works}. -@item "dev" -@itemx "ino" -The file's device and inode numbers, respectively. +@c So - that's my reasoning and philosophy. -@item "mode" -The file's mode, as a numeric value. This includes both the file's -type and its permissions. - -@item "nlink" -The number of hard links (directory entries) the file has. - -@item "uid" -@itemx "gid" -The numeric user and group ID numbers of the file's owner. - -@item "size" -The size in bytes of the file. - -@item "blocks" -The number of disk blocks the file actually occupies. This may not -be a function of the file's size if the file has holes. +What are some of the consequences and/or actions to take? -@item "atime" -@itemx "mtime" -@itemx "ctime" -The file's last access, modification, and inode update times, -respectively. These are numeric timestamps, suitable for formatting -with @code{strftime()} -(@pxref{Built-in}). +@enumerate 1 +@item +We don't mind that there are differing files in the different branches +as a result of different versions of the autotools. -@item "pmode" -The file's ``printable mode.'' This is a string representation of -the file's type and permissions, such as what is produced by -@samp{ls -l}---for example, @code{"drwxr-xr-x"}. +@enumerate A +@item +It's the maintainer's job to merge them and he will deal with it. -@item "type" -A printable string representation of the file's type. The value -is one of the following: +@item +He is really good at @samp{git diff x y > /tmp/diff1 ; gvim /tmp/diff1} to +remove the diffs that aren't of interest in order to review code. @code{:-)} +@end enumerate -@table @code -@item "blockdev" -@itemx "chardev" -The file is a block or character device (``special file''). +@item +It would certainly help if everyone used the same versions of the GNU tools +as he does, which in general are the latest released versions of +@command{automake}, +@command{autoconf}, +@command{bison}, +and +@command{gettext}. @ignore -@item "door" -The file is a Solaris ``door'' (special file used for -interprocess communications). +If it would help if I sent out an "I just upgraded to version x.y +of tool Z" kind of message to this list, I can do that. Up until +now it hasn't been a real issue since I'm the only one who's been +dorking with the configuration machinery. @end ignore -@item "directory" -The file is a directory. - -@item "fifo" -The file is a named-pipe (also known as a FIFO). - -@item "file" -The file is just a regular file. - -@item "socket" -The file is an @code{AF_UNIX} (``Unix domain'') socket in the -filesystem. - -@item "symlink" -The file is a symbolic link. -@end table -@end table - -Several additional elements may be present depending upon the operating -system and the type of the file. You can test for them in your @command{awk} -program by using the @code{in} operator -(@pxref{Reference to Elements}): - -@table @code -@item "blksize" -The preferred block size for I/O to the file. This field is not -present on all POSIX-like systems in the C @code{stat} structure. - -@item "linkval" -If the file is a symbolic link, this element is the name of the -file the link points to (i.e., the value of the link). - -@item "rdev" -@itemx "major" -@itemx "minor" -If the file is a block or character device file, then these values -represent the numeric device number and the major and minor components -of that number, respectively. -@end table - -@node Internal File Ops -@appendixsubsubsec C Code for @code{chdir()} and @code{stat()} - -Here is the C code for these extensions. They were written for -GNU/Linux. The code needs some more work for complete portability -to other POSIX-compliant systems:@footnote{This version is edited -slightly for presentation. See -@file{extension/filefuncs.c} in the @command{gawk} distribution -for the complete version.} - -@c break line for page breaking -@example -#include "awk.h" - -#include <sys/sysmacros.h> - -int plugin_is_GPL_compatible; - -/* do_chdir --- provide dynamically loaded chdir() builtin for gawk */ - -static NODE * -do_chdir(int nargs) -@{ - NODE *newdir; - int ret = -1; - - if (do_lint && get_curfunc_arg_count() != 1) - lintwarn("chdir: called with incorrect number of arguments"); - - newdir = get_scalar_argument(0, FALSE); -@end example - -The file includes the @code{"awk.h"} header file for definitions -for the @command{gawk} internals. It includes @code{<sys/sysmacros.h>} -for access to the @code{major()} and @code{minor}() macros. - -@cindex programming conventions, @command{gawk} internals -By convention, for an @command{awk} function @code{foo}, the function that -implements it is called @samp{do_foo}. The function should take -a @samp{int} argument, usually called @code{nargs}, that -represents the number of defined arguments for the function. The @code{newdir} -variable represents the new directory to change to, retrieved -with @code{get_scalar_argument()}. Note that the first argument is -numbered zero. - -This code actually accomplishes the @code{chdir()}. It first forces -the argument to be a string and passes the string value to the -@code{chdir()} system call. If the @code{chdir()} fails, @code{ERRNO} -is updated. - -@example - (void) force_string(newdir); - ret = chdir(newdir->stptr); - if (ret < 0) - update_ERRNO(); -@end example - -Finally, the function returns the return value to the @command{awk} level: - -@example - return make_number((AWKNUM) ret); -@} -@end example - -The @code{stat()} built-in is more involved. First comes a function -that turns a numeric mode into a printable representation -(e.g., 644 becomes @samp{-rw-r--r--}). This is omitted here for brevity: +@enumerate A +@item +Installing from source is quite easy. It's how the maintainer worked for years +under Fedora. +He had @file{/usr/local/bin} at the front of hs @env{PATH} and just did: -@c break line for page breaking @example -/* format_mode --- turn a stat mode field into something readable */ - -static char * -format_mode(unsigned long fmode) -@{ - @dots{} -@} +wget http://ftp.gnu.org/gnu/@var{package}/@var{package}-@var{x}.@var{y}.@var{z}.tar.gz +tar -xpzvf @var{package}-@var{x}.@var{y}.@var{z}.tar.gz +cd @var{package}-@var{x}.@var{y}.@var{z} +./configure && make && make check +make install # as root @end example -Next comes the @code{do_stat()} function. It starts with -variable declarations and argument checking: +@item +These days the maintainer uses Ubuntu 10.11 which is medium current, but +he is already doing the above for @command{autoconf} and @command{bison}. @ignore -Changed message for page breaking. Used to be: - "stat: called with incorrect number of arguments (%d), should be 2", +(C. Rant: Recent Linux versions with GNOME 3 really suck. What + are all those people thinking? Fedora 15 was such a bust it drove + me to Ubuntu, but Ubuntu 11.04 and 11.10 are totally unusable from + a UI perspective. Bleah.) @end ignore -@example -/* do_stat --- provide a stat() function for gawk */ - -static NODE * -do_stat(int nargs) -@{ - NODE *file, *array, *tmp; - struct stat sbuf; - int ret; - NODE **aptr; - char *pmode; /* printable mode */ - char *type = "unknown"; - - if (do_lint && get_curfunc_arg_count() > 2) - lintwarn("stat: called with too many arguments"); -@end example - -Then comes the actual work. First, the function gets the arguments. -Then, it always clears the array. -The code use @code{lstat()} (instead of @code{stat()}) -to get the file information, -in case the file is a symbolic link. -If there's an error, it sets @code{ERRNO} and returns: - -@c comment made multiline for page breaking -@example - /* file is first arg, array to hold results is second */ - file = get_scalar_argument(0, FALSE); - array = get_array_argument(1, FALSE); - - /* empty out the array */ - assoc_clear(array); - - /* lstat the file, if error, set ERRNO and return */ - (void) force_string(file); - ret = lstat(file->stptr, & sbuf); - if (ret < 0) @{ - update_ERRNO(); - return make_number((AWKNUM) ret); - @} -@end example - -Now comes the tedious part: filling in the array. Only a few of the -calls are shown here, since they all follow the same pattern: - -@example - /* fill in the array */ - aptr = assoc_lookup(array, tmp = make_string("name", 4), FALSE); - *aptr = dupnode(file); - unref(tmp); - - aptr = assoc_lookup(array, tmp = make_string("mode", 4), FALSE); - *aptr = make_number((AWKNUM) sbuf.st_mode); - unref(tmp); - - aptr = assoc_lookup(array, tmp = make_string("pmode", 5), FALSE); - pmode = format_mode(sbuf.st_mode); - *aptr = make_string(pmode, strlen(pmode)); - unref(tmp); -@end example - -When done, return the @code{lstat()} return value: - -@example - - return make_number((AWKNUM) ret); -@} -@end example - -@cindex programming conventions, @command{gawk} internals -Finally, it's necessary to provide the ``glue'' that loads the -new function(s) into @command{gawk}. By convention, each library has -a routine named @code{dlload()} that does the job: - -@example -/* dlload --- load new builtins in this library */ - -NODE * -dlload(NODE *tree, void *dl) -@{ - make_builtin("chdir", do_chdir, 1); - make_builtin("stat", do_stat, 2); - return make_number((AWKNUM) 0); -@} -@end example +@end enumerate -And that's it! As an exercise, consider adding functions to -implement system calls such as @code{chown()}, @code{chmod()}, -and @code{umask()}. +@ignore +@item +If someone still feels really strongly about all this, then perhaps they +can have two branches, one for their development with just the clean +changes, and one that is buildable (xgawk and xgawk-buildable, maybe). +Or, as I suggested in another mail, make commits in pairs, the first with +the "real" changes and the second with "everything else needed for + building". +@end ignore +@end enumerate -@node Using Internal File Ops -@appendixsubsubsec Integrating the Extensions +Most of the above was originally written by the maintainer to other +@command{gawk} developers. It raised the objection from one of +the developers ``@dots{} that anybody pulling down the source from +@command{git} is not an end user.'' -@cindex @command{gawk}, interpreter@comma{} adding code to -Now that the code is written, it must be possible to add it at -runtime to the running @command{gawk} interpreter. First, the -code must be compiled. Assuming that the functions are in -a file named @file{filefuncs.c}, and @var{idir} is the location -of the @command{gawk} include files, -the following steps create -a GNU/Linux shared library: +However, this is not true. There are ``power @command{awk} users'' +who can build @command{gawk} (using the magic incantation shown previously) +but who can't program in C. Thus, the major branches should be +kept buildable all the time. -@example -$ @kbd{gcc -fPIC -shared -DHAVE_CONFIG_H -c -O -g -I@var{idir} filefuncs.c} -$ @kbd{ld -o filefuncs.so -shared filefuncs.o} -@end example +It was then suggested that there be a @command{cron} job to create +nightly tarballs of ``the source.'' Here, the problem is that there +are source trees, corresponding to the various branches! So, +nightly tar balls aren't the answer, especially as the repository can go +for weeks without significant change being introduced. -@cindex @code{extension()} function (@command{gawk}) -Once the library exists, it is loaded by calling the @code{extension()} -built-in function. -This function takes two arguments: the name of the -library to load and the name of a function to call when the library -is first loaded. This function adds the new functions to @command{gawk}. -It returns the value returned by the initialization function -within the shared library: +Fortunately, the @command{git} server can meet this need. For any given +branch named @var{branchname}, use: @example -# file testff.awk -BEGIN @{ - extension("./filefuncs.so", "dlload") - - chdir(".") # no-op - - data[1] = 1 # force `data' to be an array - print "Info for testff.awk" - ret = stat("testff.awk", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "testff.awk modified:", - strftime("%m %d %y %H:%M:%S", data["mtime"]) - - print "\nInfo for JUNK" - ret = stat("JUNK", data) - print "ret =", ret - for (i in data) - printf "data[\"%s\"] = %s\n", i, data[i] - print "JUNK modified:", strftime("%m %d %y %H:%M:%S", data["mtime"]) -@} +wget http://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-@var{branchname}.tar.gz @end example -Here are the results of running the program: +@noindent +to retrieve a snapshot of the given branch. -@example -$ @kbd{gawk -f testff.awk} -@print{} Info for testff.awk -@print{} ret = 0 -@print{} data["size"] = 607 -@print{} data["ino"] = 14945891 -@print{} data["name"] = testff.awk -@print{} data["pmode"] = -rw-rw-r-- -@print{} data["nlink"] = 1 -@print{} data["atime"] = 1293993369 -@print{} data["mtime"] = 1288520752 -@print{} data["mode"] = 33204 -@print{} data["blksize"] = 4096 -@print{} data["dev"] = 2054 -@print{} data["type"] = file -@print{} data["gid"] = 500 -@print{} data["uid"] = 500 -@print{} data["blocks"] = 8 -@print{} data["ctime"] = 1290113572 -@print{} testff.awk modified: 10 31 10 12:25:52 -@print{} -@print{} Info for JUNK -@print{} ret = -1 -@print{} JUNK modified: 01 01 70 02:00:00 -@end example -@c ENDOFRANGE filre -@c ENDOFRANGE dirch -@c ENDOFRANGE statg -@c ENDOFRANGE chdirg -@c ENDOFRANGE gladfgaw -@c ENDOFRANGE adfugaw -@c ENDOFRANGE fubadgaw @node Future Extensions @appendixsec Probable Future Extensions @@ -29954,12 +31423,8 @@ Following is a list of probable future changes visible at the @c these are ordered by likelihood @table @asis -@item Loadable module interface -It is not clear that the @command{awk}-level interface to the -modules facility is as good as it should be. The interface needs to be -redesigned, particularly taking namespace issues into account, as -well as possibly including issues such as library search path order -and versioning. +@item Databases +It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array. @item @code{RECLEN} variable for fixed-length records Along with @code{FIELDWIDTHS}, this would speed up the processing of @@ -29967,9 +31432,6 @@ fixed-length records. @code{PROCINFO["RS"]} would be @code{"RS"} or @code{"RECLEN"}, depending upon which kind of record processing is in effect. -@item Databases -It may be possible to map a GDBM/NDBM/SDBM file into an @command{awk} array. - @item More @code{lint} warnings There are more things that could be checked for portability. @end table @@ -29978,21 +31440,6 @@ Following is a list of probable improvements that will make @command{gawk}'s source code easier to work with: @table @asis -@item Loadable module mechanics -The current extension mechanism works -(@pxref{Dynamic Extensions}), -but is rather primitive. It requires a fair amount of manual work -to create and integrate a loadable module. -Nor is the current mechanism as portable as might be desired. -The GNU @command{libtool} package provides a number of features that -would make using loadable modules much easier. -@command{gawk} should be changed to use @command{libtool}. - -@item Loadable module internals -The API to its internals that @command{gawk} ``exports'' should be revised. -Too many things are needlessly exposed. A new API should be designed -and implemented to make module writing easier. - @item Better array subscript management @command{gawk}'s management of array subscript storage could use revamping, so that using the same value to index multiple arrays only @@ -30024,7 +31471,6 @@ other introductory texts that you should refer to instead.) @menu * Basic High Level:: The high level view. * Basic Data Typing:: A very quick intro to data types. -* Floating Point Issues:: Stuff to know about floating-point numbers. @end menu @node Basic High Level @@ -30175,47 +31621,10 @@ Individual variables, as well as numeric and string variables, are referred to as @dfn{scalar} values. Groups of values, such as arrays, are not scalars. -@cindex integers -@cindex floating-point, numbers -@cindex numbers, floating-point -Within computers, there are two kinds of numeric values: @dfn{integers} -and @dfn{floating-point}. -In school, integer values were referred to as ``whole'' numbers---that is, -numbers without any fractional part, such as 1, 42, or @minus{}17. -The advantage to integer numbers is that they represent values exactly. -The disadvantage is that their range is limited. On most systems, -this range is @minus{}2,147,483,648 to 2,147,483,647. -However, many systems now support a range from -@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. - -@cindex unsigned integers -@cindex integers, unsigned -Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}. -Signed values may be negative or positive, with the range of values just -described. -Unsigned values are always positive. On most systems, -the range is from 0 to 4,294,967,295. -However, many systems now support a range from -0 to 18,446,744,073,709,551,615. - -@cindex double precision floating-point -@cindex single precision floating-point -Floating-point numbers represent what are called ``real'' numbers; i.e., -those that do have a fractional part, such as 3.1415927. -The advantage to floating-point numbers is that they -can represent a much larger range of values. -The disadvantage is that there are numbers that they cannot represent -exactly. -@command{awk} uses @dfn{double precision} floating-point numbers, which -can hold more digits than @dfn{single precision} -floating-point numbers. -Floating-point issues are discussed more fully in -@ref{Floating Point Issues}. - -At the very lowest level, computers store values as groups of binary digits, -or @dfn{bits}. Modern computers group bits into groups of eight, called @dfn{bytes}. -Advanced applications sometimes have to manipulate bits directly, -and @command{gawk} provides functions for doing so. +@ref{General Arithmetic}, provided a basic introduction to numeric +types (integer and floating-point) and how they are used in a computer. +Please review that information, including a number of caveats that +were presented. @cindex null strings While you are probably used to the idea of a number without a value (i.e., zero), @@ -30239,6 +31648,11 @@ plus 0 times 1, or decimal 10. Octal and hexadecimal are discussed more in @ref{Nondecimal-numbers}. +At the very lowest level, computers store values as groups of binary digits, +or @dfn{bits}. Modern computers group bits into groups of eight, called @dfn{bytes}. +Advanced applications sometimes have to manipulate bits directly, +and @command{gawk} provides functions for doing so. + Programs are written in programming languages. Hundreds, if not thousands, of programming languages exist. One of the most popular is the C programming language. @@ -30258,239 +31672,6 @@ standard for C. This standard became an ISO standard in 1990. In 1999, a revised ISO C standard was approved and released. Where it makes sense, POSIX @command{awk} is compatible with 1999 ISO C. -@node Floating Point Issues -@appendixsec Floating-Point Number Caveats - -As mentioned earlier, floating-point numbers represent what are called -``real'' numbers, i.e., those that have a fractional part. @command{awk} -uses double precision floating-point numbers to represent all -numeric values. This @value{SECTION} describes some of the issues -involved in using floating-point numbers. - -There is a very nice -@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic} -by David Goldberg, -``What Every Computer Scientist Should Know About Floating-point Arithmetic,'' -@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48. -This is worth reading if you are interested in the details, -but it does require a background in computer science. - -@menu -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. -@end menu - -@node String Conversion Precision -@appendixsubsec The String Value Can Lie - -Internally, @command{awk} keeps both the numeric value -(double precision floating-point) and the string value for a variable. -Separately, @command{awk} keeps -track of what type the variable has -(@pxref{Typing and Comparison}), -which plays a role in how variables are used in comparisons. - -It is important to note that the string value for a number may not -reflect the full value (all the digits) that the numeric value -actually contains. -The following program (@file{values.awk}) illustrates this: - -@example -@{ - sum = $1 + $2 - # see it for what it is - printf("sum = %.12g\n", sum) - # use CONVFMT - a = "<" sum ">" - print "a =", a - # use OFMT - print "sum =", sum -@} -@end example - -@noindent -This program shows the full value of the sum of @code{$1} and @code{$2} -using @code{printf}, and then prints the string values obtained -from both automatic conversion (via @code{CONVFMT}) and -from printing (via @code{OFMT}). - -Here is what happens when the program is run: - -@example -$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk} -@print{} sum = 4.8888888 -@print{} a = <4.88889> -@print{} sum = 4.88889 -@end example - -This makes it clear that the full numeric value is different from -what the default string representations show. - -@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with -at least six significant digits. For some applications, you might want to -change it to specify more precision. -On most modern machines, most of the time, -17 digits is enough to capture a floating-point number's -value exactly.@footnote{Pathological cases can require up to -752 digits (!), but we doubt that you need to worry about this.} - -@node Unexpected Results -@appendixsubsec Floating Point Numbers Are Not Abstract Numbers - -@cindex floating-point, numbers -Unlike numbers in the abstract sense (such as what you studied in high school -or college math), numbers stored in computers are limited in certain ways. -They cannot represent an infinite number of digits, nor can they always -represent things exactly. -In particular, -floating-point numbers cannot -always represent values exactly. Here is an example: - -@example -$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'} -515.79 -@print{} 0000051579 -515.80 -@print{} 0000051579 -515.81 -@print{} 0000051580 -515.82 -@print{} 0000051582 -@kbd{@value{CTL}-d} -@end example - -@noindent -This shows that some values can be represented exactly, -whereas others are only approximated. This is not a ``bug'' -in @command{awk}, but simply an artifact of how computers -represent numbers. - -@cindex negative zero -@cindex positive zero -@cindex zero@comma{} negative vs.@: positive -Another peculiarity of floating-point numbers on modern systems -is that they often have more than one representation for the number zero! -In particular, it is possible to represent ``minus zero'' as well as -regular, or ``positive'' zero. - -This example shows that negative and positive zero are distinct values -when stored internally, but that they are in fact equal to each other, -as well as to ``regular'' zero: - -@example -$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0} -> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz} -> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0} -> @kbd{@}'} -@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1 -@print{} mz == 0 -> 1, pz == 0 -> 1 -@end example - -It helps to keep this in mind should you process numeric data -that contains negative zero values; the fact that the zero is negative -is noted and can affect comparisons. - -@node POSIX Floating Point Problems -@appendixsubsec Standards Versus Existing Practice - -Historically, @command{awk} has converted any non-numeric looking string -to the numeric value zero, when required. Furthermore, the original -definition of the language and the original POSIX standards specified that -@command{awk} only understands decimal numbers (base 10), and not octal -(base 8) or hexadecimal numbers (base 16). - -Changes in the language of the -2001 and 2004 POSIX standard can be interpreted to imply that @command{awk} -should support additional features. These features are: - -@itemize @bullet -@item -Interpretation of floating point data values specified in hexadecimal -notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not} -source code constants.) - -@item -Support for the special IEEE 754 floating point values ``Not A Number'' -(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf''). -In particular, the format for these values is as specified by the ISO 1999 -C standard, which ignores case and can allow machine-dependent additional -characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}. -@end itemize - -The first problem is that both of these are clear changes to historical -practice: - -@itemize @bullet -@item -The @command{gawk} maintainer feels that supporting hexadecimal floating -point values, in particular, is ugly, and was never intended by the -original designers to be part of the language. - -@item -Allowing completely alphabetic strings to have valid numeric -values is also a very severe departure from historical practice. -@end itemize - -The second problem is that the @code{gawk} maintainer feels that this -interpretation of the standard, which requires a certain amount of -``language lawyering'' to arrive at in the first place, was not even -intended by the standard developers. In other words, ``we see how you -got where you are, but we don't think that that's where you want to be.'' - -The 2008 POSIX standard added explicit wording to allow, but not require, -that @command{awk} support hexadecimal floating point values and -special values for ``Not A Number'' and infinity. - -Although the @command{gawk} maintainer continues to feel that -providing those features is inadvisable, -nevertheless, on systems that support IEEE floating point, it seems -reasonable to provide @emph{some} way to support NaN and Infinity values. -The solution implemented in @command{gawk} is as follows: - -@itemize @bullet -@item -With the @option{--posix} command-line option, @command{gawk} becomes -``hands off.'' String values are passed directly to the system library's -@code{strtod()} function, and if it successfully returns a numeric value, -that is what's used.@footnote{You asked for it, you got it.} -By definition, the results are not portable across -different systems. They are also a little surprising: - -@example -$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'} -@print{} nan -$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'} -@print{} 3735928559 -@end example - -@item -Without @option{--posix}, @command{gawk} interprets the four strings -@samp{+inf}, -@samp{-inf}, -@samp{+nan}, -and -@samp{-nan} -specially, producing the corresponding special numeric values. -The leading sign acts a signal to @command{gawk} (and the user) -that the value is really numeric. Hexadecimal floating point is -not supported (unless you also use @option{--non-decimal-data}, -which is @emph{not} recommended). For example: - -@example -$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'} -@print{} 0 -$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'} -@print{} nan -$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'} -@print{} 0 -@end example - -@command{gawk} does ignore case in the four special values. -Thus @samp{+nan} and @samp{+NaN} are the same. -@end itemize - @c ENDOFRANGE procon @node Glossary @@ -32580,14 +33761,27 @@ ORA uses filename, thus the macro. Suggestions: ------------ -Enhance FIELDWIDTHS with some way to indicate "the rest of the record". -E.g., a length of 0 or -1 or something. May be "n"? - -Make FIELDWIDTHS be an array? - % Next edition: -% 1. Talk about common extensions, those in nawk, gawk, mawk -% 2. Use @code{foo} for variables and @code{foo()} for functions -% 3. Standardize the error messages from the functions and programs +% 1. Standardize the error messages from the functions and programs % in Chapters 12 and 13. -% 4. Nuke the BBS stuff and use something that won't be obsolete +% 2. Nuke the BBS stuff and use something that won't be obsolete + +From: Doug McIlroy <doug@cs.dartmouth.edu> +Date: Sat, 13 Oct 2012 19:55:25 -0400 +To: arnold@skeeve.com +Subject: Re: origin of the term "cookie"? + +I believe the term "cookie", for a more or less inscrutable +saying or crumb of information, was injected into Unix +jargon by Bob Morris, who used the word quite frequently. +It had no fixed meaning as it now does in browsers. + +The word had been around long before it was recognized in +the 8th edition glossary (earlier editions had no glossary): + +cookie a peculiar goodie, token, saying or remembrance +returned by or presented to a program. [I would say that +"returned by" would better read "produced by", and assume +responsibility for the inexactitude.] + +Doug McIlroy |