diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 1505 |
1 files changed, 1294 insertions, 211 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 83bd3b5d..8cd7e38e 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -79,9 +79,11 @@ @c some special symbols @iftex @set LEQ @math{@leq} +@set PI @math{@pi} @end iftex @ifnottex @set LEQ <= +@set PI @i{pi} @end ifnottex @ifnottex @@ -285,12 +287,14 @@ particular records in a file and perform operations upon them. * Functions:: Built-in and user-defined functions. * Internationalization:: Getting @command{gawk} to speak your language. +* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with + @command{gawk}. * Advanced Features:: Stuff for advanced users, specific to @command{gawk}. * Library Functions:: A Library of @command{awk} Functions. * Sample Programs:: Many @command{awk} programs with complete explanations. -* Debugger:: The @code{dgawk} debugger. +* Debugger:: The @code{gawk} debugger. * Language History:: The evolution of the @command{awk} language. * Installation:: Installing @command{gawk} under various @@ -393,6 +397,7 @@ particular records in a file and perform operations upon them. * Getline Notes:: Important things to know about @code{getline}. * Getline Summary:: Summary of @code{getline} Variants. +* Read Timeout:: Reading input with a timeout. * Command line directories:: What happens if you put a directory on the command line. * Print:: The @code{print} statement. @@ -550,6 +555,21 @@ particular records in a file and perform operations upon them. * I18N Portability:: @command{awk}-level portability issues. * I18N Example:: A simple i18n example. * Gawk I18N:: @command{gawk} is also internationalized. +* Floating-point Programming:: Effective floating-point programming. +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +* Arbitrary Precision Floats:: Arbitrary precision floating-point + arithmetic with @command{gawk}. +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point numbers. +* Integer Programming:: Effective integer programming. +* Arbitrary Precision Integers:: Arbitrary precision integer + arithmetic with @command{gawk}. +* MPFR and GMP Libraries:: Information about the MPFR and GMP libraries. * Nondecimal Data:: Allowing nondecimal input data. * Array Sorting:: Facilities for controlling array traversal and sorting arrays. @@ -614,23 +634,23 @@ particular records in a file and perform operations upon them. * Anagram Program:: Finding anagrams from a dictionary. * Signature Program:: People do amazing things with too much time on their hands. -* Debugging:: Introduction to @command{dgawk}. -* Debugging Concepts:: Debugging In General. +* Debugging:: Introduction to @command{gawk} Debugger. +* Debugging Concepts:: Debugging in General. * Debugging Terms:: Additional Debugging Concepts. * Awk Debugging:: Awk Debugging. -* Sample dgawk session:: Sample @command{dgawk} session. -* dgawk invocation:: @command{dgawk} Invocation. -* Finding The Bug:: Finding The Bug. -* List of Debugger Commands:: Main @command{dgawk} Commands. -* Breakpoint Control:: Control of breakpoints. -* Dgawk Execution Control:: Control of execution. -* Viewing And Changing Data:: Viewing and changing data. -* Dgawk Stack:: Dealing with the stack. -* Dgawk Info:: Obtaining information about the program and - the debugger state. -* Miscellaneous Dgawk Commands:: Miscellaneous Commands. +* Sample Debugging Session:: Sample Debugging Session. +* Debugger Invocation:: How to Start the Debugger. +* Finding The Bug:: Finding the Bug. +* List of Debugger Commands:: Main Commands. +* Breakpoint Control:: Control of Breakpoints. +* Debugger Execution Control:: Control of Execution. +* Viewing And Changing Data:: Viewing and Changing Data. +* Execution Stack:: Dealing with the Stack. +* Debugger Info:: Obtaining Information about the Program and + the Debugger State. +* Miscellaneous Debugger Commands:: Miscellaneous Commands. * Readline Support:: Readline Support. -* Dgawk Limitations:: Limitations and future plans. +* Limitations:: Limitations and Future Plans. * V7/SVR3.1:: The major changes between V7 and System V Release 3.1. * SVR4:: Minor changes between System V Releases 3.1 @@ -686,6 +706,7 @@ particular records in a file and perform operations upon them. * Internals:: A brief look at some @command{gawk} internals. * Plugin License:: A note about licensing. +* Loading Extensions:: How to load dynamic extensions. * Sample Library:: A example of new functions. * Internal File Description:: What the new functions will do. * Internal File Ops:: The code for internal file operations. @@ -1164,8 +1185,7 @@ provide many sample @command{awk} programs. Reading them allows you to see @command{awk} solving real problems. -@ref{Debugger}, describes the @command{awk} debugger, -@command{dgawk}. +@ref{Debugger}, describes the @command{awk} debugger. @ref{Language History}, describes how the @command{awk} language has evolved since @@ -1594,10 +1614,13 @@ has been and continues to be a pleasure working with this team of fine people. John Haque contributed the modifications to convert @command{gawk} -into a byte-code interpreter, including the debugger. Stephen Davies +into a byte-code interpreter, including the debugger, and the +additional modifications for support of arbitrary precision arithmetic. +Stephen Davies contributed to the effort to bring the byte-code changes into the mainstream code base. Efraim Yawitz contributed the initial text of @ref{Debugger}. +John Haque contributed the initial text of @ref{Arbitrary Precision Arithmetic}. @cindex Kernighan, Brian I would like to thank Brian Kernighan for invaluable assistance during the @@ -3103,6 +3126,19 @@ inadvertently use global variables that you meant to be local. (This is a particularly easy mistake to make with simple variable names like @code{i}, @code{j}, etc.) +@item -D@r{[}@var{file}@r{]} +@itemx --debug=@r{[}@var{file}@r{]} +@cindex @code{-D} option +@cindex @code{--debug} option +@cindex @command{awk} debugging, enabling +Enable debugging of @command{awk} programs +(@pxref{Debugging}). +By default, the debugger reads commands interactively from the terminal. +The optional @var{file} argument allows you to specify a file with a list +of commands for the debugger to execute non-interactively. +No space is allowed between the @option{-D} and @var{file}, if +@var{file} is supplied. + @item -e @var{program-text} @itemx --source @var{program-text} @cindex @code{-e} option @@ -3168,6 +3204,15 @@ for information about this option. Print a ``usage'' message summarizing the short and long style options that @command{gawk} accepts and then exit. +@item -l @var{lib} +@itemx --load @var{lib} +@cindex @code{-l} option +@cindex @code{--load} option +@cindex loading, library +Load a shared library @var{lib}. This searches for the library using the @env{AWKPATH} +environment variable. The suffix @samp{.so} in the library name is optional. +The library initialization routine should be named @code{dlload()}. + @item -L @r{[}value@r{]} @itemx --lint@r{[}=value@r{]} @cindex @code{-l} option @@ -3191,6 +3236,14 @@ when eliminating problems pointed out by @option{--lint}, you should take care to search for all occurrences of each inappropriate construct. As @command{awk} programs are usually short, doing so is not burdensome. +@item -M +@itemx --bignum +@cindex @code{-M} option +@cindex @code{--bignum} option +Force arbitrary precision arithmetic on numbers. This option has no effect +if @command{gawk} is not compiled to use the GNU MPFR and MP libraries +(@pxref{Arbitrary Precision Arithmetic}). + @item -n @itemx --non-decimal-data @cindex @code{-n} option @@ -3214,6 +3267,18 @@ Use with care. Force the use of the locale's decimal point character when parsing numeric input data (@pxref{Locales}). +@item -o@r{[}@var{file}@r{]} +@itemx --pretty-print@r{[}=@var{file}@r{]} +@cindex @code{-o} option +@cindex @code{--pretty-print} option +@cindex @command{awk} enabling +Enable pretty-printing of @command{awk} programs. +By default, output program is created in a file named @file{awkprof.out}. +The optional @var{file} argument allows you to specify a different +@value{FN} for the output. +No space is allowed between the @option{-o} and @var{file}, if +@var{file} is supplied. + @item -O @itemx --optimize @cindex @code{--optimize} option @@ -3226,7 +3291,7 @@ maintainer hopes to add more optimizations over time. @itemx --profile@r{[}=@var{file}@r{]} @cindex @code{-p} option @cindex @code{--profile} option -@cindex @command{awk} programs, profiling, enabling +@cindex @command{awk} profiling, enabling Enable profiling of @command{awk} programs (@pxref{Profiling}). By default, profiles are created in a file named @file{awkprof.out}. @@ -3235,10 +3300,8 @@ The optional @var{file} argument allows you to specify a different No space is allowed between the @option{-p} and @var{file}, if @var{file} is supplied. -When run with @command{gawk}, the profile is just a ``pretty printed'' version -of the program. When run with @command{pgawk}, the profile contains execution -counts for each statement in the program in the left margin, and function -call counts for each function. +The profile contains execution counts for each statement in the program +in the left margin, and function call counts for each function. @item -P @itemx --posix @@ -3302,14 +3365,6 @@ This is now @command{gawk}'s default behavior. Nevertheless, this option remains both for backward compatibility, and for use in combination with the @option{--traditional} option. -@item -R @var{file} -@itemx --command=@var{file} -@cindex @code{-R} option -@cindex @code{--command} option -@command{dgawk} only. -Read @command{dgawk} debugger options and commands from @var{file}. -@xref{Dgawk Info}, for more information. - @item -S @itemx --sandbox @cindex @code{-S} option @@ -3633,6 +3688,11 @@ Specifies the interval between connection retries, in milliseconds. On systems that do not support the @code{usleep()} system call, the value is rounded up to an integral number of seconds. + +@item GAWK_READ_TIMEOUT +Specifies the time, in milliseconds, for @command{gawk} to +wait for input before returning with an error. +@xref{Read Timeout}. @end table The environment variables in the following list are meant @@ -3706,7 +3766,7 @@ into smaller, more manageable pieces, and also lets you reuse common @command{aw code from various @command{awk} scripts. In other words, you can group together @command{awk} functions, used to carry out specific tasks, into external files. These files can be used just like function libraries, -using the @samp{@@include} keyword in conjunction with the @code{AWKPATH} +using the @samp{@@include} keyword in conjunction with the @env{AWKPATH} environment variable. Let's see an example. @@ -5119,6 +5179,8 @@ used with it do not have to be named on the @command{awk} command line * Multiple Line:: Reading multi-line records. * Getline:: Reading files under explicit program control using the @code{getline} function. +* Read Timeout:: Reading input with a timeout. + * Command line directories:: What happens if you put a directory on the command line. @end menu @@ -7197,6 +7259,110 @@ and whether the variant is standard or a @command{gawk} extension. @c ENDOFRANGE inex @c ENDOFRANGE infir +@node Read Timeout +@section Reading Input With A Timeout +@cindex timeout, reading input + +You may specify a timeout in milliseconds for reading input from a terminal, +pipe or two-way communication including, TCP/IP sockets. This can be done +on a per input, command or connection basis, by setting a special element +in the @code{PROCINFO} array: + +@example +PROCINFO["input_name", "READ_TIMEOUT"] = @var{timeout in milliseconds} +@end example + +When set, this will cause @command{gawk} to time out and return failure +if no data is available to read within the specified timeout period. +For example, a TCP client can decide to give up on receiving +any response from the server after a certain amount of time: + +@example +Service = "/inet/tcp/0/localhost/daytime" +PROCINFO[Service, "READ_TIMEOUT"] = 100 +if ((Service |& getline) > 0) + print $0 +else if (ERRNO != "") + print ERRNO +@end example + +Here is how to read interactively from the terminal@footnote{This assumes +that standard input is the keyboard} without waiting +for more than five seconds: + +@example +PROCINFO["/dev/stdin", "READ_TIMEOUT"] = 5000 +while ((getline < "/dev/stdin") > 0) + print $0 +@end example + +@command{gawk} will terminate the read operation if input does not +arrive after waiting for the timeout period, return failure +and set the @code{ERRNO} variable to an appropriate string value. +A negative or zero value for the timeout is the same as specifying +no timeout at all. + +A timeout can also be set for reading from the terminal in the implicit +loop that reads input records and matches them against patterns, +like so: + +@example +$ @kbd{ gawk 'BEGIN @{ PROCINFO["-", "READ_TIMEOUT"] = 5000 @}} +> @kbd{@{ print "You entered: " $0 @}'} +@kbd{gawk} +@print{} You entered: gawk +@end example + +In this case, failure to respond within five seconds results in the following +error message: + +@example +@error{} gawk: cmd. line:2: (FILENAME=- FNR=1) fatal: error reading input file `-': Connection timed out +@end example + +The timeout can be set or changed at any time, and will take effect on the +next attempt to read from the input device. In the following example, +we start with a timeout value of one second, and progressively +reduce it by one-tenth of a second until we wait indefinitely +for the input to arrive: + +@example +PROCINFO[Service, "READ_TIMEOUT"] = 1000 +while ((Service |& getline) > 0) @{ + print $0 + PROCINFO[S, "READ_TIMEOUT"] -= 100 +@} +@end example + +@quotation NOTE +You should not assume that the read operation will block +exactly after the tenth record has been printed. It is possible that +@command{gawk} will read and buffer more than one record's +worth of data the first time. Because of this, changing the value +of timeout like in the above example is not very useful. +@end quotation + +If the @code{PROCINFO} element is not present and the environment +variable @env{GAWK_READ_TIMEOUT} exists, +@command{gawk} uses its value to initialize the timeout value. +The exclusive use of the environment variable to specify timeout +has the disadvantage of not being able to control it +on a per command or connection basis. + +@command{gawk} considers a timeout event to be an error even though +the attempt to read from the underlying device may +succeed in a later attempt. This is a limitation, and it also +means that you cannot use this to multiplex input from +two or more sources. + +Assigning a timeout value prevents read operations from +blocking indefinitely. But bear in mind that there are other ways +@command{gawk} can stall waiting for an input device to be ready. +A network client can sometimes take a long time to establish +a connection before it can start reading any data, +or the attempt to open a FIFO special file for reading can block +indefinitely until some other process opens it for writing. + @node Command line directories @section Directories On The Command Line @cindex directories, command line @@ -12470,6 +12636,18 @@ This is the output record separator. It is output at the end of every @code{print} statement. Its default value is @code{"\n"}, the newline character. (@xref{Output Separators}.) +@cindex @code{PREC} variable +@item PREC # +The working precision of arbitrary precision floating-point numbers, +53 by default (@pxref{Setting Precision}). + +@cindex @code{ROUNDMODE} variable +@item ROUNDMODE # +The rounding mode to use for arbitrary precision arithmetic on +numbers, by default @code{"N"} (@samp{roundTiesToEven} in +the IEEE-754 standard) +(@pxref{Setting Rounding Mode}). + @cindex @code{RS} variable @cindex separators, for records @cindex record separators @@ -12753,6 +12931,25 @@ The value of the @code{getuid()} system call. The version of @command{gawk}. @end table +The following additional elements in the array +are available to provide information about the MPFR and GMP libraries +if your version of @command{gawk} supports arbitrary precision numbers +(@pxref{Arbitrary Precision Arithmetic}): + +@table @code +@item PROCINFO["mpfr_version"] +The version of the GNU MPFR library. + +@item PROCINFO["gmp_version"] +The version of the GNU MP library. + +@item PROCINFO["prec_max"] +The maximum precision supported by MPFR. + +@item PROCINFO["prec_min"] +The minimum precision required by MPFR. +@end table + On some systems, there may be elements in the array, @code{"group1"} through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of supplementary groups that the process has. Use the @code{in} operator @@ -18171,6 +18368,863 @@ then @command{gawk} produces usage messages, warnings, and fatal errors in the local language. @c ENDOFRANGE inloc +@node Arbitrary Precision Arithmetic +@chapter Arbitrary Precision Arithmetic with @command{gawk} +@cindex arbitrary precision +@cindex multiple precision +@cindex infinite precision +@cindex floating-point numbers, arbitrary precision +@cindex MPFR +@cindex GMP + +@cindex Knuth, Donald +@quotation +@i{There's a credibility gap: We don't know how much of the computer's answers +to believe. Novice computer users solve this problem by implicitly trusting +in the computer as an infallible authority; they tend to believe that all +digits of a printed answer are significant. Disillusioned computer users have +just the opposite approach; they are constantly afraid that their answers +are almost meaningless.} + +Donald Knuth@footnote{Donald E.@: Knuth. +@cite{The Art of Computer Programming}. Volume 2, +@cite{Seminumerical Algorithms}, third edition, +1998, ISBN 0-201-89683-4, p.@: 229.} +@end quotation + +This @value{SECTION} decsribes how to use the arbitrary precision +(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric +capabilites in @command{gawk} to produce maximally accurate results +when you need it. But first you should check if your version of +@command{gawk} supports arbitrary precision arithmetic. +The easiest way to find out is to look at the output of +the following command: + +@example +$ @kbd{gawk --version} +@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) +@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation. +@dots{} +@end example + +@command{gawk} uses the +@uref{http://www.mpfr.org, GNU MPFR} +and +@uref{http://gmplib.org, GNU MP} (GMP) +libraries for arbitrary precision +arithmetic on numbers. So if you do not see the names of these libraries +in the output, then your version of @command{gawk} does not support +arbitrary precision arithmetic. + +Even if you aren't interested in arbitrary precision arithmetic, you +may still benifit from knowing about how @command{gawk} handles numbers +in general, and the limitations of doing arithmetic with ordinary +@command{gawk} numbers. + +@menu +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary Floating-point Representation. +* Floating-point Context:: Floating-point Context. +* Rounding Mode:: Floating-point Rounding Mode. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. +* Setting Precision:: Setting the Working Precision. +* Setting Rounding Mode:: Setting the Rounding Mode. +* Floating-point Constants:: Representing Floating-point Constants. +* Changing Precision:: Changing the Precision of a Number. +* Exact Arithmetic:: Exact Arithmetic with Floating-point Numbers. +* Integer Programming:: Effective Integer Programming. +* Arbitrary Precision Integers:: Arbitrary Precision Integer + Arithmetic with @command{gawk}. +* MPFR and GMP Libraries:: Information About the MPFR and GMP Libraries. +@end menu + +@node Floating-point Programming +@section Effective Floating-point Programming + +Numerical programming is an extensive area; if you need to develop +sophisticated numerical algorithms then @command{gawk} may not be +the ideal tool, and this documentation may not be sufficient. +@c FIXME: JOHN: Do you want to cite some actual books? +It might require a book or two to communicate how to compute +with ideal accuracy and precision +and the result often depends on the particular application. + +@quotation NOTE +A floating-point calculation's @dfn{accuracy} is how close it comes +to the real value. This is as opposed to the @dfn{precision}, which +usually refers to the number of bits used to represent the number +(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision, +the Wikipedia article} for more information). +@end quotation + +Binary floating-point representations and arithmetic are inexact. +Simple values like 0.1 cannot be precisely represented using +binary floating-point numbers, and the limited precision of +floating-point numbers means that slight changes in +the order of operations or the precision of intermediate storage +can change the result. To make matters worse with arbitrary precision +floating-point, you can set the precision before starting a computation, +but then you cannot be sure of the number of significant decimal places +in the final result. + +Sometimes you need to think more about what you really want +and what's really happening. Consider the two numbers +in the following example: + +@example +x = 0.875 # 1/2 + 1/4 + 1/8 +y = 0.425 +@end example + +Unlike the number in @code{y}, the number stored in @code{x} +is exactly representable +in binary since it can be written as a finite sum of one or +more fractions whose denominators are all powers of two. +When @command{gawk} reads a floating-point number from +program source, it automatically rounds that number to whatever +precision your machine supports. If you try to print the numeric +content of a variable using an output format string of @code{"%.17g"}, +it may not produce the same number as you assigned to it: + +@example +$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425} +> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'} +@print{} 0.875, 0.42499999999999999 +@end example + +Often the error is so small you do not even notice it, and if you do, +you can always specify how much precision you would like in your output. +Usually this is a format string like @code{"%.15g"}, which when +used in the previous example, produces an output identical to the input. + +Because the underlying representation can be little bit off from the exact value, +comparing floats to see if they are equal is generally not a good idea. +Here is an example where it does not work like you expect: + +@example +$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +The loss of accuracy during a single computation with floating-point numbers +usually isn't enough to worry about. However, if you compute a value +which is the result of a sequence of floating point operations, +the error can accumulate and greatly affect the computation itself. +Here is an attempt to compute the value of the constant +@value{PI} using one of its many series representations: + +@example +BEGIN @{ + x = 1.0 / sqrt(3.0) + n = 6 + for (i = 1; i < 30; i++) @{ + n = n * 2.0 + x = (sqrt(x * x + 1) - 1) / x + printf("%.15f\n", n * x) + @} +@} +@end example + +When run, the early errors propagating through later computations +cause the loop to terminate prematurely after an attempt to divide by zero. + +@example +$ @kbd{gawk -f pi.awk} +@print{} 3.215390309173475 +@print{} 3.159659942097510 +@print{} 3.146086215131467 +@print{} 3.142714599645573 +@dots{} +@print{} 3.224515243534819 +@print{} 2.791117213058638 +@print{} 0.000000000000000 +@error{} gawk: pi.awk:6: fatal: division by zero attempted +@end example + +Here is one more example where the inaccuracies in internal representations +yield an unexpected result: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{i++} +> @kbd{print i} +> @kbd{@}'} +@print{} 4 +@end example + +Can computation using aribitrary precision help with the previous examples? +If you are impatient to know, see +@ref{Exact Arithmetic}. + +Instead of aribitrary precision floating-point arithmetic, +often all you need is an adjustment of your logic +or a different order for the operations in your calculation. +The stability and the accuracy of the computation of the constant @value{PI} +in the previous example can be enhanced by using the following +simple algebraic transformation: + +@example +(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + x) +@end example + +There is no need to be unduly suspicious about the results from +floating-point arithmetic. The lesson to remember is that +floating-point math is always more complex than the math using +pencil and paper. In order to take advantage of the power +of computer floating-point, you need to know its limitations +and work within them. For most casual use of floating-point arithmetic, +you will often get the expected result in the end if you simply round +the display of your final results to the correct number of significant +decimal digits. Avoid presenting numerical data in a manner that +implies better precision than is actually the case. + +@node Floating-point Representation +@section Binary Floating-point Representation +@cindex IEEE-754 format + +Although floating-point representations vary from machine to machine, +the most commonly encountered representation is that defined by the +IEEE 754 Standard. An IEEE-754 format value has three components: + +@itemize @bullet +@item +a sign bit telling whether the number is positive or negative, + +@item +an @dfn{exponent} giving its order of magnitude, @var{e}, + +@item +and a @dfn{significand}, @var{s}, +specifying the actual digits of the number. +@end itemize + +The value of the +number is then +@iftex +@math{s @cdot 2^e}. +@end iftex +@ifnottex +@var{s * 2^e}. +@end ifnottex +The first bit of a non-zero binary significand +is always one, so the significand in an IEEE-754 format only includes the +fractional part, leaving the leading one implicit. + +Three of the standard IEEE-754 types are 32-bit single precision, +64-bit double precision and 128-bit quadruple precision. +The standard also specifies extended precision formats +to allow greater precisions and larger exponent ranges. + +@node Floating-point Context +@section Floating-point Context +@cindex context, floating-point + +A floating-point context defines the environment for arithmetic operations. +It governs precision, sets rules for rounding and limits range for exponents. +The context has the following primary components: + +@table @code +@item precision +Precision of the floating-point format in bits. +@item emax +Maximum exponent allowed for this format. +@item emin +Minimum exponent allowed for this format. +@item underflow behavior +The format may or may not support gradual underflow. +@item rounding +The rounding mode of this context. +@end table + +@ref{table-ieee-formats} lists the precision and exponent +field values for the basic IEEE-754 binary formats: + +@float Table,table-ieee-formats +@caption{Basic IEEE Formats} +@multitable @columnfractions .20 .20 .20 .20 .20 +@headitem Name @tab Total bits @tab Precision @tab emin @tab emax +@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127 +@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023 +@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383 +@end multitable +@end float + +@quotation NOTE +The precision numbers include the implied leading one that gives them +one extra bit of significand. +@end quotation + +A floating-point context can also determine which signals are treated +as exceptions, and can set rules for arithmetic with special values. +Please consult the IEEE-754 standard or other resources for details. + +@command{gawk} ordinarily uses the hardware double precision +representation for numbers. On most systems, this is IEEE-754 +floating-point format, corresponding to 64-bit binary with 53 bits +of precision. + +@quotation NOTE +In case an underflow occurs, the standard allows, but does not require, +the result from an arithmetic operation to be a number smaller than +the smallest nonzero normalized number. Such numbers do +not have as many significant digits as normal numbers, and are called +@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero, +is called @dfn{flush to zero}. The basic IEEE-754 binary formats +support subnormal numbers. +@end quotation + +@node Rounding Mode +@section Floating-point Rounding Mode +@cindex rounding mode, floating-point + +The @dfn{rounding mode} specifies the behavior for the results of numerical +operations when discarding extra precision. Each rounding mode indicates +how the least significant returned digit of a rounded result is to +be calculated. +The @code{ROUNDMODE} variable (@pxref{Setting Rounding Mode}) provides +program level control over the rounding mode. +@ref{table-rounding-modes} lists the IEEE-754 defined +rounding modes: + +@float Table,table-rounding-modes +@caption{Rounding Modes} +@multitable @columnfractions .45 .30 .25 +@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE} +@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"} +@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"} +@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"} +@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"} +@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"} +@end multitable +@end float + +The default mode @samp{roundTiesToEven} is the most preferred, +but the least intuitive. This method does the obvious thing for most values, +by rounding them up or down to the nearest digit. +For example, rounding 1.132 to two digits yields 1.13, +and rounding 1.157 yields 1.16. + +However, when it comes to rounding a value that is exactly halfway between, +things do not work the way you probably learned in school. +In this case, the number is rounded to the nearest even digit. +So rounding 0.125 to two digits rounds down to 0.12, +but rounding 0.6875 to three digits rounds up to 0.688. +You probably have already encountered this rounding mode when +using the @code{printf} routine to format floating-point numbers. +For example: + +@example +BEGIN @{ + x = -4.5 + for (i = 1; i < 10; i++) @{ + x += 1.0 + printf("%4.1f => %2.0f\n", x, x) + @} +@} +@end example + +@noindent +produces the following output when run@footnote{It +is possible for the output to be completely different if the +C library in your system does not use the IEEE-754 even-rounding +rule to round halfway cases for @code{printf()}.}: + +@example +-3.5 => -4 +-2.5 => -2 +-1.5 => -2 +-0.5 => 0 + 0.5 => 0 + 1.5 => 2 + 2.5 => 2 + 3.5 => 4 + 4.5 => 4 +@end example + +The theory behind the rounding mode @samp{roundTiesToEven} is that +it more or less evenly distributes upward and downward rounds +of exact halves, which might cause the round-off error +to cancel itself out. This is the default rounding mode used +in IEEE-754 computing functions and operators. + +The other rounding modes are rarely used. +Round toward positive infinity (@samp{roundTowardPositive}) +and round toward negative infinity (@samp{roundTowardNegative}) +are often used to implement interval arithmetic, +where you adjust the rounding mode to calculate upper and lower bounds +for the range of output. The @samp{roundTowardZero} +mode can be used for converting floating-point numbers to integers. +The rounding mode @samp{roundTiesToAway} rounds the result to the +nearest number and selects the number with the larger magnitude +if a tie occurs. + +Some numerical analysts will tell you that your choice of rounding style +has tremendous impact on the final outcome, and advise you to wait until +final output for any rounding. Instead, you can often achieve this goal by +setting the precision initially to some value sufficiently larger than +the final desired precision, so that the accumulation of round-off error +does not influence the outcome. +If you suspect that results from your computation are +sensitive to accumulation of round-off error, +one way to be sure is to look for a significant difference in output +when you change the rounding mode. + +@node Arbitrary Precision Floats +@section Arbitrary Precision Floating-point Arithmetic with @command{gawk} + +@command{gawk} uses the GNU MPFR library +for arbitrary precision floating-point arithmetic. The MPFR library +provides precise control over precisions and rounding modes, and gives +correctly rounded reproducible platform-independent results. With the +command-line option @option{--bignum} or @option{-M}, +all floating-point arithmetic operators and numeric functions can yield +results to any desired precision level supported by MPFR. +Two built-in +variables @code{PREC} +(@pxref{Setting Precision}) +and @code{ROUNDMODE} +(@pxref{Setting Rounding Mode}) +provide control over the working precision and the rounding mode. +The precision and the rounding mode are set globally for every operation +to follow. + +The default working precision for arbitrary precision floats is 53, +and the default value for @code{ROUNDMODE} is @code{"N"}, +which selects the IEEE-754 +@samp{roundTiesToEven} (@pxref{Rounding Mode}) rounding mode.@footnote{The +default precision is 53, since according to the MPFR documentation, +the library should be able to exactly reproduce all computations with +double-precision machine floating-point numbers (@code{double} type +in C), except the default exponent range is much wider and subnormal +numbers are not implemented.} +@command{gawk} uses the default exponent range in MPFR +@iftex +(@math{emax = 2^{30} - 1, emin = -emax}) +@end iftex +@ifnottex +(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax}) +@end ifnottex +for all floating-point contexts. +There is no explicit mechanism to adjust the exponent range. +MPFR does not implement subnormal numbers by default, +and this behavior cannot be changed in @command{gawk}. + +@quotation NOTE +When emulating an IEEE-754 format (@pxref{Setting Precision}), +@command{gawk} internally adjusts the exponent range +to the value defined for the format and also performs computations needed for +gradual underflow (subnormal numbers). +@end quotation + +@quotation NOTE +MPFR numbers are variable-size entities, consuming only as much space as +needed to store the significant digits. Since the performance using MPFR +numbers pales in comparison to doing math using the underlying machine +types, you should consider using only as much precision as needed by +your program. +@end quotation + +@node Setting Precision +@section Setting the Working Precision +@cindex @code{PREC} variable + +@command{gawk} uses a global working precision; it does not keep track of +the precision or accuracy of individual numbers. Performing an arithmetic +operation or calling a built-in function rounds the result to the current +working precision. The default working precision is 53 which can be +modified using the built-in variable @code{PREC}. You can also set the +value to one of the following pre-defined case-insensitive strings +to emulate an IEEE-754 binary format: + +@multitable {@code{"double"}} {12345678901234567890123456789012345} +@headitem @code{PREC} @tab IEEE-754 Binary Format +@item @code{"half"} @tab 16-bit half-precision. +@item @code{"single"} @tab Basic 32-bit single precision. +@item @code{"double"} @tab Basic 64-bit double precision. +@item @code{"quad"} @tab Basic 128-bit quadruple precision. +@item @code{"oct"} @tab 256-bit octuple precision. +@end multitable + +The following example illustrates the effects of changing precision +on arithmetic operations: + +@example +$ @kbd{gawk -M -vPREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \} +> @kbd{PREC = "double"; print x + 0 @}'} +@print{} 1e-400 +@print{} 0 +@end example + +Binary and decimal precisions are related approximately according to the +formula: + +@iftex +@math{prec = 3.322 @cdot dps} +@end iftex +@ifnottex +@var{prec} = 3.322 * @var{dps} +@end ifnottex + +@noindent +Here, @var{prec} denotes the binary precision +(measured in bits) and @var{dps} (short for decimal places) +is the decimal digits. We can easily calculate how many decimal +digits the 53-bit significand of an IEEE double is equivalent to: +53 / 3.332 which is equal to about 15.95. +But what does 15.95 digits actually mean? It depends whether you are +concerned about how many digits you can rely on, or how many digits +you need. + +It is important to know how many bits it takes to uniquely identify +a double-precision value (the C type @code{double}). If you want to +convert from @code{double} to decimal and back to @code{double} (e.g., +saving a @code{double} representing an intermediate result to a file, and +later reading it back to restart the computation), then a few more decimal +digits are required. 17 digits is generally enough for a @code{double}. + +It can also be important to know what decimal numbers can be uniquely +represented with a @code{double}. If you want to convert +from decimal to @code{double} and back again, 15 digits is the most that +you can get. Stated differently, you should not present +the numbers from your floating-point computations with more than 15 +significant digits in them. + +Conversely, it takes a precision of 332 bits to hold an approximation +of constant @value{PI} that is accurate to 100 decimal places. +You should always add some extra bits in order to avoid the confusing round-off +issues that occur because numbers are stored internally in binary. + +@node Setting Rounding Mode +@section Setting the Rounding Mode +@cindex @code{ROUNDMODE} variable + +The built-in variable @code{ROUNDMODE} has the default value @code{"N"}, +which selects the IEEE-754 rounding mode @samp{roundTiesToEven}. +The other possible values for @code{ROUNDMODE} are @code{"U"} for rounding mode +@samp{roundTowardPositive}, @code{"D"} for @samp{roundTowardNegative}, +and @code{"Z"} for @samp{roundTowardZero}. +@command{gawk} also accepts @code{"A"} to select the IEEE-754 mode +@samp{roundTiesToAway} +if your version of the MPFR library supports it; otherwise setting +@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode}, +for the meanings of the various rounding modes. + +Here is an example of how to change the default rounding behavior of +@code{printf}'s output: + +@example +$ @kbd{gawk -M -vROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'} +@print{} 1.37 +@end example + +@node Floating-point Constants +@section Representing Floating-point Constants +@cindex constants, floating-point + +Be wary of floating-point constants! When reading a floating-point constant +from program source code, @command{gawk} uses the default precision, +unless overridden +by an assignment to the special variable @code{PREC} on the command +line, to store it internally as a MPFR number. +Changing the precision using @code{PREC} in the program text does +not change the precision of a constant. If you need to +represent a floating-point constant at a higher precision than the +default and cannot use a command line assignment to @code{PREC}, +you should either specify the constant as a string, or +a rational number whenever possible. The following example +illustrates the differences among various ways to +print a floating-point constant: + +@example +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000055511151 +$ @kbd{gawk -M -vPREC = 113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'} +@print{} 0.1000000000000000000000000 +@end example + +In the first case, the number is stored with the default precision of 53. + +@node Changing Precision +@section Changing the Precision of a Number + +@cindex Laurie, Dirk +@quotation +@i{The point is that in any variable-precision package, +a decision is made on how to treat numbers given as data, +or arising in intermediate results, which are represented in +floating-point format to a precision lower than working precision. +Do we promote them to full membership of the high-precision club, +or do we treat them and all their associates as second-class citizens? +Sometimes the first course is proper, sometimes the second, and it takes +careful analysis to tell which.} + +Dirk Laurie@footnote{Dirk Laurie. +@cite{Variable-precision Arithmetic Considered Perilous -- A Detective Story}. +Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.} +@end quotation + +@command{gawk} does not implicitly modify the precision of any previously +computed results when the working precision is changed with an assignment +to @code{PREC}. The precision of a number is always the one that was +used at the time of its creation, and there is no way for the user +to explicitly change it afterwards. However, since the result of a +floating-point arithmetic operation is always an arbitrary precision +floating-point value---with a precision set by the value of @code{PREC}---one of the +following workarounds effectively accomplishes the desired behavior: + +@example +x = x + 0.0 +@end example + +@noindent +or: + +@example +x += 0.0 +@end example + +@node Exact Arithmetic +@section Exact Arithmetic with Floating-point Numbers + +@quotation CAUTION +Never depend on the exactness of floating-point arithmetic, +even for apparently simple expressions! +@end quotation + +Can arbitrary precision arithmetic give exact results? There are +no easy answers. The standard rules of algebra often do not apply +when using floating-point arithmetic. +Among other things, the distributive and associative laws +do not hold completely, and order of operation may be important +for your computation. Rounding error, cumulative precision loss +and underflow are often troublesome. + +When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3} +for equality +using the machine double precision arithmetic, it decides that they +are not equal! +(@xref{Floating-point Programming}.) +You can get the result you want by increasing the precision; +56 in this case will get the job done: + +@example +$ @kbd{gawk -M -vPREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 1 +@end example + +If adding more bits is good, perhaps adding even more bits of +precision is better? +Here is what happens if we use an even larger value of @code{PREC}: + +@example +$ @kbd{gawk -M -vPREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +This is not a bug in @command{gawk} or in the MPFR library. +It is easy to forget that the finite number of bits used to store the value +is often just an approximation after proper rounding. +The test for equality succeeds if and only if @emph{all} bits in the two operands +are exactly the same. Since this is not necessarily true after floating-point +computations with a particular precision and effective rounding rule, +a straight test for equality may not work. + +So, don't assume that floating-point values can be compared for equality. +You should also exercise caution when using other forms of comparisons. +The standard way to compare between floating-point numbers is to determine +how much error (or @dfn{tolerance}) you will allow in a comparison and +check to see if one value is within this error range of the other. + +In applications where 15 or fewer decimal places suffice, +hardware double precision arithmetic can be adequate, and is usually much faster. +But you do need to keep in mind that every floating-point operation +can suffer a new rounding error with catastrophic consequences as illustrated +by our attempt to compute the value of the constant @value{PI}, +(@pxref{Floating-point Programming}). +Extra precision can greatly enhance the stability and the accuracy +of your computation in such cases. + +Repeated addition is not necessarily equivalent to multiplication +in floating-point arithmetic. In the last example +(@pxref{Floating-point Programming}), +you may or may not succeed in getting the correct result by choosing +an arbitrarily large value for @code{PREC}. Reformulation of +the problem at hand is often the correct approach in such situations. + + +@node Integer Programming +@section Effective Integer Programming + +As has been mentioned already, @command{gawk} ordinarily uses hardware double +precision with 64-bit IEEE binary floating-point representation +for numbers on most systems. A large integer like 9007199254740997 +has a binary representation that, although finite, is more than 53 bits long; +it must also be rounded to 53 bits. +The biggest integer that can be stored in a C @code{double} is usually the same +as the largest possible value of a @code{double}. If your system @code{double} +is an IEEE 64-bit @code{double}, this largest possible value is an integer and +can be represented precisely. What more should one know about integers? + +If you want to know what is the largest integer, such that it and +all smaller integers can be stored in 64-bit doubles without losing precision, +then the answer is +@iftex +@math{2^{53}}. +@end iftex +@ifnottex +2^53. +@end ifnottex +The next representable number is the even number +@iftex +@math{2^{53} + 2}, +@end iftex +@ifnottex +2^53 + 2, +@end ifnottex +meaning it is unlikely that you will be able to make +@command{gawk} print +@iftex +@math{2^{53} + 1} +@end iftex +@ifnottex +2^53 + 1 +@end ifnottex +in integer format. +The range of integers exactly representable by a 64-bit double +is +@iftex +@math{[-2^{53}, 2^{53}]}. +@end iftex +@ifnottex +[@minus{}2^53, 2^53]. +@end ifnottex +If you ever see an integer outside this range in @command{gawk} +using 64-bit doubles, you have reason to be very suspicious about +the accuracy of the output. Here is a simple program with erroneous output: + +@example +$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'} +@print{} 9007199254740991 +@print{} 9007199254740992 +@print{} 9007199254740992 +@print{} 9007199254740994 +@end example + +The lesson is to not assume that any large integer printed by @command{gawk} +represents an exact result from your computation, especially if it wraps +around on your screen. + +@node Arbitrary Precision Integers +@section Arbitrary Precision Integer Arithmetic with @command{gawk} +@cindex integer, arbitrary precision + +If the option @option{--bignum} or @option{-M} is specified, +@command{gawk} performs all +integer arithmetic using GMP arbitrary precision integers. +Any number that looks like an integer in a program source or data file +is stored as an arbitrary precision integer. +The size of the integer is limited only by your computer's memory. +The current floating-point context has no effect on operations involving integers. +For example, the following computes +@iftex +@math{5^{4^{3^{2}}}}, +@end iftex +@ifnottex +5^4^3^2, +@end ifnottex +the result of which is beyond the +limits of ordinary @command{gawk} numbers: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{x = 5^4^3^2} +> @kbd{print "# of digits =", length(x)} +> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)} +> @kbd{@}'} +@print{} # of digits = 183231 +@print{} 62060698786608744707 ... 92256259918212890625 +@end example + +If you were to compute the same value using arbitrary precision +floating-point values instead, the precision needed for correct output +(using the formula +@iftex +@math{prec = 3.322 @cdot dps}), +would be @math{3.322 @cdot 183231}, +@end iftex +@ifnottex +@samp{prec = 3.322 * dps}), +would be 3.322 x 183231, +@end ifnottex +or 608693. + +The result from an arithmetic operation with an integer and a floating-point value +is a floating-point value with a precision equal to the working precision. +The following program calculates the eighth term in +Sylvester's sequence@footnote{Weisstein, Eric W. +@cite{Sylvester's Sequence}. From MathWorld--A Wolfram Web Resource. +@url{http://mathworld.wolfram.com/SylvestersSequence.html}} +using a recurrence: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{s = 2.0} +> @kbd{for (i = 1; i <= 7; i++)} +> @kbd{s = s * (s - 1) + 1} +> @kbd{print s} +> @kbd{@}'} +@print{} 113423713055421845118910464 +@end example + +The output differs from the acutal number, 113423713055421844361000443, +because the default precision of 53 is not enough to represent the +floating-point results exactly. You can either increase the precision +(100 is enough in this case), or replace the floating-point constant +@code{2.0} with an integer, to perform all computations using integer +arithmetic to get the correct output. + +It will sometimes be necessary for @command{gawk} to implicitly convert an +arbitrary precision integer into an arbitrary precision floating-point value. +This is primarily because the MPFR library does not always provide the +relevant interface to process arbitrary precision integers or mixed-mode +numbers as needed by an operation or function. +In such a case, the precision is set to the minimum value necessary +for exact conversion, and the working precision is not used for this purpose. +If this is not what you need or want, you can employ a subterfuge +like this: + +@example +gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}' +@end example + +You can avoid this issue altogether by specifying the number as a float +to begin with: + +@example +gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}' +@end example + +Note that for the particular example above, there is unlikely to be a +reason for simply not using the following: + +@example +gawk -M 'BEGIN @{ n = 13; print n % 2 @}' +@end example + + +@node MPFR and GMP Libraries +@section Information About the MPFR and GMP Libraries + +There are a few elements available in the @code{PROCINFO} array +to provide information about the MPFR and GMP libraries. +@xref{Auto-set}, for more information. + @node Advanced Features @chapter Advanced Features of @command{gawk} @cindex advanced features, network connections, See Also networks, connections @@ -18967,40 +20021,32 @@ extensive examples. @cindex @command{awk} programs, profiling @c STARTOFRANGE proawk @cindex profiling @command{awk} programs -@c STARTOFRANGE pgawk -@cindex @command{pgawk} program -@cindex profiling @command{gawk}, See @command{pgawk} program - -You may produce execution -traces of your @command{awk} programs. -This is done with a specially compiled version of @command{gawk}, -called @command{pgawk} (``profiling @command{gawk}''). - +@cindex profiling @command{gawk} @cindex @code{awkprof.out} file @cindex files, @code{awkprof.out} -@cindex @command{pgawk} program, @code{awkprof.out} file -@command{pgawk} is identical in every way to @command{gawk}, except that when -it has finished running, it creates a profile of your program in a file -named @file{awkprof.out}. -Because it is profiling, it also executes up to 45% slower than + +You may produce execution traces of your @command{awk} programs. +This is done by passing the option @option{--profile} to @command{gawk}. +When @command{gawk} has finished running, it creates a profile of your program in a file +named @file{awkprof.out}. Because it is profiling, it also executes up to 45% slower than @command{gawk} normally does. @cindex @code{--profile} option As shown in the following example, the @option{--profile} option can be used to change the name of the file -where @command{pgawk} will write the profile: +where @command{gawk} will write the profile: @example -pgawk --profile=myprog.prof -f myprog.awk data1 data2 +gawk --profile=myprog.prof -f myprog.awk data1 data2 @end example @noindent -In the above example, @command{pgawk} places the profile in +In the above example, @command{gawk} places the profile in @file{myprog.prof} instead of in @file{awkprof.out}. -Here is a sample -session showing a simple @command{awk} program, its input data, and the -results from running @command{pgawk}. First, the @command{awk} program: +Here is a sample session showing a simple @command{awk} program, its input data, and the +results from running @command{gawk} with the @option{--profile} option. +First, the @command{awk} program: @example BEGIN @{ print "First BEGIN rule" @} @@ -19040,12 +20086,12 @@ foo junk @end example -Here is the @file{awkprof.out} that results from running @command{pgawk} -on this program and data (this example also illustrates that @command{awk} +Here is the @file{awkprof.out} that results from running the @command{gawk} +profiler on this program and data (this example also illustrates that @command{awk} programmers sometimes have to work late): -@cindex @code{BEGIN} pattern, @command{pgawk} program -@cindex @code{END} pattern, @command{pgawk} program +@cindex @code{BEGIN} pattern +@cindex @code{END} pattern @example # gawk profile, created Sun Aug 13 00:00:15 2000 @@ -19137,15 +20183,15 @@ keyword indicates how many times the function was called. The counts next to the statements in the body show how many times those statements were executed. -@cindex @code{@{@}} (braces), @command{pgawk} program -@cindex braces (@code{@{@}}), @command{pgawk} program +@cindex @code{@{@}} (braces) +@cindex braces (@code{@{@}}) @item The layout uses ``K&R'' style with TABs. Braces are used everywhere, even when the body of an @code{if}, @code{else}, or loop is only a single statement. -@cindex @code{()} (parentheses), @command{pgawk} program -@cindex parentheses @code{()}, @command{pgawk} program +@cindex @code{()} (parentheses) +@cindex parentheses @code{()} @item Parentheses are used only where needed, as indicated by the structure of the program and the precedence rules. @@ -19168,16 +20214,16 @@ Similarly, if the target of a redirection isn't a scalar, it gets parenthesized. @item -@command{pgawk} supplies leading comments in +@command{gawk} supplies leading comments in front of the @code{BEGIN} and @code{END} rules, the pattern/action rules, and the functions. @end itemize The profiled version of your program may not look exactly like what you -typed when you wrote it. This is because @command{pgawk} creates the +typed when you wrote it. This is because @command{gawk} creates the profiled version by ``pretty printing'' its internal representation of -the program. The advantage to this is that @command{pgawk} can produce +the program. The advantage to this is that @command{gawk} can produce a standard representation. The disadvantage is that all source-code comments are lost, as are the distinctions among multiple @code{BEGIN}, @code{END}, @code{BEGINFILE}, and @code{ENDFILE} rules. Also, things such as: @@ -19199,15 +20245,16 @@ come out as: which is correct, but possibly surprising. @cindex profiling @command{awk} programs, dynamically -@cindex @command{pgawk} program, dynamic profiling +@cindex @command{gawk} program, dynamic profiling Besides creating profiles when a program has completed, -@command{pgawk} can produce a profile while it is running. +@command{gawk} can produce a profile while it is running. This is useful if your @command{awk} program goes into an infinite loop and you want to see what has been executed. -To use this feature, run @command{pgawk} in the background: +To use this feature, run @command{gawk} with the @option{--profile} +option in the background: @example -$ @kbd{pgawk -f myprog &} +$ @kbd{gawk --profile -f myprog &} [1] 13992 @end example @@ -19218,7 +20265,7 @@ $ @kbd{pgawk -f myprog &} @noindent The shell prints a job number and process ID number; in this case, 13992. Use the @command{kill} command to send the @code{USR1} signal -to @command{pgawk}: +to @command{gawk}: @example $ @kbd{kill -USR1 13992} @@ -19226,8 +20273,8 @@ $ @kbd{kill -USR1 13992} @noindent As usual, the profiled version of the program is written to -@file{awkprof.out}, or to a different file if you use the @option{--profile} -option. +@file{awkprof.out}, or to a different file if one specified with +the @option{--profile} option. Along with the regular profile, as shown earlier, the profile includes a trace of any active functions: @@ -19241,7 +20288,7 @@ includes a trace of any active functions: # -- main -- @end example -You may send @command{pgawk} the @code{USR1} signal as many times as you like. +You may send @command{gawk} the @code{USR1} signal as many times as you like. Each time, the profile and function call trace are appended to the output profile file. @@ -19249,7 +20296,7 @@ profile file. @cindex @code{SIGHUP} signal @cindex signals, @code{HUP}/@code{SIGHUP} If you use the @code{HUP} signal instead of the @code{USR1} signal, -@command{pgawk} produces the profile and the function call trace and then exits. +@command{gawk} produces the profile and the function call trace and then exits. @cindex @code{INT} signal (MS-Windows) @cindex @code{SIGINT} signal (MS-Windows) @@ -19257,21 +20304,20 @@ If you use the @code{HUP} signal instead of the @code{USR1} signal, @cindex @code{QUIT} signal (MS-Windows) @cindex @code{SIGQUIT} signal (MS-Windows) @cindex signals, @code{QUIT}/@code{SIGQUIT} (MS-Windows) -When @command{pgawk} runs on MS-Windows systems, it uses the +When @command{gawk} runs on MS-Windows systems, it uses the @code{INT} and @code{QUIT} signals for producing the profile and, in -the case of the @code{INT} signal, @command{pgawk} exits. This is +the case of the @code{INT} signal, @command{gawk} exits. This is because these systems don't support the @command{kill} command, so the only signals you can deliver to a program are those generated by the keyboard. The @code{INT} signal is generated by the @kbd{@value{CTL}-@key{C}} or @kbd{@value{CTL}-@key{BREAK}} key, while the @code{QUIT} signal is generated by the @kbd{@value{CTL}-@key{\}} key. -Finally, regular @command{gawk} also accepts the @option{--profile} option. +Finally, @command{gawk} also accepts another option @option{--pretty-print}. When called this way, @command{gawk} ``pretty prints'' the program into @file{awkprof.out}, without any execution counts. @c ENDOFRANGE advgaw @c ENDOFRANGE gawadv -@c ENDOFRANGE pgawk @c ENDOFRANGE awkp @c ENDOFRANGE proawk @@ -25153,41 +26199,41 @@ BEGIN { @c FIXME: Add more indexing. @node Debugger -@chapter @command{dgawk}: The @command{awk} Debugger -@cindex @command{dgawk} +@chapter Debugging @command{awk} Programs +@cindex debugging @command{awk} programs It would be nice if computer programs worked perfectly the first time they were run, but in real life, this rarely happens for programs of any complexity. Thus, most programming languages have facilities available for ``debugging'' programs, and now @command{awk} is no exception. -The @command{dgawk} debugger is purposely modeled after +The @command{gawk} debugger is purposely modeled after @uref{http://www.gnu.org/software/gdb/, the GNU Debugger (GDB)} command-line debugger. If you are familiar with GDB, learning -@command{dgawk} is easy. +how to use @command{gawk} for debugging your program is easy. @menu -* Debugging:: Introduction to @command{dgawk}. -* Sample dgawk session:: Sample @command{dgawk} session. -* List of Debugger Commands:: Main @command{dgawk} Commands. -* Readline Support:: Readline Support. -* Dgawk Limitations:: Limitations and future plans. +* Debugging:: Introduction to @command{gawk} debugger. +* Sample Debugging Session:: Sample debugging session. +* List of Debugger Commands:: Main debugger commands. +* Readline Support:: Readline support. +* Limitations:: Limitations and future plans. @end menu @node Debugging -@section Introduction to @command{dgawk} +@section Introduction to @command{gawk} Debugger This @value{SECTION} introduces debugging in general and begins the discussion of debugging in @command{gawk}. @menu -* Debugging Concepts:: Debugging In General. +* Debugging Concepts:: Debugging in General. * Debugging Terms:: Additional Debugging Concepts. * Awk Debugging:: Awk Debugging. @end menu @node Debugging Concepts -@subsection Debugging In General +@subsection Debugging in General (If you have used debuggers in other languages, you may want to skip ahead to the next section on the specific features of the @command{awk} @@ -25233,8 +26279,7 @@ functional program that you or someone else wrote). @subsection Additional Debugging Concepts Before diving in to the details, we need to introduce several -important concepts that apply to just about all debuggers, including -@command{dgawk}. +important concepts that apply to just about all debuggers. The following list defines terms used throughout the rest of this @value{CHAPTER}. @@ -25253,7 +26298,7 @@ that contains the function's parameters, local variables, and return value, as well as any other ``bookkeeping'' information needed to manage the call stack. This data area is termed a @dfn{stack frame}. -@command{gawk} also follows this model, and @command{dgawk} gives you +@command{gawk} also follows this model, and gives you access to the call stack and to each stack frame. You can see the call stack, as well as from where each function on the stack was invoked. Commands that print the call stack print information about @@ -25298,48 +26343,48 @@ each line of @command{awk} code. The debugger provides the opportunity to look at the individual primitive instructions carried out by the higher-level @command{awk} commands. -@node Sample dgawk session -@section Sample @command{dgawk} session +@node Sample Debugging Session +@section Sample Debugging Session -In order to illustrate the use of @command{dgawk}, let's look at a sample +In order to illustrate the use of @command{gawk} as a debugger, let's look at a sample debugging session. We will use the @command{awk} implementation of the POSIX @command{uniq} command described earlier (@pxref{Uniq Program}) as our example. @menu -* dgawk invocation:: @command{dgawk} Invocation. -* Finding The Bug:: Finding The Bug. +* Debugger Invocation:: How to Start the Debugger. +* Finding The Bug:: Finding the Bug. @end menu -@node dgawk invocation -@subsection @command{dgawk} Invocation +@node Debugger Invocation +@subsection How to Start the Debugger -Starting @command{dgawk} is exactly like running @command{awk}. The -file(s) containing the program and any supporting code are given on the -command line as arguments to one or more @option{-f} options. -(@command{dgawk} is not designed to debug command-line -programs, only programs contained in files.) In our case, -we call @command{dgawk} like this: +Starting the debugger is almost exactly like running @command{awk}, except you have to +pass an additional option @option{--debug} or the corresponding short option @option{-D}. +The file(s) containing the program and any supporting code are given on the command +line as arguments to one or more @option{-f} options. (@command{gawk} is not designed +to debug command-line programs, only programs contained in files.) In our case, +we invoke the debugger like this: @example -$ @kbd{dgawk -f getopt.awk -f join.awk -f uniq.awk inputfile} +$ @kbd{gawk -D -f getopt.awk -f join.awk -f uniq.awk inputfile} @end example @noindent where both @file{getopt.awk} and @file{uniq.awk} are in @env{$AWKPATH}. (Experienced users of GDB or similar debuggers should note that this syntax is slightly different from what they are used to. -With @command{dgawk}, the arguments for running the program are given +With @command{gawk} debugger, the arguments for running the program are given in the command line to the debugger rather than as part of the @code{run} command at the debugger prompt.) Instead of immediately running the program on @file{inputfile}, as -@command{gawk} would ordinarily do, @command{dgawk} merely loads all +@command{gawk} would ordinarily do, the debugger merely loads all the program source files, compiles them internally, and then gives us a prompt: @example -dgawk> +gawk> @end example @noindent @@ -25347,7 +26392,7 @@ from which we can issue commands to the debugger. At this point, no code has been executed. @node Finding The Bug -@subsection Finding The Bug +@subsection Finding the Bug Let's say that we are having a problem using (a faulty version of) @file{uniq.awk} in the ``field-skipping'' mode, and it doesn't seem to be @@ -25383,7 +26428,7 @@ a breakpoint in @file{uniq.awk} is at the beginning of the function the breakpoint, use the @code{b} (breakpoint) command: @example -dgawk> @kbd{b are_equal} +gawk> @kbd{b are_equal} @print{} Breakpoint 1 set at file `awklib/eg/prog/uniq.awk', line 64 @end example @@ -25392,22 +26437,22 @@ Now type @samp{r} or @samp{run} and the program runs until it hits the breakpoint for the first time: @example -dgawk> @kbd{r} +gawk> @kbd{r} @print{} Starting program: @print{} Stopping in Rule ... @print{} Breakpoint 1, are_equal(n, m, clast, cline, alast, aline) at `awklib/eg/prog/uniq.awk':64 @print{} 64 if (fcount == 0 && charcount == 0) -dgawk> +gawk> @end example Now we can look at what's going on inside our program. First of all, let's see how we got to where we are. At the prompt, we type @samp{bt} -(short for ``backtrace''), and @command{dgawk} responds with a +(short for ``backtrace''), and the debugger responds with a listing of the current stack frames: @example -dgawk> @kbd{bt} +gawk> @kbd{bt} @print{} #0 are_equal(n, m, clast, cline, alast, aline) at `awklib/eg/prog/uniq.awk':69 @print{} #1 in main() at `awklib/eg/prog/uniq.awk':89 @@ -25422,11 +26467,11 @@ the key to finding the source of the problem.) Now that we're in @code{are_equal()}, we can start looking at the values of some variables. Let's say we type @samp{p n} (@code{p} is short for ``print''). We would expect to see the value of -@code{n}, a parameter to @code{are_equal()}. Actually, @command{dgawk} +@code{n}, a parameter to @code{are_equal()}. Actually, the debugger gives us: @example -dgawk> @kbd{p n} +gawk> @kbd{p n} @print{} n = untyped variable @end example @@ -25437,7 +26482,7 @@ function was called without arguments (@pxref{Function Calls}). A more useful variable to display might be the current record: @example -dgawk> @kbd{p $0} +gawk> @kbd{p $0} @print{} $0 = string ("gawk is a wonderful program!") @end example @@ -25446,7 +26491,7 @@ This might be a bit puzzling at first since this is the second line of our test input above. Let's look at @code{NR}: @example -dgawk> @kbd{p NR} +gawk> @kbd{p NR} @print{} NR = number (2) @end example @@ -25465,7 +26510,7 @@ NR == 1 @{ OK, let's just check that that rule worked correctly: @example -dgawk> @kbd{p last} +gawk> @kbd{p last} @print{} last = string ("awk is a wonderful program!") @end example @@ -25476,7 +26521,7 @@ be inside this function. To investigate further, we must begin @samp{n} (for ``next''): @example -dgawk> @kbd{n} +gawk> @kbd{n} @print{} 67 if (fcount > 0) @{ @end example @@ -25496,9 +26541,9 @@ Continuing to step, we now get to the splitting of the current and last records: @example -dgawk> @kbd{n} +gawk> @kbd{n} @print{} 68 n = split(last, alast) -dgawk> @kbd{n} +gawk> @kbd{n} @print{} 69 m = split($0, aline) @end example @@ -25506,7 +26551,7 @@ At this point, we should be curious to see what our records were split into, so we try to look: @example -dgawk> @kbd{p n m alast aline} +gawk> @kbd{p n m alast aline} @print{} n = number (5) @print{} m = number (5) @print{} alast = array, 5 elements @@ -25525,7 +26570,7 @@ inside the array? The first choice would be to use subscripts: @example -dgawk> @kbd{p alast[0]} +gawk> @kbd{p alast[0]} @print{} "0" not in array `alast' @end example @@ -25533,16 +26578,16 @@ dgawk> @kbd{p alast[0]} Oops! @example -dgawk> @kbd{p alast[1]} +gawk> @kbd{p alast[1]} @print{} alast["1"] = string ("awk") @end example This would be kind of slow for a 100-member array, though, so -@command{dgawk} provides a shortcut (reminiscent of another language +@command{gawk} provides a shortcut (reminiscent of another language not to be mentioned): @example -dgawk> @kbd{p @@alast} +gawk> @kbd{p @@alast} @print{} alast["1"] = string ("awk") @print{} alast["2"] = string ("is") @print{} alast["3"] = string ("a") @@ -25554,9 +26599,9 @@ It looks like we got this far OK. Let's take another step or two: @example -dgawk> @kbd{n} +gawk> @kbd{n} @print{} 70 clast = join(alast, fcount, n) -dgawk> @kbd{n} +gawk> @kbd{n} @print{} 71 cline = join(aline, fcount, m) @end example @@ -25566,7 +26611,7 @@ the virtual record to compare, and if the first field was numbered zero, this would work. Let's look at what we've got: @example -dgawk> @kbd{p cline clast} +gawk> @kbd{p cline clast} @print{} cline = string ("gawk is a wonderful program!") @print{} clast = string ("awk is a wonderful program!") @end example @@ -25575,10 +26620,10 @@ Hey, those look pretty familiar! They're just our original, unaltered, input records. A little thinking (the human brain is still the best debugging tool), and we realize that we were off by one! -We get out of @command{dgawk}: +We get out of the debugger: @example -dgawk> @kbd{q} +gawk> @kbd{q} @print{} The program is running. Exit anyway (y/n)? @kbd{y} @end example @@ -25594,9 +26639,9 @@ cline = join(aline, fcount+1, m) and problem solved! @node List of Debugger Commands -@section Main @command{dgawk} Commands +@section Main Debugger Commands -The @command{dgawk} command set can be divided into the +The @command{gawk} debugger command set can be divided into the following categories: @itemize @bullet{} @@ -25623,24 +26668,24 @@ Miscellaneous Each of these are discussed in the following subsections. In the following descriptions, commands which may be abbreviated show the abbreviation on a second description line. -A @command{dgawk} command name may also be truncated if that partial -name is unambiguous. @command{dgawk} has the built-in capability to +A debugger command name may also be truncated if that partial +name is unambiguous. The debugger has the built-in capability to automatically repeat the previous command when just hitting @key{Enter}. This works for the commands @code{list}, @code{next}, @code{nexti}, @code{step}, @code{stepi} and @code{continue} executed without any argument. @menu -* Breakpoint Control:: Control of breakpoints. -* Dgawk Execution Control:: Control of execution. -* Viewing And Changing Data:: Viewing and changing data. -* Dgawk Stack:: Dealing with the stack. -* Dgawk Info:: Obtaining information about the program and - the debugger state. -* Miscellaneous Dgawk Commands:: Miscellaneous Commands. +* Breakpoint Control:: Control of Breakpoints. +* Debugger Execution Control:: Control of Execution. +* Viewing And Changing Data:: Viewing and Changing Data. +* Execution Stack:: Dealing with the Stack. +* Debugger Info:: Obtaining Information about the Program and + the Debugger State. +* Miscellaneous Debugger Commands:: Miscellaneous Commands. @end menu @node Breakpoint Control -@subsection Control Of Breakpoints +@subsection Control of Breakpoints As we saw above, the first thing you probably want to do in a debugging session is to get your breakpoints set up, since otherwise your program @@ -25675,10 +26720,10 @@ Each breakpoint is assigned a number which can be used to delete it from the breakpoint list using the @code{delete} command. With a breakpoint, you may also supply a condition. This is an -@command{awk} expression (enclosed in double quotes) that @command{dgawk} +@command{awk} expression (enclosed in double quotes) that the debugger evaluates whenever the breakpoint is reached. If the condition is true, -then @command{dgawk} stops execution and prompts for a command. Otherwise, -@command{dgawk} continues executing the program. +then the debugger stops execution and prompts for a command. Otherwise, +it continues executing the program. @cindex debugger commands, @code{clear} @cindex @code{clear} debugger command @@ -25704,10 +26749,10 @@ Delete breakpoint(s) set at entry to function @var{function}. @cindex @code{condition} debugger command @item @code{condition} @var{n} @code{"@var{expression}"} Add a condition to existing breakpoint or watchpoint @var{n}. The -condition is an @command{awk} expression that @command{dgawk} evaluates +condition is an @command{awk} expression that the debugger evaluates whenever the breakpoint or watchpoint is reached. If the condition is true, then -@command{dgawk} stops execution and prompts for a command. Otherwise, -@command{dgawk} continues executing the program. If the condition expression is +the debugger stops execution and prompts for a command. Otherwise, +the debugger continues executing the program. If the condition expression is not specified, any existing condition is removed; i.e., the breakpoint or watchpoint is made unconditional. @@ -25763,7 +26808,7 @@ Set a temporary breakpoint (enabled for only one stop). The arguments are the same as for @code{break}. @end table -@node Dgawk Execution Control +@node Debugger Execution Control @subsection Control of Execution Now that your breakpoints are ready, you can start running the program @@ -25792,14 +26837,14 @@ in the list that resumes execution (e.g., @code{continue}) terminates the list For example: @example -dgawk> @kbd{commands} +gawk> @kbd{commands} > @kbd{silent} > @kbd{printf "A silent breakpoint; i = %d\n", i} > @kbd{info locals} > @kbd{set i = 10} > @kbd{continue} > @kbd{end} -dgawk> +gawk> @end example @cindex debugger commands, @code{c} (@code{continue}) @@ -25849,7 +26894,7 @@ and the caller of that frame becomes the innermost frame. @cindex @code{r} debugger command (alias for @code{run}) @item @code{run} @itemx @code{r} -Start/restart execution of the program. When restarting, @command{dgawk} +Start/restart execution of the program. When restarting, the debugger retains the current breakpoints, watchpoints, command history, automatic display variables, and debugger options. @@ -25872,7 +26917,7 @@ stopping, unless it encounters a breakpoint or watchpoint. @itemx @code{si} [@var{count}] Execute one (or @var{count}) instruction(s), stepping inside function calls. (For illustration of what is meant by an ``instruction'' in @command{gawk}, -see the output shown under @code{dump} in @ref{Miscellaneous Dgawk Commands}.) +see the output shown under @code{dump} in @ref{Miscellaneous Debugger Commands}.) @cindex debugger commands, @code{u} (@code{until}) @cindex debugger commands, @code{until} @@ -25900,7 +26945,7 @@ The value of the variable or field is displayed each time the program stops. Each variable added to the list is identified by a unique number: @example -dgawk> @kbd{display x} +gawk> @kbd{display x} @print{} 10: x = 1 @end example @@ -25937,7 +26982,7 @@ Print the value of a @command{gawk} variable or field. Fields must be referenced by constants: @example -dgawk> @kbd{print $3} +gawk> @kbd{print $3} @end example @noindent @@ -25979,16 +27024,16 @@ You can also set special @command{awk} variables, such as @code{FS}, @item @code{watch} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}] @itemx @code{w} @var{var} | @code{$}@var{n} [@code{"@var{expression}"}] Add variable @var{var} (or field @code{$@var{n}}) to the watch list. -@command{dgawk} then stops whenever +The debugger then stops whenever the value of the variable or field changes. Each watched item is assigned a number which can be used to delete it from the watch list using the @code{unwatch} command. With a watchpoint, you may also supply a condition. This is an -@command{awk} expression (enclosed in double quotes) that @command{dgawk} +@command{awk} expression (enclosed in double quotes) that the debugger evaluates whenever the watchpoint is reached. If the condition is true, -then @command{dgawk} stops execution and prompts for a command. Otherwise, -@command{dgawk} continues executing the program. +then the debugger stops execution and prompts for a command. Otherwise, +@command{gawk} continues executing the program. @cindex debugger commands, @code{undisplay} @cindex @code{undisplay} debugger command @@ -26004,8 +27049,8 @@ watch list. @end table -@node Dgawk Stack -@subsection Dealing With The Stack +@node Execution Stack +@subsection Dealing with the Stack Whenever you run a program which contains any function calls, @command{gawk} maintains a stack of all of the function calls leading up @@ -26049,12 +27094,12 @@ Move @var{count} (default 1) frames up the stack toward the outermost frame. Then select and print the frame. @end table -@node Dgawk Info -@subsection Obtaining Information About The Program and The Debugger State +@node Debugger Info +@subsection Obtaining Information about the Program and the Debugger State Besides looking at the values of variables, there is often a need to get other sorts of information about the state of your program and of the -debugging environment itself. @command{dgawk} has one command which +debugging environment itself. The @command{gawk} debugger has one command which provides this information, appropriately called @code{info}. @code{info} is used with one of a number of arguments that tell it exactly what you want to know: @@ -26092,7 +27137,7 @@ Local variables of the selected frame. @item source The name of the current source file. Each time the program stops, the current source file is the file containing the current instruction. -When @command{dgawk} first starts, the current source file is the first file +When the debugger first starts, the current source file is the first file included via the @option{-f} option. The @samp{list @var{filename}:@var{lineno}} command can be used at any time to change the current source. @@ -26128,7 +27173,7 @@ The available options are: @c nested table @table @code @item history_size -The maximum number of lines to keep in the history file @file{./.dgawk_history}. +The maximum number of lines to keep in the history file @file{./.gawk_history}. The default is 100. @item listsize @@ -26140,14 +27185,14 @@ to standard output. An empty string (@code{""}) resets output to standard output. @item prompt -The debugger prompt. The default is @samp{@w{dgawk> }}. +The debugger prompt. The default is @samp{@w{gawk> }}. @item save_history @r{[}on @r{|} off@r{]} -Save command history to file @file{./.dgawk_history}. +Save command history to file @file{./.gawk_history}. The default is @code{on}. @item save_options @r{[}on @r{|} off@r{]} -Save current options to file @file{./.dgawkrc} upon exit. +Save current options to file @file{./.gawkrc} upon exit. The default is @code{on}. Options are read back in to the next session upon startup. @@ -26167,16 +27212,16 @@ Empty lines are ignored; they do @emph{not} repeat the last command. You can't restart the program by having more than one @code{run} command in the file. Also, the list of commands may include additional -@code{source} commands; however, @command{dgawk} will not source the +@code{source} commands; however, the @command{gawk} debugger will not source the same file more than once in order to avoid infinite recursion. In addition to, or instead of the @code{source} command, you can use -the @option{-R @var{file}} or @option{--command=@var{file}} command-line +the @option{-D @var{file}} or @option{--debug=@var{file}} command-line options to execute commands from a file non-interactively (@pxref{Options}. @end table -@node Miscellaneous Dgawk Commands +@node Miscellaneous Debugger Commands @subsection Miscellaneous Commands There are a few more commands which do not fit into the @@ -26194,7 +27239,7 @@ partial dump of Davide Brini's obfuscated code (@pxref{Signature Program}) demonstrates: @smallexample -dgawk> @kbd{dump} +gawk> @kbd{dump} @print{} # BEGIN @print{} @print{} [ 2:0x89faef4] Op_rule : [in_rule = BEGIN] [source_file = brini.awk] @@ -26243,7 +27288,7 @@ dgawk> @kbd{dump} @print{} [ :0x89fa3b0] Op_after_beginfile : @print{} [ :0x89fa388] Op_no_op : @print{} [ :0x89fa3c4] Op_after_endfile : -dgawk> +gawk> @end smallexample @cindex debugger commands, @code{h} (@code{help}) @@ -26252,7 +27297,7 @@ dgawk> @cindex @code{h} debugger command (alias for @code{help}) @item @code{help} @itemx @code{h} -Print a list of all of the @command{dgawk} commands with a short +Print a list of all of the @command{gawk} debugger commands with a short summary of their usage. @samp{help @var{command}} prints the information about the command @var{command}. @@ -26299,7 +27344,7 @@ function @var{function}. This command may change the current source file. Exit the debugger. Debugging is great fun, but sometimes we all have to tend to other obligations in life, and sometimes we find the bug, and are free to go on to the next one! As we saw above, if you are -running a program, @command{dgawk} warns you if you accidentally type +running a program, the debugger warns you if you accidentally type @samp{q} or @samp{quit}, to make sure you really want to quit. @cindex debugger commands, @code{trace} @@ -26318,7 +27363,7 @@ fairly self-explanatory, and using @code{stepi} and @code{nexti} while @node Readline Support @section Readline Support -If @command{dgawk} is compiled with the @code{readline} library, you +If @command{gawk} is compiled with the @code{readline} library, you can take advantage of that library's command completion and history expansion features. The following types of completion are available: @@ -26350,28 +27395,28 @@ and @end table -@node Dgawk Limitations +@node Limitations @section Limitations and Future Plans -We hope you find @command{dgawk} useful and enjoyable to work with, +We hope you find the @command{gawk} debugger useful and enjoyable to work with, but as with any program, especially in its early releases, it still has some limitations. A few which are worth being aware of are: @itemize @bullet{} @item -At this point, @command{dgawk} does not give a detailed explanation of +At this point, the debugger does not give a detailed explanation of what you did wrong when you type in something it doesn't like. Rather, it just responds @samp{syntax error}. When you do figure out what your mistake was, though, you'll feel like a real guru. @item -If you perused the dump of opcodes in @ref{Miscellaneous Dgawk Commands}, +If you perused the dump of opcodes in @ref{Miscellaneous Debugger Commands}, (or if you are already familiar with @command{gawk} internals), you will realize that much of the internal manipulation of data in @command{gawk}, as in many interpreters, is done on a stack. @code{Op_push}, @code{Op_pop}, etc., are the ``bread and butter'' of -most @command{gawk} code. Unfortunately, as of now, @command{dgawk} -does not allow you to examine the stack's contents. +most @command{gawk} code. Unfortunately, as of now, the @command{gawk} +debugger does not allow you to examine the stack's contents. That is, the intermediate results of expression evaluation are on the stack, but cannot be printed. Rather, only variables which are defined @@ -26386,14 +27431,14 @@ programmer, you are expected to know what @code{/[^[:alnum:][:blank:]]/} means. @item -@command{dgawk} is designed to be used by running a program (with all its -parameters) on the command line, as described in @ref{dgawk invocation}. +The @command{gawk} debugger is designed to be used by running a program (with all its +parameters) on the command line, as described in @ref{Debugger Invocation}. There is no way (as of now) to attach or ``break in'' to a running program. This seems reasonable for a language which is used mainly for quickly executing, short programs. @item -@command{dgawk} only accepts source supplied with the @option{-f} option. +The @command{gawk} debugger only accepts source supplied with the @option{-f} option. @end itemize Look forward to a future release when these and other missing features may @@ -27302,7 +28347,7 @@ environments. @cindex Haque, John John Haque reworked the @command{gawk} internals to use a byte-code engine, -providing the @command{dgawk} debugger for @command{awk} programs. +providing the @command{gawk} debugger for @command{awk} programs. @item @cindex Yawitz, Efraim @@ -28572,7 +29617,7 @@ since approximately 2003. @item @command{pawk} Nelson H.F.@: Beebe at the University of Utah has modified Brian Kernighan's @command{awk} to provide timing and profiling information. -It is different from @command{pgawk} +It is different from @command{gawk} with the @option{--profile} option. (@pxref{Profiling}), in that it uses CPU-based profiling, not line-count profiling. You may find it at either @@ -29077,6 +30122,7 @@ When @option{--sandbox} is specified, extensions are disabled @menu * Internals:: A brief look at some @command{gawk} internals. * Plugin License:: A note about licensing. +* Loading Extensions:: How to load dynamic extensions. * Sample Library:: A example of new functions. @end menu @@ -29141,22 +30187,12 @@ macro guarantees that a @code{NODE}'s wide-string value is current. It may end up calling an internal @command{gawk} function. It also guarantees that the wide string is zero-terminated. -@cindex @code{get_curfunc_arg_count()} internal function -@cindex internal function, @code{get_curfunc_arg_count()} -@item size_t get_curfunc_arg_count(void) -This function returns the actual number of parameters passed -to the current function. Inside the code of an extension -this can be used to determine the maximum index which is -safe to use with @code{get_actual_argument}. If this value is -greater than @code{nargs}, the function was -called incorrectly from the @command{awk} program. - @cindex parameters@comma{} number of @cindex @code{nargs} internal variable @cindex internal variable, @code{nargs} @item nargs -Inside an extension function, this is the maximum number of -expected parameters, as set by the @code{make_builtin()} function. +Inside an extension function, this is the actual number of +parameters passed to the current function. @cindex @code{stptr} internal variable @cindex internal variable, @code{stptr} @@ -29202,13 +30238,10 @@ Make sure that @samp{n->type == Node_var_array} first. @cindex arrays, elements, installing @cindex @code{assoc_lookup()} internal function @cindex internal function, @code{assoc_lookup()} -@item NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference) +@item NODE **assoc_lookup(NODE *symbol, NODE *subs) Finds, and installs if necessary, array elements. @code{symbol} is the array, @code{subs} is the subscript. This is usually a value created with @code{make_string()} (see below). -@code{reference} should be @code{TRUE} if it is an error to use the -value before it is created. Typically, @code{FALSE} is the -correct value to use from extension functions. @cindex strings @cindex @code{make_string()} internal function @@ -29400,6 +30433,56 @@ the symbol exists in the global scope. Something like this is enough: int plugin_is_GPL_compatible; @end example +@node Loading Extensions +@appendixsubsec Loading a Dynamic Extension +@cindex loading extension +@cindex @command{gawk}, functions, loading +There are two ways to load a dynamically linked library. The first is to use the +builtin @code{extension()}: + +@example +extension(libname, init_func) +@end example + +where @file{libname} is the library to load, and @samp{init_func} is the +name of the initialization or bootstrap routine to run once loaded. + +The second method for dynamic loading of a library is to use the +command line option @option{-l}: + +@example +$ @kbd{gawk -l libname -f myprog} +@end example + +This will work only if the initialization routine is named @code{dlload()}. + +If you use @code{extension()}, the library will be loaded +at run time. This means that the functions are available only to the rest of +your script. If you use the command line option @option{-l} instead, +the library will be loaded before @command{gawk} starts compiling the +actual program. The net effect is that you can use those functions +anywhere in the program. + +@command{gawk} has a list of directories where it searches for libraries. +By default, the list includes directories that depend upon how gawk was built +and installed (@pxref{AWKPATH Variable}). If you want @command{gawk} +to look for libraries in your private directory, you have to tell it. +The way to do it is to set the @env{AWKPATH} environment variable +(@pxref{AWKPATH Variable}). +@command{gawk} supplies the default suffix @samp{.so} if it is not +present in the name of the library. +If the name of your library is @file{mylib.so}, you can simply type + +@example +$ @kbd{gawk -l mylib -f myprog} +@end example + +and @command{gawk} will do everything necessary to load in your library, +and then call your @code{dlload()} routine. + +You can always specify the library using an absolute pathname, in which +case @command{gawk} will not use @env{AWKPATH} to search for it. + @node Sample Library @appendixsubsec Example: Directory and File Operation Built-ins @c STARTOFRANGE chdirg @@ -29592,7 +30675,7 @@ do_chdir(int nargs) NODE *newdir; int ret = -1; - if (do_lint && get_curfunc_arg_count() != 1) + if (do_lint && nargs != 1) lintwarn("chdir: called with incorrect number of arguments"); newdir = get_scalar_argument(0, FALSE); @@ -29665,7 +30748,7 @@ do_stat(int nargs) char *pmode; /* printable mode */ char *type = "unknown"; - if (do_lint && get_curfunc_arg_count() > 2) + if (do_lint && nargs > 2) lintwarn("stat: called with too many arguments"); @end example @@ -29699,15 +30782,15 @@ calls are shown here, since they all follow the same pattern: @example /* fill in the array */ - aptr = assoc_lookup(array, tmp = make_string("name", 4), FALSE); + aptr = assoc_lookup(array, tmp = make_string("name", 4)); *aptr = dupnode(file); unref(tmp); - aptr = assoc_lookup(array, tmp = make_string("mode", 4), FALSE); + aptr = assoc_lookup(array, tmp = make_string("mode", 4)); *aptr = make_number((AWKNUM) sbuf.st_mode); unref(tmp); - aptr = assoc_lookup(array, tmp = make_string("pmode", 5), FALSE); + aptr = assoc_lookup(array, tmp = make_string("pmode", 5)); pmode = format_mode(sbuf.st_mode); *aptr = make_string(pmode, strlen(pmode)); unref(tmp); |