diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 944 |
1 files changed, 939 insertions, 5 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 7b46026b..d3f5c672 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -20,7 +20,7 @@ @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH November, 2011 +@set UPDATE-MONTH February, 2012 @set VERSION 4.0 @set PATCHLEVEL 1 @@ -79,9 +79,11 @@ @c some special symbols @iftex @set LEQ @math{@leq} +@set PI @math{@pi} @end iftex @ifnottex @set LEQ <= +@set PI @i{pi} @end ifnottex @ifnottex @@ -143,7 +145,7 @@ Some comments on the layout for TeX. @copying Copyright @copyright{} 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, -2000, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011 +2000, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012 Free Software Foundation, Inc. @sp 2 @@ -285,6 +287,8 @@ particular records in a file and perform operations upon them. * Functions:: Built-in and user-defined functions. * Internationalization:: Getting @command{gawk} to speak your language. +* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with + @command{gawk}. * Advanced Features:: Stuff for advanced users, specific to @command{gawk}. * Library Functions:: A Library of @command{awk} Functions. @@ -554,6 +558,21 @@ particular records in a file and perform operations upon them. * I18N Portability:: @command{awk}-level portability issues. * I18N Example:: A simple i18n example. * Gawk I18N:: @command{gawk} is also internationalized. +* Floating-point Programming:: Effective floating-point programming. +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +* Arbitrary Precision Floats:: Arbitrary precision floating-point + arithmetic with @command{gawk}. +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point numbers. +* Integer Programming:: Effective integer programming. +* Arbitrary Precision Integers:: Arbitrary precision integer + arithmetic with @command{gawk}. +* MPFR and GMP Libraries:: Information about the MPFR and GMP libraries. * Nondecimal Data:: Allowing nondecimal input data. * Array Sorting:: Facilities for controlling array traversal and sorting arrays. @@ -1598,10 +1617,13 @@ has been and continues to be a pleasure working with this team of fine people. John Haque contributed the modifications to convert @command{gawk} -into a byte-code interpreter, including the debugger. Stephen Davies +into a byte-code interpreter, including the debugger, and the +additional modifications for support of arbitrary precision arithmetic. +Stephen Davies contributed to the effort to bring the byte-code changes into the mainstream code base. Efraim Yawitz contributed the initial text of @ref{Debugger}. +John Haque contributed the initial text of @ref{Arbitrary Precision Arithmetic}. @cindex Kernighan, Brian I would like to thank Brian Kernighan for invaluable assistance during the @@ -1973,9 +1995,11 @@ line beginning with @samp{#!} lists the full @value{FN} of an interpreter to run and an optional initial command-line argument to pass to that interpreter. The operating system then runs the interpreter with the given argument and the full argument list of the executed program. The first argument -in the list is the full @value{FN} of the @command{awk} program. The rest of the +in the list is the full @value{FN} of the @command{awk} program. +The rest of the argument list contains either options to @command{awk}, or @value{DF}s, -or both.} as if you had +or both. Note that on many systems @command{awk} may be found in +@file{/usr/bin} instead of in @file{/bin}. Caveat Emptor.} as if you had typed @samp{awk -f advice}: @example @@ -3058,6 +3082,9 @@ The following list describes @command{gawk}-specific options: @cindex @code{-b} option @cindex @code{--characters-as-bytes} option Cause @command{gawk} to treat all input data as single-byte characters. +In addition, all output written with @code{print} or @code{printf} +are treated as single-byte characters. + Normally, @command{gawk} follows the POSIX standard and attempts to process its input data according to the current locale. This can often involve converting multibyte characters into wide characters (internally), and @@ -3219,6 +3246,14 @@ when eliminating problems pointed out by @option{--lint}, you should take care to search for all occurrences of each inappropriate construct. As @command{awk} programs are usually short, doing so is not burdensome. +@item -M +@itemx --bignum +@cindex @code{-M} option +@cindex @code{--bignum} option +Force arbitrary precision arithmetic on numbers. This option has no effect +if @command{gawk} is not compiled to use the GNU MPFR and MP libraries +(@pxref{Arbitrary Precision Arithmetic}). + @item -n @itemx --non-decimal-data @cindex @code{-n} option @@ -4573,6 +4608,10 @@ it is good practice to always escape them with a backslash. Then the regexp constants are valid and work the way you want them to, using any version of @command{awk}.@footnote{Use two backslashes if you're using a string constant with a regexp operator or function.} + +Finally, when @samp{@{} and @samp{@}} appear in regexp constants +in a way that cannot be interpreted as an interval expression +(such as @code{/q@{a@}/}), then they stand for themselves. @end table @cindex precedence, regexp operators @@ -12659,6 +12698,18 @@ This is the output record separator. It is output at the end of every @code{print} statement. Its default value is @code{"\n"}, the newline character. (@xref{Output Separators}.) +@cindex @code{PREC} variable +@item PREC # +The working precision of arbitrary precision floating-point numbers, +53 by default (@pxref{Setting Precision}). + +@cindex @code{ROUNDMODE} variable +@item ROUNDMODE # +The rounding mode to use for arbitrary precision arithmetic on +numbers, by default @code{"N"} (@samp{roundTiesToEven} in +the IEEE-754 standard) +(@pxref{Setting Rounding Mode}). + @cindex @code{RS} variable @cindex separators, for records @cindex record separators @@ -12944,6 +12995,25 @@ The value of the @code{getuid()} system call. The version of @command{gawk}. @end table +The following additional elements in the array +are available to provide information about the MPFR and GMP libraries +if your version of @command{gawk} supports arbitrary precision numbers +(@pxref{Arbitrary Precision Arithmetic}): + +@table @code +@item PROCINFO["mpfr_version"] +The version of the GNU MPFR library. + +@item PROCINFO["gmp_version"] +The version of the GNU MP library. + +@item PROCINFO["prec_max"] +The maximum precision supported by MPFR. + +@item PROCINFO["prec_min"] +The minimum precision required by MPFR. +@end table + On some systems, there may be elements in the array, @code{"group1"} through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of supplementary groups that the process has. Use the @code{in} operator @@ -14400,6 +14470,13 @@ Optional parameters are enclosed in square brackets@w{ ([ ]):} @item atan2(@var{y}, @var{x}) @cindex @code{atan2()} function Return the arctangent of @code{@var{y} / @var{x}} in radians. +You can use @samp{pi = atan2(0, -1)} to retrieve the value of +@tex +$\pi$. +@end tex +@ifnottex +pi. +@end ifnottex @item cos(@var{x}) @cindex @code{cos()} function @@ -18355,6 +18432,863 @@ then @command{gawk} produces usage messages, warnings, and fatal errors in the local language. @c ENDOFRANGE inloc +@node Arbitrary Precision Arithmetic +@chapter Arbitrary Precision Arithmetic with @command{gawk} +@cindex arbitrary precision +@cindex multiple precision +@cindex infinite precision +@cindex floating-point numbers, arbitrary precision +@cindex MPFR +@cindex GMP + +@cindex Knuth, Donald +@quotation +@i{There's a credibility gap: We don't know how much of the computer's answers +to believe. Novice computer users solve this problem by implicitly trusting +in the computer as an infallible authority; they tend to believe that all +digits of a printed answer are significant. Disillusioned computer users have +just the opposite approach; they are constantly afraid that their answers +are almost meaningless.} + +Donald Knuth@footnote{Donald E.@: Knuth. +@cite{The Art of Computer Programming}. Volume 2, +@cite{Seminumerical Algorithms}, third edition, +1998, ISBN 0-201-89683-4, p.@: 229.} +@end quotation + +This @value{SECTION} decsribes how to use the arbitrary precision +(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric +capabilites in @command{gawk} to produce maximally accurate results +when you need it. But first you should check if your version of +@command{gawk} supports arbitrary precision arithmetic. +The easiest way to find out is to look at the output of +the following command: + +@example +$ @kbd{gawk --version} +@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) +@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation. +@dots{} +@end example + +@command{gawk} uses the +@uref{http://www.mpfr.org, GNU MPFR} +and +@uref{http://gmplib.org, GNU MP} (GMP) +libraries for arbitrary precision +arithmetic on numbers. So if you do not see the names of these libraries +in the output, then your version of @command{gawk} does not support +arbitrary precision arithmetic. + +Even if you aren't interested in arbitrary precision arithmetic, you +may still benifit from knowing about how @command{gawk} handles numbers +in general, and the limitations of doing arithmetic with ordinary +@command{gawk} numbers. + +@menu +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary Floating-point Representation. +* Floating-point Context:: Floating-point Context. +* Rounding Mode:: Floating-point Rounding Mode. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. +* Setting Precision:: Setting the Working Precision. +* Setting Rounding Mode:: Setting the Rounding Mode. +* Floating-point Constants:: Representing Floating-point Constants. +* Changing Precision:: Changing the Precision of a Number. +* Exact Arithmetic:: Exact Arithmetic with Floating-point Numbers. +* Integer Programming:: Effective Integer Programming. +* Arbitrary Precision Integers:: Arbitrary Precision Integer + Arithmetic with @command{gawk}. +* MPFR and GMP Libraries:: Information About the MPFR and GMP Libraries. +@end menu + +@node Floating-point Programming +@section Effective Floating-point Programming + +Numerical programming is an extensive area; if you need to develop +sophisticated numerical algorithms then @command{gawk} may not be +the ideal tool, and this documentation may not be sufficient. +@c FIXME: JOHN: Do you want to cite some actual books? +It might require a book or two to communicate how to compute +with ideal accuracy and precision +and the result often depends on the particular application. + +@quotation NOTE +A floating-point calculation's @dfn{accuracy} is how close it comes +to the real value. This is as opposed to the @dfn{precision}, which +usually refers to the number of bits used to represent the number +(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision, +the Wikipedia article} for more information). +@end quotation + +Binary floating-point representations and arithmetic are inexact. +Simple values like 0.1 cannot be precisely represented using +binary floating-point numbers, and the limited precision of +floating-point numbers means that slight changes in +the order of operations or the precision of intermediate storage +can change the result. To make matters worse with arbitrary precision +floating-point, you can set the precision before starting a computation, +but then you cannot be sure of the number of significant decimal places +in the final result. + +Sometimes you need to think more about what you really want +and what's really happening. Consider the two numbers +in the following example: + +@example +x = 0.875 # 1/2 + 1/4 + 1/8 +y = 0.425 +@end example + +Unlike the number in @code{y}, the number stored in @code{x} +is exactly representable +in binary since it can be written as a finite sum of one or +more fractions whose denominators are all powers of two. +When @command{gawk} reads a floating-point number from +program source, it automatically rounds that number to whatever +precision your machine supports. If you try to print the numeric +content of a variable using an output format string of @code{"%.17g"}, +it may not produce the same number as you assigned to it: + +@example +$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425} +> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'} +@print{} 0.875, 0.42499999999999999 +@end example + +Often the error is so small you do not even notice it, and if you do, +you can always specify how much precision you would like in your output. +Usually this is a format string like @code{"%.15g"}, which when +used in the previous example, produces an output identical to the input. + +Because the underlying representation can be little bit off from the exact value, +comparing floats to see if they are equal is generally not a good idea. +Here is an example where it does not work like you expect: + +@example +$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +The loss of accuracy during a single computation with floating-point numbers +usually isn't enough to worry about. However, if you compute a value +which is the result of a sequence of floating point operations, +the error can accumulate and greatly affect the computation itself. +Here is an attempt to compute the value of the constant +@value{PI} using one of its many series representations: + +@example +BEGIN @{ + x = 1.0 / sqrt(3.0) + n = 6 + for (i = 1; i < 30; i++) @{ + n = n * 2.0 + x = (sqrt(x * x + 1) - 1) / x + printf("%.15f\n", n * x) + @} +@} +@end example + +When run, the early errors propagating through later computations +cause the loop to terminate prematurely after an attempt to divide by zero. + +@example +$ @kbd{gawk -f pi.awk} +@print{} 3.215390309173475 +@print{} 3.159659942097510 +@print{} 3.146086215131467 +@print{} 3.142714599645573 +@dots{} +@print{} 3.224515243534819 +@print{} 2.791117213058638 +@print{} 0.000000000000000 +@error{} gawk: pi.awk:6: fatal: division by zero attempted +@end example + +Here is one more example where the inaccuracies in internal representations +yield an unexpected result: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{i++} +> @kbd{print i} +> @kbd{@}'} +@print{} 4 +@end example + +Can computation using aribitrary precision help with the previous examples? +If you are impatient to know, see +@ref{Exact Arithmetic}. + +Instead of aribitrary precision floating-point arithmetic, +often all you need is an adjustment of your logic +or a different order for the operations in your calculation. +The stability and the accuracy of the computation of the constant @value{PI} +in the previous example can be enhanced by using the following +simple algebraic transformation: + +@example +(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + x) +@end example + +There is no need to be unduly suspicious about the results from +floating-point arithmetic. The lesson to remember is that +floating-point math is always more complex than the math using +pencil and paper. In order to take advantage of the power +of computer floating-point, you need to know its limitations +and work within them. For most casual use of floating-point arithmetic, +you will often get the expected result in the end if you simply round +the display of your final results to the correct number of significant +decimal digits. Avoid presenting numerical data in a manner that +implies better precision than is actually the case. + +@node Floating-point Representation +@section Binary Floating-point Representation +@cindex IEEE-754 format + +Although floating-point representations vary from machine to machine, +the most commonly encountered representation is that defined by the +IEEE 754 Standard. An IEEE-754 format value has three components: + +@itemize @bullet +@item +a sign bit telling whether the number is positive or negative, + +@item +an @dfn{exponent} giving its order of magnitude, @var{e}, + +@item +and a @dfn{significand}, @var{s}, +specifying the actual digits of the number. +@end itemize + +The value of the +number is then +@iftex +@math{s @cdot 2^e}. +@end iftex +@ifnottex +@var{s * 2^e}. +@end ifnottex +The first bit of a non-zero binary significand +is always one, so the significand in an IEEE-754 format only includes the +fractional part, leaving the leading one implicit. + +Three of the standard IEEE-754 types are 32-bit single precision, +64-bit double precision and 128-bit quadruple precision. +The standard also specifies extended precision formats +to allow greater precisions and larger exponent ranges. + +@node Floating-point Context +@section Floating-point Context +@cindex context, floating-point + +A floating-point context defines the environment for arithmetic operations. +It governs precision, sets rules for rounding and limits range for exponents. +The context has the following primary components: + +@table @code +@item precision +Precision of the floating-point format in bits. +@item emax +Maximum exponent allowed for this format. +@item emin +Minimum exponent allowed for this format. +@item underflow behavior +The format may or may not support gradual underflow. +@item rounding +The rounding mode of this context. +@end table + +@ref{table-ieee-formats} lists the precision and exponent +field values for the basic IEEE-754 binary formats: + +@float Table,table-ieee-formats +@caption{Basic IEEE Formats} +@multitable @columnfractions .20 .20 .20 .20 .20 +@headitem Name @tab Total bits @tab Precision @tab emin @tab emax +@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127 +@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023 +@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383 +@end multitable +@end float + +@quotation NOTE +The precision numbers include the implied leading one that gives them +one extra bit of significand. +@end quotation + +A floating-point context can also determine which signals are treated +as exceptions, and can set rules for arithmetic with special values. +Please consult the IEEE-754 standard or other resources for details. + +@command{gawk} ordinarily uses the hardware double precision +representation for numbers. On most systems, this is IEEE-754 +floating-point format, corresponding to 64-bit binary with 53 bits +of precision. + +@quotation NOTE +In case an underflow occurs, the standard allows, but does not require, +the result from an arithmetic operation to be a number smaller than +the smallest nonzero normalized number. Such numbers do +not have as many significant digits as normal numbers, and are called +@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero, +is called @dfn{flush to zero}. The basic IEEE-754 binary formats +support subnormal numbers. +@end quotation + +@node Rounding Mode +@section Floating-point Rounding Mode +@cindex rounding mode, floating-point + +The @dfn{rounding mode} specifies the behavior for the results of numerical +operations when discarding extra precision. Each rounding mode indicates +how the least significant returned digit of a rounded result is to +be calculated. +The @code{ROUNDMODE} variable (@pxref{Setting Rounding Mode}) provides +program level control over the rounding mode. +@ref{table-rounding-modes} lists the IEEE-754 defined +rounding modes: + +@float Table,table-rounding-modes +@caption{Rounding Modes} +@multitable @columnfractions .45 .30 .25 +@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE} +@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"} +@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"} +@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"} +@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"} +@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"} +@end multitable +@end float + +The default mode @samp{roundTiesToEven} is the most preferred, +but the least intuitive. This method does the obvious thing for most values, +by rounding them up or down to the nearest digit. +For example, rounding 1.132 to two digits yields 1.13, +and rounding 1.157 yields 1.16. + +However, when it comes to rounding a value that is exactly halfway between, +things do not work the way you probably learned in school. +In this case, the number is rounded to the nearest even digit. +So rounding 0.125 to two digits rounds down to 0.12, +but rounding 0.6875 to three digits rounds up to 0.688. +You probably have already encountered this rounding mode when +using the @code{printf} routine to format floating-point numbers. +For example: + +@example +BEGIN @{ + x = -4.5 + for (i = 1; i < 10; i++) @{ + x += 1.0 + printf("%4.1f => %2.0f\n", x, x) + @} +@} +@end example + +@noindent +produces the following output when run@footnote{It +is possible for the output to be completely different if the +C library in your system does not use the IEEE-754 even-rounding +rule to round halfway cases for @code{printf()}.}: + +@example +-3.5 => -4 +-2.5 => -2 +-1.5 => -2 +-0.5 => 0 + 0.5 => 0 + 1.5 => 2 + 2.5 => 2 + 3.5 => 4 + 4.5 => 4 +@end example + +The theory behind the rounding mode @samp{roundTiesToEven} is that +it more or less evenly distributes upward and downward rounds +of exact halves, which might cause the round-off error +to cancel itself out. This is the default rounding mode used +in IEEE-754 computing functions and operators. + +The other rounding modes are rarely used. +Round toward positive infinity (@samp{roundTowardPositive}) +and round toward negative infinity (@samp{roundTowardNegative}) +are often used to implement interval arithmetic, +where you adjust the rounding mode to calculate upper and lower bounds +for the range of output. The @samp{roundTowardZero} +mode can be used for converting floating-point numbers to integers. +The rounding mode @samp{roundTiesToAway} rounds the result to the +nearest number and selects the number with the larger magnitude +if a tie occurs. + +Some numerical analysts will tell you that your choice of rounding style +has tremendous impact on the final outcome, and advise you to wait until +final output for any rounding. Instead, you can often achieve this goal by +setting the precision initially to some value sufficiently larger than +the final desired precision, so that the accumulation of round-off error +does not influence the outcome. +If you suspect that results from your computation are +sensitive to accumulation of round-off error, +one way to be sure is to look for a significant difference in output +when you change the rounding mode. + +@node Arbitrary Precision Floats +@section Arbitrary Precision Floating-point Arithmetic with @command{gawk} + +@command{gawk} uses the GNU MPFR library +for arbitrary precision floating-point arithmetic. The MPFR library +provides precise control over precisions and rounding modes, and gives +correctly rounded reproducible platform-independent results. With the +command-line option @option{--bignum} or @option{-M}, +all floating-point arithmetic operators and numeric functions can yield +results to any desired precision level supported by MPFR. +Two built-in +variables @code{PREC} +(@pxref{Setting Precision}) +and @code{ROUNDMODE} +(@pxref{Setting Rounding Mode}) +provide control over the working precision and the rounding mode. +The precision and the rounding mode are set globally for every operation +to follow. + +The default working precision for arbitrary precision floats is 53, +and the default value for @code{ROUNDMODE} is @code{"N"}, +which selects the IEEE-754 +@samp{roundTiesToEven} (@pxref{Rounding Mode}) rounding mode.@footnote{The +default precision is 53, since according to the MPFR documentation, +the library should be able to exactly reproduce all computations with +double-precision machine floating-point numbers (@code{double} type +in C), except the default exponent range is much wider and subnormal +numbers are not implemented.} +@command{gawk} uses the default exponent range in MPFR +@iftex +(@math{emax = 2^{30} - 1, emin = -emax}) +@end iftex +@ifnottex +(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax}) +@end ifnottex +for all floating-point contexts. +There is no explicit mechanism to adjust the exponent range. +MPFR does not implement subnormal numbers by default, +and this behavior cannot be changed in @command{gawk}. + +@quotation NOTE +When emulating an IEEE-754 format (@pxref{Setting Precision}), +@command{gawk} internally adjusts the exponent range +to the value defined for the format and also performs computations needed for +gradual underflow (subnormal numbers). +@end quotation + +@quotation NOTE +MPFR numbers are variable-size entities, consuming only as much space as +needed to store the significant digits. Since the performance using MPFR +numbers pales in comparison to doing math using the underlying machine +types, you should consider using only as much precision as needed by +your program. +@end quotation + +@node Setting Precision +@section Setting the Working Precision +@cindex @code{PREC} variable + +@command{gawk} uses a global working precision; it does not keep track of +the precision or accuracy of individual numbers. Performing an arithmetic +operation or calling a built-in function rounds the result to the current +working precision. The default working precision is 53 which can be +modified using the built-in variable @code{PREC}. You can also set the +value to one of the following pre-defined case-insensitive strings +to emulate an IEEE-754 binary format: + +@multitable {@code{"double"}} {12345678901234567890123456789012345} +@headitem @code{PREC} @tab IEEE-754 Binary Format +@item @code{"half"} @tab 16-bit half-precision. +@item @code{"single"} @tab Basic 32-bit single precision. +@item @code{"double"} @tab Basic 64-bit double precision. +@item @code{"quad"} @tab Basic 128-bit quadruple precision. +@item @code{"oct"} @tab 256-bit octuple precision. +@end multitable + +The following example illustrates the effects of changing precision +on arithmetic operations: + +@example +$ @kbd{gawk -M -vPREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \} +> @kbd{PREC = "double"; print x + 0 @}'} +@print{} 1e-400 +@print{} 0 +@end example + +Binary and decimal precisions are related approximately according to the +formula: + +@iftex +@math{prec = 3.322 @cdot dps} +@end iftex +@ifnottex +@var{prec} = 3.322 * @var{dps} +@end ifnottex + +@noindent +Here, @var{prec} denotes the binary precision +(measured in bits) and @var{dps} (short for decimal places) +is the decimal digits. We can easily calculate how many decimal +digits the 53-bit significand of an IEEE double is equivalent to: +53 / 3.332 which is equal to about 15.95. +But what does 15.95 digits actually mean? It depends whether you are +concerned about how many digits you can rely on, or how many digits +you need. + +It is important to know how many bits it takes to uniquely identify +a double-precision value (the C type @code{double}). If you want to +convert from @code{double} to decimal and back to @code{double} (e.g., +saving a @code{double} representing an intermediate result to a file, and +later reading it back to restart the computation), then a few more decimal +digits are required. 17 digits is generally enough for a @code{double}. + +It can also be important to know what decimal numbers can be uniquely +represented with a @code{double}. If you want to convert +from decimal to @code{double} and back again, 15 digits is the most that +you can get. Stated differently, you should not present +the numbers from your floating-point computations with more than 15 +significant digits in them. + +Conversely, it takes a precision of 332 bits to hold an approximation +of constant @value{PI} that is accurate to 100 decimal places. +You should always add some extra bits in order to avoid the confusing round-off +issues that occur because numbers are stored internally in binary. + +@node Setting Rounding Mode +@section Setting the Rounding Mode +@cindex @code{ROUNDMODE} variable + +The built-in variable @code{ROUNDMODE} has the default value @code{"N"}, +which selects the IEEE-754 rounding mode @samp{roundTiesToEven}. +The other possible values for @code{ROUNDMODE} are @code{"U"} for rounding mode +@samp{roundTowardPositive}, @code{"D"} for @samp{roundTowardNegative}, +and @code{"Z"} for @samp{roundTowardZero}. +@command{gawk} also accepts @code{"A"} to select the IEEE-754 mode +@samp{roundTiesToAway} +if your version of the MPFR library supports it; otherwise setting +@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode}, +for the meanings of the various rounding modes. + +Here is an example of how to change the default rounding behavior of +@code{printf}'s output: + +@example +$ @kbd{gawk -M -vROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'} +@print{} 1.37 +@end example + +@node Floating-point Constants +@section Representing Floating-point Constants +@cindex constants, floating-point + +Be wary of floating-point constants! When reading a floating-point constant +from program source code, @command{gawk} uses the default precision, +unless overridden +by an assignment to the special variable @code{PREC} on the command +line, to store it internally as a MPFR number. +Changing the precision using @code{PREC} in the program text does +not change the precision of a constant. If you need to +represent a floating-point constant at a higher precision than the +default and cannot use a command line assignment to @code{PREC}, +you should either specify the constant as a string, or +a rational number whenever possible. The following example +illustrates the differences among various ways to +print a floating-point constant: + +@example +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000055511151 +$ @kbd{gawk -M -vPREC = 113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'} +@print{} 0.1000000000000000000000000 +@end example + +In the first case, the number is stored with the default precision of 53. + +@node Changing Precision +@section Changing the Precision of a Number + +@cindex Laurie, Dirk +@quotation +@i{The point is that in any variable-precision package, +a decision is made on how to treat numbers given as data, +or arising in intermediate results, which are represented in +floating-point format to a precision lower than working precision. +Do we promote them to full membership of the high-precision club, +or do we treat them and all their associates as second-class citizens? +Sometimes the first course is proper, sometimes the second, and it takes +careful analysis to tell which.} + +Dirk Laurie@footnote{Dirk Laurie. +@cite{Variable-precision Arithmetic Considered Perilous -- A Detective Story}. +Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.} +@end quotation + +@command{gawk} does not implicitly modify the precision of any previously +computed results when the working precision is changed with an assignment +to @code{PREC}. The precision of a number is always the one that was +used at the time of its creation, and there is no way for the user +to explicitly change it afterwards. However, since the result of a +floating-point arithmetic operation is always an arbitrary precision +floating-point value---with a precision set by the value of @code{PREC}---one of the +following workarounds effectively accomplishes the desired behavior: + +@example +x = x + 0.0 +@end example + +@noindent +or: + +@example +x += 0.0 +@end example + +@node Exact Arithmetic +@section Exact Arithmetic with Floating-point Numbers + +@quotation CAUTION +Never depend on the exactness of floating-point arithmetic, +even for apparently simple expressions! +@end quotation + +Can arbitrary precision arithmetic give exact results? There are +no easy answers. The standard rules of algebra often do not apply +when using floating-point arithmetic. +Among other things, the distributive and associative laws +do not hold completely, and order of operation may be important +for your computation. Rounding error, cumulative precision loss +and underflow are often troublesome. + +When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3} +for equality +using the machine double precision arithmetic, it decides that they +are not equal! +(@xref{Floating-point Programming}.) +You can get the result you want by increasing the precision; +56 in this case will get the job done: + +@example +$ @kbd{gawk -M -vPREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 1 +@end example + +If adding more bits is good, perhaps adding even more bits of +precision is better? +Here is what happens if we use an even larger value of @code{PREC}: + +@example +$ @kbd{gawk -M -vPREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +This is not a bug in @command{gawk} or in the MPFR library. +It is easy to forget that the finite number of bits used to store the value +is often just an approximation after proper rounding. +The test for equality succeeds if and only if @emph{all} bits in the two operands +are exactly the same. Since this is not necessarily true after floating-point +computations with a particular precision and effective rounding rule, +a straight test for equality may not work. + +So, don't assume that floating-point values can be compared for equality. +You should also exercise caution when using other forms of comparisons. +The standard way to compare between floating-point numbers is to determine +how much error (or @dfn{tolerance}) you will allow in a comparison and +check to see if one value is within this error range of the other. + +In applications where 15 or fewer decimal places suffice, +hardware double precision arithmetic can be adequate, and is usually much faster. +But you do need to keep in mind that every floating-point operation +can suffer a new rounding error with catastrophic consequences as illustrated +by our attempt to compute the value of the constant @value{PI}, +(@pxref{Floating-point Programming}). +Extra precision can greatly enhance the stability and the accuracy +of your computation in such cases. + +Repeated addition is not necessarily equivalent to multiplication +in floating-point arithmetic. In the last example +(@pxref{Floating-point Programming}), +you may or may not succeed in getting the correct result by choosing +an arbitrarily large value for @code{PREC}. Reformulation of +the problem at hand is often the correct approach in such situations. + + +@node Integer Programming +@section Effective Integer Programming + +As has been mentioned already, @command{gawk} ordinarily uses hardware double +precision with 64-bit IEEE binary floating-point representation +for numbers on most systems. A large integer like 9007199254740997 +has a binary representation that, although finite, is more than 53 bits long; +it must also be rounded to 53 bits. +The biggest integer that can be stored in a C @code{double} is usually the same +as the largest possible value of a @code{double}. If your system @code{double} +is an IEEE 64-bit @code{double}, this largest possible value is an integer and +can be represented precisely. What more should one know about integers? + +If you want to know what is the largest integer, such that it and +all smaller integers can be stored in 64-bit doubles without losing precision, +then the answer is +@iftex +@math{2^{53}}. +@end iftex +@ifnottex +2^53. +@end ifnottex +The next representable number is the even number +@iftex +@math{2^{53} + 2}, +@end iftex +@ifnottex +2^53 + 2, +@end ifnottex +meaning it is unlikely that you will be able to make +@command{gawk} print +@iftex +@math{2^{53} + 1} +@end iftex +@ifnottex +2^53 + 1 +@end ifnottex +in integer format. +The range of integers exactly representable by a 64-bit double +is +@iftex +@math{[-2^{53}, 2^{53}]}. +@end iftex +@ifnottex +[@minus{}2^53, 2^53]. +@end ifnottex +If you ever see an integer outside this range in @command{gawk} +using 64-bit doubles, you have reason to be very suspicious about +the accuracy of the output. Here is a simple program with erroneous output: + +@example +$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'} +@print{} 9007199254740991 +@print{} 9007199254740992 +@print{} 9007199254740992 +@print{} 9007199254740994 +@end example + +The lesson is to not assume that any large integer printed by @command{gawk} +represents an exact result from your computation, especially if it wraps +around on your screen. + +@node Arbitrary Precision Integers +@section Arbitrary Precision Integer Arithmetic with @command{gawk} +@cindex integer, arbitrary precision + +If the option @option{--bignum} or @option{-M} is specified, +@command{gawk} performs all +integer arithmetic using GMP arbitrary precision integers. +Any number that looks like an integer in a program source or data file +is stored as an arbitrary precision integer. +The size of the integer is limited only by your computer's memory. +The current floating-point context has no effect on operations involving integers. +For example, the following computes +@iftex +@math{5^{4^{3^{2}}}}, +@end iftex +@ifnottex +5^4^3^2, +@end ifnottex +the result of which is beyond the +limits of ordinary @command{gawk} numbers: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{x = 5^4^3^2} +> @kbd{print "# of digits =", length(x)} +> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)} +> @kbd{@}'} +@print{} # of digits = 183231 +@print{} 62060698786608744707 ... 92256259918212890625 +@end example + +If you were to compute the same value using arbitrary precision +floating-point values instead, the precision needed for correct output +(using the formula +@iftex +@math{prec = 3.322 @cdot dps}), +would be @math{3.322 @cdot 183231}, +@end iftex +@ifnottex +@samp{prec = 3.322 * dps}), +would be 3.322 x 183231, +@end ifnottex +or 608693. + +The result from an arithmetic operation with an integer and a floating-point value +is a floating-point value with a precision equal to the working precision. +The following program calculates the eighth term in +Sylvester's sequence@footnote{Weisstein, Eric W. +@cite{Sylvester's Sequence}. From MathWorld--A Wolfram Web Resource. +@url{http://mathworld.wolfram.com/SylvestersSequence.html}} +using a recurrence: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{s = 2.0} +> @kbd{for (i = 1; i <= 7; i++)} +> @kbd{s = s * (s - 1) + 1} +> @kbd{print s} +> @kbd{@}'} +@print{} 113423713055421845118910464 +@end example + +The output differs from the acutal number, 113423713055421844361000443, +because the default precision of 53 is not enough to represent the +floating-point results exactly. You can either increase the precision +(100 is enough in this case), or replace the floating-point constant +@code{2.0} with an integer, to perform all computations using integer +arithmetic to get the correct output. + +It will sometimes be necessary for @command{gawk} to implicitly convert an +arbitrary precision integer into an arbitrary precision floating-point value. +This is primarily because the MPFR library does not always provide the +relevant interface to process arbitrary precision integers or mixed-mode +numbers as needed by an operation or function. +In such a case, the precision is set to the minimum value necessary +for exact conversion, and the working precision is not used for this purpose. +If this is not what you need or want, you can employ a subterfuge +like this: + +@example +gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}' +@end example + +You can avoid this issue altogether by specifying the number as a float +to begin with: + +@example +gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}' +@end example + +Note that for the particular example above, there is unlikely to be a +reason for simply not using the following: + +@example +gawk -M 'BEGIN @{ n = 13; print n % 2 @}' +@end example + + +@node MPFR and GMP Libraries +@section Information About the MPFR and GMP Libraries + +There are a few elements available in the @code{PROCINFO} array +to provide information about the MPFR and GMP libraries. +@xref{Auto-set}, for more information. + @node Advanced Features @chapter Advanced Features of @command{gawk} @cindex advanced features, network connections, See Also networks, connections |