diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 766 |
1 files changed, 766 insertions, 0 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 5b3dd71c..2d68b9cc 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -285,6 +285,7 @@ particular records in a file and perform operations upon them. * Functions:: Built-in and user-defined functions. * Internationalization:: Getting @command{gawk} to speak your language. +* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with @command{gawk}. * Advanced Features:: Stuff for advanced users, specific to @command{gawk}. * Library Functions:: A Library of @command{awk} Functions. @@ -551,6 +552,21 @@ particular records in a file and perform operations upon them. * I18N Portability:: @command{awk}-level portability issues. * I18N Example:: A simple i18n example. * Gawk I18N:: @command{gawk} is also internationalized. +* Floating-point Programming:: Effective floating-point programming. +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +* Arbitrary Precision Floats:: Arbitrary precision floating-point + arithmetic with @command{gawk}. +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point numbers. +* Integer Programming:: Effective integer programming. +* Arbitrary Precision Integers:: Arbitrary precision integer + arithmetic with @command{gawk}. +* MPFR and GMP Libraries:: Information about the MPFR and GMP libraries. * Nondecimal Data:: Allowing nondecimal input data. * Array Sorting:: Facilities for controlling array traversal and sorting arrays. @@ -3212,6 +3228,14 @@ when eliminating problems pointed out by @option{--lint}, you should take care to search for all occurrences of each inappropriate construct. As @command{awk} programs are usually short, doing so is not burdensome. +@item -M +@itemx --bcmath +@cindex @code{-M} option +@cindex @code{--bcmath} option +Force arbitrary precision arithmetic on numbers. This option has no effect +if @command{gawk} is not compiled to use the GNU MPFR and MP libraries +(@pxref{Arbitrary Precision Arithmetic}). + @item -n @itemx --non-decimal-data @cindex @code{-n} option @@ -18294,6 +18318,748 @@ then @command{gawk} produces usage messages, warnings, and fatal errors in the local language. @c ENDOFRANGE inloc +@node Arbitrary Precision Arithmetic +@chapter Arbitrary Precision Arithmetic with @command{gawk} +@cindex arbitrary precision +@cindex multiple precision +@cindex infinite precision +@cindex floating-point numbers, arbitrary precision +@cindex MPFR +@cindex GMP + +@cindex Knuth, Donald +@quotation +@i{There's a credibility gap: We don't know how much of the computer's answers +to believe. Novice computer users solve this problem by implicitly trusting +in the computer as an infallible authority; they tend to believe that all +digits of a printed answer are significant. Disillusioned computer users have +just the opposite approach; they are constantly afraid that their answers +are almost meaningless.}@footnote{ +Donald E. Knuth. The Art of Computer Programming. Volume 2, +Seminumerical Algorithms, 3rd edition, 1998, ISBN 0-201-89683-4, p. 229. +} + +Donald Knuth +@end quotation + + +This section is about how to use the arbitrary precision +(also known as multiple precision or infinite precision) numeric +capabilites in @command{gawk} to produce maximally accurate results +when you need it. But first you should check if your version of +@command{gawk} supports arbitrary precision arithmetic. +The easiest way to find out is to look at the output of +the following command: + +@example +$ @kbd{gawk --version} +@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) +@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation. +.. +@end example + +Gawk uses the GNU MPFR and MP libraries for arbitrary precision arithmetic +on numbers. So if you do not see the names of these libraries in the output above, +then your version of @command{gawk} does not support arbitrary precision math. + +Even if you aren't interested in arbitrary precision arithmetic, you +may still benifit from knowing about how @command{gawk} handles numbers +in general, and the limitations of doing arithmetic with ordinary +@command{gawk} numbers. + +@menu +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary Floating-point Representation. +* Floating-point Context:: Floating-point Context. +* Rounding Mode:: Floating-point Rounding Mode. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. +* Setting Precision:: Setting the Working Precision. +* Setting Rounding Mode:: Setting the Rounding Mode. +* Floating-point Constants:: Representing Floating-point Constants. +* Changing Precision:: Changing the Precision of a Number. +* Exact Arithmetic:: Exact Arithmetic with Floating-point Numbers. +* Integer Programming:: Effective Integer Programming. +* Arbitrary Precision Integers:: Arbitrary Precision Integer. + Arithmetic with @command{gawk}. +* MPFR and GMP Libraries:: Information About the MPFR and GMP Libraries. +@end menu + +@node Floating-point Programming +@section Effective Floating-point Programming + +Numerical programming is an extensive area; if you need to develop +sophisticated numerical algorithms then @command{gawk} may not be +the ideal tool, and this documentation may not be sufficient. +It might require a book or two to communicate how to compute +with ideal accuracy and precision, and the result often depends +on the particular application. + +Binary floating-point representations and arithmetic are inexact. +Simple values like 0.1 cannot be precisely represented using +binary floating-point numbers, and the limited precision of +floating-point numbers means that slight changes in +the order of operations or the precision of intermediate storage +can change the result. To make matters worse with arbitrary precision +floating-point, one can set the precision before starting a computation, +and then one cannot be sure of the final result. + +Sometimes you need to think more about what you really want +and what's really happening. Consider the two numbers +in the following example: + +@example + x = 0.875 # 1/2 + 1/4 + 1/8 + y = 0.425 +@end example + +Unlike the number in y, the number stored in x is exactly representable +in binary since it can be written as a finite sum of one or +more fractions whose denominators are all powers of two. +When @command{gawk} reads a floating-point number from +a program source, it automatically rounds that number to whatever +precision that your machine supports. If you try to print the numeric +content of a variable using an output format string "%.17g", +it may not produce the same number as you assigned to it: + +@example +$ @kbd{gawk 'BEGIN @{ printf("%0.17g, %0.17g\n", x, y) @}'} +@print{} 0.875, 0.42499999999999999 +@end example + +Often the error is so small you do not even notice it, and if you do, +you can always specify how much precision you would like in your output. +Usually this is a format string like "%.15g", which when +used in the example above will produce an output identical to the input. + +Because the underlying representation can be little bit off from the exact value, +comparing floats to see if they are equal is generally not a good idea. +Here is an example where it does not work like you expect: + +@example +$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +The loss of accuracy during a single computation with floating-point numbers +usually isn't enough to worry about. However, if you compute a value +which is the result of a sequence of floating point operations, +the error can accumulate and greatly affect the computation itself. +Here is an attempt to compute the value of the constant @samp{pi} using one of its many +series representations: + +@example +$ cat pi.awk +BEGIN @{ + x = 1.0 / sqrt(3.0) + n = 6 + for (i = 1; i < 30; i++) @{ + n = n * 2.0 + x = (sqrt(x * x + 1) - 1) / x + printf("%.15f\n", n * x) + @} +@} +@end example + +When run, the early errors propagating through later computations will +cause the loop to terminate prematurely after an attempt to divide by zero. +Here is one more example where the inaccuracies in internal representations +yield unexpected result: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{i++} +> @kbd{print i} +> @kbd{@}'} +@print{} 4 +@end example + +Can computation using aribitrary precision help with the examples above? +If you are impatient to know, +@xref{Exact Arithmetic}. +Instead of aribitrary precision floating-point arithmetic, +often all you need is an adjustment of your logic +or different order for the operations in your calculation. +The stability and the accuracy of the computation of the constant @samp{pi} +in the example above can be enhanced by using the following +simple algebraic transformation: + +@example + (sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + x) +@end example + +There is no need to be unduly suspicious about the results from +floating-point arithmetic. The lesson to remember is that +floating-point math is always more complex than the math using +pencil and paper. In order to take advantage of the power +of computer floating-point, you need to know its limitations +and work within them. For most casual use of floating-point arithmetic, +you will often get the expected result in the end if you simply round +the display of your final results to the correct number of significant +decimal digits. Avoid presenting numerical data in a manner that +implies better precision than is actually the case. + +@node Floating-point Representation +@section Binary Floating-point Representation +@cindex IEEE-754 format + +Although floating-point representations vary from machine to machine, +the most commonly encountered representation is that defined by the +IEEE 754 Standard. An IEEE-754 format has three components: +a sign bit telling whether the number is positive or negative, +an exponent giving its order of magnitude @var{e}, and a significand @var{s} +specifying the actual digits of the number. The value of the +number is then @var{s * 2^e}. The first bit of a non-zero binary significand +is always one so the significand in an IEEE-754 format only includes the +fractional part leaving the leading one implicit. + +Three of the standard IEEE-754 types are 32-bit single precision, +64-bit double precision and 128-bit quadruple precision. +The standard also specifies extended precision formats +to allow greater precisions and larger exponent ranges. + + +@node Floating-point Context +@section Floating-point Context +@cindex context, floating-point + +A floating-point context defines the environment for arithmetic operations. +It governs precision, sets rules for rounding and limits range for exponents. +The context has the following primary components: + +@table @code +@item precision +Precision of the floating-point format in bits. +@item emax +Maximum exponent allowed for this format. +@item emin +Minimum exponent allowed for this format. +@item subnormal behavior +The format may or may not support gradual underflow. +@item rounding +The rounding mode of this context. +@end table + +@ref{table-ieee-formats} lists the precision and exponent +field values for the basic IEEE-754 binary formats: + +@float Table,table-ieee-formats +@caption{Basic IEEE Formats} +@multitable @columnfractions .20 .20 .20 .20 .20 +@headitem Name @tab Total bits @tab Precision @tab emin @tab emax +@item Single @tab 32 @tab 24 @tab -126 @tab +127 +@item Double @tab 64 @tab 53 @tab -1022 @tab +1023 +@item Quadruple @tab 128 @tab 113 @tab -16382 @tab +16383 +@end multitable +@end float + +@quotation NOTE +The precision numbers include the implied leading one that gives them +one extra bit of significand. +@end quotation + +A floating-point context can also determine which signals are treated as exceptions, +or can set rules for arithmetic with special values. The interested reader should +consult the IEEE-754 standard or other resources for details. + +Gawk ordinarily uses the hardware double precision for a number. +On most systems, it is in IEEE-754 floating-point format which corresponds +to 64-bit binary with 53 bits of precision. + + +@quotation NOTE +In case an underflow occurs, the standard allows, but does not require, the smallest +normal number to loose precision gradually when an arithmetic operation is not +exactly zero but is too close to zero. Such numbers do not have as many significant +digits as normal numbers, and are called denormals or subnormals. +The basic IEEE-754 binary formats support subnormal numbers. +@end quotation + + +@node Rounding Mode +@section Floating-point Rounding Mode +@cindex rounding mode, floating-point + +Rounding mode specifies the behavior for the results of numerical operations when +discarding extra precision. Each rounding mode indicates how the +least significant returned digit of a rounded result is to be calculated. +@ref{table-rounding-modes} lists the IEEE-754 defined rounding modes: + +@float Table,table-rounding-modes +@caption{Rounding Modes} +@multitable @columnfractions .45 .25 .30 +@headitem Rounding Mode @tab IEEE Name @tab @code{RNDMODE} (@pxref{Setting Rounding Mode}) +@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"} +@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"} +@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"} +@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"} +@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"} +@end multitable +@end float + +The default mode @samp{roundTiesToEven} is the most preferred, +but the least intuitive. This method does the obvious thing for most values, +by rounding them up or down to the nearest digit. +For example, rounding 1.132 to two digits yields 1.13, +and rounding 1.157 yields 1.16. +When it comes to rounding a value that is exactly halfway between, +it does not probably work the way you have learned in school. +In this case, the number is rounded to the nearest even digit. +So rounding 0.125 to two digits rounds down to 0.12, +but rounding 0.6875 to three digits rounds up to 0.688. +You probably have already encountered this rounding mode when +using the @code{printf} routine to format floating-point numbers. +For example: + +@example +BEGIN @{ + x = -4.5 + for (i = 1; i < 10; i++) @{ + x += 1.0 + printf("%4.1f => %2.0f\n", x, x) + @} +@} +@end example + +@noindent +produces the following output when run@footnote{ +It is possible for the output to be completely different if the +C library in your system does not use the IEEE-754 even-rounding +rule to round halfway cases for @code{printf()}.}: + +@example +-3.5 => -4 +-2.5 => -2 +-1.5 => -2 +-0.5 => 0 + 0.5 => 0 + 1.5 => 2 + 2.5 => 2 + 3.5 => 4 + 4.5 => 4 +@end example + +The theory behind the rounding mode @samp{roundTiesToEven} is that +it more or less evenly distributes upward and downward rounds +of exact halves, which might cause the round-off error +to cancel itself out. This is the default rounding mode used +in IEEE-754 computing functions and operators. + +The other rounding modes are rarely used. +Round toward positive infinity @samp{roundTowardPositive} +and round toward negative infinity @samp{roundTowardNegative} +are often used to implement interval arithmetic, +where you adjust the rounding mode to calculate upper and lower bounds +for the range of output. The @samp{roundTowardZero} +mode can be used for converting floating-point numbers to integers. +The rounding mode @samp{roundTiesToAway} rounds the result to the +nearest number and selects the number with the larger magnitude +if a tie occurs. + +Some numerical analysts will tell you that your choice of rounding style +has tremendous impact on the final outcome, and advice you to wait until +final output for any rounding. This goal can often be achieved by +setting the precision initially to some value sufficiently larger than +the final desired precision so that the accumulation of round-off error +do not influence the outcome. +If you suspect that results from your computation are +sensitive to accumulation of round-off error, +one way to be sure is to look for significant difference in output +when you change the rounding mode. + + +@node Arbitrary Precision Floats +@section Arbitrary Precision Floating-point Arithmetic with @command{gawk} + +Gawk uses the GNU MPFR library for arbitrary precision floating-point arithmetic. +The MPFR library provides precise control over precisions and rounding modes, +and gives correctly rounded reproducible platform-independent results. +With the command-line option @option{--bcmath} or @option{-M}, all floating-point +arithmetic operators and numeric functions can yield results to any +desired precision level supported by MPFR. Two built-in variables @code{PREC} +(@pxref{Setting Precision}) +and @code{RNDMODE} +(@pxref{Setting Rounding Mode}) +give a simple way of controlling the working precision and the rounding mode in @command{gawk}. +The precision and the rounding mode are set globally for every operation to follow. +The default working precision for arbitrary precision floats is 53@footnote{The +default precision is 53, since according to the MPFR documentation, mpfr should be able to exactly +reproduce all computations with double-precision machine floating-point numbers (double type in C), +except the default exponent range is much wider and subnormal numbers are not implemented.} +and the default value for @code{RNDMODE} is @code{"N"} which selects the IEEE-754 +@samp{roundTiesToEven} (@pxref{Rounding Mode}) rounding mode. +The default exponent range in MPFR (@var{emax} = 2^30 - 1, @var{emin} = -@var{emax}) +is used by @command{gawk} for all floating-point contexts. +There is no explicit mechanism in @command{gawk} to adjust the exponent range. +MPFR does not implement subnormal numbers by default, +and this behavior cannot be changed in @command{gawk}. + +@quotation NOTE +When emulating an IEEE-754 format (@pxref{Setting Precision}), +@command{gawk} internally adjusts the exponent range +to the value defined for the format and also performs computations needed for +gradual underflow (subnormal numbers). +@end quotation + +@quotation NOTE +MPFR numbers are variable-size entities, consuming only as much space as needed to store +the significant digits. Since the performance using MPFR numbers pales compared to +doing math on the underlying machine types, you should consider only using as much +precision as needed by your program. +@end quotation + + +@node Setting Precision +@section Setting the Working Precision +@cindex @code{PREC} variable + +Gawk uses a global working precision; it does not keep track of +the precision or accuracy of individual numbers. Performing an arithmetic +operation or calling a built-in function rounds the result to the current +working precision. The default working precision is 53 which can be +modified using the built-in variable @code{PREC}. You can also set the +value to one of the following pre-defined case-insensitive strings +to emulate an IEEE-754 binary format: + +@multitable {double} {12345678901234567890123456789012345} +@headitem @code{PREC} @tab IEEE-754 Binary Format +@item @code{"half"} @tab 16-bit half-precision. +@item @code{"single"} @tab Basic 32-bit single precision. +@item @code{"double"} @tab Basic 64-bit double precision. +@item @code{"quad"} @tab Basic 128-bit quadruple precision. +@item @code{"oct"} @tab 256-bit octuple precision. +@end multitable + +The following example illustrates the effects of changing precision +on arithmetic operations: + +@example +$ @kbd{gawk -M -vPREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \} +> @kbd{PREC = "double"; print x + 0 @}'} +@print{} 1e-400 +@print{} 0 +@end example + +Binary and decimal precisions are related approximately according to the +formula @code{prec = 3.322 * dps}, where @code{prec} denotes the binary precision +(measured in bits) and @code{dps} (short for decimal places) +is the decimal digits. We can easily calculate how many decimal +digits the 53-bit significand of an IEEE double is equivalent to: +53 / 3.332 which is equal to about 15.95. +But what does 15.95 digits actually mean? It depends whether you are +concerned about how many digits you can rely on, or how many digits +you need. + +It is important to know how many bits it takes to uniquely +identify a double. If you want to round-trip from double to decimal and +back to double (saving a double representing an intermediate result +to a file, and later reading it back to restart the computation for instance) +then few more decimal digits are required. 17 digits will generally +be enough for a double. + +It can also be important to know what decimal numbers can be uniquely +represented with a floating-point double. If you want to round-trip +from decimal to double and back again, 15 is the most that +you can get. Stated differently, you should not present +the numbers from your floating-point computations with more than 15 +significant digits in them. + +Conversely, it takes a precision of 332 bits to hold an approximation +of constant @samp{pi} that is accurate to 100 decimal places. +You should always add few extra bits in order to avoid confusing round-off +issues that occur because numbers are stored internally in binary. + + +@node Setting Rounding Mode +@section Setting the Rounding Mode +@cindex @code{RNDMODE} variable + +The built-in variable @code{RNDMODE} has the default value @code{"N"} which selects +the IEEE-754 rounding mode @samp{roundTiesToEven}. +The other possible values for @code{RNDMODE} are @code{"U"} for rounding mode +@samp{roundTowardPositive}, @code{"D"} for @samp{roundTowardNegative}, +and @code{"Z"} for @samp{roundTowardZero}. +Gawk also accepts @code{"A"} to select the IEEE-754 mode @samp{roundTiesToAway} +if the version of your MPFR library supports it, otherwise setting +@code{RNDMODE} to this value has no effect. @xref{Rounding Mode}, +for the meanings of the various round modes. + +Here is an example of how to change the default rounding behavior of +the @code{printf} output: + +@example +$ @kbd{gawk -M -vRNDMODE="Z" 'BEGIN@{ printf("%.2f\n", 1.378)@}'} +@print{} 1.37 +@end example + + +@node Floating-point Constants +@section Representing Floating-point Constants +@cindex constants, floating-point + +Be wary of floating-point constants! When reading a floating-point constant +from a program source, @command{gawk} uses the default precision, unless overridden +by an assignment to the special variable @code{PREC} in the command +line, to store it internally as a MPFR number. +Changing the precision using @code{PREC} in the program text does +not change the precision of a constant. If you need to +represent a floating-point constant at a higher precision than the +default and cannot use a command line assignment to @code{PREC}, +you should either specify the constant as a string, or +a rational number whenever possible. The following example +illustrates the differences among various ways to +print a floating-point constant: + +@example +$ @kbd{gawk -M 'BEGIN @{ PREC=113; printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000055511151 +$ @kbd{gawk -M -vPREC=113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC=113; printf("%0.25f\n", "0.1") @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC=113; printf("%0.25f\n", 1/10) @}'} +@print{} 0.1000000000000000000000000 +@end example + +In the first case above, the number is stored with the default precision of 53. + + +@node Changing Precision +@section Changing the Precision of a Number + +@cindex Laurie, Dirk +@quotation +@i{.. The point is that in any variable-precision package, +a decision is made on how to treat numbers given as data, +or arising in intermediate results, which are represented in +floating-point format to a precision lower than working precision. +Do we promote them to full membership of the high-precision club, +or do we treat them and all their associates as second-class citizens? +Sometimes the first course is proper, sometimes the second, and it takes +careful analysis to tell which.}@footnote{ +Dirk Laurie. Variable-precision Arithmetic Considered Perilous - A Detective Story. +Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008. +} + +Dirk Laurie +@end quotation + + +Gawk does not implicitly modify the precision of any previously computed results +when the working precision is changed with an assignment to @code{PREC} in the +program. The precision of a number is always the one that was used at the time +of its creation, and there is no way for the user to explicitly change it +thereafter. However, since the result of a floating-point arithmetic operation +is always an arbitrary precision float with a precision set by the value +of @code{PREC}, the following workaround will effectively accomplish +the same desired behavior: + +@example + x = x + 0.0 +@end example + +@node Exact Arithmetic +@section Exact Arithmetic with Floating-point Numbers + +@quotation CAUTION +Never depend on the exactness of floating-point arithmetic, +even for apparently simple expressions! +@end quotation + +Can arbitrary precision arithmetic give exact results? There are +no easy answers. The standard rules of algebra often do not apply +when using floating-point arithmetic. +Among other things, the distributive and associative laws +do not hold completely, and order of operation may be important +for your computation. Rounding error, cumulative precision loss, +and underflow are often troublesome. + +When @command{gawk} tests the expressions 0.1 + 12.2 and 12.3 for equality +using the machine double precision arithmetic it decides that they +are not equal +(@pxref{Floating-point Programming})! +You can get the result you want by increasing the precision, +56 in this case will get the job done: + +@example +$ @kbd{gawk -M -vPREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 1 +@end example + +Using an even larger value of @code{PREC}: + +@example +$ @kbd{gawk -M -vPREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +This is not a bug in @command{gawk} or in the MPFR library. +It is easy to forget that the finite number of bits used to store the value +is often just an approximation after proper rounding. +The test for equality succeeds if and only if all bits in the two operands +are exactly the same. Since this is not necessarily true after floating-point +computations with a particular precision and the effective rounding rule, +a straight test for equality may not work. + +So don't assume that floating-point values can be compared for equality. +You should also exercise caution when using other forms of comparisons. +The standard way to compare between floating-point numbers is to determine +how much error (or tolerance) you will allow in a comparison and +check to see if one value is within this error range of the other. + +In applications where 15 or fewer decimal places suffice, +hardware double precision arithmetic can be adequate, and is usually much faster. +But you do need to keep in mind that every floating-point operation +can suffer a new rounding error with catastrophic consequences as illustrated +by our attempt to compute the value of the constant @samp{pi}, +(@pxref{Floating-point Programming}). +Extra precision can greatly enhance the stability and the accuracy +of your computation in such cases. + +Repeated addition is not necessarily equivalent to multiplication +in floating-point arithmetic. In the last example +(@pxref{Floating-point Programming}), +you may or may not succeed in getting the correct result by choosing +an arbitrarily large value for @code{PREC}. Reformulation of +the problem at hand is often the correct approach in such situations. + + +@node Integer Programming +@section Effective Integer Programming + +As has been mentioned already, @command{gawk} ordinarily uses hardware double +precision with 64-bit IEEE binary floating-point representation +for numbers on most systems. A large integer like 9007199254740997 +has a binary representation that, although finite, is more than 53 bits long; +it must also be rounded to 53 bits. +The biggest integer that can be stored in a double is usually the same +as the largest possible value of a double. If your system double is +an IEEE 64-bit double, it is an integer and can be represented precisely. +What more should one know about integers? + +If you want to know what is the largest integer, such that it and +all smaller integers can be stored in 64-bit doubles without losing precision, +then the answer is 2^53. The next representable number is the even number +2^53 + 2 meaning it is unlikely that you will be able to make +@command{gawk} to print 2^53 + 1 in integer format. +The range of integers exactly representable by a 64-bit double +is [-2^53, 2^53]. If you ever see an integer outside this range in @command{gawk} +using 64-bit doubles, you have the reason to be very suspicious about +the accuracy of the output. Here is a simple program with erroneous output: + +@example +$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'} +@print{} 9007199254740991 +@print{} 9007199254740992 +@print{} 9007199254740992 +@print{} 9007199254740994 +@end example + +The lesson is not to assume a large integer printed by @command{gawk} +to be an exact result from your computation, especially if it wraps around on +your terminal screen. + +@node Arbitrary Precision Integers +@section Arbitrary Precision Integer Arithmetic with @command{gawk} +@cindex integer, arbitrary precision + +If the option @option{--bcmath} or @option{-M} is specified, @command{gawk} will perform all +integer arithmetic using GMP arbitrary precision integers. +Any number that looks like an integer in a program source or data file +will be stored as an arbitrary precision integer. +The size of the integer is limited only by your computer's memory. +The current floating-point context has no effect on operations involving integers. +For example, the following computes 5^4^3^2, the result of which is beyond the +limits of ordinary @command{gawk} numbers: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{x = 5^4^3^2} +> @kbd{print "# of digits =", length(x)} +> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)} +> @kbd{@}'} +@print{} # of digits = 183231 +@print{} 62060698786608744707 ... 92256259918212890625 +@end example + +If you were to compute the same using arbitrary precision floats instead, +the precision needed for correct output, +using the formula @code{prec = 3.322 * dps}, +would be 3.322 * 183231 or 608693. + +The result from an arithmetic operation with an integer and a float +is a float with a precision equal to the working precision. +The following program calculates the eighth term in +Sylvester's sequence using a recurrence: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{s = 2.0} +> @kbd{for (i = 1; i <= 7; i++)} +> @kbd{s = s * (s - 1) + 1} +> @kbd{print s@}'} +@print{} 113423713055421845118910464 +@end example + +The output differs from the acutal number 113423713055421844361000443 +because the default precision 53 is not enough to represent the +floating-point results exactly. You can either increase +the precision (100 in this case is enough), or replace the float 2.0 with +an integer to perform all computations using integer arithmetic to +get the correct output. + +It will sometimes be necessary for @command{gawk} to implicitly convert an +arbitrary precision integer into an arbitrary precision float. +This is primarily because the MPFR library does not always provide the +relevant interface to process arbitrary precision integers or mixed-mode +numbers as needed by an operation or function. +In such a case, the precision is set to the minimum value necessary +for exact conversion, and the working precision is not used for this purpose. +If this is not what you need or want, you can employ a subterfuge +like this: + +@example +$ @kbd{gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}'} +@end example + +You can avoid this issue altogether by specifying the number as a float +to begin with: + +@example +$ @kbd{gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}'} +@end example + +Note that for the particular example above, there is unlikely to be a +reason for simply not using the following: + +@example +$ @kbd{gawk -M 'BEGIN @{ n = 13; print n % 2 @}'} +@end example + + +@node MPFR and GMP Libraries +@section Information About the MPFR and GMP Libraries +@cindex @code{PROCINFO} array + +The following elements of the PROCINFO array (@pxref{Built-in Variables}) +are available to provide information about the MPFR and GMP libraries: + +@table @code +@item PROCINFO["mpfr_version"] +The version of the GNU MPFR library. + +@item PROCINFO["gmp_version"] +The version of the GNU MP library. + +@item PROCINFO["prec_max"] +The maximum precision supported by MPFR. + +@item PROCINFO["prec_min"] +The minimum precision required by MPFR. +@end table + + @node Advanced Features @chapter Advanced Features of @command{gawk} @cindex advanced features, network connections, See Also networks, connections |