diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 2503 |
1 files changed, 1255 insertions, 1248 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 7d463a3d..d700f2a7 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -296,14 +296,14 @@ particular records in a file and perform operations upon them. * Functions:: Built-in and user-defined functions. * Internationalization:: Getting @command{gawk} to speak your language. -* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with - @command{gawk}. * Advanced Features:: Stuff for advanced users, specific to @command{gawk}. * Library Functions:: A Library of @command{awk} Functions. * Sample Programs:: Many @command{awk} programs with complete explanations. * Debugger:: The @code{gawk} debugger. +* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with + @command{gawk}. * Dynamic Extensions:: Adding new built-in functions to @command{gawk}. * Language History:: The evolution of the @command{awk} @@ -569,29 +569,6 @@ particular records in a file and perform operations upon them. * I18N Portability:: @command{awk}-level portability issues. * I18N Example:: A simple i18n example. * Gawk I18N:: @command{gawk} is also internationalized. -* General Arithmetic:: An introduction to computer arithmetic. -* Floating Point Issues:: Stuff to know about floating-point numbers. -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. -* Integer Programming:: Effective integer programming. -* Floating-point Programming:: Effective Floating-point Programming. -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. -* Gawk and MPFR:: How @command{gawk} provides - aribitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point - Arithmetic with @command{gawk}. -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point - numbers. -* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with - @command{gawk}. * Nondecimal Data:: Allowing nondecimal input data. * Array Sorting:: Facilities for controlling array traversal and sorting arrays. @@ -673,6 +650,29 @@ particular records in a file and perform operations upon them. * Miscellaneous Debugger Commands:: Miscellaneous Commands. * Readline Support:: Readline support. * Limitations:: Limitations and future plans. +* General Arithmetic:: An introduction to computer arithmetic. +* Floating Point Issues:: Stuff to know about floating-point numbers. +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not Abstract + Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. +* Integer Programming:: Effective integer programming. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +* Gawk and MPFR:: How @command{gawk} provides + aribitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point + numbers. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + @command{gawk}. * Plugin License:: A note about licensing. * Sample Library:: A example of new functions. * Internal File Description:: What the new functions will do. @@ -1201,6 +1201,13 @@ solving real problems. @ref{Debugger}, describes the @command{awk} debugger. +@ref{Arbitrary Precision Arithmetic}, +describes advanced arithmetic facilities provided by +@command{gawk}. + +@ref{Dynamic Extensions}, describes how to add new variables and +functions to @command{gawk} by writing extensions in C. + @ref{Language History}, describes how the @command{awk} language has evolved since its first release to present. It also describes how @command{gawk} @@ -18497,1229 +18504,6 @@ then @command{gawk} produces usage messages, warnings, and fatal errors in the local language. @c ENDOFRANGE inloc -@node Arbitrary Precision Arithmetic -@chapter Arithmetic and Arbitrary Precision Arithmetic with @command{gawk} -@cindex arbitrary precision -@cindex multiple precision -@cindex infinite precision -@cindex floating-point numbers, arbitrary precision -@cindex MPFR -@cindex GMP - -@cindex Knuth, Donald -@quotation -@i{There's a credibility gap: We don't know how much of the computer's answers -to believe. Novice computer users solve this problem by implicitly trusting -in the computer as an infallible authority; they tend to believe that all -digits of a printed answer are significant. Disillusioned computer users have -just the opposite approach; they are constantly afraid that their answers -are almost meaningless.}@* -Donald Knuth@footnote{Donald E.@: Knuth. -@cite{The Art of Computer Programming}. Volume 2, -@cite{Seminumerical Algorithms}, third edition, -1998, ISBN 0-201-89683-4, p.@: 229.} -@end quotation - -This @value{CHAPTER} discusses issues that you may encounter -when performing arithmetic. It begins by discussing some of -the general atributes of computer arithmetic, along with how -this can influence what you see when running @command{awk} programs. -This discussion applies to all versions of @command{awk}. - -Then the discussion moves on to @dfn{arbitrary precsion -arithmetic}, a feature which is specific to @command{gawk}. - -@menu -* General Arithmetic:: An introduction to computer arithmetic. -* Floating-point Programming:: Effective Floating-point Programming. -* Gawk and MPFR:: How @command{gawk} provides - aribitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic - with @command{gawk}. -* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with - @command{gawk}. -@end menu - -@node General Arithmetic -@section A General Description of Computer Arithmetic - -@cindex integers -@cindex floating-point, numbers -@cindex numbers, floating-point -Within computers, there are two kinds of numeric values: @dfn{integers} -and @dfn{floating-point}. -In school, integer values were referred to as ``whole'' numbers---that is, -numbers without any fractional part, such as 1, 42, or @minus{}17. -The advantage to integer numbers is that they represent values exactly. -The disadvantage is that their range is limited. On most systems, -this range is @minus{}2,147,483,648 to 2,147,483,647. -However, many systems now support a range from -@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. - -@cindex unsigned integers -@cindex integers, unsigned -Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}. -Signed values may be negative or positive, with the range of values just -described. -Unsigned values are always positive. On most systems, -the range is from 0 to 4,294,967,295. -However, many systems now support a range from -0 to 18,446,744,073,709,551,615. - -@cindex double precision floating-point -@cindex single precision floating-point -Floating-point numbers represent what are called ``real'' numbers; i.e., -those that do have a fractional part, such as 3.1415927. -The advantage to floating-point numbers is that they -can represent a much larger range of values. -The disadvantage is that there are numbers that they cannot represent -exactly. -@command{awk} uses @dfn{double precision} floating-point numbers, which -can hold more digits than @dfn{single precision} -floating-point numbers. -@c Floating-point issues are discussed more fully in -@c @ref{Floating Point Issues}. - -There a several important issues to be aware of, described next. - -@menu -* Floating Point Issues:: Stuff to know about floating-point numbers. -* Integer Programming:: Effective integer programming. -@end menu - -@node Floating Point Issues -@subsection Floating-Point Number Caveats - -As mentioned earlier, floating-point numbers represent what are called -``real'' numbers, i.e., those that have a fractional part. @command{awk} -uses double precision floating-point numbers to represent all -numeric values. This @value{SECTION} describes some of the issues -involved in using floating-point numbers. - -There is a very nice -@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic} -by David Goldberg, -``What Every Computer Scientist Should Know About Floating-point Arithmetic,'' -@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48. -This is worth reading if you are interested in the details, -but it does require a background in computer science. - -@menu -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. -@end menu - -@node String Conversion Precision -@subsubsection The String Value Can Lie - -Internally, @command{awk} keeps both the numeric value -(double precision floating-point) and the string value for a variable. -Separately, @command{awk} keeps -track of what type the variable has -(@pxref{Typing and Comparison}), -which plays a role in how variables are used in comparisons. - -It is important to note that the string value for a number may not -reflect the full value (all the digits) that the numeric value -actually contains. -The following program (@file{values.awk}) illustrates this: - -@example -@{ - sum = $1 + $2 - # see it for what it is - printf("sum = %.12g\n", sum) - # use CONVFMT - a = "<" sum ">" - print "a =", a - # use OFMT - print "sum =", sum -@} -@end example - -@noindent -This program shows the full value of the sum of @code{$1} and @code{$2} -using @code{printf}, and then prints the string values obtained -from both automatic conversion (via @code{CONVFMT}) and -from printing (via @code{OFMT}). - -Here is what happens when the program is run: - -@example -$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk} -@print{} sum = 4.8888888 -@print{} a = <4.88889> -@print{} sum = 4.88889 -@end example - -This makes it clear that the full numeric value is different from -what the default string representations show. - -@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with -at least six significant digits. For some applications, you might want to -change it to specify more precision. -On most modern machines, most of the time, -17 digits is enough to capture a floating-point number's -value exactly.@footnote{Pathological cases can require up to -752 digits (!), but we doubt that you need to worry about this.} - -@node Unexpected Results -@subsubsection Floating Point Numbers Are Not Abstract Numbers - -@cindex floating-point, numbers -Unlike numbers in the abstract sense (such as what you studied in high school -or college arithmetic), numbers stored in computers are limited in certain ways. -They cannot represent an infinite number of digits, nor can they always -represent things exactly. -In particular, -floating-point numbers cannot -always represent values exactly. Here is an example: - -@example -$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'} -515.79 -@print{} 0000051579 -515.80 -@print{} 0000051579 -515.81 -@print{} 0000051580 -515.82 -@print{} 0000051582 -@kbd{@value{CTL}-d} -@end example - -@noindent -This shows that some values can be represented exactly, -whereas others are only approximated. This is not a ``bug'' -in @command{awk}, but simply an artifact of how computers -represent numbers. - -@quotation NOTE -It cannot be emphasized enough that the behavior just -described is fundamental to modern computers. You will -see this kind of thing happen in @emph{any} programming -language using hardware floating-point numbers. It is @emph{not} -a bug in @command{gawk}, nor is it something that can be ``just -fixed.'' -@end quotation - -@cindex negative zero -@cindex positive zero -@cindex zero@comma{} negative vs.@: positive -Another peculiarity of floating-point numbers on modern systems -is that they often have more than one representation for the number zero! -In particular, it is possible to represent ``minus zero'' as well as -regular, or ``positive'' zero. - -This example shows that negative and positive zero are distinct values -when stored internally, but that they are in fact equal to each other, -as well as to ``regular'' zero: - -@example -$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0} -> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz} -> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0} -> @kbd{@}'} -@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1 -@print{} mz == 0 -> 1, pz == 0 -> 1 -@end example - -It helps to keep this in mind should you process numeric data -that contains negative zero values; the fact that the zero is negative -is noted and can affect comparisons. - -@node POSIX Floating Point Problems -@subsubsection Standards Versus Existing Practice - -Historically, @command{awk} has converted any non-numeric looking string -to the numeric value zero, when required. Furthermore, the original -definition of the language and the original POSIX standards specified that -@command{awk} only understands decimal numbers (base 10), and not octal -(base 8) or hexadecimal numbers (base 16). - -Changes in the language of the -2001 and 2004 POSIX standards can be interpreted to imply that @command{awk} -should support additional features. These features are: - -@itemize @bullet -@item -Interpretation of floating point data values specified in hexadecimal -notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not} -source code constants.) - -@item -Support for the special IEEE 754 floating point values ``Not A Number'' -(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf''). -In particular, the format for these values is as specified by the ISO 1999 -C standard, which ignores case and can allow machine-dependent additional -characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}. -@end itemize - -The first problem is that both of these are clear changes to historical -practice: - -@itemize @bullet -@item -The @command{gawk} maintainer feels that supporting hexadecimal floating -point values, in particular, is ugly, and was never intended by the -original designers to be part of the language. - -@item -Allowing completely alphabetic strings to have valid numeric -values is also a very severe departure from historical practice. -@end itemize - -The second problem is that the @code{gawk} maintainer feels that this -interpretation of the standard, which requires a certain amount of -``language lawyering'' to arrive at in the first place, was not even -intended by the standard developers. In other words, ``we see how you -got where you are, but we don't think that that's where you want to be.'' - -Recognizing the above issues, but attempting to provide compatibility -with the earlier versions of the standard, -the 2008 POSIX standard added explicit wording to allow, but not require, -that @command{awk} support hexadecimal floating point values and -special values for ``Not A Number'' and infinity. - -Although the @command{gawk} maintainer continues to feel that -providing those features is inadvisable, -nevertheless, on systems that support IEEE floating point, it seems -reasonable to provide @emph{some} way to support NaN and Infinity values. -The solution implemented in @command{gawk} is as follows: - -@itemize @bullet -@item -With the @option{--posix} command-line option, @command{gawk} becomes -``hands off.'' String values are passed directly to the system library's -@code{strtod()} function, and if it successfully returns a numeric value, -that is what's used.@footnote{You asked for it, you got it.} -By definition, the results are not portable across -different systems. They are also a little surprising: - -@example -$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'} -@print{} nan -$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'} -@print{} 3735928559 -@end example - -@item -Without @option{--posix}, @command{gawk} interprets the four strings -@samp{+inf}, -@samp{-inf}, -@samp{+nan}, -and -@samp{-nan} -specially, producing the corresponding special numeric values. -The leading sign acts a signal to @command{gawk} (and the user) -that the value is really numeric. Hexadecimal floating point is -not supported (unless you also use @option{--non-decimal-data}, -which is @emph{not} recommended). For example: - -@example -$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'} -@print{} 0 -$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'} -@print{} nan -$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'} -@print{} 0 -@end example - -@command{gawk} does ignore case in the four special values. -Thus @samp{+nan} and @samp{+NaN} are the same. -@end itemize - -@node Integer Programming -@subsection Mixing Integers And Floating-point - -As has been mentioned already, @command{gawk} ordinarily uses hardware double -precision with 64-bit IEEE binary floating-point representation -for numbers on most systems. A large integer like 9007199254740997 -has a binary representation that, although finite, is more than 53 bits long; -it must also be rounded to 53 bits. -The biggest integer that can be stored in a C @code{double} is usually the same -as the largest possible value of a @code{double}. If your system @code{double} -is an IEEE 64-bit @code{double}, this largest possible value is an integer and -can be represented precisely. What more should one know about integers? - -If you want to know what is the largest integer, such that it and -all smaller integers can be stored in 64-bit doubles without losing precision, -then the answer is -@iftex -@math{2^{53}}. -@end iftex -@ifnottex -2^53. -@end ifnottex -The next representable number is the even number -@iftex -@math{2^{53} + 2}, -@end iftex -@ifnottex -2^53 + 2, -@end ifnottex -meaning it is unlikely that you will be able to make -@command{gawk} print -@iftex -@math{2^{53} + 1} -@end iftex -@ifnottex -2^53 + 1 -@end ifnottex -in integer format. -The range of integers exactly representable by a 64-bit double -is -@iftex -@math{[-2^{53}, 2^{53}]}. -@end iftex -@ifnottex -[@minus{}2^53, 2^53]. -@end ifnottex -If you ever see an integer outside this range in @command{gawk} -using 64-bit doubles, you have reason to be very suspicious about -the accuracy of the output. Here is a simple program with erroneous output: - -@example -$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'} -@print{} 9007199254740991 -@print{} 9007199254740992 -@print{} 9007199254740992 -@print{} 9007199254740994 -@end example - -The lesson is to not assume that any large integer printed by @command{gawk} -represents an exact result from your computation, especially if it wraps -around on your screen. - -@node Floating-point Programming -@section Understanding Floating-point Programming - -Numerical programming is an extensive area; if you need to develop -sophisticated numerical algorithms then @command{gawk} may not be -the ideal tool, and this documentation may not be sufficient. -@c FIXME: JOHN: Do you want to cite some actual books? -It might require digesting a book or two to really internalize how to compute -with ideal accuracy and precision -and the result often depends on the particular application. - -@quotation NOTE -A floating-point calculation's @dfn{accuracy} is how close it comes -to the real value. This is as opposed to the @dfn{precision}, which -usually refers to the number of bits used to represent the number -(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision, -the Wikipedia article} for more information). -@end quotation - -There are two options for doing floating-point calculations: -hardware floating-point (as used by standard @command{awk} and -the default for @command{gawk}), and @dfn{arbitrary-precision} -floating-point, which is software based. This @value{CHAPTER} -aims to provide enough information to understand both, and then -will focus on @command{gawk}'s facilities for the latter.@footnote{If you -are interested in other tools that perform arbitrary precision arithmetic, -you may want to investigate the POSIX @command{bc} tool. See -@uref{http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html, -the POSIX specification for it}, for more information.} - -Binary floating-point representations and arithmetic are inexact. -Simple values like 0.1 cannot be precisely represented using -binary floating-point numbers, and the limited precision of -floating-point numbers means that slight changes in -the order of operations or the precision of intermediate storage -can change the result. To make matters worse, with arbitrary precision -floating-point, you can set the precision before starting a computation, -but then you cannot be sure of the number of significant decimal places -in the final result. - -Sometimes, before you start to write any code, you should think more -about what you really want and what's really happening. Consider the -two numbers in the following example: - -@example -x = 0.875 # 1/2 + 1/4 + 1/8 -y = 0.425 -@end example - -Unlike the number in @code{y}, the number stored in @code{x} -is exactly representable -in binary since it can be written as a finite sum of one or -more fractions whose denominators are all powers of two. -When @command{gawk} reads a floating-point number from -program source, it automatically rounds that number to whatever -precision your machine supports. If you try to print the numeric -content of a variable using an output format string of @code{"%.17g"}, -it may not produce the same number as you assigned to it: - -@example -$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425} -> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'} -@print{} 0.875, 0.42499999999999999 -@end example - -Often the error is so small you do not even notice it, and if you do, -you can always specify how much precision you would like in your output. -Usually this is a format string like @code{"%.15g"}, which when -used in the previous example, produces an output identical to the input. - -Because the underlying representation can be little bit off from the exact value, -comparing floating-point values to see if they are equal is generally not a good idea. -Here is an example where it does not work like you expect: - -@example -$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} -@print{} 0 -@end example - -The loss of accuracy during a single computation with floating-point numbers -usually isn't enough to worry about. However, if you compute a value -which is the result of a sequence of floating point operations, -the error can accumulate and greatly affect the computation itself. -Here is an attempt to compute the value of the constant -@value{PI} using one of its many series representations: - -@example -BEGIN @{ - x = 1.0 / sqrt(3.0) - n = 6 - for (i = 1; i < 30; i++) @{ - n = n * 2.0 - x = (sqrt(x * x + 1) - 1) / x - printf("%.15f\n", n * x) - @} -@} -@end example - -When run, the early errors propagating through later computations -cause the loop to terminate prematurely after an attempt to divide by zero. - -@example -$ @kbd{gawk -f pi.awk} -@print{} 3.215390309173475 -@print{} 3.159659942097510 -@print{} 3.146086215131467 -@print{} 3.142714599645573 -@dots{} -@print{} 3.224515243534819 -@print{} 2.791117213058638 -@print{} 0.000000000000000 -@error{} gawk: pi.awk:6: fatal: division by zero attempted -@end example - -Here is one more example where the inaccuracies in internal representations -yield an unexpected result: - -@example -$ @kbd{gawk 'BEGIN @{} -> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} -> @kbd{i++} -> @kbd{print i} -> @kbd{@}'} -@print{} 4 -@end example - -Can computation using aribitrary precision help with the previous examples? -If you are impatient to know, see -@ref{Exact Arithmetic}. - -Instead of aribitrary precision floating-point arithmetic, -often all you need is an adjustment of your logic -or a different order for the operations in your calculation. -The stability and the accuracy of the computation of the constant @value{PI} -in the previous example can be enhanced by using the following -simple algebraic transformation: - -@example -(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1) -@end example - -@noindent -After making this, change the program does converge to -@value{PI} in under 30 iterations: - -@example -$ @kbd{gawk -f /tmp/pi2.awk} -@print{} 3.215390309173473 -@print{} 3.159659942097501 -@print{} 3.146086215131436 -@print{} 3.142714599645370 -@print{} 3.141873049979825 -@dots{} -@print{} 3.141592653589797 -@print{} 3.141592653589797 -@end example - -There is no need to be unduly suspicious about the results from -floating-point arithmetic. The lesson to remember is that -floating-point arithmetic is always more complex than the arithmetic using -pencil and paper. In order to take advantage of the power -of computer floating-point, you need to know its limitations -and work within them. For most casual use of floating-point arithmetic, -you will often get the expected result in the end if you simply round -the display of your final results to the correct number of significant -decimal digits. And, avoid presenting numerical data in a manner that -implies better precision than is actually the case. - -@menu -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. -@end menu - -@node Floating-point Representation -@subsection Binary Floating-point Representation -@cindex IEEE-754 format - -Although floating-point representations vary from machine to machine, -the most commonly encountered representation is that defined by the -IEEE 754 Standard. An IEEE-754 format value has three components: - -@itemize @bullet -@item -A sign bit telling whether the number is positive or negative. - -@item -An @dfn{exponent} giving its order of magnitude, @var{e}. - -@item -A @dfn{significand}, @var{s}, -specifying the actual digits of the number. -@end itemize - -The value of the -number is then -@iftex -@math{s @cdot 2^e}. -@end iftex -@ifnottex -@var{s * 2^e}. -@end ifnottex -The first bit of a non-zero binary significand -is always one, so the significand in an IEEE-754 format only includes the -fractional part, leaving the leading one implicit. - -Three of the standard IEEE-754 types are 32-bit single precision, -64-bit double precision and 128-bit quadruple precision. -The standard also specifies extended precision formats -to allow greater precisions and larger exponent ranges. - -The significand is stored in @dfn{normalized} format, -which means that the first bit is always a one. - -@node Floating-point Context -@subsection Floating-point Context -@cindex context, floating-point - -A floating-point @dfn{context} defines the environment for arithmetic operations. -It governs precision, sets rules for rounding, and limits the range for exponents. -The context has the following primary components: - -@table @dfn -@item Precision -Precision of the floating-point format in bits. -@item emax -Maximum exponent allowed for this format. -@item emin -Minimum exponent allowed for this format. -@item Underflow behavior -The format may or may not support gradual underflow. -@item Rounding -The rounding mode of this context. -@end table - -@ref{table-ieee-formats} lists the precision and exponent -field values for the basic IEEE-754 binary formats: - -@float Table,table-ieee-formats -@caption{Basic IEEE Format Context Values} -@multitable @columnfractions .20 .20 .20 .20 .20 -@headitem Name @tab Total bits @tab Precision @tab emin @tab emax -@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127 -@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023 -@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383 -@end multitable -@end float - -@quotation NOTE -The precision numbers include the implied leading one that gives them -one extra bit of significand. -@end quotation - -A floating-point context can also determine which signals are treated -as exceptions, and can set rules for arithmetic with special values. -Please consult the IEEE-754 standard or other resources for details. - -@command{gawk} ordinarily uses the hardware double precision -representation for numbers. On most systems, this is IEEE-754 -floating-point format, corresponding to 64-bit binary with 53 bits -of precision. - -@quotation NOTE -In case an underflow occurs, the standard allows, but does not require, -the result from an arithmetic operation to be a number smaller than -the smallest nonzero normalized number. Such numbers do -not have as many significant digits as normal numbers, and are called -@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero, -is called @dfn{flush to zero}. The basic IEEE-754 binary formats -support subnormal numbers. -@end quotation - -@node Rounding Mode -@subsection Floating-point Rounding Mode -@cindex rounding mode, floating-point - -The @dfn{rounding mode} specifies the behavior for the results of numerical -operations when discarding extra precision. Each rounding mode indicates -how the least significant returned digit of a rounded result is to -be calculated. -@ref{table-rounding-modes} lists the IEEE-754 defined -rounding modes: - -@float Table,table-rounding-modes -@caption{IEEE 754 Rounding Modes} -@multitable @columnfractions .45 .55 -@headitem Rounding Mode @tab IEEE Name -@item Round to nearest, ties to even @tab @code{roundTiesToEven} -@item Round toward plus Infinity @tab @code{roundTowardPositive} -@item Round toward negative Infinity @tab @code{roundTowardNegative} -@item Round toward zero @tab @code{roundTowardZero} -@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} -@end multitable -@end float - -The default mode @code{roundTiesToEven} is the most preferred, -but the least intuitive. This method does the obvious thing for most values, -by rounding them up or down to the nearest digit. -For example, rounding 1.132 to two digits yields 1.13, -and rounding 1.157 yields 1.16. - -However, when it comes to rounding a value that is exactly halfway between, -things do not work the way you probably learned in school. -In this case, the number is rounded to the nearest even digit. -So rounding 0.125 to two digits rounds down to 0.12, -but rounding 0.6875 to three digits rounds up to 0.688. -You probably have already encountered this rounding mode when -using the @code{printf} routine to format floating-point numbers. -For example: - -@example -BEGIN @{ - x = -4.5 - for (i = 1; i < 10; i++) @{ - x += 1.0 - printf("%4.1f => %2.0f\n", x, x) - @} -@} -@end example - -@noindent -produces the following output when run:@footnote{It -is possible for the output to be completely different if the -C library in your system does not use the IEEE-754 even-rounding -rule to round halfway cases for @code{printf()}.} - -@example --3.5 => -4 --2.5 => -2 --1.5 => -2 --0.5 => 0 - 0.5 => 0 - 1.5 => 2 - 2.5 => 2 - 3.5 => 4 - 4.5 => 4 -@end example - -The theory behind the rounding mode @code{roundTiesToEven} is that -it more or less evenly distributes upward and downward rounds -of exact halves, which might cause the round-off error -to cancel itself out. This is the default rounding mode used -in IEEE-754 computing functions and operators. - -The other rounding modes are rarely used. -Round toward positive infinity (@code{roundTowardPositive}) -and round toward negative infinity (@code{roundTowardNegative}) -are often used to implement interval arithmetic, -where you adjust the rounding mode to calculate upper and lower bounds -for the range of output. The @code{roundTowardZero} -mode can be used for converting floating-point numbers to integers. -The rounding mode @code{roundTiesToAway} rounds the result to the -nearest number and selects the number with the larger magnitude -if a tie occurs. - -Some numerical analysts will tell you that your choice of rounding style -has tremendous impact on the final outcome, and advise you to wait until -final output for any rounding. Instead, you can often avoid round-off error problems by -setting the precision initially to some value sufficiently larger than -the final desired precision, so that the accumulation of round-off error -does not influence the outcome. -If you suspect that results from your computation are -sensitive to accumulation of round-off error, -one way to be sure is to look for a significant difference in output -when you change the rounding mode. - -@node Gawk and MPFR -@section @command{gawk} + MPFR = Powerful Arithmetic - -The rest of this @value{CHAPTER} decsribes how to use the arbitrary precision -(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric -capabilites in @command{gawk} to produce maximally accurate results -when you need it. - -But first you should check if your version of -@command{gawk} supports arbitrary precision arithmetic. -The easiest way to find out is to look at the output of -the following command: - -@example -$ @kbd{gawk --version} -@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) -@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation. -@dots{} -@end example - -@command{gawk} uses the -@uref{http://www.mpfr.org, GNU MPFR} -and -@uref{http://gmplib.org, GNU MP} (GMP) -libraries for arbitrary precision -arithmetic on numbers. So if you do not see the names of these libraries -in the output, then your version of @command{gawk} does not support -arbitrary precision arithmetic. - -Additionally, -there are a few elements available in the @code{PROCINFO} array -to provide information about the MPFR and GMP libraries. -@xref{Auto-set}, for more information. - -@ignore -Even if you aren't interested in arbitrary precision arithmetic, you -may still benefit from knowing about how @command{gawk} handles numbers -in general, and the limitations of doing arithmetic with ordinary -@command{gawk} numbers. -@end ignore - - -@node Arbitrary Precision Floats -@section Arbitrary Precision Floating-point Arithmetic with @command{gawk} - -@command{gawk} uses the GNU MPFR library -for arbitrary precision floating-point arithmetic. The MPFR library -provides precise control over precisions and rounding modes, and gives -correctly rounded reproducible platform-independent results. With the -command-line option @option{--bignum} or @option{-M}, -all floating-point arithmetic operators and numeric functions can yield -results to any desired precision level supported by MPFR. -Two built-in -variables @code{PREC} -(@pxref{Setting Precision}) -and @code{ROUNDMODE} -(@pxref{Setting Rounding Mode}) -provide control over the working precision and the rounding mode. -The precision and the rounding mode are set globally for every operation -to follow. - -The default working precision for arbitrary precision floating-point values is 53, -and the default value for @code{ROUNDMODE} is @code{"N"}, -which selects the IEEE-754 -@code{roundTiesToEven} (@pxref{Rounding Mode}) rounding mode.@footnote{The -default precision is 53, since according to the MPFR documentation, -the library should be able to exactly reproduce all computations with -double-precision machine floating-point numbers (@code{double} type -in C), except the default exponent range is much wider and subnormal -numbers are not implemented.} -@command{gawk} uses the default exponent range in MPFR -@iftex -(@math{emax = 2^{30} - 1, emin = -emax}) -@end iftex -@ifnottex -(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax}) -@end ifnottex -for all floating-point contexts. -There is no explicit mechanism to adjust the exponent range. -MPFR does not implement subnormal numbers by default, -and this behavior cannot be changed in @command{gawk}. - -@quotation NOTE -When emulating an IEEE-754 format (@pxref{Setting Precision}), -@command{gawk} internally adjusts the exponent range -to the value defined for the format and also performs computations needed for -gradual underflow (subnormal numbers). -@end quotation - -@quotation NOTE -MPFR numbers are variable-size entities, consuming only as much space as -needed to store the significant digits. Since the performance using MPFR -numbers pales in comparison to doing arithmetic using the underlying machine -types, you should consider using only as much precision as needed by -your program. -@end quotation - -@menu -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point numbers. -@end menu - -@node Setting Precision -@subsection Setting the Working Precision -@cindex @code{PREC} variable - -@command{gawk} uses a global working precision; it does not keep track of -the precision or accuracy of individual numbers. Performing an arithmetic -operation or calling a built-in function rounds the result to the current -working precision. The default working precision is 53 which can be -modified using the built-in variable @code{PREC}. You can also set the -value to one of the following pre-defined case-insensitive strings -to emulate an IEEE-754 binary format: - -@multitable {@code{"double"}} {12345678901234567890123456789012345} -@headitem @code{PREC} @tab IEEE-754 Binary Format -@item @code{"half"} @tab 16-bit half-precision. -@item @code{"single"} @tab Basic 32-bit single precision. -@item @code{"double"} @tab Basic 64-bit double precision. -@item @code{"quad"} @tab Basic 128-bit quadruple precision. -@item @code{"oct"} @tab 256-bit octuple precision. -@end multitable - -The following example illustrates the effects of changing precision -on arithmetic operations: - -@example -$ @kbd{gawk -M -vPREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \} -> @kbd{PREC = "double"; print x + 0 @}'} -@print{} 1e-400 -@print{} 0 -@end example - -Binary and decimal precisions are related approximately according to the -formula: - -@iftex -@math{prec = 3.322 @cdot dps} -@end iftex -@ifnottex -@var{prec} = 3.322 * @var{dps} -@end ifnottex - -@noindent -Here, @var{prec} denotes the binary precision -(measured in bits) and @var{dps} (short for decimal places) -is the decimal digits. We can easily calculate how many decimal -digits the 53-bit significand of an IEEE double is equivalent to: -53 / 3.332 which is equal to about 15.95. -But what does 15.95 digits actually mean? It depends whether you are -concerned about how many digits you can rely on, or how many digits -you need. - -It is important to know how many bits it takes to uniquely identify -a double-precision value (the C type @code{double}). If you want to -convert from @code{double} to decimal and back to @code{double} (e.g., -saving a @code{double} representing an intermediate result to a file, and -later reading it back to restart the computation), then a few more decimal -digits are required. 17 digits is generally enough for a @code{double}. - -It can also be important to know what decimal numbers can be uniquely -represented with a @code{double}. If you want to convert -from decimal to @code{double} and back again, 15 digits is the most that -you can get. Stated differently, you should not present -the numbers from your floating-point computations with more than 15 -significant digits in them. - -Conversely, it takes a precision of 332 bits to hold an approximation -of the constant @value{PI} that is accurate to 100 decimal places. -You should always add some extra bits in order to avoid the confusing round-off -issues that occur because numbers are stored internally in binary. - -@node Setting Rounding Mode -@subsection Setting the Rounding Mode -@cindex @code{ROUNDMODE} variable - -The @code{ROUNDMODE} variable provides -program level control over the rounding mode. -The correspondance between @code{ROUNDMODE} and the IEEE -rounding modes is shown in @ref{table-gawk-rounding-modes}. - -@float Table,table-gawk-rounding-modes -@caption{@command{gawk} Rounding Modes} -@multitable @columnfractions .45 .30 .25 -@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE} -@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"} -@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"} -@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"} -@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"} -@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"} -@end multitable -@end float - -@code{ROUNDMODE} has the default value @code{"N"}, -which selects the IEEE-754 rounding mode @code{roundTiesToEven}. -Besides the values listed in @ref{table-gawk-rounding-modes}, -@command{gawk} also accepts @code{"A"} to select the IEEE-754 mode -@code{roundTiesToAway} -if your version of the MPFR library supports it; otherwise setting -@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode}, -for the meanings of the various rounding modes. - -Here is an example of how to change the default rounding behavior of -@code{printf}'s output: - -@example -$ @kbd{gawk -M -vROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'} -@print{} 1.37 -@end example - -@node Floating-point Constants -@subsection Representing Floating-point Constants -@cindex constants, floating-point - -Be wary of floating-point constants! When reading a floating-point constant -from program source code, @command{gawk} uses the default precision, -unless overridden -by an assignment to the special variable @code{PREC} on the command -line, to store it internally as a MPFR number. -Changing the precision using @code{PREC} in the program text does -not change the precision of a constant. If you need to -represent a floating-point constant at a higher precision than the -default and cannot use a command line assignment to @code{PREC}, -you should either specify the constant as a string, or -as a rational number whenever possible. The following example -illustrates the differences among various ways to -print a floating-point constant: - -@example -$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'} -@print{} 0.1000000000000000055511151 -$ @kbd{gawk -M -vPREC = 113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'} -@print{} 0.1000000000000000000000000 -$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'} -@print{} 0.1000000000000000000000000 -$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'} -@print{} 0.1000000000000000000000000 -@end example - -In the first case, the number is stored with the default precision of 53. - -@node Changing Precision -@subsection Changing the Precision of a Number - -@cindex Laurie, Dirk -@quotation -@i{The point is that in any variable-precision package, -a decision is made on how to treat numbers given as data, -or arising in intermediate results, which are represented in -floating-point format to a precision lower than working precision. -Do we promote them to full membership of the high-precision club, -or do we treat them and all their associates as second-class citizens? -Sometimes the first course is proper, sometimes the second, and it takes -careful analysis to tell which.} - -Dirk Laurie@footnote{Dirk Laurie. -@cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}. -Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.} -@end quotation - -@command{gawk} does not implicitly modify the precision of any previously -computed results when the working precision is changed with an assignment -to @code{PREC}. The precision of a number is always the one that was -used at the time of its creation, and there is no way for the user -to explicitly change it afterwards. However, since the result of a -floating-point arithmetic operation is always an arbitrary precision -floating-point value---with a precision set by the value of @code{PREC}---one of the -following workarounds effectively accomplishes the desired behavior: - -@example -x = x + 0.0 -@end example - -@noindent -or: - -@example -x += 0.0 -@end example - -@node Exact Arithmetic -@subsection Exact Arithmetic with Floating-point Numbers - -@quotation CAUTION -Never depend on the exactness of floating-point arithmetic, -even for apparently simple expressions! -@end quotation - -Can arbitrary precision arithmetic give exact results? There are -no easy answers. The standard rules of algebra often do not apply -when using floating-point arithmetic. -Among other things, the distributive and associative laws -do not hold completely, and order of operation may be important -for your computation. Rounding error, cumulative precision loss -and underflow are often troublesome. - -When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3} -for equality -using the machine double precision arithmetic, it decides that they -are not equal! -(@xref{Floating-point Programming}.) -You can get the result you want by increasing the precision; -56 in this case will get the job done: - -@example -$ @kbd{gawk -M -vPREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} -@print{} 1 -@end example - -If adding more bits is good, perhaps adding even more bits of -precision is better? -Here is what happens if we use an even larger value of @code{PREC}: - -@example -$ @kbd{gawk -M -vPREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} -@print{} 0 -@end example - -This is not a bug in @command{gawk} or in the MPFR library. -It is easy to forget that the finite number of bits used to store the value -is often just an approximation after proper rounding. -The test for equality succeeds if and only if @emph{all} bits in the two operands -are exactly the same. Since this is not necessarily true after floating-point -computations with a particular precision and effective rounding rule, -a straight test for equality may not work. - -So, don't assume that floating-point values can be compared for equality. -You should also exercise caution when using other forms of comparisons. -The standard way to compare between floating-point numbers is to determine -how much error (or @dfn{tolerance}) you will allow in a comparison and -check to see if one value is within this error range of the other. - -In applications where 15 or fewer decimal places suffice, -hardware double precision arithmetic can be adequate, and is usually much faster. -But you do need to keep in mind that every floating-point operation -can suffer a new rounding error with catastrophic consequences as illustrated -by our attempt to compute the value of the constant @value{PI} -(@pxref{Floating-point Programming}). -Extra precision can greatly enhance the stability and the accuracy -of your computation in such cases. - -Repeated addition is not necessarily equivalent to multiplication -in floating-point arithmetic. In the example in -@ref{Floating-point Programming}: - -@example -$ @kbd{gawk 'BEGIN @{} -> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} -> @kbd{i++} -> @kbd{print i} -> @kbd{@}'} -@print{} 4 -@end example - -@noindent -you may or may not succeed in getting the correct result by choosing -an arbitrarily large value for @code{PREC}. Reformulation of -the problem at hand is often the correct approach in such situations. - -@node Arbitrary Precision Integers -@section Arbitrary Precision Integer Arithmetic with @command{gawk} -@cindex integer, arbitrary precision - -If the option @option{--bignum} or @option{-M} is specified, -@command{gawk} performs all -integer arithmetic using GMP arbitrary precision integers. -Any number that looks like an integer in a program source or data file -is stored as an arbitrary precision integer. -The size of the integer is limited only by your computer's memory. -The current floating-point context has no effect on operations involving integers. -For example, the following computes -@iftex -@math{5^{4^{3^{2}}}}, -@end iftex -@ifnottex -5^4^3^2, -@end ifnottex -the result of which is beyond the -limits of ordinary @command{gawk} numbers: - -@example -$ @kbd{gawk -M 'BEGIN @{} -> @kbd{x = 5^4^3^2} -> @kbd{print "# of digits =", length(x)} -> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)} -> @kbd{@}'} -@print{} # of digits = 183231 -@print{} 62060698786608744707 ... 92256259918212890625 -@end example - -If you were to compute the same value using arbitrary precision -floating-point values instead, the precision needed for correct output -(using the formula -@iftex -@math{prec = 3.322 @cdot dps}), -would be @math{3.322 @cdot 183231}, -@end iftex -@ifnottex -@samp{prec = 3.322 * dps}), -would be 3.322 x 183231, -@end ifnottex -or 608693. -(Thus, the floating-point representation requires over 30 times as -many decimal digits!) - -The result from an arithmetic operation with an integer and a floating-point value -is a floating-point value with a precision equal to the working precision. -The following program calculates the eighth term in -Sylvester's sequence@footnote{Weisstein, Eric W. -@cite{Sylvester's Sequence}. From MathWorld---A Wolfram Web Resource. -@url{http://mathworld.wolfram.com/SylvestersSequence.html}} -using a recurrence: - -@example -$ @kbd{gawk -M 'BEGIN @{} -> @kbd{s = 2.0} -> @kbd{for (i = 1; i <= 7; i++)} -> @kbd{s = s * (s - 1) + 1} -> @kbd{print s} -> @kbd{@}'} -@print{} 113423713055421845118910464 -@end example - -The output differs from the acutal number, 113423713055421844361000443, -because the default precision of 53 is not enough to represent the -floating-point results exactly. You can either increase the precision -(100 is enough in this case), or replace the floating-point constant -@samp{2.0} with an integer, to perform all computations using integer -arithmetic to get the correct output. - -It will sometimes be necessary for @command{gawk} to implicitly convert an -arbitrary precision integer into an arbitrary precision floating-point value. -This is primarily because the MPFR library does not always provide the -relevant interface to process arbitrary precision integers or mixed-mode -numbers as needed by an operation or function. -In such a case, the precision is set to the minimum value necessary -for exact conversion, and the working precision is not used for this purpose. -If this is not what you need or want, you can employ a subterfuge -like this: - -@example -gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}' -@end example - -You can avoid this issue altogether by specifying the number as a floating-point value -to begin with: - -@example -gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}' -@end example - -Note that for the particular example above, there is likely best -to just use the following: - -@example -gawk -M 'BEGIN @{ n = 13; print n % 2 @}' -@end example - @node Advanced Features @chapter Advanced Features of @command{gawk} @cindex advanced features, network connections, See Also networks, connections @@ -27939,6 +26723,1229 @@ The @command{gawk} debugger only accepts source supplied with the @option{-f} op Look forward to a future release when these and other missing features may be added, and of course feel free to try to add them yourself! +@node Arbitrary Precision Arithmetic +@chapter Arithmetic and Arbitrary Precision Arithmetic with @command{gawk} +@cindex arbitrary precision +@cindex multiple precision +@cindex infinite precision +@cindex floating-point numbers, arbitrary precision +@cindex MPFR +@cindex GMP + +@cindex Knuth, Donald +@quotation +@i{There's a credibility gap: We don't know how much of the computer's answers +to believe. Novice computer users solve this problem by implicitly trusting +in the computer as an infallible authority; they tend to believe that all +digits of a printed answer are significant. Disillusioned computer users have +just the opposite approach; they are constantly afraid that their answers +are almost meaningless.}@* +Donald Knuth@footnote{Donald E.@: Knuth. +@cite{The Art of Computer Programming}. Volume 2, +@cite{Seminumerical Algorithms}, third edition, +1998, ISBN 0-201-89683-4, p.@: 229.} +@end quotation + +This @value{CHAPTER} discusses issues that you may encounter +when performing arithmetic. It begins by discussing some of +the general atributes of computer arithmetic, along with how +this can influence what you see when running @command{awk} programs. +This discussion applies to all versions of @command{awk}. + +Then the discussion moves on to @dfn{arbitrary precsion +arithmetic}, a feature which is specific to @command{gawk}. + +@menu +* General Arithmetic:: An introduction to computer arithmetic. +* Floating-point Programming:: Effective Floating-point Programming. +* Gawk and MPFR:: How @command{gawk} provides + aribitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic + with @command{gawk}. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + @command{gawk}. +@end menu + +@node General Arithmetic +@section A General Description of Computer Arithmetic + +@cindex integers +@cindex floating-point, numbers +@cindex numbers, floating-point +Within computers, there are two kinds of numeric values: @dfn{integers} +and @dfn{floating-point}. +In school, integer values were referred to as ``whole'' numbers---that is, +numbers without any fractional part, such as 1, 42, or @minus{}17. +The advantage to integer numbers is that they represent values exactly. +The disadvantage is that their range is limited. On most systems, +this range is @minus{}2,147,483,648 to 2,147,483,647. +However, many systems now support a range from +@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. + +@cindex unsigned integers +@cindex integers, unsigned +Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}. +Signed values may be negative or positive, with the range of values just +described. +Unsigned values are always positive. On most systems, +the range is from 0 to 4,294,967,295. +However, many systems now support a range from +0 to 18,446,744,073,709,551,615. + +@cindex double precision floating-point +@cindex single precision floating-point +Floating-point numbers represent what are called ``real'' numbers; i.e., +those that do have a fractional part, such as 3.1415927. +The advantage to floating-point numbers is that they +can represent a much larger range of values. +The disadvantage is that there are numbers that they cannot represent +exactly. +@command{awk} uses @dfn{double precision} floating-point numbers, which +can hold more digits than @dfn{single precision} +floating-point numbers. +@c Floating-point issues are discussed more fully in +@c @ref{Floating Point Issues}. + +There a several important issues to be aware of, described next. + +@menu +* Floating Point Issues:: Stuff to know about floating-point numbers. +* Integer Programming:: Effective integer programming. +@end menu + +@node Floating Point Issues +@subsection Floating-Point Number Caveats + +As mentioned earlier, floating-point numbers represent what are called +``real'' numbers, i.e., those that have a fractional part. @command{awk} +uses double precision floating-point numbers to represent all +numeric values. This @value{SECTION} describes some of the issues +involved in using floating-point numbers. + +There is a very nice +@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic} +by David Goldberg, +``What Every Computer Scientist Should Know About Floating-point Arithmetic,'' +@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48. +This is worth reading if you are interested in the details, +but it does require a background in computer science. + +@menu +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not Abstract + Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. +@end menu + +@node String Conversion Precision +@subsubsection The String Value Can Lie + +Internally, @command{awk} keeps both the numeric value +(double precision floating-point) and the string value for a variable. +Separately, @command{awk} keeps +track of what type the variable has +(@pxref{Typing and Comparison}), +which plays a role in how variables are used in comparisons. + +It is important to note that the string value for a number may not +reflect the full value (all the digits) that the numeric value +actually contains. +The following program (@file{values.awk}) illustrates this: + +@example +@{ + sum = $1 + $2 + # see it for what it is + printf("sum = %.12g\n", sum) + # use CONVFMT + a = "<" sum ">" + print "a =", a + # use OFMT + print "sum =", sum +@} +@end example + +@noindent +This program shows the full value of the sum of @code{$1} and @code{$2} +using @code{printf}, and then prints the string values obtained +from both automatic conversion (via @code{CONVFMT}) and +from printing (via @code{OFMT}). + +Here is what happens when the program is run: + +@example +$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk} +@print{} sum = 4.8888888 +@print{} a = <4.88889> +@print{} sum = 4.88889 +@end example + +This makes it clear that the full numeric value is different from +what the default string representations show. + +@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with +at least six significant digits. For some applications, you might want to +change it to specify more precision. +On most modern machines, most of the time, +17 digits is enough to capture a floating-point number's +value exactly.@footnote{Pathological cases can require up to +752 digits (!), but we doubt that you need to worry about this.} + +@node Unexpected Results +@subsubsection Floating Point Numbers Are Not Abstract Numbers + +@cindex floating-point, numbers +Unlike numbers in the abstract sense (such as what you studied in high school +or college arithmetic), numbers stored in computers are limited in certain ways. +They cannot represent an infinite number of digits, nor can they always +represent things exactly. +In particular, +floating-point numbers cannot +always represent values exactly. Here is an example: + +@example +$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'} +515.79 +@print{} 0000051579 +515.80 +@print{} 0000051579 +515.81 +@print{} 0000051580 +515.82 +@print{} 0000051582 +@kbd{@value{CTL}-d} +@end example + +@noindent +This shows that some values can be represented exactly, +whereas others are only approximated. This is not a ``bug'' +in @command{awk}, but simply an artifact of how computers +represent numbers. + +@quotation NOTE +It cannot be emphasized enough that the behavior just +described is fundamental to modern computers. You will +see this kind of thing happen in @emph{any} programming +language using hardware floating-point numbers. It is @emph{not} +a bug in @command{gawk}, nor is it something that can be ``just +fixed.'' +@end quotation + +@cindex negative zero +@cindex positive zero +@cindex zero@comma{} negative vs.@: positive +Another peculiarity of floating-point numbers on modern systems +is that they often have more than one representation for the number zero! +In particular, it is possible to represent ``minus zero'' as well as +regular, or ``positive'' zero. + +This example shows that negative and positive zero are distinct values +when stored internally, but that they are in fact equal to each other, +as well as to ``regular'' zero: + +@example +$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0} +> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz} +> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0} +> @kbd{@}'} +@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1 +@print{} mz == 0 -> 1, pz == 0 -> 1 +@end example + +It helps to keep this in mind should you process numeric data +that contains negative zero values; the fact that the zero is negative +is noted and can affect comparisons. + +@node POSIX Floating Point Problems +@subsubsection Standards Versus Existing Practice + +Historically, @command{awk} has converted any non-numeric looking string +to the numeric value zero, when required. Furthermore, the original +definition of the language and the original POSIX standards specified that +@command{awk} only understands decimal numbers (base 10), and not octal +(base 8) or hexadecimal numbers (base 16). + +Changes in the language of the +2001 and 2004 POSIX standards can be interpreted to imply that @command{awk} +should support additional features. These features are: + +@itemize @bullet +@item +Interpretation of floating point data values specified in hexadecimal +notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not} +source code constants.) + +@item +Support for the special IEEE 754 floating point values ``Not A Number'' +(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf''). +In particular, the format for these values is as specified by the ISO 1999 +C standard, which ignores case and can allow machine-dependent additional +characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}. +@end itemize + +The first problem is that both of these are clear changes to historical +practice: + +@itemize @bullet +@item +The @command{gawk} maintainer feels that supporting hexadecimal floating +point values, in particular, is ugly, and was never intended by the +original designers to be part of the language. + +@item +Allowing completely alphabetic strings to have valid numeric +values is also a very severe departure from historical practice. +@end itemize + +The second problem is that the @code{gawk} maintainer feels that this +interpretation of the standard, which requires a certain amount of +``language lawyering'' to arrive at in the first place, was not even +intended by the standard developers. In other words, ``we see how you +got where you are, but we don't think that that's where you want to be.'' + +Recognizing the above issues, but attempting to provide compatibility +with the earlier versions of the standard, +the 2008 POSIX standard added explicit wording to allow, but not require, +that @command{awk} support hexadecimal floating point values and +special values for ``Not A Number'' and infinity. + +Although the @command{gawk} maintainer continues to feel that +providing those features is inadvisable, +nevertheless, on systems that support IEEE floating point, it seems +reasonable to provide @emph{some} way to support NaN and Infinity values. +The solution implemented in @command{gawk} is as follows: + +@itemize @bullet +@item +With the @option{--posix} command-line option, @command{gawk} becomes +``hands off.'' String values are passed directly to the system library's +@code{strtod()} function, and if it successfully returns a numeric value, +that is what's used.@footnote{You asked for it, you got it.} +By definition, the results are not portable across +different systems. They are also a little surprising: + +@example +$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'} +@print{} nan +$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'} +@print{} 3735928559 +@end example + +@item +Without @option{--posix}, @command{gawk} interprets the four strings +@samp{+inf}, +@samp{-inf}, +@samp{+nan}, +and +@samp{-nan} +specially, producing the corresponding special numeric values. +The leading sign acts a signal to @command{gawk} (and the user) +that the value is really numeric. Hexadecimal floating point is +not supported (unless you also use @option{--non-decimal-data}, +which is @emph{not} recommended). For example: + +@example +$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'} +@print{} 0 +$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'} +@print{} nan +$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'} +@print{} 0 +@end example + +@command{gawk} does ignore case in the four special values. +Thus @samp{+nan} and @samp{+NaN} are the same. +@end itemize + +@node Integer Programming +@subsection Mixing Integers And Floating-point + +As has been mentioned already, @command{gawk} ordinarily uses hardware double +precision with 64-bit IEEE binary floating-point representation +for numbers on most systems. A large integer like 9007199254740997 +has a binary representation that, although finite, is more than 53 bits long; +it must also be rounded to 53 bits. +The biggest integer that can be stored in a C @code{double} is usually the same +as the largest possible value of a @code{double}. If your system @code{double} +is an IEEE 64-bit @code{double}, this largest possible value is an integer and +can be represented precisely. What more should one know about integers? + +If you want to know what is the largest integer, such that it and +all smaller integers can be stored in 64-bit doubles without losing precision, +then the answer is +@iftex +@math{2^{53}}. +@end iftex +@ifnottex +2^53. +@end ifnottex +The next representable number is the even number +@iftex +@math{2^{53} + 2}, +@end iftex +@ifnottex +2^53 + 2, +@end ifnottex +meaning it is unlikely that you will be able to make +@command{gawk} print +@iftex +@math{2^{53} + 1} +@end iftex +@ifnottex +2^53 + 1 +@end ifnottex +in integer format. +The range of integers exactly representable by a 64-bit double +is +@iftex +@math{[-2^{53}, 2^{53}]}. +@end iftex +@ifnottex +[@minus{}2^53, 2^53]. +@end ifnottex +If you ever see an integer outside this range in @command{gawk} +using 64-bit doubles, you have reason to be very suspicious about +the accuracy of the output. Here is a simple program with erroneous output: + +@example +$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'} +@print{} 9007199254740991 +@print{} 9007199254740992 +@print{} 9007199254740992 +@print{} 9007199254740994 +@end example + +The lesson is to not assume that any large integer printed by @command{gawk} +represents an exact result from your computation, especially if it wraps +around on your screen. + +@node Floating-point Programming +@section Understanding Floating-point Programming + +Numerical programming is an extensive area; if you need to develop +sophisticated numerical algorithms then @command{gawk} may not be +the ideal tool, and this documentation may not be sufficient. +@c FIXME: JOHN: Do you want to cite some actual books? +It might require digesting a book or two to really internalize how to compute +with ideal accuracy and precision +and the result often depends on the particular application. + +@quotation NOTE +A floating-point calculation's @dfn{accuracy} is how close it comes +to the real value. This is as opposed to the @dfn{precision}, which +usually refers to the number of bits used to represent the number +(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision, +the Wikipedia article} for more information). +@end quotation + +There are two options for doing floating-point calculations: +hardware floating-point (as used by standard @command{awk} and +the default for @command{gawk}), and @dfn{arbitrary-precision} +floating-point, which is software based. This @value{CHAPTER} +aims to provide enough information to understand both, and then +will focus on @command{gawk}'s facilities for the latter.@footnote{If you +are interested in other tools that perform arbitrary precision arithmetic, +you may want to investigate the POSIX @command{bc} tool. See +@uref{http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html, +the POSIX specification for it}, for more information.} + +Binary floating-point representations and arithmetic are inexact. +Simple values like 0.1 cannot be precisely represented using +binary floating-point numbers, and the limited precision of +floating-point numbers means that slight changes in +the order of operations or the precision of intermediate storage +can change the result. To make matters worse, with arbitrary precision +floating-point, you can set the precision before starting a computation, +but then you cannot be sure of the number of significant decimal places +in the final result. + +Sometimes, before you start to write any code, you should think more +about what you really want and what's really happening. Consider the +two numbers in the following example: + +@example +x = 0.875 # 1/2 + 1/4 + 1/8 +y = 0.425 +@end example + +Unlike the number in @code{y}, the number stored in @code{x} +is exactly representable +in binary since it can be written as a finite sum of one or +more fractions whose denominators are all powers of two. +When @command{gawk} reads a floating-point number from +program source, it automatically rounds that number to whatever +precision your machine supports. If you try to print the numeric +content of a variable using an output format string of @code{"%.17g"}, +it may not produce the same number as you assigned to it: + +@example +$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425} +> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'} +@print{} 0.875, 0.42499999999999999 +@end example + +Often the error is so small you do not even notice it, and if you do, +you can always specify how much precision you would like in your output. +Usually this is a format string like @code{"%.15g"}, which when +used in the previous example, produces an output identical to the input. + +Because the underlying representation can be little bit off from the exact value, +comparing floating-point values to see if they are equal is generally not a good idea. +Here is an example where it does not work like you expect: + +@example +$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +The loss of accuracy during a single computation with floating-point numbers +usually isn't enough to worry about. However, if you compute a value +which is the result of a sequence of floating point operations, +the error can accumulate and greatly affect the computation itself. +Here is an attempt to compute the value of the constant +@value{PI} using one of its many series representations: + +@example +BEGIN @{ + x = 1.0 / sqrt(3.0) + n = 6 + for (i = 1; i < 30; i++) @{ + n = n * 2.0 + x = (sqrt(x * x + 1) - 1) / x + printf("%.15f\n", n * x) + @} +@} +@end example + +When run, the early errors propagating through later computations +cause the loop to terminate prematurely after an attempt to divide by zero. + +@example +$ @kbd{gawk -f pi.awk} +@print{} 3.215390309173475 +@print{} 3.159659942097510 +@print{} 3.146086215131467 +@print{} 3.142714599645573 +@dots{} +@print{} 3.224515243534819 +@print{} 2.791117213058638 +@print{} 0.000000000000000 +@error{} gawk: pi.awk:6: fatal: division by zero attempted +@end example + +Here is one more example where the inaccuracies in internal representations +yield an unexpected result: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{i++} +> @kbd{print i} +> @kbd{@}'} +@print{} 4 +@end example + +Can computation using aribitrary precision help with the previous examples? +If you are impatient to know, see +@ref{Exact Arithmetic}. + +Instead of aribitrary precision floating-point arithmetic, +often all you need is an adjustment of your logic +or a different order for the operations in your calculation. +The stability and the accuracy of the computation of the constant @value{PI} +in the previous example can be enhanced by using the following +simple algebraic transformation: + +@example +(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1) +@end example + +@noindent +After making this, change the program does converge to +@value{PI} in under 30 iterations: + +@example +$ @kbd{gawk -f /tmp/pi2.awk} +@print{} 3.215390309173473 +@print{} 3.159659942097501 +@print{} 3.146086215131436 +@print{} 3.142714599645370 +@print{} 3.141873049979825 +@dots{} +@print{} 3.141592653589797 +@print{} 3.141592653589797 +@end example + +There is no need to be unduly suspicious about the results from +floating-point arithmetic. The lesson to remember is that +floating-point arithmetic is always more complex than the arithmetic using +pencil and paper. In order to take advantage of the power +of computer floating-point, you need to know its limitations +and work within them. For most casual use of floating-point arithmetic, +you will often get the expected result in the end if you simply round +the display of your final results to the correct number of significant +decimal digits. And, avoid presenting numerical data in a manner that +implies better precision than is actually the case. + +@menu +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +@end menu + +@node Floating-point Representation +@subsection Binary Floating-point Representation +@cindex IEEE-754 format + +Although floating-point representations vary from machine to machine, +the most commonly encountered representation is that defined by the +IEEE 754 Standard. An IEEE-754 format value has three components: + +@itemize @bullet +@item +A sign bit telling whether the number is positive or negative. + +@item +An @dfn{exponent} giving its order of magnitude, @var{e}. + +@item +A @dfn{significand}, @var{s}, +specifying the actual digits of the number. +@end itemize + +The value of the +number is then +@iftex +@math{s @cdot 2^e}. +@end iftex +@ifnottex +@var{s * 2^e}. +@end ifnottex +The first bit of a non-zero binary significand +is always one, so the significand in an IEEE-754 format only includes the +fractional part, leaving the leading one implicit. + +Three of the standard IEEE-754 types are 32-bit single precision, +64-bit double precision and 128-bit quadruple precision. +The standard also specifies extended precision formats +to allow greater precisions and larger exponent ranges. + +The significand is stored in @dfn{normalized} format, +which means that the first bit is always a one. + +@node Floating-point Context +@subsection Floating-point Context +@cindex context, floating-point + +A floating-point @dfn{context} defines the environment for arithmetic operations. +It governs precision, sets rules for rounding, and limits the range for exponents. +The context has the following primary components: + +@table @dfn +@item Precision +Precision of the floating-point format in bits. +@item emax +Maximum exponent allowed for this format. +@item emin +Minimum exponent allowed for this format. +@item Underflow behavior +The format may or may not support gradual underflow. +@item Rounding +The rounding mode of this context. +@end table + +@ref{table-ieee-formats} lists the precision and exponent +field values for the basic IEEE-754 binary formats: + +@float Table,table-ieee-formats +@caption{Basic IEEE Format Context Values} +@multitable @columnfractions .20 .20 .20 .20 .20 +@headitem Name @tab Total bits @tab Precision @tab emin @tab emax +@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127 +@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023 +@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383 +@end multitable +@end float + +@quotation NOTE +The precision numbers include the implied leading one that gives them +one extra bit of significand. +@end quotation + +A floating-point context can also determine which signals are treated +as exceptions, and can set rules for arithmetic with special values. +Please consult the IEEE-754 standard or other resources for details. + +@command{gawk} ordinarily uses the hardware double precision +representation for numbers. On most systems, this is IEEE-754 +floating-point format, corresponding to 64-bit binary with 53 bits +of precision. + +@quotation NOTE +In case an underflow occurs, the standard allows, but does not require, +the result from an arithmetic operation to be a number smaller than +the smallest nonzero normalized number. Such numbers do +not have as many significant digits as normal numbers, and are called +@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero, +is called @dfn{flush to zero}. The basic IEEE-754 binary formats +support subnormal numbers. +@end quotation + +@node Rounding Mode +@subsection Floating-point Rounding Mode +@cindex rounding mode, floating-point + +The @dfn{rounding mode} specifies the behavior for the results of numerical +operations when discarding extra precision. Each rounding mode indicates +how the least significant returned digit of a rounded result is to +be calculated. +@ref{table-rounding-modes} lists the IEEE-754 defined +rounding modes: + +@float Table,table-rounding-modes +@caption{IEEE 754 Rounding Modes} +@multitable @columnfractions .45 .55 +@headitem Rounding Mode @tab IEEE Name +@item Round to nearest, ties to even @tab @code{roundTiesToEven} +@item Round toward plus Infinity @tab @code{roundTowardPositive} +@item Round toward negative Infinity @tab @code{roundTowardNegative} +@item Round toward zero @tab @code{roundTowardZero} +@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} +@end multitable +@end float + +The default mode @code{roundTiesToEven} is the most preferred, +but the least intuitive. This method does the obvious thing for most values, +by rounding them up or down to the nearest digit. +For example, rounding 1.132 to two digits yields 1.13, +and rounding 1.157 yields 1.16. + +However, when it comes to rounding a value that is exactly halfway between, +things do not work the way you probably learned in school. +In this case, the number is rounded to the nearest even digit. +So rounding 0.125 to two digits rounds down to 0.12, +but rounding 0.6875 to three digits rounds up to 0.688. +You probably have already encountered this rounding mode when +using the @code{printf} routine to format floating-point numbers. +For example: + +@example +BEGIN @{ + x = -4.5 + for (i = 1; i < 10; i++) @{ + x += 1.0 + printf("%4.1f => %2.0f\n", x, x) + @} +@} +@end example + +@noindent +produces the following output when run:@footnote{It +is possible for the output to be completely different if the +C library in your system does not use the IEEE-754 even-rounding +rule to round halfway cases for @code{printf()}.} + +@example +-3.5 => -4 +-2.5 => -2 +-1.5 => -2 +-0.5 => 0 + 0.5 => 0 + 1.5 => 2 + 2.5 => 2 + 3.5 => 4 + 4.5 => 4 +@end example + +The theory behind the rounding mode @code{roundTiesToEven} is that +it more or less evenly distributes upward and downward rounds +of exact halves, which might cause the round-off error +to cancel itself out. This is the default rounding mode used +in IEEE-754 computing functions and operators. + +The other rounding modes are rarely used. +Round toward positive infinity (@code{roundTowardPositive}) +and round toward negative infinity (@code{roundTowardNegative}) +are often used to implement interval arithmetic, +where you adjust the rounding mode to calculate upper and lower bounds +for the range of output. The @code{roundTowardZero} +mode can be used for converting floating-point numbers to integers. +The rounding mode @code{roundTiesToAway} rounds the result to the +nearest number and selects the number with the larger magnitude +if a tie occurs. + +Some numerical analysts will tell you that your choice of rounding style +has tremendous impact on the final outcome, and advise you to wait until +final output for any rounding. Instead, you can often avoid round-off error problems by +setting the precision initially to some value sufficiently larger than +the final desired precision, so that the accumulation of round-off error +does not influence the outcome. +If you suspect that results from your computation are +sensitive to accumulation of round-off error, +one way to be sure is to look for a significant difference in output +when you change the rounding mode. + +@node Gawk and MPFR +@section @command{gawk} + MPFR = Powerful Arithmetic + +The rest of this @value{CHAPTER} decsribes how to use the arbitrary precision +(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric +capabilites in @command{gawk} to produce maximally accurate results +when you need it. + +But first you should check if your version of +@command{gawk} supports arbitrary precision arithmetic. +The easiest way to find out is to look at the output of +the following command: + +@example +$ @kbd{gawk --version} +@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) +@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation. +@dots{} +@end example + +@command{gawk} uses the +@uref{http://www.mpfr.org, GNU MPFR} +and +@uref{http://gmplib.org, GNU MP} (GMP) +libraries for arbitrary precision +arithmetic on numbers. So if you do not see the names of these libraries +in the output, then your version of @command{gawk} does not support +arbitrary precision arithmetic. + +Additionally, +there are a few elements available in the @code{PROCINFO} array +to provide information about the MPFR and GMP libraries. +@xref{Auto-set}, for more information. + +@ignore +Even if you aren't interested in arbitrary precision arithmetic, you +may still benefit from knowing about how @command{gawk} handles numbers +in general, and the limitations of doing arithmetic with ordinary +@command{gawk} numbers. +@end ignore + + +@node Arbitrary Precision Floats +@section Arbitrary Precision Floating-point Arithmetic with @command{gawk} + +@command{gawk} uses the GNU MPFR library +for arbitrary precision floating-point arithmetic. The MPFR library +provides precise control over precisions and rounding modes, and gives +correctly rounded reproducible platform-independent results. With the +command-line option @option{--bignum} or @option{-M}, +all floating-point arithmetic operators and numeric functions can yield +results to any desired precision level supported by MPFR. +Two built-in +variables @code{PREC} +(@pxref{Setting Precision}) +and @code{ROUNDMODE} +(@pxref{Setting Rounding Mode}) +provide control over the working precision and the rounding mode. +The precision and the rounding mode are set globally for every operation +to follow. + +The default working precision for arbitrary precision floating-point values is 53, +and the default value for @code{ROUNDMODE} is @code{"N"}, +which selects the IEEE-754 +@code{roundTiesToEven} (@pxref{Rounding Mode}) rounding mode.@footnote{The +default precision is 53, since according to the MPFR documentation, +the library should be able to exactly reproduce all computations with +double-precision machine floating-point numbers (@code{double} type +in C), except the default exponent range is much wider and subnormal +numbers are not implemented.} +@command{gawk} uses the default exponent range in MPFR +@iftex +(@math{emax = 2^{30} - 1, emin = -emax}) +@end iftex +@ifnottex +(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax}) +@end ifnottex +for all floating-point contexts. +There is no explicit mechanism to adjust the exponent range. +MPFR does not implement subnormal numbers by default, +and this behavior cannot be changed in @command{gawk}. + +@quotation NOTE +When emulating an IEEE-754 format (@pxref{Setting Precision}), +@command{gawk} internally adjusts the exponent range +to the value defined for the format and also performs computations needed for +gradual underflow (subnormal numbers). +@end quotation + +@quotation NOTE +MPFR numbers are variable-size entities, consuming only as much space as +needed to store the significant digits. Since the performance using MPFR +numbers pales in comparison to doing arithmetic using the underlying machine +types, you should consider using only as much precision as needed by +your program. +@end quotation + +@menu +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point numbers. +@end menu + +@node Setting Precision +@subsection Setting the Working Precision +@cindex @code{PREC} variable + +@command{gawk} uses a global working precision; it does not keep track of +the precision or accuracy of individual numbers. Performing an arithmetic +operation or calling a built-in function rounds the result to the current +working precision. The default working precision is 53 which can be +modified using the built-in variable @code{PREC}. You can also set the +value to one of the following pre-defined case-insensitive strings +to emulate an IEEE-754 binary format: + +@multitable {@code{"double"}} {12345678901234567890123456789012345} +@headitem @code{PREC} @tab IEEE-754 Binary Format +@item @code{"half"} @tab 16-bit half-precision. +@item @code{"single"} @tab Basic 32-bit single precision. +@item @code{"double"} @tab Basic 64-bit double precision. +@item @code{"quad"} @tab Basic 128-bit quadruple precision. +@item @code{"oct"} @tab 256-bit octuple precision. +@end multitable + +The following example illustrates the effects of changing precision +on arithmetic operations: + +@example +$ @kbd{gawk -M -vPREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \} +> @kbd{PREC = "double"; print x + 0 @}'} +@print{} 1e-400 +@print{} 0 +@end example + +Binary and decimal precisions are related approximately according to the +formula: + +@iftex +@math{prec = 3.322 @cdot dps} +@end iftex +@ifnottex +@var{prec} = 3.322 * @var{dps} +@end ifnottex + +@noindent +Here, @var{prec} denotes the binary precision +(measured in bits) and @var{dps} (short for decimal places) +is the decimal digits. We can easily calculate how many decimal +digits the 53-bit significand of an IEEE double is equivalent to: +53 / 3.332 which is equal to about 15.95. +But what does 15.95 digits actually mean? It depends whether you are +concerned about how many digits you can rely on, or how many digits +you need. + +It is important to know how many bits it takes to uniquely identify +a double-precision value (the C type @code{double}). If you want to +convert from @code{double} to decimal and back to @code{double} (e.g., +saving a @code{double} representing an intermediate result to a file, and +later reading it back to restart the computation), then a few more decimal +digits are required. 17 digits is generally enough for a @code{double}. + +It can also be important to know what decimal numbers can be uniquely +represented with a @code{double}. If you want to convert +from decimal to @code{double} and back again, 15 digits is the most that +you can get. Stated differently, you should not present +the numbers from your floating-point computations with more than 15 +significant digits in them. + +Conversely, it takes a precision of 332 bits to hold an approximation +of the constant @value{PI} that is accurate to 100 decimal places. +You should always add some extra bits in order to avoid the confusing round-off +issues that occur because numbers are stored internally in binary. + +@node Setting Rounding Mode +@subsection Setting the Rounding Mode +@cindex @code{ROUNDMODE} variable + +The @code{ROUNDMODE} variable provides +program level control over the rounding mode. +The correspondance between @code{ROUNDMODE} and the IEEE +rounding modes is shown in @ref{table-gawk-rounding-modes}. + +@float Table,table-gawk-rounding-modes +@caption{@command{gawk} Rounding Modes} +@multitable @columnfractions .45 .30 .25 +@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE} +@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"} +@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"} +@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"} +@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"} +@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"} +@end multitable +@end float + +@code{ROUNDMODE} has the default value @code{"N"}, +which selects the IEEE-754 rounding mode @code{roundTiesToEven}. +Besides the values listed in @ref{table-gawk-rounding-modes}, +@command{gawk} also accepts @code{"A"} to select the IEEE-754 mode +@code{roundTiesToAway} +if your version of the MPFR library supports it; otherwise setting +@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode}, +for the meanings of the various rounding modes. + +Here is an example of how to change the default rounding behavior of +@code{printf}'s output: + +@example +$ @kbd{gawk -M -vROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'} +@print{} 1.37 +@end example + +@node Floating-point Constants +@subsection Representing Floating-point Constants +@cindex constants, floating-point + +Be wary of floating-point constants! When reading a floating-point constant +from program source code, @command{gawk} uses the default precision, +unless overridden +by an assignment to the special variable @code{PREC} on the command +line, to store it internally as a MPFR number. +Changing the precision using @code{PREC} in the program text does +not change the precision of a constant. If you need to +represent a floating-point constant at a higher precision than the +default and cannot use a command line assignment to @code{PREC}, +you should either specify the constant as a string, or +as a rational number whenever possible. The following example +illustrates the differences among various ways to +print a floating-point constant: + +@example +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000055511151 +$ @kbd{gawk -M -vPREC = 113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'} +@print{} 0.1000000000000000000000000 +@end example + +In the first case, the number is stored with the default precision of 53. + +@node Changing Precision +@subsection Changing the Precision of a Number + +@cindex Laurie, Dirk +@quotation +@i{The point is that in any variable-precision package, +a decision is made on how to treat numbers given as data, +or arising in intermediate results, which are represented in +floating-point format to a precision lower than working precision. +Do we promote them to full membership of the high-precision club, +or do we treat them and all their associates as second-class citizens? +Sometimes the first course is proper, sometimes the second, and it takes +careful analysis to tell which.} + +Dirk Laurie@footnote{Dirk Laurie. +@cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}. +Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.} +@end quotation + +@command{gawk} does not implicitly modify the precision of any previously +computed results when the working precision is changed with an assignment +to @code{PREC}. The precision of a number is always the one that was +used at the time of its creation, and there is no way for the user +to explicitly change it afterwards. However, since the result of a +floating-point arithmetic operation is always an arbitrary precision +floating-point value---with a precision set by the value of @code{PREC}---one of the +following workarounds effectively accomplishes the desired behavior: + +@example +x = x + 0.0 +@end example + +@noindent +or: + +@example +x += 0.0 +@end example + +@node Exact Arithmetic +@subsection Exact Arithmetic with Floating-point Numbers + +@quotation CAUTION +Never depend on the exactness of floating-point arithmetic, +even for apparently simple expressions! +@end quotation + +Can arbitrary precision arithmetic give exact results? There are +no easy answers. The standard rules of algebra often do not apply +when using floating-point arithmetic. +Among other things, the distributive and associative laws +do not hold completely, and order of operation may be important +for your computation. Rounding error, cumulative precision loss +and underflow are often troublesome. + +When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3} +for equality +using the machine double precision arithmetic, it decides that they +are not equal! +(@xref{Floating-point Programming}.) +You can get the result you want by increasing the precision; +56 in this case will get the job done: + +@example +$ @kbd{gawk -M -vPREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 1 +@end example + +If adding more bits is good, perhaps adding even more bits of +precision is better? +Here is what happens if we use an even larger value of @code{PREC}: + +@example +$ @kbd{gawk -M -vPREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +This is not a bug in @command{gawk} or in the MPFR library. +It is easy to forget that the finite number of bits used to store the value +is often just an approximation after proper rounding. +The test for equality succeeds if and only if @emph{all} bits in the two operands +are exactly the same. Since this is not necessarily true after floating-point +computations with a particular precision and effective rounding rule, +a straight test for equality may not work. + +So, don't assume that floating-point values can be compared for equality. +You should also exercise caution when using other forms of comparisons. +The standard way to compare between floating-point numbers is to determine +how much error (or @dfn{tolerance}) you will allow in a comparison and +check to see if one value is within this error range of the other. + +In applications where 15 or fewer decimal places suffice, +hardware double precision arithmetic can be adequate, and is usually much faster. +But you do need to keep in mind that every floating-point operation +can suffer a new rounding error with catastrophic consequences as illustrated +by our attempt to compute the value of the constant @value{PI} +(@pxref{Floating-point Programming}). +Extra precision can greatly enhance the stability and the accuracy +of your computation in such cases. + +Repeated addition is not necessarily equivalent to multiplication +in floating-point arithmetic. In the example in +@ref{Floating-point Programming}: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{i++} +> @kbd{print i} +> @kbd{@}'} +@print{} 4 +@end example + +@noindent +you may or may not succeed in getting the correct result by choosing +an arbitrarily large value for @code{PREC}. Reformulation of +the problem at hand is often the correct approach in such situations. + +@node Arbitrary Precision Integers +@section Arbitrary Precision Integer Arithmetic with @command{gawk} +@cindex integer, arbitrary precision + +If the option @option{--bignum} or @option{-M} is specified, +@command{gawk} performs all +integer arithmetic using GMP arbitrary precision integers. +Any number that looks like an integer in a program source or data file +is stored as an arbitrary precision integer. +The size of the integer is limited only by your computer's memory. +The current floating-point context has no effect on operations involving integers. +For example, the following computes +@iftex +@math{5^{4^{3^{2}}}}, +@end iftex +@ifnottex +5^4^3^2, +@end ifnottex +the result of which is beyond the +limits of ordinary @command{gawk} numbers: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{x = 5^4^3^2} +> @kbd{print "# of digits =", length(x)} +> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)} +> @kbd{@}'} +@print{} # of digits = 183231 +@print{} 62060698786608744707 ... 92256259918212890625 +@end example + +If you were to compute the same value using arbitrary precision +floating-point values instead, the precision needed for correct output +(using the formula +@iftex +@math{prec = 3.322 @cdot dps}), +would be @math{3.322 @cdot 183231}, +@end iftex +@ifnottex +@samp{prec = 3.322 * dps}), +would be 3.322 x 183231, +@end ifnottex +or 608693. +(Thus, the floating-point representation requires over 30 times as +many decimal digits!) + +The result from an arithmetic operation with an integer and a floating-point value +is a floating-point value with a precision equal to the working precision. +The following program calculates the eighth term in +Sylvester's sequence@footnote{Weisstein, Eric W. +@cite{Sylvester's Sequence}. From MathWorld---A Wolfram Web Resource. +@url{http://mathworld.wolfram.com/SylvestersSequence.html}} +using a recurrence: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{s = 2.0} +> @kbd{for (i = 1; i <= 7; i++)} +> @kbd{s = s * (s - 1) + 1} +> @kbd{print s} +> @kbd{@}'} +@print{} 113423713055421845118910464 +@end example + +The output differs from the acutal number, 113423713055421844361000443, +because the default precision of 53 is not enough to represent the +floating-point results exactly. You can either increase the precision +(100 is enough in this case), or replace the floating-point constant +@samp{2.0} with an integer, to perform all computations using integer +arithmetic to get the correct output. + +It will sometimes be necessary for @command{gawk} to implicitly convert an +arbitrary precision integer into an arbitrary precision floating-point value. +This is primarily because the MPFR library does not always provide the +relevant interface to process arbitrary precision integers or mixed-mode +numbers as needed by an operation or function. +In such a case, the precision is set to the minimum value necessary +for exact conversion, and the working precision is not used for this purpose. +If this is not what you need or want, you can employ a subterfuge +like this: + +@example +gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}' +@end example + +You can avoid this issue altogether by specifying the number as a floating-point value +to begin with: + +@example +gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}' +@end example + +Note that for the particular example above, there is likely best +to just use the following: + +@example +gawk -M 'BEGIN @{ n = 13; print n % 2 @}' +@end example + @node Dynamic Extensions @chapter Writing Extensions for @command{gawk} |