aboutsummaryrefslogtreecommitdiffstats
path: root/doc/gawk.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r--doc/gawk.texi2503
1 files changed, 1255 insertions, 1248 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 7d463a3d..d700f2a7 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -296,14 +296,14 @@ particular records in a file and perform operations upon them.
* Functions:: Built-in and user-defined functions.
* Internationalization:: Getting @command{gawk} to speak your
language.
-* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
- @command{gawk}.
* Advanced Features:: Stuff for advanced users, specific to
@command{gawk}.
* Library Functions:: A Library of @command{awk} Functions.
* Sample Programs:: Many @command{awk} programs with complete
explanations.
* Debugger:: The @code{gawk} debugger.
+* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
+ @command{gawk}.
* Dynamic Extensions:: Adding new built-in functions to
@command{gawk}.
* Language History:: The evolution of the @command{awk}
@@ -569,29 +569,6 @@ particular records in a file and perform operations upon them.
* I18N Portability:: @command{awk}-level portability issues.
* I18N Example:: A simple i18n example.
* Gawk I18N:: @command{gawk} is also internationalized.
-* General Arithmetic:: An introduction to computer arithmetic.
-* Floating Point Issues:: Stuff to know about floating-point numbers.
-* String Conversion Precision:: The String Value Can Lie.
-* Unexpected Results:: Floating Point Numbers Are Not Abstract
- Numbers.
-* POSIX Floating Point Problems:: Standards Versus Existing Practice.
-* Integer Programming:: Effective integer programming.
-* Floating-point Programming:: Effective Floating-point Programming.
-* Floating-point Representation:: Binary floating-point representation.
-* Floating-point Context:: Floating-point context.
-* Rounding Mode:: Floating-point rounding mode.
-* Gawk and MPFR:: How @command{gawk} provides
- aribitrary-precision arithmetic.
-* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
- Arithmetic with @command{gawk}.
-* Setting Precision:: Setting the working precision.
-* Setting Rounding Mode:: Setting the rounding mode.
-* Floating-point Constants:: Representing floating-point constants.
-* Changing Precision:: Changing the precision of a number.
-* Exact Arithmetic:: Exact arithmetic with floating-point
- numbers.
-* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
- @command{gawk}.
* Nondecimal Data:: Allowing nondecimal input data.
* Array Sorting:: Facilities for controlling array traversal
and sorting arrays.
@@ -673,6 +650,29 @@ particular records in a file and perform operations upon them.
* Miscellaneous Debugger Commands:: Miscellaneous Commands.
* Readline Support:: Readline support.
* Limitations:: Limitations and future plans.
+* General Arithmetic:: An introduction to computer arithmetic.
+* Floating Point Issues:: Stuff to know about floating-point numbers.
+* String Conversion Precision:: The String Value Can Lie.
+* Unexpected Results:: Floating Point Numbers Are Not Abstract
+ Numbers.
+* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+* Integer Programming:: Effective integer programming.
+* Floating-point Programming:: Effective Floating-point Programming.
+* Floating-point Representation:: Binary floating-point representation.
+* Floating-point Context:: Floating-point context.
+* Rounding Mode:: Floating-point rounding mode.
+* Gawk and MPFR:: How @command{gawk} provides
+ aribitrary-precision arithmetic.
+* Arbitrary Precision Floats:: Arbitrary Precision Floating-point
+ Arithmetic with @command{gawk}.
+* Setting Precision:: Setting the working precision.
+* Setting Rounding Mode:: Setting the rounding mode.
+* Floating-point Constants:: Representing floating-point constants.
+* Changing Precision:: Changing the precision of a number.
+* Exact Arithmetic:: Exact arithmetic with floating-point
+ numbers.
+* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
+ @command{gawk}.
* Plugin License:: A note about licensing.
* Sample Library:: A example of new functions.
* Internal File Description:: What the new functions will do.
@@ -1201,6 +1201,13 @@ solving real problems.
@ref{Debugger}, describes the @command{awk} debugger.
+@ref{Arbitrary Precision Arithmetic},
+describes advanced arithmetic facilities provided by
+@command{gawk}.
+
+@ref{Dynamic Extensions}, describes how to add new variables and
+functions to @command{gawk} by writing extensions in C.
+
@ref{Language History},
describes how the @command{awk} language has evolved since
its first release to present. It also describes how @command{gawk}
@@ -18497,1229 +18504,6 @@ then @command{gawk} produces usage messages, warnings,
and fatal errors in the local language.
@c ENDOFRANGE inloc
-@node Arbitrary Precision Arithmetic
-@chapter Arithmetic and Arbitrary Precision Arithmetic with @command{gawk}
-@cindex arbitrary precision
-@cindex multiple precision
-@cindex infinite precision
-@cindex floating-point numbers, arbitrary precision
-@cindex MPFR
-@cindex GMP
-
-@cindex Knuth, Donald
-@quotation
-@i{There's a credibility gap: We don't know how much of the computer's answers
-to believe. Novice computer users solve this problem by implicitly trusting
-in the computer as an infallible authority; they tend to believe that all
-digits of a printed answer are significant. Disillusioned computer users have
-just the opposite approach; they are constantly afraid that their answers
-are almost meaningless.}@*
-Donald Knuth@footnote{Donald E.@: Knuth.
-@cite{The Art of Computer Programming}. Volume 2,
-@cite{Seminumerical Algorithms}, third edition,
-1998, ISBN 0-201-89683-4, p.@: 229.}
-@end quotation
-
-This @value{CHAPTER} discusses issues that you may encounter
-when performing arithmetic. It begins by discussing some of
-the general atributes of computer arithmetic, along with how
-this can influence what you see when running @command{awk} programs.
-This discussion applies to all versions of @command{awk}.
-
-Then the discussion moves on to @dfn{arbitrary precsion
-arithmetic}, a feature which is specific to @command{gawk}.
-
-@menu
-* General Arithmetic:: An introduction to computer arithmetic.
-* Floating-point Programming:: Effective Floating-point Programming.
-* Gawk and MPFR:: How @command{gawk} provides
- aribitrary-precision arithmetic.
-* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic
- with @command{gawk}.
-* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
- @command{gawk}.
-@end menu
-
-@node General Arithmetic
-@section A General Description of Computer Arithmetic
-
-@cindex integers
-@cindex floating-point, numbers
-@cindex numbers, floating-point
-Within computers, there are two kinds of numeric values: @dfn{integers}
-and @dfn{floating-point}.
-In school, integer values were referred to as ``whole'' numbers---that is,
-numbers without any fractional part, such as 1, 42, or @minus{}17.
-The advantage to integer numbers is that they represent values exactly.
-The disadvantage is that their range is limited. On most systems,
-this range is @minus{}2,147,483,648 to 2,147,483,647.
-However, many systems now support a range from
-@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
-
-@cindex unsigned integers
-@cindex integers, unsigned
-Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}.
-Signed values may be negative or positive, with the range of values just
-described.
-Unsigned values are always positive. On most systems,
-the range is from 0 to 4,294,967,295.
-However, many systems now support a range from
-0 to 18,446,744,073,709,551,615.
-
-@cindex double precision floating-point
-@cindex single precision floating-point
-Floating-point numbers represent what are called ``real'' numbers; i.e.,
-those that do have a fractional part, such as 3.1415927.
-The advantage to floating-point numbers is that they
-can represent a much larger range of values.
-The disadvantage is that there are numbers that they cannot represent
-exactly.
-@command{awk} uses @dfn{double precision} floating-point numbers, which
-can hold more digits than @dfn{single precision}
-floating-point numbers.
-@c Floating-point issues are discussed more fully in
-@c @ref{Floating Point Issues}.
-
-There a several important issues to be aware of, described next.
-
-@menu
-* Floating Point Issues:: Stuff to know about floating-point numbers.
-* Integer Programming:: Effective integer programming.
-@end menu
-
-@node Floating Point Issues
-@subsection Floating-Point Number Caveats
-
-As mentioned earlier, floating-point numbers represent what are called
-``real'' numbers, i.e., those that have a fractional part. @command{awk}
-uses double precision floating-point numbers to represent all
-numeric values. This @value{SECTION} describes some of the issues
-involved in using floating-point numbers.
-
-There is a very nice
-@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic}
-by David Goldberg,
-``What Every Computer Scientist Should Know About Floating-point Arithmetic,''
-@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48.
-This is worth reading if you are interested in the details,
-but it does require a background in computer science.
-
-@menu
-* String Conversion Precision:: The String Value Can Lie.
-* Unexpected Results:: Floating Point Numbers Are Not Abstract
- Numbers.
-* POSIX Floating Point Problems:: Standards Versus Existing Practice.
-@end menu
-
-@node String Conversion Precision
-@subsubsection The String Value Can Lie
-
-Internally, @command{awk} keeps both the numeric value
-(double precision floating-point) and the string value for a variable.
-Separately, @command{awk} keeps
-track of what type the variable has
-(@pxref{Typing and Comparison}),
-which plays a role in how variables are used in comparisons.
-
-It is important to note that the string value for a number may not
-reflect the full value (all the digits) that the numeric value
-actually contains.
-The following program (@file{values.awk}) illustrates this:
-
-@example
-@{
- sum = $1 + $2
- # see it for what it is
- printf("sum = %.12g\n", sum)
- # use CONVFMT
- a = "<" sum ">"
- print "a =", a
- # use OFMT
- print "sum =", sum
-@}
-@end example
-
-@noindent
-This program shows the full value of the sum of @code{$1} and @code{$2}
-using @code{printf}, and then prints the string values obtained
-from both automatic conversion (via @code{CONVFMT}) and
-from printing (via @code{OFMT}).
-
-Here is what happens when the program is run:
-
-@example
-$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk}
-@print{} sum = 4.8888888
-@print{} a = <4.88889>
-@print{} sum = 4.88889
-@end example
-
-This makes it clear that the full numeric value is different from
-what the default string representations show.
-
-@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with
-at least six significant digits. For some applications, you might want to
-change it to specify more precision.
-On most modern machines, most of the time,
-17 digits is enough to capture a floating-point number's
-value exactly.@footnote{Pathological cases can require up to
-752 digits (!), but we doubt that you need to worry about this.}
-
-@node Unexpected Results
-@subsubsection Floating Point Numbers Are Not Abstract Numbers
-
-@cindex floating-point, numbers
-Unlike numbers in the abstract sense (such as what you studied in high school
-or college arithmetic), numbers stored in computers are limited in certain ways.
-They cannot represent an infinite number of digits, nor can they always
-represent things exactly.
-In particular,
-floating-point numbers cannot
-always represent values exactly. Here is an example:
-
-@example
-$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'}
-515.79
-@print{} 0000051579
-515.80
-@print{} 0000051579
-515.81
-@print{} 0000051580
-515.82
-@print{} 0000051582
-@kbd{@value{CTL}-d}
-@end example
-
-@noindent
-This shows that some values can be represented exactly,
-whereas others are only approximated. This is not a ``bug''
-in @command{awk}, but simply an artifact of how computers
-represent numbers.
-
-@quotation NOTE
-It cannot be emphasized enough that the behavior just
-described is fundamental to modern computers. You will
-see this kind of thing happen in @emph{any} programming
-language using hardware floating-point numbers. It is @emph{not}
-a bug in @command{gawk}, nor is it something that can be ``just
-fixed.''
-@end quotation
-
-@cindex negative zero
-@cindex positive zero
-@cindex zero@comma{} negative vs.@: positive
-Another peculiarity of floating-point numbers on modern systems
-is that they often have more than one representation for the number zero!
-In particular, it is possible to represent ``minus zero'' as well as
-regular, or ``positive'' zero.
-
-This example shows that negative and positive zero are distinct values
-when stored internally, but that they are in fact equal to each other,
-as well as to ``regular'' zero:
-
-@example
-$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0}
-> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz}
-> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0}
-> @kbd{@}'}
-@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1
-@print{} mz == 0 -> 1, pz == 0 -> 1
-@end example
-
-It helps to keep this in mind should you process numeric data
-that contains negative zero values; the fact that the zero is negative
-is noted and can affect comparisons.
-
-@node POSIX Floating Point Problems
-@subsubsection Standards Versus Existing Practice
-
-Historically, @command{awk} has converted any non-numeric looking string
-to the numeric value zero, when required. Furthermore, the original
-definition of the language and the original POSIX standards specified that
-@command{awk} only understands decimal numbers (base 10), and not octal
-(base 8) or hexadecimal numbers (base 16).
-
-Changes in the language of the
-2001 and 2004 POSIX standards can be interpreted to imply that @command{awk}
-should support additional features. These features are:
-
-@itemize @bullet
-@item
-Interpretation of floating point data values specified in hexadecimal
-notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not}
-source code constants.)
-
-@item
-Support for the special IEEE 754 floating point values ``Not A Number''
-(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf'').
-In particular, the format for these values is as specified by the ISO 1999
-C standard, which ignores case and can allow machine-dependent additional
-characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}.
-@end itemize
-
-The first problem is that both of these are clear changes to historical
-practice:
-
-@itemize @bullet
-@item
-The @command{gawk} maintainer feels that supporting hexadecimal floating
-point values, in particular, is ugly, and was never intended by the
-original designers to be part of the language.
-
-@item
-Allowing completely alphabetic strings to have valid numeric
-values is also a very severe departure from historical practice.
-@end itemize
-
-The second problem is that the @code{gawk} maintainer feels that this
-interpretation of the standard, which requires a certain amount of
-``language lawyering'' to arrive at in the first place, was not even
-intended by the standard developers. In other words, ``we see how you
-got where you are, but we don't think that that's where you want to be.''
-
-Recognizing the above issues, but attempting to provide compatibility
-with the earlier versions of the standard,
-the 2008 POSIX standard added explicit wording to allow, but not require,
-that @command{awk} support hexadecimal floating point values and
-special values for ``Not A Number'' and infinity.
-
-Although the @command{gawk} maintainer continues to feel that
-providing those features is inadvisable,
-nevertheless, on systems that support IEEE floating point, it seems
-reasonable to provide @emph{some} way to support NaN and Infinity values.
-The solution implemented in @command{gawk} is as follows:
-
-@itemize @bullet
-@item
-With the @option{--posix} command-line option, @command{gawk} becomes
-``hands off.'' String values are passed directly to the system library's
-@code{strtod()} function, and if it successfully returns a numeric value,
-that is what's used.@footnote{You asked for it, you got it.}
-By definition, the results are not portable across
-different systems. They are also a little surprising:
-
-@example
-$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'}
-@print{} nan
-$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'}
-@print{} 3735928559
-@end example
-
-@item
-Without @option{--posix}, @command{gawk} interprets the four strings
-@samp{+inf},
-@samp{-inf},
-@samp{+nan},
-and
-@samp{-nan}
-specially, producing the corresponding special numeric values.
-The leading sign acts a signal to @command{gawk} (and the user)
-that the value is really numeric. Hexadecimal floating point is
-not supported (unless you also use @option{--non-decimal-data},
-which is @emph{not} recommended). For example:
-
-@example
-$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'}
-@print{} 0
-$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'}
-@print{} nan
-$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'}
-@print{} 0
-@end example
-
-@command{gawk} does ignore case in the four special values.
-Thus @samp{+nan} and @samp{+NaN} are the same.
-@end itemize
-
-@node Integer Programming
-@subsection Mixing Integers And Floating-point
-
-As has been mentioned already, @command{gawk} ordinarily uses hardware double
-precision with 64-bit IEEE binary floating-point representation
-for numbers on most systems. A large integer like 9007199254740997
-has a binary representation that, although finite, is more than 53 bits long;
-it must also be rounded to 53 bits.
-The biggest integer that can be stored in a C @code{double} is usually the same
-as the largest possible value of a @code{double}. If your system @code{double}
-is an IEEE 64-bit @code{double}, this largest possible value is an integer and
-can be represented precisely. What more should one know about integers?
-
-If you want to know what is the largest integer, such that it and
-all smaller integers can be stored in 64-bit doubles without losing precision,
-then the answer is
-@iftex
-@math{2^{53}}.
-@end iftex
-@ifnottex
-2^53.
-@end ifnottex
-The next representable number is the even number
-@iftex
-@math{2^{53} + 2},
-@end iftex
-@ifnottex
-2^53 + 2,
-@end ifnottex
-meaning it is unlikely that you will be able to make
-@command{gawk} print
-@iftex
-@math{2^{53} + 1}
-@end iftex
-@ifnottex
-2^53 + 1
-@end ifnottex
-in integer format.
-The range of integers exactly representable by a 64-bit double
-is
-@iftex
-@math{[-2^{53}, 2^{53}]}.
-@end iftex
-@ifnottex
-[@minus{}2^53, 2^53].
-@end ifnottex
-If you ever see an integer outside this range in @command{gawk}
-using 64-bit doubles, you have reason to be very suspicious about
-the accuracy of the output. Here is a simple program with erroneous output:
-
-@example
-$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'}
-@print{} 9007199254740991
-@print{} 9007199254740992
-@print{} 9007199254740992
-@print{} 9007199254740994
-@end example
-
-The lesson is to not assume that any large integer printed by @command{gawk}
-represents an exact result from your computation, especially if it wraps
-around on your screen.
-
-@node Floating-point Programming
-@section Understanding Floating-point Programming
-
-Numerical programming is an extensive area; if you need to develop
-sophisticated numerical algorithms then @command{gawk} may not be
-the ideal tool, and this documentation may not be sufficient.
-@c FIXME: JOHN: Do you want to cite some actual books?
-It might require digesting a book or two to really internalize how to compute
-with ideal accuracy and precision
-and the result often depends on the particular application.
-
-@quotation NOTE
-A floating-point calculation's @dfn{accuracy} is how close it comes
-to the real value. This is as opposed to the @dfn{precision}, which
-usually refers to the number of bits used to represent the number
-(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision,
-the Wikipedia article} for more information).
-@end quotation
-
-There are two options for doing floating-point calculations:
-hardware floating-point (as used by standard @command{awk} and
-the default for @command{gawk}), and @dfn{arbitrary-precision}
-floating-point, which is software based. This @value{CHAPTER}
-aims to provide enough information to understand both, and then
-will focus on @command{gawk}'s facilities for the latter.@footnote{If you
-are interested in other tools that perform arbitrary precision arithmetic,
-you may want to investigate the POSIX @command{bc} tool. See
-@uref{http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html,
-the POSIX specification for it}, for more information.}
-
-Binary floating-point representations and arithmetic are inexact.
-Simple values like 0.1 cannot be precisely represented using
-binary floating-point numbers, and the limited precision of
-floating-point numbers means that slight changes in
-the order of operations or the precision of intermediate storage
-can change the result. To make matters worse, with arbitrary precision
-floating-point, you can set the precision before starting a computation,
-but then you cannot be sure of the number of significant decimal places
-in the final result.
-
-Sometimes, before you start to write any code, you should think more
-about what you really want and what's really happening. Consider the
-two numbers in the following example:
-
-@example
-x = 0.875 # 1/2 + 1/4 + 1/8
-y = 0.425
-@end example
-
-Unlike the number in @code{y}, the number stored in @code{x}
-is exactly representable
-in binary since it can be written as a finite sum of one or
-more fractions whose denominators are all powers of two.
-When @command{gawk} reads a floating-point number from
-program source, it automatically rounds that number to whatever
-precision your machine supports. If you try to print the numeric
-content of a variable using an output format string of @code{"%.17g"},
-it may not produce the same number as you assigned to it:
-
-@example
-$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425}
-> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'}
-@print{} 0.875, 0.42499999999999999
-@end example
-
-Often the error is so small you do not even notice it, and if you do,
-you can always specify how much precision you would like in your output.
-Usually this is a format string like @code{"%.15g"}, which when
-used in the previous example, produces an output identical to the input.
-
-Because the underlying representation can be little bit off from the exact value,
-comparing floating-point values to see if they are equal is generally not a good idea.
-Here is an example where it does not work like you expect:
-
-@example
-$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
-@print{} 0
-@end example
-
-The loss of accuracy during a single computation with floating-point numbers
-usually isn't enough to worry about. However, if you compute a value
-which is the result of a sequence of floating point operations,
-the error can accumulate and greatly affect the computation itself.
-Here is an attempt to compute the value of the constant
-@value{PI} using one of its many series representations:
-
-@example
-BEGIN @{
- x = 1.0 / sqrt(3.0)
- n = 6
- for (i = 1; i < 30; i++) @{
- n = n * 2.0
- x = (sqrt(x * x + 1) - 1) / x
- printf("%.15f\n", n * x)
- @}
-@}
-@end example
-
-When run, the early errors propagating through later computations
-cause the loop to terminate prematurely after an attempt to divide by zero.
-
-@example
-$ @kbd{gawk -f pi.awk}
-@print{} 3.215390309173475
-@print{} 3.159659942097510
-@print{} 3.146086215131467
-@print{} 3.142714599645573
-@dots{}
-@print{} 3.224515243534819
-@print{} 2.791117213058638
-@print{} 0.000000000000000
-@error{} gawk: pi.awk:6: fatal: division by zero attempted
-@end example
-
-Here is one more example where the inaccuracies in internal representations
-yield an unexpected result:
-
-@example
-$ @kbd{gawk 'BEGIN @{}
-> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)}
-> @kbd{i++}
-> @kbd{print i}
-> @kbd{@}'}
-@print{} 4
-@end example
-
-Can computation using aribitrary precision help with the previous examples?
-If you are impatient to know, see
-@ref{Exact Arithmetic}.
-
-Instead of aribitrary precision floating-point arithmetic,
-often all you need is an adjustment of your logic
-or a different order for the operations in your calculation.
-The stability and the accuracy of the computation of the constant @value{PI}
-in the previous example can be enhanced by using the following
-simple algebraic transformation:
-
-@example
-(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1)
-@end example
-
-@noindent
-After making this, change the program does converge to
-@value{PI} in under 30 iterations:
-
-@example
-$ @kbd{gawk -f /tmp/pi2.awk}
-@print{} 3.215390309173473
-@print{} 3.159659942097501
-@print{} 3.146086215131436
-@print{} 3.142714599645370
-@print{} 3.141873049979825
-@dots{}
-@print{} 3.141592653589797
-@print{} 3.141592653589797
-@end example
-
-There is no need to be unduly suspicious about the results from
-floating-point arithmetic. The lesson to remember is that
-floating-point arithmetic is always more complex than the arithmetic using
-pencil and paper. In order to take advantage of the power
-of computer floating-point, you need to know its limitations
-and work within them. For most casual use of floating-point arithmetic,
-you will often get the expected result in the end if you simply round
-the display of your final results to the correct number of significant
-decimal digits. And, avoid presenting numerical data in a manner that
-implies better precision than is actually the case.
-
-@menu
-* Floating-point Representation:: Binary floating-point representation.
-* Floating-point Context:: Floating-point context.
-* Rounding Mode:: Floating-point rounding mode.
-@end menu
-
-@node Floating-point Representation
-@subsection Binary Floating-point Representation
-@cindex IEEE-754 format
-
-Although floating-point representations vary from machine to machine,
-the most commonly encountered representation is that defined by the
-IEEE 754 Standard. An IEEE-754 format value has three components:
-
-@itemize @bullet
-@item
-A sign bit telling whether the number is positive or negative.
-
-@item
-An @dfn{exponent} giving its order of magnitude, @var{e}.
-
-@item
-A @dfn{significand}, @var{s},
-specifying the actual digits of the number.
-@end itemize
-
-The value of the
-number is then
-@iftex
-@math{s @cdot 2^e}.
-@end iftex
-@ifnottex
-@var{s * 2^e}.
-@end ifnottex
-The first bit of a non-zero binary significand
-is always one, so the significand in an IEEE-754 format only includes the
-fractional part, leaving the leading one implicit.
-
-Three of the standard IEEE-754 types are 32-bit single precision,
-64-bit double precision and 128-bit quadruple precision.
-The standard also specifies extended precision formats
-to allow greater precisions and larger exponent ranges.
-
-The significand is stored in @dfn{normalized} format,
-which means that the first bit is always a one.
-
-@node Floating-point Context
-@subsection Floating-point Context
-@cindex context, floating-point
-
-A floating-point @dfn{context} defines the environment for arithmetic operations.
-It governs precision, sets rules for rounding, and limits the range for exponents.
-The context has the following primary components:
-
-@table @dfn
-@item Precision
-Precision of the floating-point format in bits.
-@item emax
-Maximum exponent allowed for this format.
-@item emin
-Minimum exponent allowed for this format.
-@item Underflow behavior
-The format may or may not support gradual underflow.
-@item Rounding
-The rounding mode of this context.
-@end table
-
-@ref{table-ieee-formats} lists the precision and exponent
-field values for the basic IEEE-754 binary formats:
-
-@float Table,table-ieee-formats
-@caption{Basic IEEE Format Context Values}
-@multitable @columnfractions .20 .20 .20 .20 .20
-@headitem Name @tab Total bits @tab Precision @tab emin @tab emax
-@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127
-@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023
-@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383
-@end multitable
-@end float
-
-@quotation NOTE
-The precision numbers include the implied leading one that gives them
-one extra bit of significand.
-@end quotation
-
-A floating-point context can also determine which signals are treated
-as exceptions, and can set rules for arithmetic with special values.
-Please consult the IEEE-754 standard or other resources for details.
-
-@command{gawk} ordinarily uses the hardware double precision
-representation for numbers. On most systems, this is IEEE-754
-floating-point format, corresponding to 64-bit binary with 53 bits
-of precision.
-
-@quotation NOTE
-In case an underflow occurs, the standard allows, but does not require,
-the result from an arithmetic operation to be a number smaller than
-the smallest nonzero normalized number. Such numbers do
-not have as many significant digits as normal numbers, and are called
-@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero,
-is called @dfn{flush to zero}. The basic IEEE-754 binary formats
-support subnormal numbers.
-@end quotation
-
-@node Rounding Mode
-@subsection Floating-point Rounding Mode
-@cindex rounding mode, floating-point
-
-The @dfn{rounding mode} specifies the behavior for the results of numerical
-operations when discarding extra precision. Each rounding mode indicates
-how the least significant returned digit of a rounded result is to
-be calculated.
-@ref{table-rounding-modes} lists the IEEE-754 defined
-rounding modes:
-
-@float Table,table-rounding-modes
-@caption{IEEE 754 Rounding Modes}
-@multitable @columnfractions .45 .55
-@headitem Rounding Mode @tab IEEE Name
-@item Round to nearest, ties to even @tab @code{roundTiesToEven}
-@item Round toward plus Infinity @tab @code{roundTowardPositive}
-@item Round toward negative Infinity @tab @code{roundTowardNegative}
-@item Round toward zero @tab @code{roundTowardZero}
-@item Round to nearest, ties away from zero @tab @code{roundTiesToAway}
-@end multitable
-@end float
-
-The default mode @code{roundTiesToEven} is the most preferred,
-but the least intuitive. This method does the obvious thing for most values,
-by rounding them up or down to the nearest digit.
-For example, rounding 1.132 to two digits yields 1.13,
-and rounding 1.157 yields 1.16.
-
-However, when it comes to rounding a value that is exactly halfway between,
-things do not work the way you probably learned in school.
-In this case, the number is rounded to the nearest even digit.
-So rounding 0.125 to two digits rounds down to 0.12,
-but rounding 0.6875 to three digits rounds up to 0.688.
-You probably have already encountered this rounding mode when
-using the @code{printf} routine to format floating-point numbers.
-For example:
-
-@example
-BEGIN @{
- x = -4.5
- for (i = 1; i < 10; i++) @{
- x += 1.0
- printf("%4.1f => %2.0f\n", x, x)
- @}
-@}
-@end example
-
-@noindent
-produces the following output when run:@footnote{It
-is possible for the output to be completely different if the
-C library in your system does not use the IEEE-754 even-rounding
-rule to round halfway cases for @code{printf()}.}
-
-@example
--3.5 => -4
--2.5 => -2
--1.5 => -2
--0.5 => 0
- 0.5 => 0
- 1.5 => 2
- 2.5 => 2
- 3.5 => 4
- 4.5 => 4
-@end example
-
-The theory behind the rounding mode @code{roundTiesToEven} is that
-it more or less evenly distributes upward and downward rounds
-of exact halves, which might cause the round-off error
-to cancel itself out. This is the default rounding mode used
-in IEEE-754 computing functions and operators.
-
-The other rounding modes are rarely used.
-Round toward positive infinity (@code{roundTowardPositive})
-and round toward negative infinity (@code{roundTowardNegative})
-are often used to implement interval arithmetic,
-where you adjust the rounding mode to calculate upper and lower bounds
-for the range of output. The @code{roundTowardZero}
-mode can be used for converting floating-point numbers to integers.
-The rounding mode @code{roundTiesToAway} rounds the result to the
-nearest number and selects the number with the larger magnitude
-if a tie occurs.
-
-Some numerical analysts will tell you that your choice of rounding style
-has tremendous impact on the final outcome, and advise you to wait until
-final output for any rounding. Instead, you can often avoid round-off error problems by
-setting the precision initially to some value sufficiently larger than
-the final desired precision, so that the accumulation of round-off error
-does not influence the outcome.
-If you suspect that results from your computation are
-sensitive to accumulation of round-off error,
-one way to be sure is to look for a significant difference in output
-when you change the rounding mode.
-
-@node Gawk and MPFR
-@section @command{gawk} + MPFR = Powerful Arithmetic
-
-The rest of this @value{CHAPTER} decsribes how to use the arbitrary precision
-(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric
-capabilites in @command{gawk} to produce maximally accurate results
-when you need it.
-
-But first you should check if your version of
-@command{gawk} supports arbitrary precision arithmetic.
-The easiest way to find out is to look at the output of
-the following command:
-
-@example
-$ @kbd{gawk --version}
-@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3)
-@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation.
-@dots{}
-@end example
-
-@command{gawk} uses the
-@uref{http://www.mpfr.org, GNU MPFR}
-and
-@uref{http://gmplib.org, GNU MP} (GMP)
-libraries for arbitrary precision
-arithmetic on numbers. So if you do not see the names of these libraries
-in the output, then your version of @command{gawk} does not support
-arbitrary precision arithmetic.
-
-Additionally,
-there are a few elements available in the @code{PROCINFO} array
-to provide information about the MPFR and GMP libraries.
-@xref{Auto-set}, for more information.
-
-@ignore
-Even if you aren't interested in arbitrary precision arithmetic, you
-may still benefit from knowing about how @command{gawk} handles numbers
-in general, and the limitations of doing arithmetic with ordinary
-@command{gawk} numbers.
-@end ignore
-
-
-@node Arbitrary Precision Floats
-@section Arbitrary Precision Floating-point Arithmetic with @command{gawk}
-
-@command{gawk} uses the GNU MPFR library
-for arbitrary precision floating-point arithmetic. The MPFR library
-provides precise control over precisions and rounding modes, and gives
-correctly rounded reproducible platform-independent results. With the
-command-line option @option{--bignum} or @option{-M},
-all floating-point arithmetic operators and numeric functions can yield
-results to any desired precision level supported by MPFR.
-Two built-in
-variables @code{PREC}
-(@pxref{Setting Precision})
-and @code{ROUNDMODE}
-(@pxref{Setting Rounding Mode})
-provide control over the working precision and the rounding mode.
-The precision and the rounding mode are set globally for every operation
-to follow.
-
-The default working precision for arbitrary precision floating-point values is 53,
-and the default value for @code{ROUNDMODE} is @code{"N"},
-which selects the IEEE-754
-@code{roundTiesToEven} (@pxref{Rounding Mode}) rounding mode.@footnote{The
-default precision is 53, since according to the MPFR documentation,
-the library should be able to exactly reproduce all computations with
-double-precision machine floating-point numbers (@code{double} type
-in C), except the default exponent range is much wider and subnormal
-numbers are not implemented.}
-@command{gawk} uses the default exponent range in MPFR
-@iftex
-(@math{emax = 2^{30} - 1, emin = -emax})
-@end iftex
-@ifnottex
-(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax})
-@end ifnottex
-for all floating-point contexts.
-There is no explicit mechanism to adjust the exponent range.
-MPFR does not implement subnormal numbers by default,
-and this behavior cannot be changed in @command{gawk}.
-
-@quotation NOTE
-When emulating an IEEE-754 format (@pxref{Setting Precision}),
-@command{gawk} internally adjusts the exponent range
-to the value defined for the format and also performs computations needed for
-gradual underflow (subnormal numbers).
-@end quotation
-
-@quotation NOTE
-MPFR numbers are variable-size entities, consuming only as much space as
-needed to store the significant digits. Since the performance using MPFR
-numbers pales in comparison to doing arithmetic using the underlying machine
-types, you should consider using only as much precision as needed by
-your program.
-@end quotation
-
-@menu
-* Setting Precision:: Setting the working precision.
-* Setting Rounding Mode:: Setting the rounding mode.
-* Floating-point Constants:: Representing floating-point constants.
-* Changing Precision:: Changing the precision of a number.
-* Exact Arithmetic:: Exact arithmetic with floating-point numbers.
-@end menu
-
-@node Setting Precision
-@subsection Setting the Working Precision
-@cindex @code{PREC} variable
-
-@command{gawk} uses a global working precision; it does not keep track of
-the precision or accuracy of individual numbers. Performing an arithmetic
-operation or calling a built-in function rounds the result to the current
-working precision. The default working precision is 53 which can be
-modified using the built-in variable @code{PREC}. You can also set the
-value to one of the following pre-defined case-insensitive strings
-to emulate an IEEE-754 binary format:
-
-@multitable {@code{"double"}} {12345678901234567890123456789012345}
-@headitem @code{PREC} @tab IEEE-754 Binary Format
-@item @code{"half"} @tab 16-bit half-precision.
-@item @code{"single"} @tab Basic 32-bit single precision.
-@item @code{"double"} @tab Basic 64-bit double precision.
-@item @code{"quad"} @tab Basic 128-bit quadruple precision.
-@item @code{"oct"} @tab 256-bit octuple precision.
-@end multitable
-
-The following example illustrates the effects of changing precision
-on arithmetic operations:
-
-@example
-$ @kbd{gawk -M -vPREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \}
-> @kbd{PREC = "double"; print x + 0 @}'}
-@print{} 1e-400
-@print{} 0
-@end example
-
-Binary and decimal precisions are related approximately according to the
-formula:
-
-@iftex
-@math{prec = 3.322 @cdot dps}
-@end iftex
-@ifnottex
-@var{prec} = 3.322 * @var{dps}
-@end ifnottex
-
-@noindent
-Here, @var{prec} denotes the binary precision
-(measured in bits) and @var{dps} (short for decimal places)
-is the decimal digits. We can easily calculate how many decimal
-digits the 53-bit significand of an IEEE double is equivalent to:
-53 / 3.332 which is equal to about 15.95.
-But what does 15.95 digits actually mean? It depends whether you are
-concerned about how many digits you can rely on, or how many digits
-you need.
-
-It is important to know how many bits it takes to uniquely identify
-a double-precision value (the C type @code{double}). If you want to
-convert from @code{double} to decimal and back to @code{double} (e.g.,
-saving a @code{double} representing an intermediate result to a file, and
-later reading it back to restart the computation), then a few more decimal
-digits are required. 17 digits is generally enough for a @code{double}.
-
-It can also be important to know what decimal numbers can be uniquely
-represented with a @code{double}. If you want to convert
-from decimal to @code{double} and back again, 15 digits is the most that
-you can get. Stated differently, you should not present
-the numbers from your floating-point computations with more than 15
-significant digits in them.
-
-Conversely, it takes a precision of 332 bits to hold an approximation
-of the constant @value{PI} that is accurate to 100 decimal places.
-You should always add some extra bits in order to avoid the confusing round-off
-issues that occur because numbers are stored internally in binary.
-
-@node Setting Rounding Mode
-@subsection Setting the Rounding Mode
-@cindex @code{ROUNDMODE} variable
-
-The @code{ROUNDMODE} variable provides
-program level control over the rounding mode.
-The correspondance between @code{ROUNDMODE} and the IEEE
-rounding modes is shown in @ref{table-gawk-rounding-modes}.
-
-@float Table,table-gawk-rounding-modes
-@caption{@command{gawk} Rounding Modes}
-@multitable @columnfractions .45 .30 .25
-@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE}
-@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"}
-@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"}
-@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"}
-@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"}
-@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"}
-@end multitable
-@end float
-
-@code{ROUNDMODE} has the default value @code{"N"},
-which selects the IEEE-754 rounding mode @code{roundTiesToEven}.
-Besides the values listed in @ref{table-gawk-rounding-modes},
-@command{gawk} also accepts @code{"A"} to select the IEEE-754 mode
-@code{roundTiesToAway}
-if your version of the MPFR library supports it; otherwise setting
-@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode},
-for the meanings of the various rounding modes.
-
-Here is an example of how to change the default rounding behavior of
-@code{printf}'s output:
-
-@example
-$ @kbd{gawk -M -vROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'}
-@print{} 1.37
-@end example
-
-@node Floating-point Constants
-@subsection Representing Floating-point Constants
-@cindex constants, floating-point
-
-Be wary of floating-point constants! When reading a floating-point constant
-from program source code, @command{gawk} uses the default precision,
-unless overridden
-by an assignment to the special variable @code{PREC} on the command
-line, to store it internally as a MPFR number.
-Changing the precision using @code{PREC} in the program text does
-not change the precision of a constant. If you need to
-represent a floating-point constant at a higher precision than the
-default and cannot use a command line assignment to @code{PREC},
-you should either specify the constant as a string, or
-as a rational number whenever possible. The following example
-illustrates the differences among various ways to
-print a floating-point constant:
-
-@example
-$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'}
-@print{} 0.1000000000000000055511151
-$ @kbd{gawk -M -vPREC = 113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'}
-@print{} 0.1000000000000000000000000
-$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'}
-@print{} 0.1000000000000000000000000
-$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'}
-@print{} 0.1000000000000000000000000
-@end example
-
-In the first case, the number is stored with the default precision of 53.
-
-@node Changing Precision
-@subsection Changing the Precision of a Number
-
-@cindex Laurie, Dirk
-@quotation
-@i{The point is that in any variable-precision package,
-a decision is made on how to treat numbers given as data,
-or arising in intermediate results, which are represented in
-floating-point format to a precision lower than working precision.
-Do we promote them to full membership of the high-precision club,
-or do we treat them and all their associates as second-class citizens?
-Sometimes the first course is proper, sometimes the second, and it takes
-careful analysis to tell which.}
-
-Dirk Laurie@footnote{Dirk Laurie.
-@cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}.
-Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.}
-@end quotation
-
-@command{gawk} does not implicitly modify the precision of any previously
-computed results when the working precision is changed with an assignment
-to @code{PREC}. The precision of a number is always the one that was
-used at the time of its creation, and there is no way for the user
-to explicitly change it afterwards. However, since the result of a
-floating-point arithmetic operation is always an arbitrary precision
-floating-point value---with a precision set by the value of @code{PREC}---one of the
-following workarounds effectively accomplishes the desired behavior:
-
-@example
-x = x + 0.0
-@end example
-
-@noindent
-or:
-
-@example
-x += 0.0
-@end example
-
-@node Exact Arithmetic
-@subsection Exact Arithmetic with Floating-point Numbers
-
-@quotation CAUTION
-Never depend on the exactness of floating-point arithmetic,
-even for apparently simple expressions!
-@end quotation
-
-Can arbitrary precision arithmetic give exact results? There are
-no easy answers. The standard rules of algebra often do not apply
-when using floating-point arithmetic.
-Among other things, the distributive and associative laws
-do not hold completely, and order of operation may be important
-for your computation. Rounding error, cumulative precision loss
-and underflow are often troublesome.
-
-When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3}
-for equality
-using the machine double precision arithmetic, it decides that they
-are not equal!
-(@xref{Floating-point Programming}.)
-You can get the result you want by increasing the precision;
-56 in this case will get the job done:
-
-@example
-$ @kbd{gawk -M -vPREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
-@print{} 1
-@end example
-
-If adding more bits is good, perhaps adding even more bits of
-precision is better?
-Here is what happens if we use an even larger value of @code{PREC}:
-
-@example
-$ @kbd{gawk -M -vPREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
-@print{} 0
-@end example
-
-This is not a bug in @command{gawk} or in the MPFR library.
-It is easy to forget that the finite number of bits used to store the value
-is often just an approximation after proper rounding.
-The test for equality succeeds if and only if @emph{all} bits in the two operands
-are exactly the same. Since this is not necessarily true after floating-point
-computations with a particular precision and effective rounding rule,
-a straight test for equality may not work.
-
-So, don't assume that floating-point values can be compared for equality.
-You should also exercise caution when using other forms of comparisons.
-The standard way to compare between floating-point numbers is to determine
-how much error (or @dfn{tolerance}) you will allow in a comparison and
-check to see if one value is within this error range of the other.
-
-In applications where 15 or fewer decimal places suffice,
-hardware double precision arithmetic can be adequate, and is usually much faster.
-But you do need to keep in mind that every floating-point operation
-can suffer a new rounding error with catastrophic consequences as illustrated
-by our attempt to compute the value of the constant @value{PI}
-(@pxref{Floating-point Programming}).
-Extra precision can greatly enhance the stability and the accuracy
-of your computation in such cases.
-
-Repeated addition is not necessarily equivalent to multiplication
-in floating-point arithmetic. In the example in
-@ref{Floating-point Programming}:
-
-@example
-$ @kbd{gawk 'BEGIN @{}
-> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)}
-> @kbd{i++}
-> @kbd{print i}
-> @kbd{@}'}
-@print{} 4
-@end example
-
-@noindent
-you may or may not succeed in getting the correct result by choosing
-an arbitrarily large value for @code{PREC}. Reformulation of
-the problem at hand is often the correct approach in such situations.
-
-@node Arbitrary Precision Integers
-@section Arbitrary Precision Integer Arithmetic with @command{gawk}
-@cindex integer, arbitrary precision
-
-If the option @option{--bignum} or @option{-M} is specified,
-@command{gawk} performs all
-integer arithmetic using GMP arbitrary precision integers.
-Any number that looks like an integer in a program source or data file
-is stored as an arbitrary precision integer.
-The size of the integer is limited only by your computer's memory.
-The current floating-point context has no effect on operations involving integers.
-For example, the following computes
-@iftex
-@math{5^{4^{3^{2}}}},
-@end iftex
-@ifnottex
-5^4^3^2,
-@end ifnottex
-the result of which is beyond the
-limits of ordinary @command{gawk} numbers:
-
-@example
-$ @kbd{gawk -M 'BEGIN @{}
-> @kbd{x = 5^4^3^2}
-> @kbd{print "# of digits =", length(x)}
-> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)}
-> @kbd{@}'}
-@print{} # of digits = 183231
-@print{} 62060698786608744707 ... 92256259918212890625
-@end example
-
-If you were to compute the same value using arbitrary precision
-floating-point values instead, the precision needed for correct output
-(using the formula
-@iftex
-@math{prec = 3.322 @cdot dps}),
-would be @math{3.322 @cdot 183231},
-@end iftex
-@ifnottex
-@samp{prec = 3.322 * dps}),
-would be 3.322 x 183231,
-@end ifnottex
-or 608693.
-(Thus, the floating-point representation requires over 30 times as
-many decimal digits!)
-
-The result from an arithmetic operation with an integer and a floating-point value
-is a floating-point value with a precision equal to the working precision.
-The following program calculates the eighth term in
-Sylvester's sequence@footnote{Weisstein, Eric W.
-@cite{Sylvester's Sequence}. From MathWorld---A Wolfram Web Resource.
-@url{http://mathworld.wolfram.com/SylvestersSequence.html}}
-using a recurrence:
-
-@example
-$ @kbd{gawk -M 'BEGIN @{}
-> @kbd{s = 2.0}
-> @kbd{for (i = 1; i <= 7; i++)}
-> @kbd{s = s * (s - 1) + 1}
-> @kbd{print s}
-> @kbd{@}'}
-@print{} 113423713055421845118910464
-@end example
-
-The output differs from the acutal number, 113423713055421844361000443,
-because the default precision of 53 is not enough to represent the
-floating-point results exactly. You can either increase the precision
-(100 is enough in this case), or replace the floating-point constant
-@samp{2.0} with an integer, to perform all computations using integer
-arithmetic to get the correct output.
-
-It will sometimes be necessary for @command{gawk} to implicitly convert an
-arbitrary precision integer into an arbitrary precision floating-point value.
-This is primarily because the MPFR library does not always provide the
-relevant interface to process arbitrary precision integers or mixed-mode
-numbers as needed by an operation or function.
-In such a case, the precision is set to the minimum value necessary
-for exact conversion, and the working precision is not used for this purpose.
-If this is not what you need or want, you can employ a subterfuge
-like this:
-
-@example
-gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}'
-@end example
-
-You can avoid this issue altogether by specifying the number as a floating-point value
-to begin with:
-
-@example
-gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}'
-@end example
-
-Note that for the particular example above, there is likely best
-to just use the following:
-
-@example
-gawk -M 'BEGIN @{ n = 13; print n % 2 @}'
-@end example
-
@node Advanced Features
@chapter Advanced Features of @command{gawk}
@cindex advanced features, network connections, See Also networks, connections
@@ -27939,6 +26723,1229 @@ The @command{gawk} debugger only accepts source supplied with the @option{-f} op
Look forward to a future release when these and other missing features may
be added, and of course feel free to try to add them yourself!
+@node Arbitrary Precision Arithmetic
+@chapter Arithmetic and Arbitrary Precision Arithmetic with @command{gawk}
+@cindex arbitrary precision
+@cindex multiple precision
+@cindex infinite precision
+@cindex floating-point numbers, arbitrary precision
+@cindex MPFR
+@cindex GMP
+
+@cindex Knuth, Donald
+@quotation
+@i{There's a credibility gap: We don't know how much of the computer's answers
+to believe. Novice computer users solve this problem by implicitly trusting
+in the computer as an infallible authority; they tend to believe that all
+digits of a printed answer are significant. Disillusioned computer users have
+just the opposite approach; they are constantly afraid that their answers
+are almost meaningless.}@*
+Donald Knuth@footnote{Donald E.@: Knuth.
+@cite{The Art of Computer Programming}. Volume 2,
+@cite{Seminumerical Algorithms}, third edition,
+1998, ISBN 0-201-89683-4, p.@: 229.}
+@end quotation
+
+This @value{CHAPTER} discusses issues that you may encounter
+when performing arithmetic. It begins by discussing some of
+the general atributes of computer arithmetic, along with how
+this can influence what you see when running @command{awk} programs.
+This discussion applies to all versions of @command{awk}.
+
+Then the discussion moves on to @dfn{arbitrary precsion
+arithmetic}, a feature which is specific to @command{gawk}.
+
+@menu
+* General Arithmetic:: An introduction to computer arithmetic.
+* Floating-point Programming:: Effective Floating-point Programming.
+* Gawk and MPFR:: How @command{gawk} provides
+ aribitrary-precision arithmetic.
+* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic
+ with @command{gawk}.
+* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with
+ @command{gawk}.
+@end menu
+
+@node General Arithmetic
+@section A General Description of Computer Arithmetic
+
+@cindex integers
+@cindex floating-point, numbers
+@cindex numbers, floating-point
+Within computers, there are two kinds of numeric values: @dfn{integers}
+and @dfn{floating-point}.
+In school, integer values were referred to as ``whole'' numbers---that is,
+numbers without any fractional part, such as 1, 42, or @minus{}17.
+The advantage to integer numbers is that they represent values exactly.
+The disadvantage is that their range is limited. On most systems,
+this range is @minus{}2,147,483,648 to 2,147,483,647.
+However, many systems now support a range from
+@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
+
+@cindex unsigned integers
+@cindex integers, unsigned
+Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}.
+Signed values may be negative or positive, with the range of values just
+described.
+Unsigned values are always positive. On most systems,
+the range is from 0 to 4,294,967,295.
+However, many systems now support a range from
+0 to 18,446,744,073,709,551,615.
+
+@cindex double precision floating-point
+@cindex single precision floating-point
+Floating-point numbers represent what are called ``real'' numbers; i.e.,
+those that do have a fractional part, such as 3.1415927.
+The advantage to floating-point numbers is that they
+can represent a much larger range of values.
+The disadvantage is that there are numbers that they cannot represent
+exactly.
+@command{awk} uses @dfn{double precision} floating-point numbers, which
+can hold more digits than @dfn{single precision}
+floating-point numbers.
+@c Floating-point issues are discussed more fully in
+@c @ref{Floating Point Issues}.
+
+There a several important issues to be aware of, described next.
+
+@menu
+* Floating Point Issues:: Stuff to know about floating-point numbers.
+* Integer Programming:: Effective integer programming.
+@end menu
+
+@node Floating Point Issues
+@subsection Floating-Point Number Caveats
+
+As mentioned earlier, floating-point numbers represent what are called
+``real'' numbers, i.e., those that have a fractional part. @command{awk}
+uses double precision floating-point numbers to represent all
+numeric values. This @value{SECTION} describes some of the issues
+involved in using floating-point numbers.
+
+There is a very nice
+@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic}
+by David Goldberg,
+``What Every Computer Scientist Should Know About Floating-point Arithmetic,''
+@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48.
+This is worth reading if you are interested in the details,
+but it does require a background in computer science.
+
+@menu
+* String Conversion Precision:: The String Value Can Lie.
+* Unexpected Results:: Floating Point Numbers Are Not Abstract
+ Numbers.
+* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+@end menu
+
+@node String Conversion Precision
+@subsubsection The String Value Can Lie
+
+Internally, @command{awk} keeps both the numeric value
+(double precision floating-point) and the string value for a variable.
+Separately, @command{awk} keeps
+track of what type the variable has
+(@pxref{Typing and Comparison}),
+which plays a role in how variables are used in comparisons.
+
+It is important to note that the string value for a number may not
+reflect the full value (all the digits) that the numeric value
+actually contains.
+The following program (@file{values.awk}) illustrates this:
+
+@example
+@{
+ sum = $1 + $2
+ # see it for what it is
+ printf("sum = %.12g\n", sum)
+ # use CONVFMT
+ a = "<" sum ">"
+ print "a =", a
+ # use OFMT
+ print "sum =", sum
+@}
+@end example
+
+@noindent
+This program shows the full value of the sum of @code{$1} and @code{$2}
+using @code{printf}, and then prints the string values obtained
+from both automatic conversion (via @code{CONVFMT}) and
+from printing (via @code{OFMT}).
+
+Here is what happens when the program is run:
+
+@example
+$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk}
+@print{} sum = 4.8888888
+@print{} a = <4.88889>
+@print{} sum = 4.88889
+@end example
+
+This makes it clear that the full numeric value is different from
+what the default string representations show.
+
+@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with
+at least six significant digits. For some applications, you might want to
+change it to specify more precision.
+On most modern machines, most of the time,
+17 digits is enough to capture a floating-point number's
+value exactly.@footnote{Pathological cases can require up to
+752 digits (!), but we doubt that you need to worry about this.}
+
+@node Unexpected Results
+@subsubsection Floating Point Numbers Are Not Abstract Numbers
+
+@cindex floating-point, numbers
+Unlike numbers in the abstract sense (such as what you studied in high school
+or college arithmetic), numbers stored in computers are limited in certain ways.
+They cannot represent an infinite number of digits, nor can they always
+represent things exactly.
+In particular,
+floating-point numbers cannot
+always represent values exactly. Here is an example:
+
+@example
+$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'}
+515.79
+@print{} 0000051579
+515.80
+@print{} 0000051579
+515.81
+@print{} 0000051580
+515.82
+@print{} 0000051582
+@kbd{@value{CTL}-d}
+@end example
+
+@noindent
+This shows that some values can be represented exactly,
+whereas others are only approximated. This is not a ``bug''
+in @command{awk}, but simply an artifact of how computers
+represent numbers.
+
+@quotation NOTE
+It cannot be emphasized enough that the behavior just
+described is fundamental to modern computers. You will
+see this kind of thing happen in @emph{any} programming
+language using hardware floating-point numbers. It is @emph{not}
+a bug in @command{gawk}, nor is it something that can be ``just
+fixed.''
+@end quotation
+
+@cindex negative zero
+@cindex positive zero
+@cindex zero@comma{} negative vs.@: positive
+Another peculiarity of floating-point numbers on modern systems
+is that they often have more than one representation for the number zero!
+In particular, it is possible to represent ``minus zero'' as well as
+regular, or ``positive'' zero.
+
+This example shows that negative and positive zero are distinct values
+when stored internally, but that they are in fact equal to each other,
+as well as to ``regular'' zero:
+
+@example
+$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0}
+> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz}
+> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0}
+> @kbd{@}'}
+@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1
+@print{} mz == 0 -> 1, pz == 0 -> 1
+@end example
+
+It helps to keep this in mind should you process numeric data
+that contains negative zero values; the fact that the zero is negative
+is noted and can affect comparisons.
+
+@node POSIX Floating Point Problems
+@subsubsection Standards Versus Existing Practice
+
+Historically, @command{awk} has converted any non-numeric looking string
+to the numeric value zero, when required. Furthermore, the original
+definition of the language and the original POSIX standards specified that
+@command{awk} only understands decimal numbers (base 10), and not octal
+(base 8) or hexadecimal numbers (base 16).
+
+Changes in the language of the
+2001 and 2004 POSIX standards can be interpreted to imply that @command{awk}
+should support additional features. These features are:
+
+@itemize @bullet
+@item
+Interpretation of floating point data values specified in hexadecimal
+notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not}
+source code constants.)
+
+@item
+Support for the special IEEE 754 floating point values ``Not A Number''
+(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf'').
+In particular, the format for these values is as specified by the ISO 1999
+C standard, which ignores case and can allow machine-dependent additional
+characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}.
+@end itemize
+
+The first problem is that both of these are clear changes to historical
+practice:
+
+@itemize @bullet
+@item
+The @command{gawk} maintainer feels that supporting hexadecimal floating
+point values, in particular, is ugly, and was never intended by the
+original designers to be part of the language.
+
+@item
+Allowing completely alphabetic strings to have valid numeric
+values is also a very severe departure from historical practice.
+@end itemize
+
+The second problem is that the @code{gawk} maintainer feels that this
+interpretation of the standard, which requires a certain amount of
+``language lawyering'' to arrive at in the first place, was not even
+intended by the standard developers. In other words, ``we see how you
+got where you are, but we don't think that that's where you want to be.''
+
+Recognizing the above issues, but attempting to provide compatibility
+with the earlier versions of the standard,
+the 2008 POSIX standard added explicit wording to allow, but not require,
+that @command{awk} support hexadecimal floating point values and
+special values for ``Not A Number'' and infinity.
+
+Although the @command{gawk} maintainer continues to feel that
+providing those features is inadvisable,
+nevertheless, on systems that support IEEE floating point, it seems
+reasonable to provide @emph{some} way to support NaN and Infinity values.
+The solution implemented in @command{gawk} is as follows:
+
+@itemize @bullet
+@item
+With the @option{--posix} command-line option, @command{gawk} becomes
+``hands off.'' String values are passed directly to the system library's
+@code{strtod()} function, and if it successfully returns a numeric value,
+that is what's used.@footnote{You asked for it, you got it.}
+By definition, the results are not portable across
+different systems. They are also a little surprising:
+
+@example
+$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'}
+@print{} nan
+$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'}
+@print{} 3735928559
+@end example
+
+@item
+Without @option{--posix}, @command{gawk} interprets the four strings
+@samp{+inf},
+@samp{-inf},
+@samp{+nan},
+and
+@samp{-nan}
+specially, producing the corresponding special numeric values.
+The leading sign acts a signal to @command{gawk} (and the user)
+that the value is really numeric. Hexadecimal floating point is
+not supported (unless you also use @option{--non-decimal-data},
+which is @emph{not} recommended). For example:
+
+@example
+$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'}
+@print{} 0
+$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'}
+@print{} nan
+$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'}
+@print{} 0
+@end example
+
+@command{gawk} does ignore case in the four special values.
+Thus @samp{+nan} and @samp{+NaN} are the same.
+@end itemize
+
+@node Integer Programming
+@subsection Mixing Integers And Floating-point
+
+As has been mentioned already, @command{gawk} ordinarily uses hardware double
+precision with 64-bit IEEE binary floating-point representation
+for numbers on most systems. A large integer like 9007199254740997
+has a binary representation that, although finite, is more than 53 bits long;
+it must also be rounded to 53 bits.
+The biggest integer that can be stored in a C @code{double} is usually the same
+as the largest possible value of a @code{double}. If your system @code{double}
+is an IEEE 64-bit @code{double}, this largest possible value is an integer and
+can be represented precisely. What more should one know about integers?
+
+If you want to know what is the largest integer, such that it and
+all smaller integers can be stored in 64-bit doubles without losing precision,
+then the answer is
+@iftex
+@math{2^{53}}.
+@end iftex
+@ifnottex
+2^53.
+@end ifnottex
+The next representable number is the even number
+@iftex
+@math{2^{53} + 2},
+@end iftex
+@ifnottex
+2^53 + 2,
+@end ifnottex
+meaning it is unlikely that you will be able to make
+@command{gawk} print
+@iftex
+@math{2^{53} + 1}
+@end iftex
+@ifnottex
+2^53 + 1
+@end ifnottex
+in integer format.
+The range of integers exactly representable by a 64-bit double
+is
+@iftex
+@math{[-2^{53}, 2^{53}]}.
+@end iftex
+@ifnottex
+[@minus{}2^53, 2^53].
+@end ifnottex
+If you ever see an integer outside this range in @command{gawk}
+using 64-bit doubles, you have reason to be very suspicious about
+the accuracy of the output. Here is a simple program with erroneous output:
+
+@example
+$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'}
+@print{} 9007199254740991
+@print{} 9007199254740992
+@print{} 9007199254740992
+@print{} 9007199254740994
+@end example
+
+The lesson is to not assume that any large integer printed by @command{gawk}
+represents an exact result from your computation, especially if it wraps
+around on your screen.
+
+@node Floating-point Programming
+@section Understanding Floating-point Programming
+
+Numerical programming is an extensive area; if you need to develop
+sophisticated numerical algorithms then @command{gawk} may not be
+the ideal tool, and this documentation may not be sufficient.
+@c FIXME: JOHN: Do you want to cite some actual books?
+It might require digesting a book or two to really internalize how to compute
+with ideal accuracy and precision
+and the result often depends on the particular application.
+
+@quotation NOTE
+A floating-point calculation's @dfn{accuracy} is how close it comes
+to the real value. This is as opposed to the @dfn{precision}, which
+usually refers to the number of bits used to represent the number
+(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision,
+the Wikipedia article} for more information).
+@end quotation
+
+There are two options for doing floating-point calculations:
+hardware floating-point (as used by standard @command{awk} and
+the default for @command{gawk}), and @dfn{arbitrary-precision}
+floating-point, which is software based. This @value{CHAPTER}
+aims to provide enough information to understand both, and then
+will focus on @command{gawk}'s facilities for the latter.@footnote{If you
+are interested in other tools that perform arbitrary precision arithmetic,
+you may want to investigate the POSIX @command{bc} tool. See
+@uref{http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html,
+the POSIX specification for it}, for more information.}
+
+Binary floating-point representations and arithmetic are inexact.
+Simple values like 0.1 cannot be precisely represented using
+binary floating-point numbers, and the limited precision of
+floating-point numbers means that slight changes in
+the order of operations or the precision of intermediate storage
+can change the result. To make matters worse, with arbitrary precision
+floating-point, you can set the precision before starting a computation,
+but then you cannot be sure of the number of significant decimal places
+in the final result.
+
+Sometimes, before you start to write any code, you should think more
+about what you really want and what's really happening. Consider the
+two numbers in the following example:
+
+@example
+x = 0.875 # 1/2 + 1/4 + 1/8
+y = 0.425
+@end example
+
+Unlike the number in @code{y}, the number stored in @code{x}
+is exactly representable
+in binary since it can be written as a finite sum of one or
+more fractions whose denominators are all powers of two.
+When @command{gawk} reads a floating-point number from
+program source, it automatically rounds that number to whatever
+precision your machine supports. If you try to print the numeric
+content of a variable using an output format string of @code{"%.17g"},
+it may not produce the same number as you assigned to it:
+
+@example
+$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425}
+> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'}
+@print{} 0.875, 0.42499999999999999
+@end example
+
+Often the error is so small you do not even notice it, and if you do,
+you can always specify how much precision you would like in your output.
+Usually this is a format string like @code{"%.15g"}, which when
+used in the previous example, produces an output identical to the input.
+
+Because the underlying representation can be little bit off from the exact value,
+comparing floating-point values to see if they are equal is generally not a good idea.
+Here is an example where it does not work like you expect:
+
+@example
+$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
+@print{} 0
+@end example
+
+The loss of accuracy during a single computation with floating-point numbers
+usually isn't enough to worry about. However, if you compute a value
+which is the result of a sequence of floating point operations,
+the error can accumulate and greatly affect the computation itself.
+Here is an attempt to compute the value of the constant
+@value{PI} using one of its many series representations:
+
+@example
+BEGIN @{
+ x = 1.0 / sqrt(3.0)
+ n = 6
+ for (i = 1; i < 30; i++) @{
+ n = n * 2.0
+ x = (sqrt(x * x + 1) - 1) / x
+ printf("%.15f\n", n * x)
+ @}
+@}
+@end example
+
+When run, the early errors propagating through later computations
+cause the loop to terminate prematurely after an attempt to divide by zero.
+
+@example
+$ @kbd{gawk -f pi.awk}
+@print{} 3.215390309173475
+@print{} 3.159659942097510
+@print{} 3.146086215131467
+@print{} 3.142714599645573
+@dots{}
+@print{} 3.224515243534819
+@print{} 2.791117213058638
+@print{} 0.000000000000000
+@error{} gawk: pi.awk:6: fatal: division by zero attempted
+@end example
+
+Here is one more example where the inaccuracies in internal representations
+yield an unexpected result:
+
+@example
+$ @kbd{gawk 'BEGIN @{}
+> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)}
+> @kbd{i++}
+> @kbd{print i}
+> @kbd{@}'}
+@print{} 4
+@end example
+
+Can computation using aribitrary precision help with the previous examples?
+If you are impatient to know, see
+@ref{Exact Arithmetic}.
+
+Instead of aribitrary precision floating-point arithmetic,
+often all you need is an adjustment of your logic
+or a different order for the operations in your calculation.
+The stability and the accuracy of the computation of the constant @value{PI}
+in the previous example can be enhanced by using the following
+simple algebraic transformation:
+
+@example
+(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1)
+@end example
+
+@noindent
+After making this, change the program does converge to
+@value{PI} in under 30 iterations:
+
+@example
+$ @kbd{gawk -f /tmp/pi2.awk}
+@print{} 3.215390309173473
+@print{} 3.159659942097501
+@print{} 3.146086215131436
+@print{} 3.142714599645370
+@print{} 3.141873049979825
+@dots{}
+@print{} 3.141592653589797
+@print{} 3.141592653589797
+@end example
+
+There is no need to be unduly suspicious about the results from
+floating-point arithmetic. The lesson to remember is that
+floating-point arithmetic is always more complex than the arithmetic using
+pencil and paper. In order to take advantage of the power
+of computer floating-point, you need to know its limitations
+and work within them. For most casual use of floating-point arithmetic,
+you will often get the expected result in the end if you simply round
+the display of your final results to the correct number of significant
+decimal digits. And, avoid presenting numerical data in a manner that
+implies better precision than is actually the case.
+
+@menu
+* Floating-point Representation:: Binary floating-point representation.
+* Floating-point Context:: Floating-point context.
+* Rounding Mode:: Floating-point rounding mode.
+@end menu
+
+@node Floating-point Representation
+@subsection Binary Floating-point Representation
+@cindex IEEE-754 format
+
+Although floating-point representations vary from machine to machine,
+the most commonly encountered representation is that defined by the
+IEEE 754 Standard. An IEEE-754 format value has three components:
+
+@itemize @bullet
+@item
+A sign bit telling whether the number is positive or negative.
+
+@item
+An @dfn{exponent} giving its order of magnitude, @var{e}.
+
+@item
+A @dfn{significand}, @var{s},
+specifying the actual digits of the number.
+@end itemize
+
+The value of the
+number is then
+@iftex
+@math{s @cdot 2^e}.
+@end iftex
+@ifnottex
+@var{s * 2^e}.
+@end ifnottex
+The first bit of a non-zero binary significand
+is always one, so the significand in an IEEE-754 format only includes the
+fractional part, leaving the leading one implicit.
+
+Three of the standard IEEE-754 types are 32-bit single precision,
+64-bit double precision and 128-bit quadruple precision.
+The standard also specifies extended precision formats
+to allow greater precisions and larger exponent ranges.
+
+The significand is stored in @dfn{normalized} format,
+which means that the first bit is always a one.
+
+@node Floating-point Context
+@subsection Floating-point Context
+@cindex context, floating-point
+
+A floating-point @dfn{context} defines the environment for arithmetic operations.
+It governs precision, sets rules for rounding, and limits the range for exponents.
+The context has the following primary components:
+
+@table @dfn
+@item Precision
+Precision of the floating-point format in bits.
+@item emax
+Maximum exponent allowed for this format.
+@item emin
+Minimum exponent allowed for this format.
+@item Underflow behavior
+The format may or may not support gradual underflow.
+@item Rounding
+The rounding mode of this context.
+@end table
+
+@ref{table-ieee-formats} lists the precision and exponent
+field values for the basic IEEE-754 binary formats:
+
+@float Table,table-ieee-formats
+@caption{Basic IEEE Format Context Values}
+@multitable @columnfractions .20 .20 .20 .20 .20
+@headitem Name @tab Total bits @tab Precision @tab emin @tab emax
+@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127
+@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023
+@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383
+@end multitable
+@end float
+
+@quotation NOTE
+The precision numbers include the implied leading one that gives them
+one extra bit of significand.
+@end quotation
+
+A floating-point context can also determine which signals are treated
+as exceptions, and can set rules for arithmetic with special values.
+Please consult the IEEE-754 standard or other resources for details.
+
+@command{gawk} ordinarily uses the hardware double precision
+representation for numbers. On most systems, this is IEEE-754
+floating-point format, corresponding to 64-bit binary with 53 bits
+of precision.
+
+@quotation NOTE
+In case an underflow occurs, the standard allows, but does not require,
+the result from an arithmetic operation to be a number smaller than
+the smallest nonzero normalized number. Such numbers do
+not have as many significant digits as normal numbers, and are called
+@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero,
+is called @dfn{flush to zero}. The basic IEEE-754 binary formats
+support subnormal numbers.
+@end quotation
+
+@node Rounding Mode
+@subsection Floating-point Rounding Mode
+@cindex rounding mode, floating-point
+
+The @dfn{rounding mode} specifies the behavior for the results of numerical
+operations when discarding extra precision. Each rounding mode indicates
+how the least significant returned digit of a rounded result is to
+be calculated.
+@ref{table-rounding-modes} lists the IEEE-754 defined
+rounding modes:
+
+@float Table,table-rounding-modes
+@caption{IEEE 754 Rounding Modes}
+@multitable @columnfractions .45 .55
+@headitem Rounding Mode @tab IEEE Name
+@item Round to nearest, ties to even @tab @code{roundTiesToEven}
+@item Round toward plus Infinity @tab @code{roundTowardPositive}
+@item Round toward negative Infinity @tab @code{roundTowardNegative}
+@item Round toward zero @tab @code{roundTowardZero}
+@item Round to nearest, ties away from zero @tab @code{roundTiesToAway}
+@end multitable
+@end float
+
+The default mode @code{roundTiesToEven} is the most preferred,
+but the least intuitive. This method does the obvious thing for most values,
+by rounding them up or down to the nearest digit.
+For example, rounding 1.132 to two digits yields 1.13,
+and rounding 1.157 yields 1.16.
+
+However, when it comes to rounding a value that is exactly halfway between,
+things do not work the way you probably learned in school.
+In this case, the number is rounded to the nearest even digit.
+So rounding 0.125 to two digits rounds down to 0.12,
+but rounding 0.6875 to three digits rounds up to 0.688.
+You probably have already encountered this rounding mode when
+using the @code{printf} routine to format floating-point numbers.
+For example:
+
+@example
+BEGIN @{
+ x = -4.5
+ for (i = 1; i < 10; i++) @{
+ x += 1.0
+ printf("%4.1f => %2.0f\n", x, x)
+ @}
+@}
+@end example
+
+@noindent
+produces the following output when run:@footnote{It
+is possible for the output to be completely different if the
+C library in your system does not use the IEEE-754 even-rounding
+rule to round halfway cases for @code{printf()}.}
+
+@example
+-3.5 => -4
+-2.5 => -2
+-1.5 => -2
+-0.5 => 0
+ 0.5 => 0
+ 1.5 => 2
+ 2.5 => 2
+ 3.5 => 4
+ 4.5 => 4
+@end example
+
+The theory behind the rounding mode @code{roundTiesToEven} is that
+it more or less evenly distributes upward and downward rounds
+of exact halves, which might cause the round-off error
+to cancel itself out. This is the default rounding mode used
+in IEEE-754 computing functions and operators.
+
+The other rounding modes are rarely used.
+Round toward positive infinity (@code{roundTowardPositive})
+and round toward negative infinity (@code{roundTowardNegative})
+are often used to implement interval arithmetic,
+where you adjust the rounding mode to calculate upper and lower bounds
+for the range of output. The @code{roundTowardZero}
+mode can be used for converting floating-point numbers to integers.
+The rounding mode @code{roundTiesToAway} rounds the result to the
+nearest number and selects the number with the larger magnitude
+if a tie occurs.
+
+Some numerical analysts will tell you that your choice of rounding style
+has tremendous impact on the final outcome, and advise you to wait until
+final output for any rounding. Instead, you can often avoid round-off error problems by
+setting the precision initially to some value sufficiently larger than
+the final desired precision, so that the accumulation of round-off error
+does not influence the outcome.
+If you suspect that results from your computation are
+sensitive to accumulation of round-off error,
+one way to be sure is to look for a significant difference in output
+when you change the rounding mode.
+
+@node Gawk and MPFR
+@section @command{gawk} + MPFR = Powerful Arithmetic
+
+The rest of this @value{CHAPTER} decsribes how to use the arbitrary precision
+(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric
+capabilites in @command{gawk} to produce maximally accurate results
+when you need it.
+
+But first you should check if your version of
+@command{gawk} supports arbitrary precision arithmetic.
+The easiest way to find out is to look at the output of
+the following command:
+
+@example
+$ @kbd{gawk --version}
+@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3)
+@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation.
+@dots{}
+@end example
+
+@command{gawk} uses the
+@uref{http://www.mpfr.org, GNU MPFR}
+and
+@uref{http://gmplib.org, GNU MP} (GMP)
+libraries for arbitrary precision
+arithmetic on numbers. So if you do not see the names of these libraries
+in the output, then your version of @command{gawk} does not support
+arbitrary precision arithmetic.
+
+Additionally,
+there are a few elements available in the @code{PROCINFO} array
+to provide information about the MPFR and GMP libraries.
+@xref{Auto-set}, for more information.
+
+@ignore
+Even if you aren't interested in arbitrary precision arithmetic, you
+may still benefit from knowing about how @command{gawk} handles numbers
+in general, and the limitations of doing arithmetic with ordinary
+@command{gawk} numbers.
+@end ignore
+
+
+@node Arbitrary Precision Floats
+@section Arbitrary Precision Floating-point Arithmetic with @command{gawk}
+
+@command{gawk} uses the GNU MPFR library
+for arbitrary precision floating-point arithmetic. The MPFR library
+provides precise control over precisions and rounding modes, and gives
+correctly rounded reproducible platform-independent results. With the
+command-line option @option{--bignum} or @option{-M},
+all floating-point arithmetic operators and numeric functions can yield
+results to any desired precision level supported by MPFR.
+Two built-in
+variables @code{PREC}
+(@pxref{Setting Precision})
+and @code{ROUNDMODE}
+(@pxref{Setting Rounding Mode})
+provide control over the working precision and the rounding mode.
+The precision and the rounding mode are set globally for every operation
+to follow.
+
+The default working precision for arbitrary precision floating-point values is 53,
+and the default value for @code{ROUNDMODE} is @code{"N"},
+which selects the IEEE-754
+@code{roundTiesToEven} (@pxref{Rounding Mode}) rounding mode.@footnote{The
+default precision is 53, since according to the MPFR documentation,
+the library should be able to exactly reproduce all computations with
+double-precision machine floating-point numbers (@code{double} type
+in C), except the default exponent range is much wider and subnormal
+numbers are not implemented.}
+@command{gawk} uses the default exponent range in MPFR
+@iftex
+(@math{emax = 2^{30} - 1, emin = -emax})
+@end iftex
+@ifnottex
+(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax})
+@end ifnottex
+for all floating-point contexts.
+There is no explicit mechanism to adjust the exponent range.
+MPFR does not implement subnormal numbers by default,
+and this behavior cannot be changed in @command{gawk}.
+
+@quotation NOTE
+When emulating an IEEE-754 format (@pxref{Setting Precision}),
+@command{gawk} internally adjusts the exponent range
+to the value defined for the format and also performs computations needed for
+gradual underflow (subnormal numbers).
+@end quotation
+
+@quotation NOTE
+MPFR numbers are variable-size entities, consuming only as much space as
+needed to store the significant digits. Since the performance using MPFR
+numbers pales in comparison to doing arithmetic using the underlying machine
+types, you should consider using only as much precision as needed by
+your program.
+@end quotation
+
+@menu
+* Setting Precision:: Setting the working precision.
+* Setting Rounding Mode:: Setting the rounding mode.
+* Floating-point Constants:: Representing floating-point constants.
+* Changing Precision:: Changing the precision of a number.
+* Exact Arithmetic:: Exact arithmetic with floating-point numbers.
+@end menu
+
+@node Setting Precision
+@subsection Setting the Working Precision
+@cindex @code{PREC} variable
+
+@command{gawk} uses a global working precision; it does not keep track of
+the precision or accuracy of individual numbers. Performing an arithmetic
+operation or calling a built-in function rounds the result to the current
+working precision. The default working precision is 53 which can be
+modified using the built-in variable @code{PREC}. You can also set the
+value to one of the following pre-defined case-insensitive strings
+to emulate an IEEE-754 binary format:
+
+@multitable {@code{"double"}} {12345678901234567890123456789012345}
+@headitem @code{PREC} @tab IEEE-754 Binary Format
+@item @code{"half"} @tab 16-bit half-precision.
+@item @code{"single"} @tab Basic 32-bit single precision.
+@item @code{"double"} @tab Basic 64-bit double precision.
+@item @code{"quad"} @tab Basic 128-bit quadruple precision.
+@item @code{"oct"} @tab 256-bit octuple precision.
+@end multitable
+
+The following example illustrates the effects of changing precision
+on arithmetic operations:
+
+@example
+$ @kbd{gawk -M -vPREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \}
+> @kbd{PREC = "double"; print x + 0 @}'}
+@print{} 1e-400
+@print{} 0
+@end example
+
+Binary and decimal precisions are related approximately according to the
+formula:
+
+@iftex
+@math{prec = 3.322 @cdot dps}
+@end iftex
+@ifnottex
+@var{prec} = 3.322 * @var{dps}
+@end ifnottex
+
+@noindent
+Here, @var{prec} denotes the binary precision
+(measured in bits) and @var{dps} (short for decimal places)
+is the decimal digits. We can easily calculate how many decimal
+digits the 53-bit significand of an IEEE double is equivalent to:
+53 / 3.332 which is equal to about 15.95.
+But what does 15.95 digits actually mean? It depends whether you are
+concerned about how many digits you can rely on, or how many digits
+you need.
+
+It is important to know how many bits it takes to uniquely identify
+a double-precision value (the C type @code{double}). If you want to
+convert from @code{double} to decimal and back to @code{double} (e.g.,
+saving a @code{double} representing an intermediate result to a file, and
+later reading it back to restart the computation), then a few more decimal
+digits are required. 17 digits is generally enough for a @code{double}.
+
+It can also be important to know what decimal numbers can be uniquely
+represented with a @code{double}. If you want to convert
+from decimal to @code{double} and back again, 15 digits is the most that
+you can get. Stated differently, you should not present
+the numbers from your floating-point computations with more than 15
+significant digits in them.
+
+Conversely, it takes a precision of 332 bits to hold an approximation
+of the constant @value{PI} that is accurate to 100 decimal places.
+You should always add some extra bits in order to avoid the confusing round-off
+issues that occur because numbers are stored internally in binary.
+
+@node Setting Rounding Mode
+@subsection Setting the Rounding Mode
+@cindex @code{ROUNDMODE} variable
+
+The @code{ROUNDMODE} variable provides
+program level control over the rounding mode.
+The correspondance between @code{ROUNDMODE} and the IEEE
+rounding modes is shown in @ref{table-gawk-rounding-modes}.
+
+@float Table,table-gawk-rounding-modes
+@caption{@command{gawk} Rounding Modes}
+@multitable @columnfractions .45 .30 .25
+@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE}
+@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"}
+@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"}
+@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"}
+@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"}
+@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"}
+@end multitable
+@end float
+
+@code{ROUNDMODE} has the default value @code{"N"},
+which selects the IEEE-754 rounding mode @code{roundTiesToEven}.
+Besides the values listed in @ref{table-gawk-rounding-modes},
+@command{gawk} also accepts @code{"A"} to select the IEEE-754 mode
+@code{roundTiesToAway}
+if your version of the MPFR library supports it; otherwise setting
+@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode},
+for the meanings of the various rounding modes.
+
+Here is an example of how to change the default rounding behavior of
+@code{printf}'s output:
+
+@example
+$ @kbd{gawk -M -vROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'}
+@print{} 1.37
+@end example
+
+@node Floating-point Constants
+@subsection Representing Floating-point Constants
+@cindex constants, floating-point
+
+Be wary of floating-point constants! When reading a floating-point constant
+from program source code, @command{gawk} uses the default precision,
+unless overridden
+by an assignment to the special variable @code{PREC} on the command
+line, to store it internally as a MPFR number.
+Changing the precision using @code{PREC} in the program text does
+not change the precision of a constant. If you need to
+represent a floating-point constant at a higher precision than the
+default and cannot use a command line assignment to @code{PREC},
+you should either specify the constant as a string, or
+as a rational number whenever possible. The following example
+illustrates the differences among various ways to
+print a floating-point constant:
+
+@example
+$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'}
+@print{} 0.1000000000000000055511151
+$ @kbd{gawk -M -vPREC = 113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'}
+@print{} 0.1000000000000000000000000
+$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'}
+@print{} 0.1000000000000000000000000
+$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'}
+@print{} 0.1000000000000000000000000
+@end example
+
+In the first case, the number is stored with the default precision of 53.
+
+@node Changing Precision
+@subsection Changing the Precision of a Number
+
+@cindex Laurie, Dirk
+@quotation
+@i{The point is that in any variable-precision package,
+a decision is made on how to treat numbers given as data,
+or arising in intermediate results, which are represented in
+floating-point format to a precision lower than working precision.
+Do we promote them to full membership of the high-precision club,
+or do we treat them and all their associates as second-class citizens?
+Sometimes the first course is proper, sometimes the second, and it takes
+careful analysis to tell which.}
+
+Dirk Laurie@footnote{Dirk Laurie.
+@cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}.
+Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.}
+@end quotation
+
+@command{gawk} does not implicitly modify the precision of any previously
+computed results when the working precision is changed with an assignment
+to @code{PREC}. The precision of a number is always the one that was
+used at the time of its creation, and there is no way for the user
+to explicitly change it afterwards. However, since the result of a
+floating-point arithmetic operation is always an arbitrary precision
+floating-point value---with a precision set by the value of @code{PREC}---one of the
+following workarounds effectively accomplishes the desired behavior:
+
+@example
+x = x + 0.0
+@end example
+
+@noindent
+or:
+
+@example
+x += 0.0
+@end example
+
+@node Exact Arithmetic
+@subsection Exact Arithmetic with Floating-point Numbers
+
+@quotation CAUTION
+Never depend on the exactness of floating-point arithmetic,
+even for apparently simple expressions!
+@end quotation
+
+Can arbitrary precision arithmetic give exact results? There are
+no easy answers. The standard rules of algebra often do not apply
+when using floating-point arithmetic.
+Among other things, the distributive and associative laws
+do not hold completely, and order of operation may be important
+for your computation. Rounding error, cumulative precision loss
+and underflow are often troublesome.
+
+When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3}
+for equality
+using the machine double precision arithmetic, it decides that they
+are not equal!
+(@xref{Floating-point Programming}.)
+You can get the result you want by increasing the precision;
+56 in this case will get the job done:
+
+@example
+$ @kbd{gawk -M -vPREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
+@print{} 1
+@end example
+
+If adding more bits is good, perhaps adding even more bits of
+precision is better?
+Here is what happens if we use an even larger value of @code{PREC}:
+
+@example
+$ @kbd{gawk -M -vPREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'}
+@print{} 0
+@end example
+
+This is not a bug in @command{gawk} or in the MPFR library.
+It is easy to forget that the finite number of bits used to store the value
+is often just an approximation after proper rounding.
+The test for equality succeeds if and only if @emph{all} bits in the two operands
+are exactly the same. Since this is not necessarily true after floating-point
+computations with a particular precision and effective rounding rule,
+a straight test for equality may not work.
+
+So, don't assume that floating-point values can be compared for equality.
+You should also exercise caution when using other forms of comparisons.
+The standard way to compare between floating-point numbers is to determine
+how much error (or @dfn{tolerance}) you will allow in a comparison and
+check to see if one value is within this error range of the other.
+
+In applications where 15 or fewer decimal places suffice,
+hardware double precision arithmetic can be adequate, and is usually much faster.
+But you do need to keep in mind that every floating-point operation
+can suffer a new rounding error with catastrophic consequences as illustrated
+by our attempt to compute the value of the constant @value{PI}
+(@pxref{Floating-point Programming}).
+Extra precision can greatly enhance the stability and the accuracy
+of your computation in such cases.
+
+Repeated addition is not necessarily equivalent to multiplication
+in floating-point arithmetic. In the example in
+@ref{Floating-point Programming}:
+
+@example
+$ @kbd{gawk 'BEGIN @{}
+> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)}
+> @kbd{i++}
+> @kbd{print i}
+> @kbd{@}'}
+@print{} 4
+@end example
+
+@noindent
+you may or may not succeed in getting the correct result by choosing
+an arbitrarily large value for @code{PREC}. Reformulation of
+the problem at hand is often the correct approach in such situations.
+
+@node Arbitrary Precision Integers
+@section Arbitrary Precision Integer Arithmetic with @command{gawk}
+@cindex integer, arbitrary precision
+
+If the option @option{--bignum} or @option{-M} is specified,
+@command{gawk} performs all
+integer arithmetic using GMP arbitrary precision integers.
+Any number that looks like an integer in a program source or data file
+is stored as an arbitrary precision integer.
+The size of the integer is limited only by your computer's memory.
+The current floating-point context has no effect on operations involving integers.
+For example, the following computes
+@iftex
+@math{5^{4^{3^{2}}}},
+@end iftex
+@ifnottex
+5^4^3^2,
+@end ifnottex
+the result of which is beyond the
+limits of ordinary @command{gawk} numbers:
+
+@example
+$ @kbd{gawk -M 'BEGIN @{}
+> @kbd{x = 5^4^3^2}
+> @kbd{print "# of digits =", length(x)}
+> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)}
+> @kbd{@}'}
+@print{} # of digits = 183231
+@print{} 62060698786608744707 ... 92256259918212890625
+@end example
+
+If you were to compute the same value using arbitrary precision
+floating-point values instead, the precision needed for correct output
+(using the formula
+@iftex
+@math{prec = 3.322 @cdot dps}),
+would be @math{3.322 @cdot 183231},
+@end iftex
+@ifnottex
+@samp{prec = 3.322 * dps}),
+would be 3.322 x 183231,
+@end ifnottex
+or 608693.
+(Thus, the floating-point representation requires over 30 times as
+many decimal digits!)
+
+The result from an arithmetic operation with an integer and a floating-point value
+is a floating-point value with a precision equal to the working precision.
+The following program calculates the eighth term in
+Sylvester's sequence@footnote{Weisstein, Eric W.
+@cite{Sylvester's Sequence}. From MathWorld---A Wolfram Web Resource.
+@url{http://mathworld.wolfram.com/SylvestersSequence.html}}
+using a recurrence:
+
+@example
+$ @kbd{gawk -M 'BEGIN @{}
+> @kbd{s = 2.0}
+> @kbd{for (i = 1; i <= 7; i++)}
+> @kbd{s = s * (s - 1) + 1}
+> @kbd{print s}
+> @kbd{@}'}
+@print{} 113423713055421845118910464
+@end example
+
+The output differs from the acutal number, 113423713055421844361000443,
+because the default precision of 53 is not enough to represent the
+floating-point results exactly. You can either increase the precision
+(100 is enough in this case), or replace the floating-point constant
+@samp{2.0} with an integer, to perform all computations using integer
+arithmetic to get the correct output.
+
+It will sometimes be necessary for @command{gawk} to implicitly convert an
+arbitrary precision integer into an arbitrary precision floating-point value.
+This is primarily because the MPFR library does not always provide the
+relevant interface to process arbitrary precision integers or mixed-mode
+numbers as needed by an operation or function.
+In such a case, the precision is set to the minimum value necessary
+for exact conversion, and the working precision is not used for this purpose.
+If this is not what you need or want, you can employ a subterfuge
+like this:
+
+@example
+gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}'
+@end example
+
+You can avoid this issue altogether by specifying the number as a floating-point value
+to begin with:
+
+@example
+gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}'
+@end example
+
+Note that for the particular example above, there is likely best
+to just use the following:
+
+@example
+gawk -M 'BEGIN @{ n = 13; print n % 2 @}'
+@end example
+
@node Dynamic Extensions
@chapter Writing Extensions for @command{gawk}