diff options
-rw-r--r-- | doc/ChangeLog | 2 | ||||
-rw-r--r-- | doc/gawk.info | 3145 | ||||
-rw-r--r-- | doc/gawk.texi | 2503 |
3 files changed, 2831 insertions, 2819 deletions
diff --git a/doc/ChangeLog b/doc/ChangeLog index 5372f7da..b112c720 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -2,6 +2,8 @@ * gawk.texi: Emphasize more that floating point behavior is not a language issue. Add a pointer to POSIX bc. + Move arithmetic chapter to later in the book, before chapter + on dynamic extensions. 2012-08-17 Arnold D. Robbins <arnold@skeeve.com> diff --git a/doc/gawk.info b/doc/gawk.info index 18e455cc..4c10ab10 100644 --- a/doc/gawk.info +++ b/doc/gawk.info @@ -875,6 +875,9 @@ real problems. *note Debugger::, describes the `awk' debugger. + *note Arbitrary Precision Arithmetic::, describes advanced +arithmetic facilities provided by `gawk'. + *note Language History::, describes how the `awk' language has evolved since its first release to present. It also describes how `gawk' has acquired features over time. @@ -13757,1071 +13760,9 @@ writing, the latest version of GNU `gettext' is version 0.18.1 usage messages, warnings, and fatal errors in the local language. -File: gawk.info, Node: Arbitrary Precision Arithmetic, Next: Advanced Features, Prev: Internationalization, Up: Top - -11 Arithmetic and Arbitrary Precision Arithmetic with `gawk' -************************************************************ - - There's a credibility gap: We don't know how much of the - computer's answers to believe. Novice computer users solve this - problem by implicitly trusting in the computer as an infallible - authority; they tend to believe that all digits of a printed - answer are significant. Disillusioned computer users have just the - opposite approach; they are constantly afraid that their answers - are almost meaningless. - Donald Knuth(1) - - This major node discusses issues that you may encounter when -performing arithmetic. It begins by discussing some of the general -atributes of computer arithmetic, along with how this can influence -what you see when running `awk' programs. This discussion applies to -all versions of `awk'. - - Then the discussion moves on to "arbitrary precsion arithmetic", a -feature which is specific to `gawk'. - -* Menu: - -* General Arithmetic:: An introduction to computer arithmetic. -* Floating-point Programming:: Effective Floating-point Programming. -* Gawk and MPFR:: How `gawk' provides - aribitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic - with `gawk'. -* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with - `gawk'. - - ---------- Footnotes ---------- - - (1) Donald E. Knuth. `The Art of Computer Programming'. Volume 2, -`Seminumerical Algorithms', third edition, 1998, ISBN 0-201-89683-4, p. -229. - - -File: gawk.info, Node: General Arithmetic, Next: Floating-point Programming, Up: Arbitrary Precision Arithmetic - -11.1 A General Description of Computer Arithmetic -================================================= - -Within computers, there are two kinds of numeric values: "integers" and -"floating-point". In school, integer values were referred to as -"whole" numbers--that is, numbers without any fractional part, such as -1, 42, or -17. The advantage to integer numbers is that they represent -values exactly. The disadvantage is that their range is limited. On -most systems, this range is -2,147,483,648 to 2,147,483,647. However, -many systems now support a range from -9,223,372,036,854,775,808 to -9,223,372,036,854,775,807. - - Integer values come in two flavors: "signed" and "unsigned". Signed -values may be negative or positive, with the range of values just -described. Unsigned values are always positive. On most systems, the -range is from 0 to 4,294,967,295. However, many systems now support a -range from 0 to 18,446,744,073,709,551,615. - - Floating-point numbers represent what are called "real" numbers; -i.e., those that do have a fractional part, such as 3.1415927. The -advantage to floating-point numbers is that they can represent a much -larger range of values. The disadvantage is that there are numbers -that they cannot represent exactly. `awk' uses "double precision" -floating-point numbers, which can hold more digits than "single -precision" floating-point numbers. - - There a several important issues to be aware of, described next. - -* Menu: - -* Floating Point Issues:: Stuff to know about floating-point numbers. -* Integer Programming:: Effective integer programming. - - -File: gawk.info, Node: Floating Point Issues, Next: Integer Programming, Up: General Arithmetic - -11.1.1 Floating-Point Number Caveats ------------------------------------- - -As mentioned earlier, floating-point numbers represent what are called -"real" numbers, i.e., those that have a fractional part. `awk' uses -double precision floating-point numbers to represent all numeric -values. This minor node describes some of the issues involved in using -floating-point numbers. - - There is a very nice paper on floating-point arithmetic -(http://www.validlab.com/goldberg/paper.pdf) by David Goldberg, "What -Every Computer Scientist Should Know About Floating-point Arithmetic," -`ACM Computing Surveys' *23*, 1 (1991-03), 5-48. This is worth reading -if you are interested in the details, but it does require a background -in computer science. - -* Menu: - -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. - - -File: gawk.info, Node: String Conversion Precision, Next: Unexpected Results, Up: Floating Point Issues - -11.1.1.1 The String Value Can Lie -................................. - -Internally, `awk' keeps both the numeric value (double precision -floating-point) and the string value for a variable. Separately, `awk' -keeps track of what type the variable has (*note Typing and -Comparison::), which plays a role in how variables are used in -comparisons. - - It is important to note that the string value for a number may not -reflect the full value (all the digits) that the numeric value actually -contains. The following program (`values.awk') illustrates this: - - { - sum = $1 + $2 - # see it for what it is - printf("sum = %.12g\n", sum) - # use CONVFMT - a = "<" sum ">" - print "a =", a - # use OFMT - print "sum =", sum - } - -This program shows the full value of the sum of `$1' and `$2' using -`printf', and then prints the string values obtained from both -automatic conversion (via `CONVFMT') and from printing (via `OFMT'). - - Here is what happens when the program is run: - - $ echo 3.654321 1.2345678 | awk -f values.awk - -| sum = 4.8888888 - -| a = <4.88889> - -| sum = 4.88889 - - This makes it clear that the full numeric value is different from -what the default string representations show. - - `CONVFMT''s default value is `"%.6g"', which yields a value with at -least six significant digits. For some applications, you might want to -change it to specify more precision. On most modern machines, most of -the time, 17 digits is enough to capture a floating-point number's -value exactly.(1) - - ---------- Footnotes ---------- - - (1) Pathological cases can require up to 752 digits (!), but we -doubt that you need to worry about this. - - -File: gawk.info, Node: Unexpected Results, Next: POSIX Floating Point Problems, Prev: String Conversion Precision, Up: Floating Point Issues - -11.1.1.2 Floating Point Numbers Are Not Abstract Numbers -........................................................ - -Unlike numbers in the abstract sense (such as what you studied in high -school or college arithmetic), numbers stored in computers are limited -in certain ways. They cannot represent an infinite number of digits, -nor can they always represent things exactly. In particular, -floating-point numbers cannot always represent values exactly. Here is -an example: - - $ awk '{ printf("%010d\n", $1 * 100) }' - 515.79 - -| 0000051579 - 515.80 - -| 0000051579 - 515.81 - -| 0000051580 - 515.82 - -| 0000051582 - Ctrl-d - -This shows that some values can be represented exactly, whereas others -are only approximated. This is not a "bug" in `awk', but simply an -artifact of how computers represent numbers. - - NOTE: It cannot be emphasized enough that the behavior just - described is fundamental to modern computers. You will see this - kind of thing happen in _any_ programming language using hardware - floating-point numbers. It is _not_ a bug in `gawk', nor is it - something that can be "just fixed." - - Another peculiarity of floating-point numbers on modern systems is -that they often have more than one representation for the number zero! -In particular, it is possible to represent "minus zero" as well as -regular, or "positive" zero. - - This example shows that negative and positive zero are distinct -values when stored internally, but that they are in fact equal to each -other, as well as to "regular" zero: - - $ gawk 'BEGIN { mz = -0 ; pz = 0 - > printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz - > printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0 - > }' - -| -0 = -0, +0 = 0, (-0 == +0) -> 1 - -| mz == 0 -> 1, pz == 0 -> 1 - - It helps to keep this in mind should you process numeric data that -contains negative zero values; the fact that the zero is negative is -noted and can affect comparisons. - - -File: gawk.info, Node: POSIX Floating Point Problems, Prev: Unexpected Results, Up: Floating Point Issues - -11.1.1.3 Standards Versus Existing Practice -........................................... - -Historically, `awk' has converted any non-numeric looking string to the -numeric value zero, when required. Furthermore, the original -definition of the language and the original POSIX standards specified -that `awk' only understands decimal numbers (base 10), and not octal -(base 8) or hexadecimal numbers (base 16). - - Changes in the language of the 2001 and 2004 POSIX standards can be -interpreted to imply that `awk' should support additional features. -These features are: - - * Interpretation of floating point data values specified in - hexadecimal notation (`0xDEADBEEF'). (Note: data values, _not_ - source code constants.) - - * Support for the special IEEE 754 floating point values "Not A - Number" (NaN), positive Infinity ("inf") and negative Infinity - ("-inf"). In particular, the format for these values is as - specified by the ISO 1999 C standard, which ignores case and can - allow machine-dependent additional characters after the `nan' and - allow either `inf' or `infinity'. - - The first problem is that both of these are clear changes to -historical practice: - - * The `gawk' maintainer feels that supporting hexadecimal floating - point values, in particular, is ugly, and was never intended by the - original designers to be part of the language. - - * Allowing completely alphabetic strings to have valid numeric - values is also a very severe departure from historical practice. - - The second problem is that the `gawk' maintainer feels that this -interpretation of the standard, which requires a certain amount of -"language lawyering" to arrive at in the first place, was not even -intended by the standard developers. In other words, "we see how you -got where you are, but we don't think that that's where you want to be." - - Recognizing the above issues, but attempting to provide compatibility -with the earlier versions of the standard, the 2008 POSIX standard -added explicit wording to allow, but not require, that `awk' support -hexadecimal floating point values and special values for "Not A Number" -and infinity. - - Although the `gawk' maintainer continues to feel that providing -those features is inadvisable, nevertheless, on systems that support -IEEE floating point, it seems reasonable to provide _some_ way to -support NaN and Infinity values. The solution implemented in `gawk' is -as follows: - - * With the `--posix' command-line option, `gawk' becomes "hands - off." String values are passed directly to the system library's - `strtod()' function, and if it successfully returns a numeric - value, that is what's used.(1) By definition, the results are not - portable across different systems. They are also a little - surprising: - - $ echo nanny | gawk --posix '{ print $1 + 0 }' - -| nan - $ echo 0xDeadBeef | gawk --posix '{ print $1 + 0 }' - -| 3735928559 - - * Without `--posix', `gawk' interprets the four strings `+inf', - `-inf', `+nan', and `-nan' specially, producing the corresponding - special numeric values. The leading sign acts a signal to `gawk' - (and the user) that the value is really numeric. Hexadecimal - floating point is not supported (unless you also use - `--non-decimal-data', which is _not_ recommended). For example: - - $ echo nanny | gawk '{ print $1 + 0 }' - -| 0 - $ echo +nan | gawk '{ print $1 + 0 }' - -| nan - $ echo 0xDeadBeef | gawk '{ print $1 + 0 }' - -| 0 - - `gawk' does ignore case in the four special values. Thus `+nan' - and `+NaN' are the same. - - ---------- Footnotes ---------- - - (1) You asked for it, you got it. - - -File: gawk.info, Node: Integer Programming, Prev: Floating Point Issues, Up: General Arithmetic - -11.1.2 Mixing Integers And Floating-point ------------------------------------------ - -As has been mentioned already, `gawk' ordinarily uses hardware double -precision with 64-bit IEEE binary floating-point representation for -numbers on most systems. A large integer like 9007199254740997 has a -binary representation that, although finite, is more than 53 bits long; -it must also be rounded to 53 bits. The biggest integer that can be -stored in a C `double' is usually the same as the largest possible -value of a `double'. If your system `double' is an IEEE 64-bit -`double', this largest possible value is an integer and can be -represented precisely. What more should one know about integers? - - If you want to know what is the largest integer, such that it and -all smaller integers can be stored in 64-bit doubles without losing -precision, then the answer is 2^53. The next representable number is -the even number 2^53 + 2, meaning it is unlikely that you will be able -to make `gawk' print 2^53 + 1 in integer format. The range of integers -exactly representable by a 64-bit double is [-2^53, 2^53]. If you ever -see an integer outside this range in `gawk' using 64-bit doubles, you -have reason to be very suspicious about the accuracy of the output. -Here is a simple program with erroneous output: - - $ gawk 'BEGIN { i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j }' - -| 9007199254740991 - -| 9007199254740992 - -| 9007199254740992 - -| 9007199254740994 - - The lesson is to not assume that any large integer printed by `gawk' -represents an exact result from your computation, especially if it wraps -around on your screen. - - -File: gawk.info, Node: Floating-point Programming, Next: Gawk and MPFR, Prev: General Arithmetic, Up: Arbitrary Precision Arithmetic - -11.2 Understanding Floating-point Programming -============================================= - -Numerical programming is an extensive area; if you need to develop -sophisticated numerical algorithms then `gawk' may not be the ideal -tool, and this documentation may not be sufficient. It might require -digesting a book or two to really internalize how to compute with ideal -accuracy and precision and the result often depends on the particular -application. - - NOTE: A floating-point calculation's "accuracy" is how close it - comes to the real value. This is as opposed to the "precision", - which usually refers to the number of bits used to represent the - number (see the Wikipedia article - (http://en.wikipedia.org/wiki/Accuracy_and_precision) for more - information). - - There are two options for doing floating-point calculations: -hardware floating-point (as used by standard `awk' and the default for -`gawk'), and "arbitrary-precision" floating-point, which is software -based. This major node aims to provide enough information to -understand both, and then will focus on `gawk''s facilities for the -latter.(1) - - Binary floating-point representations and arithmetic are inexact. -Simple values like 0.1 cannot be precisely represented using binary -floating-point numbers, and the limited precision of floating-point -numbers means that slight changes in the order of operations or the -precision of intermediate storage can change the result. To make -matters worse, with arbitrary precision floating-point, you can set the -precision before starting a computation, but then you cannot be sure of -the number of significant decimal places in the final result. - - Sometimes, before you start to write any code, you should think more -about what you really want and what's really happening. Consider the -two numbers in the following example: - - x = 0.875 # 1/2 + 1/4 + 1/8 - y = 0.425 - - Unlike the number in `y', the number stored in `x' is exactly -representable in binary since it can be written as a finite sum of one -or more fractions whose denominators are all powers of two. When -`gawk' reads a floating-point number from program source, it -automatically rounds that number to whatever precision your machine -supports. If you try to print the numeric content of a variable using -an output format string of `"%.17g"', it may not produce the same -number as you assigned to it: - - $ gawk 'BEGIN { x = 0.875; y = 0.425 - > printf("%0.17g, %0.17g\n", x, y) }' - -| 0.875, 0.42499999999999999 - - Often the error is so small you do not even notice it, and if you do, -you can always specify how much precision you would like in your output. -Usually this is a format string like `"%.15g"', which when used in the -previous example, produces an output identical to the input. - - Because the underlying representation can be little bit off from the -exact value, comparing floating-point values to see if they are equal -is generally not a good idea. Here is an example where it does not -work like you expect: - - $ gawk 'BEGIN { print (0.1 + 12.2 == 12.3) }' - -| 0 - - The loss of accuracy during a single computation with floating-point -numbers usually isn't enough to worry about. However, if you compute a -value which is the result of a sequence of floating point operations, -the error can accumulate and greatly affect the computation itself. -Here is an attempt to compute the value of the constant pi using one of -its many series representations: - - BEGIN { - x = 1.0 / sqrt(3.0) - n = 6 - for (i = 1; i < 30; i++) { - n = n * 2.0 - x = (sqrt(x * x + 1) - 1) / x - printf("%.15f\n", n * x) - } - } - - When run, the early errors propagating through later computations -cause the loop to terminate prematurely after an attempt to divide by -zero. - - $ gawk -f pi.awk - -| 3.215390309173475 - -| 3.159659942097510 - -| 3.146086215131467 - -| 3.142714599645573 - ... - -| 3.224515243534819 - -| 2.791117213058638 - -| 0.000000000000000 - error--> gawk: pi.awk:6: fatal: division by zero attempted - - Here is one more example where the inaccuracies in internal -representations yield an unexpected result: - - $ gawk 'BEGIN { - > for (d = 1.1; d <= 1.5; d += 0.1) - > i++ - > print i - > }' - -| 4 - - Can computation using aribitrary precision help with the previous -examples? If you are impatient to know, see *note Exact Arithmetic::. - - Instead of aribitrary precision floating-point arithmetic, often all -you need is an adjustment of your logic or a different order for the -operations in your calculation. The stability and the accuracy of the -computation of the constant pi in the previous example can be enhanced -by using the following simple algebraic transformation: - - (sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1) - -After making this, change the program does converge to pi in under 30 -iterations: - - $ gawk -f /tmp/pi2.awk - -| 3.215390309173473 - -| 3.159659942097501 - -| 3.146086215131436 - -| 3.142714599645370 - -| 3.141873049979825 - ... - -| 3.141592653589797 - -| 3.141592653589797 - - There is no need to be unduly suspicious about the results from -floating-point arithmetic. The lesson to remember is that -floating-point arithmetic is always more complex than the arithmetic -using pencil and paper. In order to take advantage of the power of -computer floating-point, you need to know its limitations and work -within them. For most casual use of floating-point arithmetic, you will -often get the expected result in the end if you simply round the -display of your final results to the correct number of significant -decimal digits. And, avoid presenting numerical data in a manner that -implies better precision than is actually the case. - -* Menu: - -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. - - ---------- Footnotes ---------- - - (1) If you are interested in other tools that perform arbitrary -precision arithmetic, you may want to investigate the POSIX `bc' tool. -See the POSIX specification for it -(http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html), for -more information. - - -File: gawk.info, Node: Floating-point Representation, Next: Floating-point Context, Up: Floating-point Programming - -11.2.1 Binary Floating-point Representation -------------------------------------------- - -Although floating-point representations vary from machine to machine, -the most commonly encountered representation is that defined by the -IEEE 754 Standard. An IEEE-754 format value has three components: - - * A sign bit telling whether the number is positive or negative. - - * An "exponent" giving its order of magnitude, E. - - * A "significand", S, specifying the actual digits of the number. - - The value of the number is then S * 2^E. The first bit of a -non-zero binary significand is always one, so the significand in an -IEEE-754 format only includes the fractional part, leaving the leading -one implicit. - - Three of the standard IEEE-754 types are 32-bit single precision, -64-bit double precision and 128-bit quadruple precision. The standard -also specifies extended precision formats to allow greater precisions -and larger exponent ranges. - - The significand is stored in "normalized" format, which means that -the first bit is always a one. - - -File: gawk.info, Node: Floating-point Context, Next: Rounding Mode, Prev: Floating-point Representation, Up: Floating-point Programming - -11.2.2 Floating-point Context ------------------------------ - -A floating-point "context" defines the environment for arithmetic -operations. It governs precision, sets rules for rounding, and limits -the range for exponents. The context has the following primary -components: - -"Precision" - Precision of the floating-point format in bits. - -"emax" - Maximum exponent allowed for this format. - -"emin" - Minimum exponent allowed for this format. - -"Underflow behavior" - The format may or may not support gradual underflow. - -"Rounding" - The rounding mode of this context. - - *note table-ieee-formats:: lists the precision and exponent field -values for the basic IEEE-754 binary formats: - -Name Total bits Precision emin emax ---------------------------------------------------------------------------- -Single 32 24 -126 +127 -Double 64 53 -1022 +1023 -Quadruple 128 113 -16382 +16383 - -Table 11.1: Basic IEEE Format Context Values - - NOTE: The precision numbers include the implied leading one that - gives them one extra bit of significand. - - A floating-point context can also determine which signals are treated -as exceptions, and can set rules for arithmetic with special values. -Please consult the IEEE-754 standard or other resources for details. - - `gawk' ordinarily uses the hardware double precision representation -for numbers. On most systems, this is IEEE-754 floating-point format, -corresponding to 64-bit binary with 53 bits of precision. - - NOTE: In case an underflow occurs, the standard allows, but does - not require, the result from an arithmetic operation to be a - number smaller than the smallest nonzero normalized number. Such - numbers do not have as many significant digits as normal numbers, - and are called "denormals" or "subnormals". The alternative, - simply returning a zero, is called "flush to zero". The basic - IEEE-754 binary formats support subnormal numbers. - - -File: gawk.info, Node: Rounding Mode, Prev: Floating-point Context, Up: Floating-point Programming - -11.2.3 Floating-point Rounding Mode ------------------------------------ - -The "rounding mode" specifies the behavior for the results of numerical -operations when discarding extra precision. Each rounding mode indicates -how the least significant returned digit of a rounded result is to be -calculated. *note table-rounding-modes:: lists the IEEE-754 defined -rounding modes: - -Rounding Mode IEEE Name --------------------------------------------------------------------------- -Round to nearest, ties to even `roundTiesToEven' -Round toward plus Infinity `roundTowardPositive' -Round toward negative Infinity `roundTowardNegative' -Round toward zero `roundTowardZero' -Round to nearest, ties away `roundTiesToAway' -from zero - -Table 11.2: IEEE 754 Rounding Modes - - The default mode `roundTiesToEven' is the most preferred, but the -least intuitive. This method does the obvious thing for most values, by -rounding them up or down to the nearest digit. For example, rounding -1.132 to two digits yields 1.13, and rounding 1.157 yields 1.16. - - However, when it comes to rounding a value that is exactly halfway -between, things do not work the way you probably learned in school. In -this case, the number is rounded to the nearest even digit. So -rounding 0.125 to two digits rounds down to 0.12, but rounding 0.6875 -to three digits rounds up to 0.688. You probably have already -encountered this rounding mode when using the `printf' routine to -format floating-point numbers. For example: - - BEGIN { - x = -4.5 - for (i = 1; i < 10; i++) { - x += 1.0 - printf("%4.1f => %2.0f\n", x, x) - } - } - -produces the following output when run:(1) - - -3.5 => -4 - -2.5 => -2 - -1.5 => -2 - -0.5 => 0 - 0.5 => 0 - 1.5 => 2 - 2.5 => 2 - 3.5 => 4 - 4.5 => 4 - - The theory behind the rounding mode `roundTiesToEven' is that it -more or less evenly distributes upward and downward rounds of exact -halves, which might cause the round-off error to cancel itself out. -This is the default rounding mode used in IEEE-754 computing functions -and operators. - - The other rounding modes are rarely used. Round toward positive -infinity (`roundTowardPositive') and round toward negative infinity -(`roundTowardNegative') are often used to implement interval arithmetic, -where you adjust the rounding mode to calculate upper and lower bounds -for the range of output. The `roundTowardZero' mode can be used for -converting floating-point numbers to integers. The rounding mode -`roundTiesToAway' rounds the result to the nearest number and selects -the number with the larger magnitude if a tie occurs. - - Some numerical analysts will tell you that your choice of rounding -style has tremendous impact on the final outcome, and advise you to -wait until final output for any rounding. Instead, you can often avoid -round-off error problems by setting the precision initially to some -value sufficiently larger than the final desired precision, so that the -accumulation of round-off error does not influence the outcome. If you -suspect that results from your computation are sensitive to -accumulation of round-off error, one way to be sure is to look for a -significant difference in output when you change the rounding mode. - - ---------- Footnotes ---------- - - (1) It is possible for the output to be completely different if the -C library in your system does not use the IEEE-754 even-rounding rule -to round halfway cases for `printf()'. - - -File: gawk.info, Node: Gawk and MPFR, Next: Arbitrary Precision Floats, Prev: Floating-point Programming, Up: Arbitrary Precision Arithmetic - -11.3 `gawk' + MPFR = Powerful Arithmetic -======================================== - -The rest of this major node decsribes how to use the arbitrary precision -(also known as "multiple precision" or "infinite precision") numeric -capabilites in `gawk' to produce maximally accurate results when you -need it. - - But first you should check if your version of `gawk' supports -arbitrary precision arithmetic. The easiest way to find out is to look -at the output of the following command: - - $ gawk --version - -| GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) - -| Copyright (C) 1989, 1991-2012 Free Software Foundation. - ... - - `gawk' uses the GNU MPFR (http://www.mpfr.org) and GNU MP -(http://gmplib.org) (GMP) libraries for arbitrary precision arithmetic -on numbers. So if you do not see the names of these libraries in the -output, then your version of `gawk' does not support arbitrary -precision arithmetic. - - Additionally, there are a few elements available in the `PROCINFO' -array to provide information about the MPFR and GMP libraries. *Note -Auto-set::, for more information. - - -File: gawk.info, Node: Arbitrary Precision Floats, Next: Arbitrary Precision Integers, Prev: Gawk and MPFR, Up: Arbitrary Precision Arithmetic - -11.4 Arbitrary Precision Floating-point Arithmetic with `gawk' -============================================================== - -`gawk' uses the GNU MPFR library for arbitrary precision floating-point -arithmetic. The MPFR library provides precise control over precisions -and rounding modes, and gives correctly rounded reproducible -platform-independent results. With the command-line option `--bignum' -or `-M', all floating-point arithmetic operators and numeric functions -can yield results to any desired precision level supported by MPFR. -Two built-in variables `PREC' (*note Setting Precision::) and -`ROUNDMODE' (*note Setting Rounding Mode::) provide control over the -working precision and the rounding mode. The precision and the -rounding mode are set globally for every operation to follow. - - The default working precision for arbitrary precision floating-point -values is 53, and the default value for `ROUNDMODE' is `"N"', which -selects the IEEE-754 `roundTiesToEven' (*note Rounding Mode::) rounding -mode.(1) `gawk' uses the default exponent range in MPFR (EMAX = 2^30 - -1, EMIN = -EMAX) for all floating-point contexts. There is no explicit -mechanism to adjust the exponent range. MPFR does not implement -subnormal numbers by default, and this behavior cannot be changed in -`gawk'. - - NOTE: When emulating an IEEE-754 format (*note Setting - Precision::), `gawk' internally adjusts the exponent range to the - value defined for the format and also performs computations needed - for gradual underflow (subnormal numbers). - - NOTE: MPFR numbers are variable-size entities, consuming only as - much space as needed to store the significant digits. Since the - performance using MPFR numbers pales in comparison to doing - arithmetic using the underlying machine types, you should consider - using only as much precision as needed by your program. - -* Menu: - -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point numbers. - - ---------- Footnotes ---------- - - (1) The default precision is 53, since according to the MPFR -documentation, the library should be able to exactly reproduce all -computations with double-precision machine floating-point numbers -(`double' type in C), except the default exponent range is much wider -and subnormal numbers are not implemented. - - -File: gawk.info, Node: Setting Precision, Next: Setting Rounding Mode, Up: Arbitrary Precision Floats - -11.4.1 Setting the Working Precision ------------------------------------- - -`gawk' uses a global working precision; it does not keep track of the -precision or accuracy of individual numbers. Performing an arithmetic -operation or calling a built-in function rounds the result to the -current working precision. The default working precision is 53 which -can be modified using the built-in variable `PREC'. You can also set the -value to one of the following pre-defined case-insensitive strings to -emulate an IEEE-754 binary format: - -`PREC' IEEE-754 Binary Format ---------------------------------------------------- -`"half"' 16-bit half-precision. -`"single"' Basic 32-bit single precision. -`"double"' Basic 64-bit double precision. -`"quad"' Basic 128-bit quadruple precision. -`"oct"' 256-bit octuple precision. - - The following example illustrates the effects of changing precision -on arithmetic operations: - - $ gawk -M -vPREC=100 'BEGIN { x = 1.0e-400; print x + 0; \ - > PREC = "double"; print x + 0 }' - -| 1e-400 - -| 0 - - Binary and decimal precisions are related approximately according to -the formula: - - PREC = 3.322 * DPS - -Here, PREC denotes the binary precision (measured in bits) and DPS -(short for decimal places) is the decimal digits. We can easily -calculate how many decimal digits the 53-bit significand of an IEEE -double is equivalent to: 53 / 3.332 which is equal to about 15.95. But -what does 15.95 digits actually mean? It depends whether you are -concerned about how many digits you can rely on, or how many digits you -need. - - It is important to know how many bits it takes to uniquely identify -a double-precision value (the C type `double'). If you want to convert -from `double' to decimal and back to `double' (e.g., saving a `double' -representing an intermediate result to a file, and later reading it -back to restart the computation), then a few more decimal digits are -required. 17 digits is generally enough for a `double'. - - It can also be important to know what decimal numbers can be uniquely -represented with a `double'. If you want to convert from decimal to -`double' and back again, 15 digits is the most that you can get. Stated -differently, you should not present the numbers from your -floating-point computations with more than 15 significant digits in -them. - - Conversely, it takes a precision of 332 bits to hold an approximation -of the constant pi that is accurate to 100 decimal places. You should -always add some extra bits in order to avoid the confusing round-off -issues that occur because numbers are stored internally in binary. - - -File: gawk.info, Node: Setting Rounding Mode, Next: Floating-point Constants, Prev: Setting Precision, Up: Arbitrary Precision Floats - -11.4.2 Setting the Rounding Mode --------------------------------- - -The `ROUNDMODE' variable provides program level control over the -rounding mode. The correspondance between `ROUNDMODE' and the IEEE -rounding modes is shown in *note table-gawk-rounding-modes::. - -Rounding Mode IEEE Name `ROUNDMODE' ---------------------------------------------------------------------------- -Round to nearest, ties to even `roundTiesToEven' `"N"' or `"n"' -Round toward plus Infinity `roundTowardPositive' `"U"' or `"u"' -Round toward negative Infinity `roundTowardNegative' `"D"' or `"d"' -Round toward zero `roundTowardZero' `"Z"' or `"z"' -Round to nearest, ties away `roundTiesToAway' `"A"' or `"a"' -from zero - -Table 11.3: `gawk' Rounding Modes - - `ROUNDMODE' has the default value `"N"', which selects the IEEE-754 -rounding mode `roundTiesToEven'. Besides the values listed in *note -Table 11.3: table-gawk-rounding-modes, `gawk' also accepts `"A"' to -select the IEEE-754 mode `roundTiesToAway' if your version of the MPFR -library supports it; otherwise setting `ROUNDMODE' to this value has no -effect. *Note Rounding Mode::, for the meanings of the various rounding -modes. - - Here is an example of how to change the default rounding behavior of -`printf''s output: - - $ gawk -M -vROUNDMODE="Z" 'BEGIN { printf("%.2f\n", 1.378) }' - -| 1.37 - - -File: gawk.info, Node: Floating-point Constants, Next: Changing Precision, Prev: Setting Rounding Mode, Up: Arbitrary Precision Floats - -11.4.3 Representing Floating-point Constants --------------------------------------------- - -Be wary of floating-point constants! When reading a floating-point -constant from program source code, `gawk' uses the default precision, -unless overridden by an assignment to the special variable `PREC' on -the command line, to store it internally as a MPFR number. Changing -the precision using `PREC' in the program text does not change the -precision of a constant. If you need to represent a floating-point -constant at a higher precision than the default and cannot use a -command line assignment to `PREC', you should either specify the -constant as a string, or as a rational number whenever possible. The -following example illustrates the differences among various ways to -print a floating-point constant: - - $ gawk -M 'BEGIN { PREC = 113; printf("%0.25f\n", 0.1) }' - -| 0.1000000000000000055511151 - $ gawk -M -vPREC = 113 'BEGIN { printf("%0.25f\n", 0.1) }' - -| 0.1000000000000000000000000 - $ gawk -M 'BEGIN { PREC = 113; printf("%0.25f\n", "0.1") }' - -| 0.1000000000000000000000000 - $ gawk -M 'BEGIN { PREC = 113; printf("%0.25f\n", 1/10) }' - -| 0.1000000000000000000000000 - - In the first case, the number is stored with the default precision -of 53. - - -File: gawk.info, Node: Changing Precision, Next: Exact Arithmetic, Prev: Floating-point Constants, Up: Arbitrary Precision Floats - -11.4.4 Changing the Precision of a Number ------------------------------------------ - - The point is that in any variable-precision package, a decision is - made on how to treat numbers given as data, or arising in - intermediate results, which are represented in floating-point - format to a precision lower than working precision. Do we promote - them to full membership of the high-precision club, or do we treat - them and all their associates as second-class citizens? Sometimes - the first course is proper, sometimes the second, and it takes - careful analysis to tell which. - - Dirk Laurie(1) - - `gawk' does not implicitly modify the precision of any previously -computed results when the working precision is changed with an -assignment to `PREC'. The precision of a number is always the one that -was used at the time of its creation, and there is no way for the user -to explicitly change it afterwards. However, since the result of a -floating-point arithmetic operation is always an arbitrary precision -floating-point value--with a precision set by the value of `PREC'--one -of the following workarounds effectively accomplishes the desired -behavior: - - x = x + 0.0 - -or: - - x += 0.0 - - ---------- Footnotes ---------- - - (1) Dirk Laurie. `Variable-precision Arithmetic Considered Perilous --- A Detective Story'. Electronic Transactions on Numerical Analysis. -Volume 28, pp. 168-173, 2008. - - -File: gawk.info, Node: Exact Arithmetic, Prev: Changing Precision, Up: Arbitrary Precision Floats - -11.4.5 Exact Arithmetic with Floating-point Numbers ---------------------------------------------------- - - CAUTION: Never depend on the exactness of floating-point - arithmetic, even for apparently simple expressions! - - Can arbitrary precision arithmetic give exact results? There are no -easy answers. The standard rules of algebra often do not apply when -using floating-point arithmetic. Among other things, the distributive -and associative laws do not hold completely, and order of operation may -be important for your computation. Rounding error, cumulative precision -loss and underflow are often troublesome. - - When `gawk' tests the expressions `0.1 + 12.2' and `12.3' for -equality using the machine double precision arithmetic, it decides that -they are not equal! (*Note Floating-point Programming::.) You can get -the result you want by increasing the precision; 56 in this case will -get the job done: - - $ gawk -M -vPREC=56 'BEGIN { print (0.1 + 12.2 == 12.3) }' - -| 1 - - If adding more bits is good, perhaps adding even more bits of -precision is better? Here is what happens if we use an even larger -value of `PREC': - - $ gawk -M -vPREC=201 'BEGIN { print (0.1 + 12.2 == 12.3) }' - -| 0 - - This is not a bug in `gawk' or in the MPFR library. It is easy to -forget that the finite number of bits used to store the value is often -just an approximation after proper rounding. The test for equality -succeeds if and only if _all_ bits in the two operands are exactly the -same. Since this is not necessarily true after floating-point -computations with a particular precision and effective rounding rule, a -straight test for equality may not work. - - So, don't assume that floating-point values can be compared for -equality. You should also exercise caution when using other forms of -comparisons. The standard way to compare between floating-point -numbers is to determine how much error (or "tolerance") you will allow -in a comparison and check to see if one value is within this error -range of the other. - - In applications where 15 or fewer decimal places suffice, hardware -double precision arithmetic can be adequate, and is usually much faster. -But you do need to keep in mind that every floating-point operation can -suffer a new rounding error with catastrophic consequences as -illustrated by our attempt to compute the value of the constant pi -(*note Floating-point Programming::). Extra precision can greatly -enhance the stability and the accuracy of your computation in such -cases. - - Repeated addition is not necessarily equivalent to multiplication in -floating-point arithmetic. In the example in *note Floating-point -Programming::: - - $ gawk 'BEGIN { - > for (d = 1.1; d <= 1.5; d += 0.1) - > i++ - > print i - > }' - -| 4 - -you may or may not succeed in getting the correct result by choosing an -arbitrarily large value for `PREC'. Reformulation of the problem at -hand is often the correct approach in such situations. - - -File: gawk.info, Node: Arbitrary Precision Integers, Prev: Arbitrary Precision Floats, Up: Arbitrary Precision Arithmetic - -11.5 Arbitrary Precision Integer Arithmetic with `gawk' -======================================================= - -If the option `--bignum' or `-M' is specified, `gawk' performs all -integer arithmetic using GMP arbitrary precision integers. Any number -that looks like an integer in a program source or data file is stored -as an arbitrary precision integer. The size of the integer is limited -only by your computer's memory. The current floating-point context has -no effect on operations involving integers. For example, the following -computes 5^4^3^2, the result of which is beyond the limits of ordinary -`gawk' numbers: - - $ gawk -M 'BEGIN { - > x = 5^4^3^2 - > print "# of digits =", length(x) - > print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20) - > }' - -| # of digits = 183231 - -| 62060698786608744707 ... 92256259918212890625 - - If you were to compute the same value using arbitrary precision -floating-point values instead, the precision needed for correct output -(using the formula `prec = 3.322 * dps'), would be 3.322 x 183231, or -608693. (Thus, the floating-point representation requires over 30 -times as many decimal digits!) - - The result from an arithmetic operation with an integer and a -floating-point value is a floating-point value with a precision equal -to the working precision. The following program calculates the eighth -term in Sylvester's sequence(1) using a recurrence: - - $ gawk -M 'BEGIN { - > s = 2.0 - > for (i = 1; i <= 7; i++) - > s = s * (s - 1) + 1 - > print s - > }' - -| 113423713055421845118910464 - - The output differs from the acutal number, -113423713055421844361000443, because the default precision of 53 is not -enough to represent the floating-point results exactly. You can either -increase the precision (100 is enough in this case), or replace the -floating-point constant `2.0' with an integer, to perform all -computations using integer arithmetic to get the correct output. - - It will sometimes be necessary for `gawk' to implicitly convert an -arbitrary precision integer into an arbitrary precision floating-point -value. This is primarily because the MPFR library does not always -provide the relevant interface to process arbitrary precision integers -or mixed-mode numbers as needed by an operation or function. In such a -case, the precision is set to the minimum value necessary for exact -conversion, and the working precision is not used for this purpose. If -this is not what you need or want, you can employ a subterfuge like -this: - - gawk -M 'BEGIN { n = 13; print (n + 0.0) % 2.0 }' - - You can avoid this issue altogether by specifying the number as a -floating-point value to begin with: - - gawk -M 'BEGIN { n = 13.0; print n % 2.0 }' - - Note that for the particular example above, there is likely best to -just use the following: - - gawk -M 'BEGIN { n = 13; print n % 2 }' - - ---------- Footnotes ---------- - - (1) Weisstein, Eric W. `Sylvester's Sequence'. From MathWorld--A -Wolfram Web Resource. -`http://mathworld.wolfram.com/SylvestersSequence.html' - - File: gawk.info, Node: Advanced Features, Next: Library Functions, Prev: Arbitrary Precision Arithmetic, Up: Top -12 Advanced Features of `gawk' +11 Advanced Features of `gawk' ****************************** Write documentation as if whoever reads it is a violent psychopath @@ -14854,7 +13795,7 @@ and likely to change, its description is relegated to an appendix. File: gawk.info, Node: Nondecimal Data, Next: Array Sorting, Up: Advanced Features -12.1 Allowing Nondecimal Input Data +11.1 Allowing Nondecimal Input Data =================================== If you run `gawk' with the `--non-decimal-data' option, you can have @@ -14896,7 +13837,7 @@ request it. File: gawk.info, Node: Array Sorting, Next: Two-way I/O, Prev: Nondecimal Data, Up: Advanced Features -12.2 Controlling Array Traversal and Array Sorting +11.2 Controlling Array Traversal and Array Sorting ================================================== `gawk' lets you control the order in which a `for (i in array)' loop @@ -14915,7 +13856,7 @@ to order the elements during sorting. File: gawk.info, Node: Controlling Array Traversal, Next: Array Sorting Functions, Up: Array Sorting -12.2.1 Controlling Array Traversal +11.2.1 Controlling Array Traversal ---------------------------------- By default, the order in which a `for (i in array)' loop scans an array @@ -15146,7 +14087,7 @@ the default. File: gawk.info, Node: Array Sorting Functions, Prev: Controlling Array Traversal, Up: Array Sorting -12.2.2 Sorting Array Values and Indices with `gawk' +11.2.2 Sorting Array Values and Indices with `gawk' --------------------------------------------------- In most `awk' implementations, sorting an array requires writing a @@ -15241,7 +14182,7 @@ extensions, they are not available in that case. File: gawk.info, Node: Two-way I/O, Next: TCP/IP Networking, Prev: Array Sorting, Up: Advanced Features -12.3 Two-Way Communications with Another Process +11.3 Two-Way Communications with Another Process ================================================ From: brennan@whidbey.com (Mike Brennan) @@ -15376,7 +14317,7 @@ regular pipes. File: gawk.info, Node: TCP/IP Networking, Next: Profiling, Prev: Two-way I/O, Up: Advanced Features -12.4 Using `gawk' for Network Programming +11.4 Using `gawk' for Network Programming ========================================= `EMISTERED': @@ -15453,7 +14394,7 @@ examples. File: gawk.info, Node: Profiling, Prev: TCP/IP Networking, Up: Advanced Features -12.5 Profiling Your `awk' Programs +11.5 Profiling Your `awk' Programs ================================== You may produce execution traces of your `awk' programs. This is done @@ -15671,7 +14612,7 @@ without any execution counts. File: gawk.info, Node: Library Functions, Next: Sample Programs, Prev: Advanced Features, Up: Top -13 A Library of `awk' Functions +12 A Library of `awk' Functions ******************************* *note User-defined::, describes how to write your own `awk' functions. @@ -15743,7 +14684,7 @@ contents of the input record. File: gawk.info, Node: Library Names, Next: General Functions, Up: Library Functions -13.1 Naming Library Function Global Variables +12.1 Naming Library Function Global Variables ============================================= Due to the way the `awk' language evolved, variables are either @@ -15823,7 +14764,7 @@ verifying this. File: gawk.info, Node: General Functions, Next: Data File Management, Prev: Library Names, Up: Library Functions -13.2 General Programming +12.2 General Programming ======================== This minor node presents a number of functions that are of general @@ -15846,7 +14787,7 @@ programming use. File: gawk.info, Node: Strtonum Function, Next: Assert Function, Up: General Functions -13.2.1 Converting Strings To Numbers +12.2.1 Converting Strings To Numbers ------------------------------------ The `strtonum()' function (*note String Functions::) is a `gawk' @@ -15930,7 +14871,7 @@ be tested with `gawk' and the results compared to the built-in File: gawk.info, Node: Assert Function, Next: Round Function, Prev: Strtonum Function, Up: General Functions -13.2.2 Assertions +12.2.2 Assertions ----------------- When writing large programs, it is often useful to know that a @@ -16016,7 +14957,7 @@ rule always ends with an `exit' statement. File: gawk.info, Node: Round Function, Next: Cliff Random Function, Prev: Assert Function, Up: General Functions -13.2.3 Rounding Numbers +12.2.3 Rounding Numbers ----------------------- The way `printf' and `sprintf()' (*note Printf::) perform rounding @@ -16062,7 +15003,7 @@ might be useful if your `awk''s `printf' does unbiased rounding: File: gawk.info, Node: Cliff Random Function, Next: Ordinal Functions, Prev: Round Function, Up: General Functions -13.2.4 The Cliff Random Number Generator +12.2.4 The Cliff Random Number Generator ---------------------------------------- The Cliff random number generator @@ -16091,7 +15032,7 @@ might try using this function instead. File: gawk.info, Node: Ordinal Functions, Next: Join Function, Prev: Cliff Random Function, Up: General Functions -13.2.5 Translating Between Characters and Numbers +12.2.5 Translating Between Characters and Numbers ------------------------------------------------- One commercial implementation of `awk' supplies a built-in function, @@ -16189,7 +15130,7 @@ extensions, you can simplify `_ord_init' to loop from 0 to 255. File: gawk.info, Node: Join Function, Next: Getlocaltime Function, Prev: Ordinal Functions, Up: General Functions -13.2.6 Merging an Array into a String +12.2.6 Merging an Array into a String ------------------------------------- When doing string processing, it is often useful to be able to join all @@ -16236,7 +15177,7 @@ makes string operations more difficult than they really need to be. File: gawk.info, Node: Getlocaltime Function, Prev: Join Function, Up: General Functions -13.2.7 Managing the Time of Day +12.2.7 Managing the Time of Day ------------------------------- The `systime()' and `strftime()' functions described in *note Time @@ -16318,7 +15259,7 @@ optional timestamp value to use instead of the current time. File: gawk.info, Node: Data File Management, Next: Getopt Function, Prev: General Functions, Up: Library Functions -13.3 Data File Management +12.3 Data File Management ========================= This minor node presents functions that are useful for managing @@ -16335,7 +15276,7 @@ command-line data files. File: gawk.info, Node: Filetrans Function, Next: Rewind Function, Up: Data File Management -13.3.1 Noting Data File Boundaries +12.3.1 Noting Data File Boundaries ---------------------------------- The `BEGIN' and `END' rules are each executed exactly once at the @@ -16433,7 +15374,7 @@ it provides an easy way to do per-file cleanup processing. File: gawk.info, Node: Rewind Function, Next: File Checking, Prev: Filetrans Function, Up: Data File Management -13.3.2 Rereading the Current File +12.3.2 Rereading the Current File --------------------------------- Another request for a new built-in function was for a `rewind()' @@ -16475,7 +15416,7 @@ Nextfile Statement::). File: gawk.info, Node: File Checking, Next: Empty Files, Prev: Rewind Function, Up: Data File Management -13.3.3 Checking for Readable Data Files +12.3.3 Checking for Readable Data Files --------------------------------------- Normally, if you give `awk' a data file that isn't readable, it stops @@ -16504,7 +15445,7 @@ in the list). See also *note ARGC and ARGV::. File: gawk.info, Node: Empty Files, Next: Ignoring Assigns, Prev: File Checking, Up: Data File Management -13.3.4 Checking For Zero-length Files +12.3.4 Checking For Zero-length Files ------------------------------------- All known `awk' implementations silently skip over zero-length files. @@ -16561,7 +15502,7 @@ intervening value in `ARGV' is a variable assignment. File: gawk.info, Node: Ignoring Assigns, Prev: Empty Files, Up: Data File Management -13.3.5 Treating Assignments as File Names +12.3.5 Treating Assignments as File Names ----------------------------------------- Occasionally, you might not want `awk' to process command-line variable @@ -16604,7 +15545,7 @@ arguments are left alone. File: gawk.info, Node: Getopt Function, Next: Passwd Functions, Prev: Data File Management, Up: Library Functions -13.4 Processing Command-Line Options +12.4 Processing Command-Line Options ==================================== Most utilities on POSIX compatible systems take options on the command @@ -16897,7 +15838,7 @@ have left it alone, since using `substr()' is more portable. File: gawk.info, Node: Passwd Functions, Next: Group Functions, Prev: Getopt Function, Up: Library Functions -13.5 Reading the User Database +12.5 Reading the User Database ============================== The `PROCINFO' array (*note Built-in Variables::) provides access to @@ -17140,7 +16081,7 @@ network database. File: gawk.info, Node: Group Functions, Next: Walking Arrays, Prev: Passwd Functions, Up: Library Functions -13.6 Reading the Group Database +12.6 Reading the Group Database =============================== Much of the discussion presented in *note Passwd Functions::, applies @@ -17374,7 +16315,7 @@ very simple, relying on `awk''s associative arrays to do work. File: gawk.info, Node: Walking Arrays, Prev: Group Functions, Up: Library Functions -13.7 Traversing Arrays of Arrays +12.7 Traversing Arrays of Arrays ================================ *note Arrays of Arrays::, described how `gawk' provides arrays of @@ -17425,7 +16366,7 @@ value. Here is a main program to demonstrate: File: gawk.info, Node: Sample Programs, Next: Debugger, Prev: Library Functions, Up: Top -14 Practical `awk' Programs +13 Practical `awk' Programs *************************** *note Library Functions::, presents the idea that reading programs in a @@ -17445,7 +16386,7 @@ Library Functions::. File: gawk.info, Node: Running Examples, Next: Clones, Up: Sample Programs -14.1 Running the Example Programs +13.1 Running the Example Programs ================================= To run a given program, you would typically do something like this: @@ -17468,7 +16409,7 @@ OPTIONS are any command-line options for the program that start with a File: gawk.info, Node: Clones, Next: Miscellaneous Programs, Prev: Running Examples, Up: Sample Programs -14.2 Reinventing Wheels for Fun and Profit +13.2 Reinventing Wheels for Fun and Profit ========================================== This minor node presents a number of POSIX utilities implemented in @@ -17498,7 +16439,7 @@ programming for "real world" tasks. File: gawk.info, Node: Cut Program, Next: Egrep Program, Up: Clones -14.2.1 Cutting out Fields and Columns +13.2.1 Cutting out Fields and Columns ------------------------------------- The `cut' utility selects, or "cuts," characters or fields from its @@ -17757,7 +16698,7 @@ solution to the problem of picking the input line apart by characters. File: gawk.info, Node: Egrep Program, Next: Id Program, Prev: Cut Program, Up: Clones -14.2.2 Searching for Regular Expressions in Files +13.2.2 Searching for Regular Expressions in Files ------------------------------------------------- The `egrep' utility searches files for patterns. It uses regular @@ -17989,7 +16930,7 @@ the translated line, not the original. File: gawk.info, Node: Id Program, Next: Split Program, Prev: Egrep Program, Up: Clones -14.2.3 Printing out User Information +13.2.3 Printing out User Information ------------------------------------ The `id' utility lists a user's real and effective user ID numbers, @@ -18096,7 +17037,7 @@ body never executes. File: gawk.info, Node: Split Program, Next: Tee Program, Prev: Id Program, Up: Clones -14.2.4 Splitting a Large File into Pieces +13.2.4 Splitting a Large File into Pieces ----------------------------------------- The `split' program splits large text files into smaller pieces. Usage @@ -18204,7 +17145,7 @@ not relevant for what the program aims to demonstrate. File: gawk.info, Node: Tee Program, Next: Uniq Program, Prev: Split Program, Up: Clones -14.2.5 Duplicating Output into Multiple Files +13.2.5 Duplicating Output into Multiple Files --------------------------------------------- The `tee' program is known as a "pipe fitting." `tee' copies its @@ -18292,7 +17233,7 @@ N input records and M output files, the first method only executes N File: gawk.info, Node: Uniq Program, Next: Wc Program, Prev: Tee Program, Up: Clones -14.2.6 Printing Nonduplicated Lines of Text +13.2.6 Printing Nonduplicated Lines of Text ------------------------------------------- The `uniq' utility reads sorted lines of data on its standard input, @@ -18511,7 +17452,7 @@ line of input data: File: gawk.info, Node: Wc Program, Prev: Uniq Program, Up: Clones -14.2.7 Counting Things +13.2.7 Counting Things ---------------------- The `wc' (word count) utility counts lines, words, and characters in @@ -18656,7 +17597,7 @@ characters, not bytes. File: gawk.info, Node: Miscellaneous Programs, Prev: Clones, Up: Sample Programs -14.3 A Grab Bag of `awk' Programs +13.3 A Grab Bag of `awk' Programs ================================= This minor node is a large "grab bag" of miscellaneous programs. We @@ -18683,7 +17624,7 @@ hope you find them both interesting and enjoyable. File: gawk.info, Node: Dupword Program, Next: Alarm Program, Up: Miscellaneous Programs -14.3.1 Finding Duplicated Words in a Document +13.3.1 Finding Duplicated Words in a Document --------------------------------------------- A common error when writing large amounts of prose is to accidentally @@ -18731,7 +17672,7 @@ word, comparing it to the previous one: File: gawk.info, Node: Alarm Program, Next: Translate Program, Prev: Dupword Program, Up: Miscellaneous Programs -14.3.2 An Alarm Clock Program +13.3.2 An Alarm Clock Program ----------------------------- Nothing cures insomnia like a ringing alarm clock. @@ -18864,7 +17805,7 @@ necessary: File: gawk.info, Node: Translate Program, Next: Labels Program, Prev: Alarm Program, Up: Miscellaneous Programs -14.3.3 Transliterating Characters +13.3.3 Transliterating Characters --------------------------------- The system `tr' utility transliterates characters. For example, it is @@ -18990,7 +17931,7 @@ split each character in a string into separate array elements. File: gawk.info, Node: Labels Program, Next: Word Sorting, Prev: Translate Program, Up: Miscellaneous Programs -14.3.4 Printing Mailing Labels +13.3.4 Printing Mailing Labels ------------------------------ Here is a "real world"(1) program. This script reads lists of names and @@ -19097,7 +18038,7 @@ something done." File: gawk.info, Node: Word Sorting, Next: History Sorting, Prev: Labels Program, Up: Miscellaneous Programs -14.3.5 Generating Word-Usage Counts +13.3.5 Generating Word-Usage Counts ----------------------------------- When working with large amounts of text, it can be interesting to know @@ -19201,7 +18142,7 @@ operating system documentation for more information on how to use the File: gawk.info, Node: History Sorting, Next: Extract Program, Prev: Word Sorting, Up: Miscellaneous Programs -14.3.6 Removing Duplicates from Unsorted Text +13.3.6 Removing Duplicates from Unsorted Text --------------------------------------------- The `uniq' program (*note Uniq Program::), removes duplicate lines from @@ -19248,7 +18189,7 @@ seen. File: gawk.info, Node: Extract Program, Next: Simple Sed, Prev: History Sorting, Up: Miscellaneous Programs -14.3.7 Extracting Programs from Texinfo Source Files +13.3.7 Extracting Programs from Texinfo Source Files ---------------------------------------------------- The nodes *note Library Functions::, and *note Sample Programs::, are @@ -19448,7 +18389,7 @@ function. Consider how you might use it to simplify the code. File: gawk.info, Node: Simple Sed, Next: Igawk Program, Prev: Extract Program, Up: Miscellaneous Programs -14.3.8 A Simple Stream Editor +13.3.8 A Simple Stream Editor ----------------------------- The `sed' utility is a stream editor, a program that reads a stream of @@ -19529,7 +18470,7 @@ the single rule handles the printing scheme outlined above, using File: gawk.info, Node: Igawk Program, Next: Anagram Program, Prev: Simple Sed, Up: Miscellaneous Programs -14.3.9 An Easy Way to Use Library Functions +13.3.9 An Easy Way to Use Library Functions ------------------------------------------- In *note Include Files::, we saw how `gawk' provides a built-in @@ -19926,7 +18867,7 @@ can loop forever if the file exists but is empty. Caveat emptor. File: gawk.info, Node: Anagram Program, Next: Signature Program, Prev: Igawk Program, Up: Miscellaneous Programs -14.3.10 Finding Anagrams From A Dictionary +13.3.10 Finding Anagrams From A Dictionary ------------------------------------------ An interesting programming challenge is to search for "anagrams" in a @@ -20016,7 +18957,7 @@ otherwise the anagrams would appear in arbitrary order: File: gawk.info, Node: Signature Program, Prev: Anagram Program, Up: Miscellaneous Programs -14.3.11 And Now For Something Completely Different +13.3.11 And Now For Something Completely Different -------------------------------------------------- The following program was written by Davide Brini and is published on @@ -20043,7 +18984,7 @@ supplies the following copyright terms: File: gawk.info, Node: Debugger, Next: Dynamic Extensions, Prev: Sample Programs, Up: Top -15 Debugging `awk' Programs +14 Debugging `awk' Programs *************************** It would be nice if computer programs worked perfectly the first time @@ -20067,7 +19008,7 @@ program is easy. File: gawk.info, Node: Debugging, Next: Sample Debugging Session, Up: Debugger -15.1 Introduction to `gawk' Debugger +14.1 Introduction to `gawk' Debugger ==================================== This minor node introduces debugging in general and begins the @@ -20082,7 +19023,7 @@ discussion of debugging in `gawk'. File: gawk.info, Node: Debugging Concepts, Next: Debugging Terms, Up: Debugging -15.1.1 Debugging in General +14.1.1 Debugging in General --------------------------- (If you have used debuggers in other languages, you may want to skip @@ -20122,7 +19063,7 @@ functional program that you or someone else wrote). File: gawk.info, Node: Debugging Terms, Next: Awk Debugging, Prev: Debugging Concepts, Up: Debugging -15.1.2 Additional Debugging Concepts +14.1.2 Additional Debugging Concepts ------------------------------------ Before diving in to the details, we need to introduce several important @@ -20174,7 +19115,7 @@ defines terms used throughout the rest of this major node. File: gawk.info, Node: Awk Debugging, Prev: Debugging Terms, Up: Debugging -15.1.3 Awk Debugging +14.1.3 Awk Debugging -------------------- Debugging an `awk' program has some specific aspects that are not @@ -20196,7 +19137,7 @@ commands. File: gawk.info, Node: Sample Debugging Session, Next: List of Debugger Commands, Prev: Debugging, Up: Debugger -15.2 Sample Debugging Session +14.2 Sample Debugging Session ============================= In order to illustrate the use of `gawk' as a debugger, let's look at a @@ -20212,7 +19153,7 @@ example. File: gawk.info, Node: Debugger Invocation, Next: Finding The Bug, Up: Sample Debugging Session -15.2.1 How to Start the Debugger +14.2.1 How to Start the Debugger -------------------------------- Starting the debugger is almost exactly like running `awk', except you @@ -20244,7 +19185,7 @@ code has been executed. File: gawk.info, Node: Finding The Bug, Prev: Debugger Invocation, Up: Sample Debugging Session -15.2.2 Finding the Bug +14.2.2 Finding the Bug ---------------------- Let's say that we are having a problem using (a faulty version of) @@ -20441,7 +19382,7 @@ and problem solved! File: gawk.info, Node: List of Debugger Commands, Next: Readline Support, Prev: Sample Debugging Session, Up: Debugger -15.3 Main Debugger Commands +14.3 Main Debugger Commands =========================== The `gawk' debugger command set can be divided into the following @@ -20480,7 +19421,7 @@ when just hitting <Enter>. This works for the commands `list', `next', File: gawk.info, Node: Breakpoint Control, Next: Debugger Execution Control, Up: List of Debugger Commands -15.3.1 Control of Breakpoints +14.3.1 Control of Breakpoints ----------------------------- As we saw above, the first thing you probably want to do in a debugging @@ -20575,7 +19516,7 @@ controlling breakpoints are: File: gawk.info, Node: Debugger Execution Control, Next: Viewing And Changing Data, Prev: Breakpoint Control, Up: List of Debugger Commands -15.3.2 Control of Execution +14.3.2 Control of Execution --------------------------- Now that your breakpoints are ready, you can start running the program @@ -20665,7 +19606,7 @@ execution of the program than we saw in our earlier example: File: gawk.info, Node: Viewing And Changing Data, Next: Execution Stack, Prev: Debugger Execution Control, Up: List of Debugger Commands -15.3.3 Viewing and Changing Data +14.3.3 Viewing and Changing Data -------------------------------- The commands for viewing and changing variables inside of `gawk' are: @@ -20754,7 +19695,7 @@ AWK STATEMENTS File: gawk.info, Node: Execution Stack, Next: Debugger Info, Prev: Viewing And Changing Data, Up: List of Debugger Commands -15.3.4 Dealing with the Stack +14.3.4 Dealing with the Stack ----------------------------- Whenever you run a program which contains any function calls, `gawk' @@ -20791,7 +19732,7 @@ are: File: gawk.info, Node: Debugger Info, Next: Miscellaneous Debugger Commands, Prev: Execution Stack, Up: List of Debugger Commands -15.3.5 Obtaining Information about the Program and the Debugger State +14.3.5 Obtaining Information about the Program and the Debugger State --------------------------------------------------------------------- Besides looking at the values of variables, there is often a need to get @@ -20900,7 +19841,7 @@ from a file. The commands are: File: gawk.info, Node: Miscellaneous Debugger Commands, Prev: Debugger Info, Up: List of Debugger Commands -15.3.6 Miscellaneous Commands +14.3.6 Miscellaneous Commands ----------------------------- There are a few more commands which do not fit into the previous @@ -21020,7 +19961,7 @@ categories, as follows: File: gawk.info, Node: Readline Support, Next: Limitations, Prev: List of Debugger Commands, Up: Debugger -15.4 Readline Support +14.4 Readline Support ===================== If `gawk' is compiled with the `readline' library, you can take @@ -21047,7 +19988,7 @@ Variable name completion File: gawk.info, Node: Limitations, Prev: Readline Support, Up: Debugger -15.5 Limitations and Future Plans +14.5 Limitations and Future Plans ================================= We hope you find the `gawk' debugger useful and enjoyable to work with, @@ -21095,7 +20036,7 @@ yourself! File: gawk.info, Node: Dynamic Extensions, Next: Language History, Prev: Debugger, Up: Top -16 Writing Extensions for `gawk' +15 Writing Extensions for `gawk' ******************************** This chapter is a placeholder, pending a rewrite for the new API. Some @@ -21119,7 +20060,7 @@ is necessary when reading this minor node. File: gawk.info, Node: Plugin License, Next: Sample Library, Up: Dynamic Extensions -16.1 Extension Licensing +15.1 Extension Licensing ======================== Every dynamic extension should define the global symbol @@ -21136,7 +20077,7 @@ the symbol exists in the global scope. Something like this is enough: File: gawk.info, Node: Sample Library, Prev: Plugin License, Up: Dynamic Extensions -16.2 Example: Directory and File Operation Built-ins +15.2 Example: Directory and File Operation Built-ins ==================================================== Two useful functions that are not in `awk' are `chdir()' (so that an @@ -21153,7 +20094,7 @@ implements these functions for `gawk' in an external extension library. File: gawk.info, Node: Internal File Description, Next: Internal File Ops, Up: Sample Library -16.2.1 Using `chdir()' and `stat()' +15.2.1 Using `chdir()' and `stat()' ----------------------------------- This minor node shows how to use the new functions at the `awk' level @@ -21276,7 +20217,7 @@ Elements::): File: gawk.info, Node: Internal File Ops, Next: Using Internal File Ops, Prev: Internal File Description, Up: Sample Library -16.2.2 C Code for `chdir()' and `stat()' +15.2.2 C Code for `chdir()' and `stat()' ---------------------------------------- Here is the C code for these extensions. They were written for @@ -21417,7 +20358,7 @@ version. File: gawk.info, Node: Using Internal File Ops, Prev: Internal File Ops, Up: Sample Library -16.2.3 Integrating the Extensions +15.2.3 Integrating the Extensions --------------------------------- Now that the code is written, it must be possible to add it at runtime @@ -21486,6 +20427,1068 @@ shared library: -| JUNK modified: 01 01 70 02:00:00 +File: gawk.info, Node: Arbitrary Precision Arithmetic, Next: Advanced Features, Prev: Internationalization, Up: Top + +16 Arithmetic and Arbitrary Precision Arithmetic with `gawk' +************************************************************ + + There's a credibility gap: We don't know how much of the + computer's answers to believe. Novice computer users solve this + problem by implicitly trusting in the computer as an infallible + authority; they tend to believe that all digits of a printed + answer are significant. Disillusioned computer users have just the + opposite approach; they are constantly afraid that their answers + are almost meaningless. + Donald Knuth(1) + + This major node discusses issues that you may encounter when +performing arithmetic. It begins by discussing some of the general +atributes of computer arithmetic, along with how this can influence +what you see when running `awk' programs. This discussion applies to +all versions of `awk'. + + Then the discussion moves on to "arbitrary precsion arithmetic", a +feature which is specific to `gawk'. + +* Menu: + +* General Arithmetic:: An introduction to computer arithmetic. +* Floating-point Programming:: Effective Floating-point Programming. +* Gawk and MPFR:: How `gawk' provides + aribitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic + with `gawk'. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + `gawk'. + + ---------- Footnotes ---------- + + (1) Donald E. Knuth. `The Art of Computer Programming'. Volume 2, +`Seminumerical Algorithms', third edition, 1998, ISBN 0-201-89683-4, p. +229. + + +File: gawk.info, Node: General Arithmetic, Next: Floating-point Programming, Up: Arbitrary Precision Arithmetic + +16.1 A General Description of Computer Arithmetic +================================================= + +Within computers, there are two kinds of numeric values: "integers" and +"floating-point". In school, integer values were referred to as +"whole" numbers--that is, numbers without any fractional part, such as +1, 42, or -17. The advantage to integer numbers is that they represent +values exactly. The disadvantage is that their range is limited. On +most systems, this range is -2,147,483,648 to 2,147,483,647. However, +many systems now support a range from -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807. + + Integer values come in two flavors: "signed" and "unsigned". Signed +values may be negative or positive, with the range of values just +described. Unsigned values are always positive. On most systems, the +range is from 0 to 4,294,967,295. However, many systems now support a +range from 0 to 18,446,744,073,709,551,615. + + Floating-point numbers represent what are called "real" numbers; +i.e., those that do have a fractional part, such as 3.1415927. The +advantage to floating-point numbers is that they can represent a much +larger range of values. The disadvantage is that there are numbers +that they cannot represent exactly. `awk' uses "double precision" +floating-point numbers, which can hold more digits than "single +precision" floating-point numbers. + + There a several important issues to be aware of, described next. + +* Menu: + +* Floating Point Issues:: Stuff to know about floating-point numbers. +* Integer Programming:: Effective integer programming. + + +File: gawk.info, Node: Floating Point Issues, Next: Integer Programming, Up: General Arithmetic + +16.1.1 Floating-Point Number Caveats +------------------------------------ + +As mentioned earlier, floating-point numbers represent what are called +"real" numbers, i.e., those that have a fractional part. `awk' uses +double precision floating-point numbers to represent all numeric +values. This minor node describes some of the issues involved in using +floating-point numbers. + + There is a very nice paper on floating-point arithmetic +(http://www.validlab.com/goldberg/paper.pdf) by David Goldberg, "What +Every Computer Scientist Should Know About Floating-point Arithmetic," +`ACM Computing Surveys' *23*, 1 (1991-03), 5-48. This is worth reading +if you are interested in the details, but it does require a background +in computer science. + +* Menu: + +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not Abstract + Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. + + +File: gawk.info, Node: String Conversion Precision, Next: Unexpected Results, Up: Floating Point Issues + +16.1.1.1 The String Value Can Lie +................................. + +Internally, `awk' keeps both the numeric value (double precision +floating-point) and the string value for a variable. Separately, `awk' +keeps track of what type the variable has (*note Typing and +Comparison::), which plays a role in how variables are used in +comparisons. + + It is important to note that the string value for a number may not +reflect the full value (all the digits) that the numeric value actually +contains. The following program (`values.awk') illustrates this: + + { + sum = $1 + $2 + # see it for what it is + printf("sum = %.12g\n", sum) + # use CONVFMT + a = "<" sum ">" + print "a =", a + # use OFMT + print "sum =", sum + } + +This program shows the full value of the sum of `$1' and `$2' using +`printf', and then prints the string values obtained from both +automatic conversion (via `CONVFMT') and from printing (via `OFMT'). + + Here is what happens when the program is run: + + $ echo 3.654321 1.2345678 | awk -f values.awk + -| sum = 4.8888888 + -| a = <4.88889> + -| sum = 4.88889 + + This makes it clear that the full numeric value is different from +what the default string representations show. + + `CONVFMT''s default value is `"%.6g"', which yields a value with at +least six significant digits. For some applications, you might want to +change it to specify more precision. On most modern machines, most of +the time, 17 digits is enough to capture a floating-point number's +value exactly.(1) + + ---------- Footnotes ---------- + + (1) Pathological cases can require up to 752 digits (!), but we +doubt that you need to worry about this. + + +File: gawk.info, Node: Unexpected Results, Next: POSIX Floating Point Problems, Prev: String Conversion Precision, Up: Floating Point Issues + +16.1.1.2 Floating Point Numbers Are Not Abstract Numbers +........................................................ + +Unlike numbers in the abstract sense (such as what you studied in high +school or college arithmetic), numbers stored in computers are limited +in certain ways. They cannot represent an infinite number of digits, +nor can they always represent things exactly. In particular, +floating-point numbers cannot always represent values exactly. Here is +an example: + + $ awk '{ printf("%010d\n", $1 * 100) }' + 515.79 + -| 0000051579 + 515.80 + -| 0000051579 + 515.81 + -| 0000051580 + 515.82 + -| 0000051582 + Ctrl-d + +This shows that some values can be represented exactly, whereas others +are only approximated. This is not a "bug" in `awk', but simply an +artifact of how computers represent numbers. + + NOTE: It cannot be emphasized enough that the behavior just + described is fundamental to modern computers. You will see this + kind of thing happen in _any_ programming language using hardware + floating-point numbers. It is _not_ a bug in `gawk', nor is it + something that can be "just fixed." + + Another peculiarity of floating-point numbers on modern systems is +that they often have more than one representation for the number zero! +In particular, it is possible to represent "minus zero" as well as +regular, or "positive" zero. + + This example shows that negative and positive zero are distinct +values when stored internally, but that they are in fact equal to each +other, as well as to "regular" zero: + + $ gawk 'BEGIN { mz = -0 ; pz = 0 + > printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz + > printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0 + > }' + -| -0 = -0, +0 = 0, (-0 == +0) -> 1 + -| mz == 0 -> 1, pz == 0 -> 1 + + It helps to keep this in mind should you process numeric data that +contains negative zero values; the fact that the zero is negative is +noted and can affect comparisons. + + +File: gawk.info, Node: POSIX Floating Point Problems, Prev: Unexpected Results, Up: Floating Point Issues + +16.1.1.3 Standards Versus Existing Practice +........................................... + +Historically, `awk' has converted any non-numeric looking string to the +numeric value zero, when required. Furthermore, the original +definition of the language and the original POSIX standards specified +that `awk' only understands decimal numbers (base 10), and not octal +(base 8) or hexadecimal numbers (base 16). + + Changes in the language of the 2001 and 2004 POSIX standards can be +interpreted to imply that `awk' should support additional features. +These features are: + + * Interpretation of floating point data values specified in + hexadecimal notation (`0xDEADBEEF'). (Note: data values, _not_ + source code constants.) + + * Support for the special IEEE 754 floating point values "Not A + Number" (NaN), positive Infinity ("inf") and negative Infinity + ("-inf"). In particular, the format for these values is as + specified by the ISO 1999 C standard, which ignores case and can + allow machine-dependent additional characters after the `nan' and + allow either `inf' or `infinity'. + + The first problem is that both of these are clear changes to +historical practice: + + * The `gawk' maintainer feels that supporting hexadecimal floating + point values, in particular, is ugly, and was never intended by the + original designers to be part of the language. + + * Allowing completely alphabetic strings to have valid numeric + values is also a very severe departure from historical practice. + + The second problem is that the `gawk' maintainer feels that this +interpretation of the standard, which requires a certain amount of +"language lawyering" to arrive at in the first place, was not even +intended by the standard developers. In other words, "we see how you +got where you are, but we don't think that that's where you want to be." + + Recognizing the above issues, but attempting to provide compatibility +with the earlier versions of the standard, the 2008 POSIX standard +added explicit wording to allow, but not require, that `awk' support +hexadecimal floating point values and special values for "Not A Number" +and infinity. + + Although the `gawk' maintainer continues to feel that providing +those features is inadvisable, nevertheless, on systems that support +IEEE floating point, it seems reasonable to provide _some_ way to +support NaN and Infinity values. The solution implemented in `gawk' is +as follows: + + * With the `--posix' command-line option, `gawk' becomes "hands + off." String values are passed directly to the system library's + `strtod()' function, and if it successfully returns a numeric + value, that is what's used.(1) By definition, the results are not + portable across different systems. They are also a little + surprising: + + $ echo nanny | gawk --posix '{ print $1 + 0 }' + -| nan + $ echo 0xDeadBeef | gawk --posix '{ print $1 + 0 }' + -| 3735928559 + + * Without `--posix', `gawk' interprets the four strings `+inf', + `-inf', `+nan', and `-nan' specially, producing the corresponding + special numeric values. The leading sign acts a signal to `gawk' + (and the user) that the value is really numeric. Hexadecimal + floating point is not supported (unless you also use + `--non-decimal-data', which is _not_ recommended). For example: + + $ echo nanny | gawk '{ print $1 + 0 }' + -| 0 + $ echo +nan | gawk '{ print $1 + 0 }' + -| nan + $ echo 0xDeadBeef | gawk '{ print $1 + 0 }' + -| 0 + + `gawk' does ignore case in the four special values. Thus `+nan' + and `+NaN' are the same. + + ---------- Footnotes ---------- + + (1) You asked for it, you got it. + + +File: gawk.info, Node: Integer Programming, Prev: Floating Point Issues, Up: General Arithmetic + +16.1.2 Mixing Integers And Floating-point +----------------------------------------- + +As has been mentioned already, `gawk' ordinarily uses hardware double +precision with 64-bit IEEE binary floating-point representation for +numbers on most systems. A large integer like 9007199254740997 has a +binary representation that, although finite, is more than 53 bits long; +it must also be rounded to 53 bits. The biggest integer that can be +stored in a C `double' is usually the same as the largest possible +value of a `double'. If your system `double' is an IEEE 64-bit +`double', this largest possible value is an integer and can be +represented precisely. What more should one know about integers? + + If you want to know what is the largest integer, such that it and +all smaller integers can be stored in 64-bit doubles without losing +precision, then the answer is 2^53. The next representable number is +the even number 2^53 + 2, meaning it is unlikely that you will be able +to make `gawk' print 2^53 + 1 in integer format. The range of integers +exactly representable by a 64-bit double is [-2^53, 2^53]. If you ever +see an integer outside this range in `gawk' using 64-bit doubles, you +have reason to be very suspicious about the accuracy of the output. +Here is a simple program with erroneous output: + + $ gawk 'BEGIN { i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j }' + -| 9007199254740991 + -| 9007199254740992 + -| 9007199254740992 + -| 9007199254740994 + + The lesson is to not assume that any large integer printed by `gawk' +represents an exact result from your computation, especially if it wraps +around on your screen. + + +File: gawk.info, Node: Floating-point Programming, Next: Gawk and MPFR, Prev: General Arithmetic, Up: Arbitrary Precision Arithmetic + +16.2 Understanding Floating-point Programming +============================================= + +Numerical programming is an extensive area; if you need to develop +sophisticated numerical algorithms then `gawk' may not be the ideal +tool, and this documentation may not be sufficient. It might require +digesting a book or two to really internalize how to compute with ideal +accuracy and precision and the result often depends on the particular +application. + + NOTE: A floating-point calculation's "accuracy" is how close it + comes to the real value. This is as opposed to the "precision", + which usually refers to the number of bits used to represent the + number (see the Wikipedia article + (http://en.wikipedia.org/wiki/Accuracy_and_precision) for more + information). + + There are two options for doing floating-point calculations: +hardware floating-point (as used by standard `awk' and the default for +`gawk'), and "arbitrary-precision" floating-point, which is software +based. This major node aims to provide enough information to +understand both, and then will focus on `gawk''s facilities for the +latter.(1) + + Binary floating-point representations and arithmetic are inexact. +Simple values like 0.1 cannot be precisely represented using binary +floating-point numbers, and the limited precision of floating-point +numbers means that slight changes in the order of operations or the +precision of intermediate storage can change the result. To make +matters worse, with arbitrary precision floating-point, you can set the +precision before starting a computation, but then you cannot be sure of +the number of significant decimal places in the final result. + + Sometimes, before you start to write any code, you should think more +about what you really want and what's really happening. Consider the +two numbers in the following example: + + x = 0.875 # 1/2 + 1/4 + 1/8 + y = 0.425 + + Unlike the number in `y', the number stored in `x' is exactly +representable in binary since it can be written as a finite sum of one +or more fractions whose denominators are all powers of two. When +`gawk' reads a floating-point number from program source, it +automatically rounds that number to whatever precision your machine +supports. If you try to print the numeric content of a variable using +an output format string of `"%.17g"', it may not produce the same +number as you assigned to it: + + $ gawk 'BEGIN { x = 0.875; y = 0.425 + > printf("%0.17g, %0.17g\n", x, y) }' + -| 0.875, 0.42499999999999999 + + Often the error is so small you do not even notice it, and if you do, +you can always specify how much precision you would like in your output. +Usually this is a format string like `"%.15g"', which when used in the +previous example, produces an output identical to the input. + + Because the underlying representation can be little bit off from the +exact value, comparing floating-point values to see if they are equal +is generally not a good idea. Here is an example where it does not +work like you expect: + + $ gawk 'BEGIN { print (0.1 + 12.2 == 12.3) }' + -| 0 + + The loss of accuracy during a single computation with floating-point +numbers usually isn't enough to worry about. However, if you compute a +value which is the result of a sequence of floating point operations, +the error can accumulate and greatly affect the computation itself. +Here is an attempt to compute the value of the constant pi using one of +its many series representations: + + BEGIN { + x = 1.0 / sqrt(3.0) + n = 6 + for (i = 1; i < 30; i++) { + n = n * 2.0 + x = (sqrt(x * x + 1) - 1) / x + printf("%.15f\n", n * x) + } + } + + When run, the early errors propagating through later computations +cause the loop to terminate prematurely after an attempt to divide by +zero. + + $ gawk -f pi.awk + -| 3.215390309173475 + -| 3.159659942097510 + -| 3.146086215131467 + -| 3.142714599645573 + ... + -| 3.224515243534819 + -| 2.791117213058638 + -| 0.000000000000000 + error--> gawk: pi.awk:6: fatal: division by zero attempted + + Here is one more example where the inaccuracies in internal +representations yield an unexpected result: + + $ gawk 'BEGIN { + > for (d = 1.1; d <= 1.5; d += 0.1) + > i++ + > print i + > }' + -| 4 + + Can computation using aribitrary precision help with the previous +examples? If you are impatient to know, see *note Exact Arithmetic::. + + Instead of aribitrary precision floating-point arithmetic, often all +you need is an adjustment of your logic or a different order for the +operations in your calculation. The stability and the accuracy of the +computation of the constant pi in the previous example can be enhanced +by using the following simple algebraic transformation: + + (sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1) + +After making this, change the program does converge to pi in under 30 +iterations: + + $ gawk -f /tmp/pi2.awk + -| 3.215390309173473 + -| 3.159659942097501 + -| 3.146086215131436 + -| 3.142714599645370 + -| 3.141873049979825 + ... + -| 3.141592653589797 + -| 3.141592653589797 + + There is no need to be unduly suspicious about the results from +floating-point arithmetic. The lesson to remember is that +floating-point arithmetic is always more complex than the arithmetic +using pencil and paper. In order to take advantage of the power of +computer floating-point, you need to know its limitations and work +within them. For most casual use of floating-point arithmetic, you will +often get the expected result in the end if you simply round the +display of your final results to the correct number of significant +decimal digits. And, avoid presenting numerical data in a manner that +implies better precision than is actually the case. + +* Menu: + +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. + + ---------- Footnotes ---------- + + (1) If you are interested in other tools that perform arbitrary +precision arithmetic, you may want to investigate the POSIX `bc' tool. +See the POSIX specification for it +(http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html), for +more information. + + +File: gawk.info, Node: Floating-point Representation, Next: Floating-point Context, Up: Floating-point Programming + +16.2.1 Binary Floating-point Representation +------------------------------------------- + +Although floating-point representations vary from machine to machine, +the most commonly encountered representation is that defined by the +IEEE 754 Standard. An IEEE-754 format value has three components: + + * A sign bit telling whether the number is positive or negative. + + * An "exponent" giving its order of magnitude, E. + + * A "significand", S, specifying the actual digits of the number. + + The value of the number is then S * 2^E. The first bit of a +non-zero binary significand is always one, so the significand in an +IEEE-754 format only includes the fractional part, leaving the leading +one implicit. + + Three of the standard IEEE-754 types are 32-bit single precision, +64-bit double precision and 128-bit quadruple precision. The standard +also specifies extended precision formats to allow greater precisions +and larger exponent ranges. + + The significand is stored in "normalized" format, which means that +the first bit is always a one. + + +File: gawk.info, Node: Floating-point Context, Next: Rounding Mode, Prev: Floating-point Representation, Up: Floating-point Programming + +16.2.2 Floating-point Context +----------------------------- + +A floating-point "context" defines the environment for arithmetic +operations. It governs precision, sets rules for rounding, and limits +the range for exponents. The context has the following primary +components: + +"Precision" + Precision of the floating-point format in bits. + +"emax" + Maximum exponent allowed for this format. + +"emin" + Minimum exponent allowed for this format. + +"Underflow behavior" + The format may or may not support gradual underflow. + +"Rounding" + The rounding mode of this context. + + *note table-ieee-formats:: lists the precision and exponent field +values for the basic IEEE-754 binary formats: + +Name Total bits Precision emin emax +--------------------------------------------------------------------------- +Single 32 24 -126 +127 +Double 64 53 -1022 +1023 +Quadruple 128 113 -16382 +16383 + +Table 16.1: Basic IEEE Format Context Values + + NOTE: The precision numbers include the implied leading one that + gives them one extra bit of significand. + + A floating-point context can also determine which signals are treated +as exceptions, and can set rules for arithmetic with special values. +Please consult the IEEE-754 standard or other resources for details. + + `gawk' ordinarily uses the hardware double precision representation +for numbers. On most systems, this is IEEE-754 floating-point format, +corresponding to 64-bit binary with 53 bits of precision. + + NOTE: In case an underflow occurs, the standard allows, but does + not require, the result from an arithmetic operation to be a + number smaller than the smallest nonzero normalized number. Such + numbers do not have as many significant digits as normal numbers, + and are called "denormals" or "subnormals". The alternative, + simply returning a zero, is called "flush to zero". The basic + IEEE-754 binary formats support subnormal numbers. + + +File: gawk.info, Node: Rounding Mode, Prev: Floating-point Context, Up: Floating-point Programming + +16.2.3 Floating-point Rounding Mode +----------------------------------- + +The "rounding mode" specifies the behavior for the results of numerical +operations when discarding extra precision. Each rounding mode indicates +how the least significant returned digit of a rounded result is to be +calculated. *note table-rounding-modes:: lists the IEEE-754 defined +rounding modes: + +Rounding Mode IEEE Name +-------------------------------------------------------------------------- +Round to nearest, ties to even `roundTiesToEven' +Round toward plus Infinity `roundTowardPositive' +Round toward negative Infinity `roundTowardNegative' +Round toward zero `roundTowardZero' +Round to nearest, ties away `roundTiesToAway' +from zero + +Table 16.2: IEEE 754 Rounding Modes + + The default mode `roundTiesToEven' is the most preferred, but the +least intuitive. This method does the obvious thing for most values, by +rounding them up or down to the nearest digit. For example, rounding +1.132 to two digits yields 1.13, and rounding 1.157 yields 1.16. + + However, when it comes to rounding a value that is exactly halfway +between, things do not work the way you probably learned in school. In +this case, the number is rounded to the nearest even digit. So +rounding 0.125 to two digits rounds down to 0.12, but rounding 0.6875 +to three digits rounds up to 0.688. You probably have already +encountered this rounding mode when using the `printf' routine to +format floating-point numbers. For example: + + BEGIN { + x = -4.5 + for (i = 1; i < 10; i++) { + x += 1.0 + printf("%4.1f => %2.0f\n", x, x) + } + } + +produces the following output when run:(1) + + -3.5 => -4 + -2.5 => -2 + -1.5 => -2 + -0.5 => 0 + 0.5 => 0 + 1.5 => 2 + 2.5 => 2 + 3.5 => 4 + 4.5 => 4 + + The theory behind the rounding mode `roundTiesToEven' is that it +more or less evenly distributes upward and downward rounds of exact +halves, which might cause the round-off error to cancel itself out. +This is the default rounding mode used in IEEE-754 computing functions +and operators. + + The other rounding modes are rarely used. Round toward positive +infinity (`roundTowardPositive') and round toward negative infinity +(`roundTowardNegative') are often used to implement interval arithmetic, +where you adjust the rounding mode to calculate upper and lower bounds +for the range of output. The `roundTowardZero' mode can be used for +converting floating-point numbers to integers. The rounding mode +`roundTiesToAway' rounds the result to the nearest number and selects +the number with the larger magnitude if a tie occurs. + + Some numerical analysts will tell you that your choice of rounding +style has tremendous impact on the final outcome, and advise you to +wait until final output for any rounding. Instead, you can often avoid +round-off error problems by setting the precision initially to some +value sufficiently larger than the final desired precision, so that the +accumulation of round-off error does not influence the outcome. If you +suspect that results from your computation are sensitive to +accumulation of round-off error, one way to be sure is to look for a +significant difference in output when you change the rounding mode. + + ---------- Footnotes ---------- + + (1) It is possible for the output to be completely different if the +C library in your system does not use the IEEE-754 even-rounding rule +to round halfway cases for `printf()'. + + +File: gawk.info, Node: Gawk and MPFR, Next: Arbitrary Precision Floats, Prev: Floating-point Programming, Up: Arbitrary Precision Arithmetic + +16.3 `gawk' + MPFR = Powerful Arithmetic +======================================== + +The rest of this major node decsribes how to use the arbitrary precision +(also known as "multiple precision" or "infinite precision") numeric +capabilites in `gawk' to produce maximally accurate results when you +need it. + + But first you should check if your version of `gawk' supports +arbitrary precision arithmetic. The easiest way to find out is to look +at the output of the following command: + + $ gawk --version + -| GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) + -| Copyright (C) 1989, 1991-2012 Free Software Foundation. + ... + + `gawk' uses the GNU MPFR (http://www.mpfr.org) and GNU MP +(http://gmplib.org) (GMP) libraries for arbitrary precision arithmetic +on numbers. So if you do not see the names of these libraries in the +output, then your version of `gawk' does not support arbitrary +precision arithmetic. + + Additionally, there are a few elements available in the `PROCINFO' +array to provide information about the MPFR and GMP libraries. *Note +Auto-set::, for more information. + + +File: gawk.info, Node: Arbitrary Precision Floats, Next: Arbitrary Precision Integers, Prev: Gawk and MPFR, Up: Arbitrary Precision Arithmetic + +16.4 Arbitrary Precision Floating-point Arithmetic with `gawk' +============================================================== + +`gawk' uses the GNU MPFR library for arbitrary precision floating-point +arithmetic. The MPFR library provides precise control over precisions +and rounding modes, and gives correctly rounded reproducible +platform-independent results. With the command-line option `--bignum' +or `-M', all floating-point arithmetic operators and numeric functions +can yield results to any desired precision level supported by MPFR. +Two built-in variables `PREC' (*note Setting Precision::) and +`ROUNDMODE' (*note Setting Rounding Mode::) provide control over the +working precision and the rounding mode. The precision and the +rounding mode are set globally for every operation to follow. + + The default working precision for arbitrary precision floating-point +values is 53, and the default value for `ROUNDMODE' is `"N"', which +selects the IEEE-754 `roundTiesToEven' (*note Rounding Mode::) rounding +mode.(1) `gawk' uses the default exponent range in MPFR (EMAX = 2^30 - +1, EMIN = -EMAX) for all floating-point contexts. There is no explicit +mechanism to adjust the exponent range. MPFR does not implement +subnormal numbers by default, and this behavior cannot be changed in +`gawk'. + + NOTE: When emulating an IEEE-754 format (*note Setting + Precision::), `gawk' internally adjusts the exponent range to the + value defined for the format and also performs computations needed + for gradual underflow (subnormal numbers). + + NOTE: MPFR numbers are variable-size entities, consuming only as + much space as needed to store the significant digits. Since the + performance using MPFR numbers pales in comparison to doing + arithmetic using the underlying machine types, you should consider + using only as much precision as needed by your program. + +* Menu: + +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point numbers. + + ---------- Footnotes ---------- + + (1) The default precision is 53, since according to the MPFR +documentation, the library should be able to exactly reproduce all +computations with double-precision machine floating-point numbers +(`double' type in C), except the default exponent range is much wider +and subnormal numbers are not implemented. + + +File: gawk.info, Node: Setting Precision, Next: Setting Rounding Mode, Up: Arbitrary Precision Floats + +16.4.1 Setting the Working Precision +------------------------------------ + +`gawk' uses a global working precision; it does not keep track of the +precision or accuracy of individual numbers. Performing an arithmetic +operation or calling a built-in function rounds the result to the +current working precision. The default working precision is 53 which +can be modified using the built-in variable `PREC'. You can also set the +value to one of the following pre-defined case-insensitive strings to +emulate an IEEE-754 binary format: + +`PREC' IEEE-754 Binary Format +--------------------------------------------------- +`"half"' 16-bit half-precision. +`"single"' Basic 32-bit single precision. +`"double"' Basic 64-bit double precision. +`"quad"' Basic 128-bit quadruple precision. +`"oct"' 256-bit octuple precision. + + The following example illustrates the effects of changing precision +on arithmetic operations: + + $ gawk -M -vPREC=100 'BEGIN { x = 1.0e-400; print x + 0; \ + > PREC = "double"; print x + 0 }' + -| 1e-400 + -| 0 + + Binary and decimal precisions are related approximately according to +the formula: + + PREC = 3.322 * DPS + +Here, PREC denotes the binary precision (measured in bits) and DPS +(short for decimal places) is the decimal digits. We can easily +calculate how many decimal digits the 53-bit significand of an IEEE +double is equivalent to: 53 / 3.332 which is equal to about 15.95. But +what does 15.95 digits actually mean? It depends whether you are +concerned about how many digits you can rely on, or how many digits you +need. + + It is important to know how many bits it takes to uniquely identify +a double-precision value (the C type `double'). If you want to convert +from `double' to decimal and back to `double' (e.g., saving a `double' +representing an intermediate result to a file, and later reading it +back to restart the computation), then a few more decimal digits are +required. 17 digits is generally enough for a `double'. + + It can also be important to know what decimal numbers can be uniquely +represented with a `double'. If you want to convert from decimal to +`double' and back again, 15 digits is the most that you can get. Stated +differently, you should not present the numbers from your +floating-point computations with more than 15 significant digits in +them. + + Conversely, it takes a precision of 332 bits to hold an approximation +of the constant pi that is accurate to 100 decimal places. You should +always add some extra bits in order to avoid the confusing round-off +issues that occur because numbers are stored internally in binary. + + +File: gawk.info, Node: Setting Rounding Mode, Next: Floating-point Constants, Prev: Setting Precision, Up: Arbitrary Precision Floats + +16.4.2 Setting the Rounding Mode +-------------------------------- + +The `ROUNDMODE' variable provides program level control over the +rounding mode. The correspondance between `ROUNDMODE' and the IEEE +rounding modes is shown in *note table-gawk-rounding-modes::. + +Rounding Mode IEEE Name `ROUNDMODE' +--------------------------------------------------------------------------- +Round to nearest, ties to even `roundTiesToEven' `"N"' or `"n"' +Round toward plus Infinity `roundTowardPositive' `"U"' or `"u"' +Round toward negative Infinity `roundTowardNegative' `"D"' or `"d"' +Round toward zero `roundTowardZero' `"Z"' or `"z"' +Round to nearest, ties away `roundTiesToAway' `"A"' or `"a"' +from zero + +Table 16.3: `gawk' Rounding Modes + + `ROUNDMODE' has the default value `"N"', which selects the IEEE-754 +rounding mode `roundTiesToEven'. Besides the values listed in *note +Table 16.3: table-gawk-rounding-modes, `gawk' also accepts `"A"' to +select the IEEE-754 mode `roundTiesToAway' if your version of the MPFR +library supports it; otherwise setting `ROUNDMODE' to this value has no +effect. *Note Rounding Mode::, for the meanings of the various rounding +modes. + + Here is an example of how to change the default rounding behavior of +`printf''s output: + + $ gawk -M -vROUNDMODE="Z" 'BEGIN { printf("%.2f\n", 1.378) }' + -| 1.37 + + +File: gawk.info, Node: Floating-point Constants, Next: Changing Precision, Prev: Setting Rounding Mode, Up: Arbitrary Precision Floats + +16.4.3 Representing Floating-point Constants +-------------------------------------------- + +Be wary of floating-point constants! When reading a floating-point +constant from program source code, `gawk' uses the default precision, +unless overridden by an assignment to the special variable `PREC' on +the command line, to store it internally as a MPFR number. Changing +the precision using `PREC' in the program text does not change the +precision of a constant. If you need to represent a floating-point +constant at a higher precision than the default and cannot use a +command line assignment to `PREC', you should either specify the +constant as a string, or as a rational number whenever possible. The +following example illustrates the differences among various ways to +print a floating-point constant: + + $ gawk -M 'BEGIN { PREC = 113; printf("%0.25f\n", 0.1) }' + -| 0.1000000000000000055511151 + $ gawk -M -vPREC = 113 'BEGIN { printf("%0.25f\n", 0.1) }' + -| 0.1000000000000000000000000 + $ gawk -M 'BEGIN { PREC = 113; printf("%0.25f\n", "0.1") }' + -| 0.1000000000000000000000000 + $ gawk -M 'BEGIN { PREC = 113; printf("%0.25f\n", 1/10) }' + -| 0.1000000000000000000000000 + + In the first case, the number is stored with the default precision +of 53. + + +File: gawk.info, Node: Changing Precision, Next: Exact Arithmetic, Prev: Floating-point Constants, Up: Arbitrary Precision Floats + +16.4.4 Changing the Precision of a Number +----------------------------------------- + + The point is that in any variable-precision package, a decision is + made on how to treat numbers given as data, or arising in + intermediate results, which are represented in floating-point + format to a precision lower than working precision. Do we promote + them to full membership of the high-precision club, or do we treat + them and all their associates as second-class citizens? Sometimes + the first course is proper, sometimes the second, and it takes + careful analysis to tell which. + + Dirk Laurie(1) + + `gawk' does not implicitly modify the precision of any previously +computed results when the working precision is changed with an +assignment to `PREC'. The precision of a number is always the one that +was used at the time of its creation, and there is no way for the user +to explicitly change it afterwards. However, since the result of a +floating-point arithmetic operation is always an arbitrary precision +floating-point value--with a precision set by the value of `PREC'--one +of the following workarounds effectively accomplishes the desired +behavior: + + x = x + 0.0 + +or: + + x += 0.0 + + ---------- Footnotes ---------- + + (1) Dirk Laurie. `Variable-precision Arithmetic Considered Perilous +-- A Detective Story'. Electronic Transactions on Numerical Analysis. +Volume 28, pp. 168-173, 2008. + + +File: gawk.info, Node: Exact Arithmetic, Prev: Changing Precision, Up: Arbitrary Precision Floats + +16.4.5 Exact Arithmetic with Floating-point Numbers +--------------------------------------------------- + + CAUTION: Never depend on the exactness of floating-point + arithmetic, even for apparently simple expressions! + + Can arbitrary precision arithmetic give exact results? There are no +easy answers. The standard rules of algebra often do not apply when +using floating-point arithmetic. Among other things, the distributive +and associative laws do not hold completely, and order of operation may +be important for your computation. Rounding error, cumulative precision +loss and underflow are often troublesome. + + When `gawk' tests the expressions `0.1 + 12.2' and `12.3' for +equality using the machine double precision arithmetic, it decides that +they are not equal! (*Note Floating-point Programming::.) You can get +the result you want by increasing the precision; 56 in this case will +get the job done: + + $ gawk -M -vPREC=56 'BEGIN { print (0.1 + 12.2 == 12.3) }' + -| 1 + + If adding more bits is good, perhaps adding even more bits of +precision is better? Here is what happens if we use an even larger +value of `PREC': + + $ gawk -M -vPREC=201 'BEGIN { print (0.1 + 12.2 == 12.3) }' + -| 0 + + This is not a bug in `gawk' or in the MPFR library. It is easy to +forget that the finite number of bits used to store the value is often +just an approximation after proper rounding. The test for equality +succeeds if and only if _all_ bits in the two operands are exactly the +same. Since this is not necessarily true after floating-point +computations with a particular precision and effective rounding rule, a +straight test for equality may not work. + + So, don't assume that floating-point values can be compared for +equality. You should also exercise caution when using other forms of +comparisons. The standard way to compare between floating-point +numbers is to determine how much error (or "tolerance") you will allow +in a comparison and check to see if one value is within this error +range of the other. + + In applications where 15 or fewer decimal places suffice, hardware +double precision arithmetic can be adequate, and is usually much faster. +But you do need to keep in mind that every floating-point operation can +suffer a new rounding error with catastrophic consequences as +illustrated by our attempt to compute the value of the constant pi +(*note Floating-point Programming::). Extra precision can greatly +enhance the stability and the accuracy of your computation in such +cases. + + Repeated addition is not necessarily equivalent to multiplication in +floating-point arithmetic. In the example in *note Floating-point +Programming::: + + $ gawk 'BEGIN { + > for (d = 1.1; d <= 1.5; d += 0.1) + > i++ + > print i + > }' + -| 4 + +you may or may not succeed in getting the correct result by choosing an +arbitrarily large value for `PREC'. Reformulation of the problem at +hand is often the correct approach in such situations. + + +File: gawk.info, Node: Arbitrary Precision Integers, Prev: Arbitrary Precision Floats, Up: Arbitrary Precision Arithmetic + +16.5 Arbitrary Precision Integer Arithmetic with `gawk' +======================================================= + +If the option `--bignum' or `-M' is specified, `gawk' performs all +integer arithmetic using GMP arbitrary precision integers. Any number +that looks like an integer in a program source or data file is stored +as an arbitrary precision integer. The size of the integer is limited +only by your computer's memory. The current floating-point context has +no effect on operations involving integers. For example, the following +computes 5^4^3^2, the result of which is beyond the limits of ordinary +`gawk' numbers: + + $ gawk -M 'BEGIN { + > x = 5^4^3^2 + > print "# of digits =", length(x) + > print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20) + > }' + -| # of digits = 183231 + -| 62060698786608744707 ... 92256259918212890625 + + If you were to compute the same value using arbitrary precision +floating-point values instead, the precision needed for correct output +(using the formula `prec = 3.322 * dps'), would be 3.322 x 183231, or +608693. (Thus, the floating-point representation requires over 30 +times as many decimal digits!) + + The result from an arithmetic operation with an integer and a +floating-point value is a floating-point value with a precision equal +to the working precision. The following program calculates the eighth +term in Sylvester's sequence(1) using a recurrence: + + $ gawk -M 'BEGIN { + > s = 2.0 + > for (i = 1; i <= 7; i++) + > s = s * (s - 1) + 1 + > print s + > }' + -| 113423713055421845118910464 + + The output differs from the acutal number, +113423713055421844361000443, because the default precision of 53 is not +enough to represent the floating-point results exactly. You can either +increase the precision (100 is enough in this case), or replace the +floating-point constant `2.0' with an integer, to perform all +computations using integer arithmetic to get the correct output. + + It will sometimes be necessary for `gawk' to implicitly convert an +arbitrary precision integer into an arbitrary precision floating-point +value. This is primarily because the MPFR library does not always +provide the relevant interface to process arbitrary precision integers +or mixed-mode numbers as needed by an operation or function. In such a +case, the precision is set to the minimum value necessary for exact +conversion, and the working precision is not used for this purpose. If +this is not what you need or want, you can employ a subterfuge like +this: + + gawk -M 'BEGIN { n = 13; print (n + 0.0) % 2.0 }' + + You can avoid this issue altogether by specifying the number as a +floating-point value to begin with: + + gawk -M 'BEGIN { n = 13.0; print n % 2.0 }' + + Note that for the particular example above, there is likely best to +just use the following: + + gawk -M 'BEGIN { n = 13; print n % 2 }' + + ---------- Footnotes ---------- + + (1) Weisstein, Eric W. `Sylvester's Sequence'. From MathWorld--A +Wolfram Web Resource. +`http://mathworld.wolfram.com/SylvestersSequence.html' + + File: gawk.info, Node: Language History, Next: Installation, Prev: Dynamic Extensions, Up: Top Appendix A The Evolution of the `awk' Language @@ -28426,441 +28429,441 @@ Node: History39607 Node: Names41998 Ref: Names-Footnote-143475 Node: This Manual43547 -Ref: This Manual-Footnote-148451 -Node: Conventions48551 -Node: Manual History50685 -Ref: Manual History-Footnote-153955 -Ref: Manual History-Footnote-253996 -Node: How To Contribute54070 -Node: Acknowledgments55214 -Node: Getting Started59710 -Node: Running gawk62089 -Node: One-shot63275 -Node: Read Terminal64500 -Ref: Read Terminal-Footnote-166150 -Ref: Read Terminal-Footnote-266426 -Node: Long66597 -Node: Executable Scripts67973 -Ref: Executable Scripts-Footnote-169842 -Ref: Executable Scripts-Footnote-269944 -Node: Comments70491 -Node: Quoting72958 -Node: DOS Quoting77581 -Node: Sample Data Files78256 -Node: Very Simple81288 -Node: Two Rules85887 -Node: More Complex88034 -Ref: More Complex-Footnote-190964 -Node: Statements/Lines91049 -Ref: Statements/Lines-Footnote-195511 -Node: Other Features95776 -Node: When96704 -Node: Invoking Gawk98851 -Node: Command Line100312 -Node: Options101095 -Ref: Options-Footnote-1116493 -Node: Other Arguments116518 -Node: Naming Standard Input119176 -Node: Environment Variables120270 -Node: AWKPATH Variable120828 -Ref: AWKPATH Variable-Footnote-1123586 -Node: AWKLIBPATH Variable123846 -Node: Other Environment Variables124443 -Node: Exit Status126938 -Node: Include Files127613 -Node: Loading Shared Libraries131182 -Node: Obsolete132407 -Node: Undocumented133104 -Node: Regexp133347 -Node: Regexp Usage134736 -Node: Escape Sequences136762 -Node: Regexp Operators142525 -Ref: Regexp Operators-Footnote-1149905 -Ref: Regexp Operators-Footnote-2150052 -Node: Bracket Expressions150150 -Ref: table-char-classes152040 -Node: GNU Regexp Operators154563 -Node: Case-sensitivity158286 -Ref: Case-sensitivity-Footnote-1161254 -Ref: Case-sensitivity-Footnote-2161489 -Node: Leftmost Longest161597 -Node: Computed Regexps162798 -Node: Reading Files166208 -Node: Records168211 -Ref: Records-Footnote-1177135 -Node: Fields177172 -Ref: Fields-Footnote-1180205 -Node: Nonconstant Fields180291 -Node: Changing Fields182493 -Node: Field Separators188474 -Node: Default Field Splitting191103 -Node: Regexp Field Splitting192220 -Node: Single Character Fields195562 -Node: Command Line Field Separator196621 -Node: Field Splitting Summary200062 -Ref: Field Splitting Summary-Footnote-1203254 -Node: Constant Size203355 -Node: Splitting By Content207939 -Ref: Splitting By Content-Footnote-1211665 -Node: Multiple Line211705 -Ref: Multiple Line-Footnote-1217552 -Node: Getline217731 -Node: Plain Getline219947 -Node: Getline/Variable222036 -Node: Getline/File223177 -Node: Getline/Variable/File224499 -Ref: Getline/Variable/File-Footnote-1226098 -Node: Getline/Pipe226185 -Node: Getline/Variable/Pipe228745 -Node: Getline/Coprocess229852 -Node: Getline/Variable/Coprocess231095 -Node: Getline Notes231809 -Node: Getline Summary234596 -Ref: table-getline-variants235004 -Node: Read Timeout235860 -Ref: Read Timeout-Footnote-1239605 -Node: Command line directories239662 -Node: Printing240292 -Node: Print241923 -Node: Print Examples243260 -Node: Output Separators246044 -Node: OFMT247804 -Node: Printf249162 -Node: Basic Printf250068 -Node: Control Letters251607 -Node: Format Modifiers255419 -Node: Printf Examples261428 -Node: Redirection264143 -Node: Special Files271127 -Node: Special FD271660 -Ref: Special FD-Footnote-1275285 -Node: Special Network275359 -Node: Special Caveats276209 -Node: Close Files And Pipes277005 -Ref: Close Files And Pipes-Footnote-1284028 -Ref: Close Files And Pipes-Footnote-2284176 -Node: Expressions284326 -Node: Values285458 -Node: Constants286134 -Node: Scalar Constants286814 -Ref: Scalar Constants-Footnote-1287673 -Node: Nondecimal-numbers287855 -Node: Regexp Constants290914 -Node: Using Constant Regexps291389 -Node: Variables294444 -Node: Using Variables295099 -Node: Assignment Options296823 -Node: Conversion298695 -Ref: table-locale-affects304071 -Ref: Conversion-Footnote-1304695 -Node: All Operators304804 -Node: Arithmetic Ops305434 -Node: Concatenation307939 -Ref: Concatenation-Footnote-1310732 -Node: Assignment Ops310852 -Ref: table-assign-ops315840 -Node: Increment Ops317248 -Node: Truth Values and Conditions320718 -Node: Truth Values321801 -Node: Typing and Comparison322850 -Node: Variable Typing323639 -Ref: Variable Typing-Footnote-1327536 -Node: Comparison Operators327658 -Ref: table-relational-ops328068 -Node: POSIX String Comparison331617 -Ref: POSIX String Comparison-Footnote-1332573 -Node: Boolean Ops332711 -Ref: Boolean Ops-Footnote-1336789 -Node: Conditional Exp336880 -Node: Function Calls338612 -Node: Precedence342206 -Node: Locales345875 -Node: Patterns and Actions346964 -Node: Pattern Overview348018 -Node: Regexp Patterns349687 -Node: Expression Patterns350230 -Node: Ranges353915 -Node: BEGIN/END356881 -Node: Using BEGIN/END357643 -Ref: Using BEGIN/END-Footnote-1360374 -Node: I/O And BEGIN/END360480 -Node: BEGINFILE/ENDFILE362762 -Node: Empty365666 -Node: Using Shell Variables365982 -Node: Action Overview368267 -Node: Statements370624 -Node: If Statement372478 -Node: While Statement373977 -Node: Do Statement376021 -Node: For Statement377177 -Node: Switch Statement380329 -Node: Break Statement382426 -Node: Continue Statement384416 -Node: Next Statement386209 -Node: Nextfile Statement388599 -Node: Exit Statement391144 -Node: Built-in Variables393560 -Node: User-modified394655 -Ref: User-modified-Footnote-1403010 -Node: Auto-set403072 -Ref: Auto-set-Footnote-1412980 -Node: ARGC and ARGV413185 -Node: Arrays417036 -Node: Array Basics418541 -Node: Array Intro419367 -Node: Reference to Elements423685 -Node: Assigning Elements425955 -Node: Array Example426446 -Node: Scanning an Array428178 -Node: Controlling Scanning430492 -Ref: Controlling Scanning-Footnote-1435425 -Node: Delete435741 -Ref: Delete-Footnote-1438176 -Node: Numeric Array Subscripts438233 -Node: Uninitialized Subscripts440416 -Node: Multi-dimensional442044 -Node: Multi-scanning445138 -Node: Arrays of Arrays446729 -Node: Functions451374 -Node: Built-in452196 -Node: Calling Built-in453274 -Node: Numeric Functions455262 -Ref: Numeric Functions-Footnote-1459094 -Ref: Numeric Functions-Footnote-2459451 -Ref: Numeric Functions-Footnote-3459499 -Node: String Functions459768 -Ref: String Functions-Footnote-1483265 -Ref: String Functions-Footnote-2483394 -Ref: String Functions-Footnote-3483642 -Node: Gory Details483729 -Ref: table-sub-escapes485408 -Ref: table-sub-posix-92486762 -Ref: table-sub-proposed488105 -Ref: table-posix-sub489455 -Ref: table-gensub-escapes491001 -Ref: Gory Details-Footnote-1492208 -Ref: Gory Details-Footnote-2492259 -Node: I/O Functions492410 -Ref: I/O Functions-Footnote-1499065 -Node: Time Functions499212 -Ref: Time Functions-Footnote-1510104 -Ref: Time Functions-Footnote-2510172 -Ref: Time Functions-Footnote-3510330 -Ref: Time Functions-Footnote-4510441 -Ref: Time Functions-Footnote-5510553 -Ref: Time Functions-Footnote-6510780 -Node: Bitwise Functions511046 -Ref: table-bitwise-ops511604 -Ref: Bitwise Functions-Footnote-1515825 -Node: Type Functions516009 -Node: I18N Functions516479 -Node: User-defined518106 -Node: Definition Syntax518910 -Ref: Definition Syntax-Footnote-1523820 -Node: Function Example523889 -Node: Function Caveats526483 -Node: Calling A Function526904 -Node: Variable Scope528019 -Node: Pass By Value/Reference529994 -Node: Return Statement533434 -Node: Dynamic Typing536415 -Node: Indirect Calls537150 -Node: Internationalization546835 -Node: I18N and L10N548274 -Node: Explaining gettext548960 -Ref: Explaining gettext-Footnote-1554026 -Ref: Explaining gettext-Footnote-2554210 -Node: Programmer i18n554375 -Node: Translator i18n558575 -Node: String Extraction559368 -Ref: String Extraction-Footnote-1560329 -Node: Printf Ordering560415 -Ref: Printf Ordering-Footnote-1563199 -Node: I18N Portability563263 -Ref: I18N Portability-Footnote-1565712 -Node: I18N Example565775 -Ref: I18N Example-Footnote-1568410 -Node: Gawk I18N568482 -Node: Arbitrary Precision Arithmetic569099 -Ref: Arbitrary Precision Arithmetic-Footnote-1570751 -Node: General Arithmetic570899 -Node: Floating Point Issues572619 -Node: String Conversion Precision573714 -Ref: String Conversion Precision-Footnote-1575420 -Node: Unexpected Results575529 -Node: POSIX Floating Point Problems577682 -Ref: POSIX Floating Point Problems-Footnote-1581507 -Node: Integer Programming581545 -Node: Floating-point Programming583293 -Ref: Floating-point Programming-Footnote-1589557 -Node: Floating-point Representation589821 -Node: Floating-point Context590988 -Ref: table-ieee-formats591830 -Node: Rounding Mode593214 -Ref: table-rounding-modes593693 -Ref: Rounding Mode-Footnote-1596697 -Node: Gawk and MPFR596878 -Node: Arbitrary Precision Floats598119 -Ref: Arbitrary Precision Floats-Footnote-1600541 -Node: Setting Precision600852 -Node: Setting Rounding Mode603579 -Ref: table-gawk-rounding-modes603983 -Node: Floating-point Constants605180 -Node: Changing Precision606602 -Ref: Changing Precision-Footnote-1608002 -Node: Exact Arithmetic608176 -Node: Arbitrary Precision Integers611274 -Ref: Arbitrary Precision Integers-Footnote-1614356 -Node: Advanced Features614503 -Node: Nondecimal Data616026 -Node: Array Sorting617609 -Node: Controlling Array Traversal618306 -Node: Array Sorting Functions626543 -Ref: Array Sorting Functions-Footnote-1630217 -Ref: Array Sorting Functions-Footnote-2630310 -Node: Two-way I/O630504 -Ref: Two-way I/O-Footnote-1635936 -Node: TCP/IP Networking636006 -Node: Profiling638850 -Node: Library Functions646304 -Ref: Library Functions-Footnote-1649311 -Node: Library Names649482 -Ref: Library Names-Footnote-1652953 -Ref: Library Names-Footnote-2653173 -Node: General Functions653259 -Node: Strtonum Function654212 -Node: Assert Function657142 -Node: Round Function660468 -Node: Cliff Random Function662011 -Node: Ordinal Functions663027 -Ref: Ordinal Functions-Footnote-1666097 -Ref: Ordinal Functions-Footnote-2666349 -Node: Join Function666558 -Ref: Join Function-Footnote-1668329 -Node: Getlocaltime Function668529 -Node: Data File Management672244 -Node: Filetrans Function672876 -Node: Rewind Function677015 -Node: File Checking678402 -Node: Empty Files679496 -Node: Ignoring Assigns681726 -Node: Getopt Function683279 -Ref: Getopt Function-Footnote-1694583 -Node: Passwd Functions694786 -Ref: Passwd Functions-Footnote-1703761 -Node: Group Functions703849 -Node: Walking Arrays711933 -Node: Sample Programs713502 -Node: Running Examples714167 -Node: Clones714895 -Node: Cut Program716119 -Node: Egrep Program725964 -Ref: Egrep Program-Footnote-1733737 -Node: Id Program733847 -Node: Split Program737463 -Ref: Split Program-Footnote-1740982 -Node: Tee Program741110 -Node: Uniq Program743913 -Node: Wc Program751342 -Ref: Wc Program-Footnote-1755608 -Ref: Wc Program-Footnote-2755808 -Node: Miscellaneous Programs755900 -Node: Dupword Program757088 -Node: Alarm Program759119 -Node: Translate Program763868 -Ref: Translate Program-Footnote-1768255 -Ref: Translate Program-Footnote-2768483 -Node: Labels Program768617 -Ref: Labels Program-Footnote-1771988 -Node: Word Sorting772072 -Node: History Sorting775956 -Node: Extract Program777795 -Ref: Extract Program-Footnote-1785278 -Node: Simple Sed785406 -Node: Igawk Program788468 -Ref: Igawk Program-Footnote-1803625 -Ref: Igawk Program-Footnote-2803826 -Node: Anagram Program803964 -Node: Signature Program807032 -Node: Debugger808132 -Node: Debugging809086 -Node: Debugging Concepts809519 -Node: Debugging Terms811375 -Node: Awk Debugging813972 -Node: Sample Debugging Session814864 -Node: Debugger Invocation815384 -Node: Finding The Bug816713 -Node: List of Debugger Commands823201 -Node: Breakpoint Control824535 -Node: Debugger Execution Control828199 -Node: Viewing And Changing Data831559 -Node: Execution Stack834915 -Node: Debugger Info836382 -Node: Miscellaneous Debugger Commands840363 -Node: Readline Support845808 -Node: Limitations846639 -Node: Dynamic Extensions848891 -Node: Plugin License849787 -Node: Sample Library850401 -Node: Internal File Description851085 -Node: Internal File Ops854798 -Ref: Internal File Ops-Footnote-1859361 -Node: Using Internal File Ops859501 -Node: Language History861877 -Node: V7/SVR3.1863399 -Node: SVR4865720 -Node: POSIX867162 -Node: BTL868170 -Node: POSIX/GNU868904 -Node: Common Extensions874439 -Node: Ranges and Locales875546 -Ref: Ranges and Locales-Footnote-1880164 -Ref: Ranges and Locales-Footnote-2880191 -Ref: Ranges and Locales-Footnote-3880451 -Node: Contributors880672 -Node: Installation884968 -Node: Gawk Distribution885862 -Node: Getting886346 -Node: Extracting887172 -Node: Distribution contents888864 -Node: Unix Installation894086 -Node: Quick Installation894703 -Node: Additional Configuration Options896665 -Node: Configuration Philosophy898142 -Node: Non-Unix Installation900484 -Node: PC Installation900942 -Node: PC Binary Installation902241 -Node: PC Compiling904089 -Node: PC Testing907033 -Node: PC Using908209 -Node: Cygwin912394 -Node: MSYS913394 -Node: VMS Installation913908 -Node: VMS Compilation914511 -Ref: VMS Compilation-Footnote-1915518 -Node: VMS Installation Details915576 -Node: VMS Running917211 -Node: VMS Old Gawk918818 -Node: Bugs919292 -Node: Other Versions923144 -Node: Notes928459 -Node: Compatibility Mode929046 -Node: Additions929829 -Node: Accessing The Source930756 -Node: Adding Code932181 -Node: New Ports938189 -Node: Derived Files942324 -Ref: Derived Files-Footnote-1947628 -Ref: Derived Files-Footnote-2947662 -Ref: Derived Files-Footnote-3948262 -Node: Future Extensions948360 -Node: Basic Concepts949847 -Node: Basic High Level950528 -Ref: Basic High Level-Footnote-1954563 -Node: Basic Data Typing954748 -Node: Glossary958103 -Node: Copying983079 -Node: GNU Free Documentation License1020636 -Node: Index1045773 +Ref: This Manual-Footnote-148556 +Node: Conventions48656 +Node: Manual History50790 +Ref: Manual History-Footnote-154060 +Ref: Manual History-Footnote-254101 +Node: How To Contribute54175 +Node: Acknowledgments55319 +Node: Getting Started59815 +Node: Running gawk62194 +Node: One-shot63380 +Node: Read Terminal64605 +Ref: Read Terminal-Footnote-166255 +Ref: Read Terminal-Footnote-266531 +Node: Long66702 +Node: Executable Scripts68078 +Ref: Executable Scripts-Footnote-169947 +Ref: Executable Scripts-Footnote-270049 +Node: Comments70596 +Node: Quoting73063 +Node: DOS Quoting77686 +Node: Sample Data Files78361 +Node: Very Simple81393 +Node: Two Rules85992 +Node: More Complex88139 +Ref: More Complex-Footnote-191069 +Node: Statements/Lines91154 +Ref: Statements/Lines-Footnote-195616 +Node: Other Features95881 +Node: When96809 +Node: Invoking Gawk98956 +Node: Command Line100417 +Node: Options101200 +Ref: Options-Footnote-1116598 +Node: Other Arguments116623 +Node: Naming Standard Input119281 +Node: Environment Variables120375 +Node: AWKPATH Variable120933 +Ref: AWKPATH Variable-Footnote-1123691 +Node: AWKLIBPATH Variable123951 +Node: Other Environment Variables124548 +Node: Exit Status127043 +Node: Include Files127718 +Node: Loading Shared Libraries131287 +Node: Obsolete132512 +Node: Undocumented133209 +Node: Regexp133452 +Node: Regexp Usage134841 +Node: Escape Sequences136867 +Node: Regexp Operators142630 +Ref: Regexp Operators-Footnote-1150010 +Ref: Regexp Operators-Footnote-2150157 +Node: Bracket Expressions150255 +Ref: table-char-classes152145 +Node: GNU Regexp Operators154668 +Node: Case-sensitivity158391 +Ref: Case-sensitivity-Footnote-1161359 +Ref: Case-sensitivity-Footnote-2161594 +Node: Leftmost Longest161702 +Node: Computed Regexps162903 +Node: Reading Files166313 +Node: Records168316 +Ref: Records-Footnote-1177240 +Node: Fields177277 +Ref: Fields-Footnote-1180310 +Node: Nonconstant Fields180396 +Node: Changing Fields182598 +Node: Field Separators188579 +Node: Default Field Splitting191208 +Node: Regexp Field Splitting192325 +Node: Single Character Fields195667 +Node: Command Line Field Separator196726 +Node: Field Splitting Summary200167 +Ref: Field Splitting Summary-Footnote-1203359 +Node: Constant Size203460 +Node: Splitting By Content208044 +Ref: Splitting By Content-Footnote-1211770 +Node: Multiple Line211810 +Ref: Multiple Line-Footnote-1217657 +Node: Getline217836 +Node: Plain Getline220052 +Node: Getline/Variable222141 +Node: Getline/File223282 +Node: Getline/Variable/File224604 +Ref: Getline/Variable/File-Footnote-1226203 +Node: Getline/Pipe226290 +Node: Getline/Variable/Pipe228850 +Node: Getline/Coprocess229957 +Node: Getline/Variable/Coprocess231200 +Node: Getline Notes231914 +Node: Getline Summary234701 +Ref: table-getline-variants235109 +Node: Read Timeout235965 +Ref: Read Timeout-Footnote-1239710 +Node: Command line directories239767 +Node: Printing240397 +Node: Print242028 +Node: Print Examples243365 +Node: Output Separators246149 +Node: OFMT247909 +Node: Printf249267 +Node: Basic Printf250173 +Node: Control Letters251712 +Node: Format Modifiers255524 +Node: Printf Examples261533 +Node: Redirection264248 +Node: Special Files271232 +Node: Special FD271765 +Ref: Special FD-Footnote-1275390 +Node: Special Network275464 +Node: Special Caveats276314 +Node: Close Files And Pipes277110 +Ref: Close Files And Pipes-Footnote-1284133 +Ref: Close Files And Pipes-Footnote-2284281 +Node: Expressions284431 +Node: Values285563 +Node: Constants286239 +Node: Scalar Constants286919 +Ref: Scalar Constants-Footnote-1287778 +Node: Nondecimal-numbers287960 +Node: Regexp Constants291019 +Node: Using Constant Regexps291494 +Node: Variables294549 +Node: Using Variables295204 +Node: Assignment Options296928 +Node: Conversion298800 +Ref: table-locale-affects304176 +Ref: Conversion-Footnote-1304800 +Node: All Operators304909 +Node: Arithmetic Ops305539 +Node: Concatenation308044 +Ref: Concatenation-Footnote-1310837 +Node: Assignment Ops310957 +Ref: table-assign-ops315945 +Node: Increment Ops317353 +Node: Truth Values and Conditions320823 +Node: Truth Values321906 +Node: Typing and Comparison322955 +Node: Variable Typing323744 +Ref: Variable Typing-Footnote-1327641 +Node: Comparison Operators327763 +Ref: table-relational-ops328173 +Node: POSIX String Comparison331722 +Ref: POSIX String Comparison-Footnote-1332678 +Node: Boolean Ops332816 +Ref: Boolean Ops-Footnote-1336894 +Node: Conditional Exp336985 +Node: Function Calls338717 +Node: Precedence342311 +Node: Locales345980 +Node: Patterns and Actions347069 +Node: Pattern Overview348123 +Node: Regexp Patterns349792 +Node: Expression Patterns350335 +Node: Ranges354020 +Node: BEGIN/END356986 +Node: Using BEGIN/END357748 +Ref: Using BEGIN/END-Footnote-1360479 +Node: I/O And BEGIN/END360585 +Node: BEGINFILE/ENDFILE362867 +Node: Empty365771 +Node: Using Shell Variables366087 +Node: Action Overview368372 +Node: Statements370729 +Node: If Statement372583 +Node: While Statement374082 +Node: Do Statement376126 +Node: For Statement377282 +Node: Switch Statement380434 +Node: Break Statement382531 +Node: Continue Statement384521 +Node: Next Statement386314 +Node: Nextfile Statement388704 +Node: Exit Statement391249 +Node: Built-in Variables393665 +Node: User-modified394760 +Ref: User-modified-Footnote-1403115 +Node: Auto-set403177 +Ref: Auto-set-Footnote-1413085 +Node: ARGC and ARGV413290 +Node: Arrays417141 +Node: Array Basics418646 +Node: Array Intro419472 +Node: Reference to Elements423790 +Node: Assigning Elements426060 +Node: Array Example426551 +Node: Scanning an Array428283 +Node: Controlling Scanning430597 +Ref: Controlling Scanning-Footnote-1435530 +Node: Delete435846 +Ref: Delete-Footnote-1438281 +Node: Numeric Array Subscripts438338 +Node: Uninitialized Subscripts440521 +Node: Multi-dimensional442149 +Node: Multi-scanning445243 +Node: Arrays of Arrays446834 +Node: Functions451479 +Node: Built-in452301 +Node: Calling Built-in453379 +Node: Numeric Functions455367 +Ref: Numeric Functions-Footnote-1459199 +Ref: Numeric Functions-Footnote-2459556 +Ref: Numeric Functions-Footnote-3459604 +Node: String Functions459873 +Ref: String Functions-Footnote-1483370 +Ref: String Functions-Footnote-2483499 +Ref: String Functions-Footnote-3483747 +Node: Gory Details483834 +Ref: table-sub-escapes485513 +Ref: table-sub-posix-92486867 +Ref: table-sub-proposed488210 +Ref: table-posix-sub489560 +Ref: table-gensub-escapes491106 +Ref: Gory Details-Footnote-1492313 +Ref: Gory Details-Footnote-2492364 +Node: I/O Functions492515 +Ref: I/O Functions-Footnote-1499170 +Node: Time Functions499317 +Ref: Time Functions-Footnote-1510209 +Ref: Time Functions-Footnote-2510277 +Ref: Time Functions-Footnote-3510435 +Ref: Time Functions-Footnote-4510546 +Ref: Time Functions-Footnote-5510658 +Ref: Time Functions-Footnote-6510885 +Node: Bitwise Functions511151 +Ref: table-bitwise-ops511709 +Ref: Bitwise Functions-Footnote-1515930 +Node: Type Functions516114 +Node: I18N Functions516584 +Node: User-defined518211 +Node: Definition Syntax519015 +Ref: Definition Syntax-Footnote-1523925 +Node: Function Example523994 +Node: Function Caveats526588 +Node: Calling A Function527009 +Node: Variable Scope528124 +Node: Pass By Value/Reference530099 +Node: Return Statement533539 +Node: Dynamic Typing536520 +Node: Indirect Calls537255 +Node: Internationalization546940 +Node: I18N and L10N548379 +Node: Explaining gettext549065 +Ref: Explaining gettext-Footnote-1554131 +Ref: Explaining gettext-Footnote-2554315 +Node: Programmer i18n554480 +Node: Translator i18n558680 +Node: String Extraction559473 +Ref: String Extraction-Footnote-1560434 +Node: Printf Ordering560520 +Ref: Printf Ordering-Footnote-1563304 +Node: I18N Portability563368 +Ref: I18N Portability-Footnote-1565817 +Node: I18N Example565880 +Ref: I18N Example-Footnote-1568515 +Node: Gawk I18N568587 +Node: Advanced Features569204 +Node: Nondecimal Data570727 +Node: Array Sorting572310 +Node: Controlling Array Traversal573007 +Node: Array Sorting Functions581244 +Ref: Array Sorting Functions-Footnote-1584918 +Ref: Array Sorting Functions-Footnote-2585011 +Node: Two-way I/O585205 +Ref: Two-way I/O-Footnote-1590637 +Node: TCP/IP Networking590707 +Node: Profiling593551 +Node: Library Functions601005 +Ref: Library Functions-Footnote-1604012 +Node: Library Names604183 +Ref: Library Names-Footnote-1607654 +Ref: Library Names-Footnote-2607874 +Node: General Functions607960 +Node: Strtonum Function608913 +Node: Assert Function611843 +Node: Round Function615169 +Node: Cliff Random Function616712 +Node: Ordinal Functions617728 +Ref: Ordinal Functions-Footnote-1620798 +Ref: Ordinal Functions-Footnote-2621050 +Node: Join Function621259 +Ref: Join Function-Footnote-1623030 +Node: Getlocaltime Function623230 +Node: Data File Management626945 +Node: Filetrans Function627577 +Node: Rewind Function631716 +Node: File Checking633103 +Node: Empty Files634197 +Node: Ignoring Assigns636427 +Node: Getopt Function637980 +Ref: Getopt Function-Footnote-1649284 +Node: Passwd Functions649487 +Ref: Passwd Functions-Footnote-1658462 +Node: Group Functions658550 +Node: Walking Arrays666634 +Node: Sample Programs668203 +Node: Running Examples668868 +Node: Clones669596 +Node: Cut Program670820 +Node: Egrep Program680665 +Ref: Egrep Program-Footnote-1688438 +Node: Id Program688548 +Node: Split Program692164 +Ref: Split Program-Footnote-1695683 +Node: Tee Program695811 +Node: Uniq Program698614 +Node: Wc Program706043 +Ref: Wc Program-Footnote-1710309 +Ref: Wc Program-Footnote-2710509 +Node: Miscellaneous Programs710601 +Node: Dupword Program711789 +Node: Alarm Program713820 +Node: Translate Program718569 +Ref: Translate Program-Footnote-1722956 +Ref: Translate Program-Footnote-2723184 +Node: Labels Program723318 +Ref: Labels Program-Footnote-1726689 +Node: Word Sorting726773 +Node: History Sorting730657 +Node: Extract Program732496 +Ref: Extract Program-Footnote-1739979 +Node: Simple Sed740107 +Node: Igawk Program743169 +Ref: Igawk Program-Footnote-1758326 +Ref: Igawk Program-Footnote-2758527 +Node: Anagram Program758665 +Node: Signature Program761733 +Node: Debugger762833 +Node: Debugging763787 +Node: Debugging Concepts764220 +Node: Debugging Terms766076 +Node: Awk Debugging768673 +Node: Sample Debugging Session769565 +Node: Debugger Invocation770085 +Node: Finding The Bug771414 +Node: List of Debugger Commands777902 +Node: Breakpoint Control779236 +Node: Debugger Execution Control782900 +Node: Viewing And Changing Data786260 +Node: Execution Stack789616 +Node: Debugger Info791083 +Node: Miscellaneous Debugger Commands795064 +Node: Readline Support800509 +Node: Limitations801340 +Node: Dynamic Extensions803592 +Node: Plugin License804488 +Node: Sample Library805102 +Node: Internal File Description805786 +Node: Internal File Ops809499 +Ref: Internal File Ops-Footnote-1814062 +Node: Using Internal File Ops814202 +Node: Arbitrary Precision Arithmetic816578 +Ref: Arbitrary Precision Arithmetic-Footnote-1818230 +Node: General Arithmetic818378 +Node: Floating Point Issues820098 +Node: String Conversion Precision821193 +Ref: String Conversion Precision-Footnote-1822899 +Node: Unexpected Results823008 +Node: POSIX Floating Point Problems825161 +Ref: POSIX Floating Point Problems-Footnote-1828986 +Node: Integer Programming829024 +Node: Floating-point Programming830772 +Ref: Floating-point Programming-Footnote-1837036 +Node: Floating-point Representation837300 +Node: Floating-point Context838467 +Ref: table-ieee-formats839309 +Node: Rounding Mode840693 +Ref: table-rounding-modes841172 +Ref: Rounding Mode-Footnote-1844176 +Node: Gawk and MPFR844357 +Node: Arbitrary Precision Floats845598 +Ref: Arbitrary Precision Floats-Footnote-1848020 +Node: Setting Precision848331 +Node: Setting Rounding Mode851058 +Ref: table-gawk-rounding-modes851462 +Node: Floating-point Constants852659 +Node: Changing Precision854081 +Ref: Changing Precision-Footnote-1855481 +Node: Exact Arithmetic855655 +Node: Arbitrary Precision Integers858753 +Ref: Arbitrary Precision Integers-Footnote-1861835 +Node: Language History861982 +Node: V7/SVR3.1863504 +Node: SVR4865825 +Node: POSIX867267 +Node: BTL868275 +Node: POSIX/GNU869009 +Node: Common Extensions874544 +Node: Ranges and Locales875651 +Ref: Ranges and Locales-Footnote-1880269 +Ref: Ranges and Locales-Footnote-2880296 +Ref: Ranges and Locales-Footnote-3880556 +Node: Contributors880777 +Node: Installation885073 +Node: Gawk Distribution885967 +Node: Getting886451 +Node: Extracting887277 +Node: Distribution contents888969 +Node: Unix Installation894191 +Node: Quick Installation894808 +Node: Additional Configuration Options896770 +Node: Configuration Philosophy898247 +Node: Non-Unix Installation900589 +Node: PC Installation901047 +Node: PC Binary Installation902346 +Node: PC Compiling904194 +Node: PC Testing907138 +Node: PC Using908314 +Node: Cygwin912499 +Node: MSYS913499 +Node: VMS Installation914013 +Node: VMS Compilation914616 +Ref: VMS Compilation-Footnote-1915623 +Node: VMS Installation Details915681 +Node: VMS Running917316 +Node: VMS Old Gawk918923 +Node: Bugs919397 +Node: Other Versions923249 +Node: Notes928564 +Node: Compatibility Mode929151 +Node: Additions929934 +Node: Accessing The Source930861 +Node: Adding Code932286 +Node: New Ports938294 +Node: Derived Files942429 +Ref: Derived Files-Footnote-1947733 +Ref: Derived Files-Footnote-2947767 +Ref: Derived Files-Footnote-3948367 +Node: Future Extensions948465 +Node: Basic Concepts949952 +Node: Basic High Level950633 +Ref: Basic High Level-Footnote-1954668 +Node: Basic Data Typing954853 +Node: Glossary958208 +Node: Copying983184 +Node: GNU Free Documentation License1020741 +Node: Index1045878 End Tag Table diff --git a/doc/gawk.texi b/doc/gawk.texi index 7d463a3d..d700f2a7 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -296,14 +296,14 @@ particular records in a file and perform operations upon them. * Functions:: Built-in and user-defined functions. * Internationalization:: Getting @command{gawk} to speak your language. -* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with - @command{gawk}. * Advanced Features:: Stuff for advanced users, specific to @command{gawk}. * Library Functions:: A Library of @command{awk} Functions. * Sample Programs:: Many @command{awk} programs with complete explanations. * Debugger:: The @code{gawk} debugger. +* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with + @command{gawk}. * Dynamic Extensions:: Adding new built-in functions to @command{gawk}. * Language History:: The evolution of the @command{awk} @@ -569,29 +569,6 @@ particular records in a file and perform operations upon them. * I18N Portability:: @command{awk}-level portability issues. * I18N Example:: A simple i18n example. * Gawk I18N:: @command{gawk} is also internationalized. -* General Arithmetic:: An introduction to computer arithmetic. -* Floating Point Issues:: Stuff to know about floating-point numbers. -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. -* Integer Programming:: Effective integer programming. -* Floating-point Programming:: Effective Floating-point Programming. -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. -* Gawk and MPFR:: How @command{gawk} provides - aribitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point - Arithmetic with @command{gawk}. -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point - numbers. -* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with - @command{gawk}. * Nondecimal Data:: Allowing nondecimal input data. * Array Sorting:: Facilities for controlling array traversal and sorting arrays. @@ -673,6 +650,29 @@ particular records in a file and perform operations upon them. * Miscellaneous Debugger Commands:: Miscellaneous Commands. * Readline Support:: Readline support. * Limitations:: Limitations and future plans. +* General Arithmetic:: An introduction to computer arithmetic. +* Floating Point Issues:: Stuff to know about floating-point numbers. +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not Abstract + Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. +* Integer Programming:: Effective integer programming. +* Floating-point Programming:: Effective Floating-point Programming. +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +* Gawk and MPFR:: How @command{gawk} provides + aribitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point + Arithmetic with @command{gawk}. +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point + numbers. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + @command{gawk}. * Plugin License:: A note about licensing. * Sample Library:: A example of new functions. * Internal File Description:: What the new functions will do. @@ -1201,6 +1201,13 @@ solving real problems. @ref{Debugger}, describes the @command{awk} debugger. +@ref{Arbitrary Precision Arithmetic}, +describes advanced arithmetic facilities provided by +@command{gawk}. + +@ref{Dynamic Extensions}, describes how to add new variables and +functions to @command{gawk} by writing extensions in C. + @ref{Language History}, describes how the @command{awk} language has evolved since its first release to present. It also describes how @command{gawk} @@ -18497,1229 +18504,6 @@ then @command{gawk} produces usage messages, warnings, and fatal errors in the local language. @c ENDOFRANGE inloc -@node Arbitrary Precision Arithmetic -@chapter Arithmetic and Arbitrary Precision Arithmetic with @command{gawk} -@cindex arbitrary precision -@cindex multiple precision -@cindex infinite precision -@cindex floating-point numbers, arbitrary precision -@cindex MPFR -@cindex GMP - -@cindex Knuth, Donald -@quotation -@i{There's a credibility gap: We don't know how much of the computer's answers -to believe. Novice computer users solve this problem by implicitly trusting -in the computer as an infallible authority; they tend to believe that all -digits of a printed answer are significant. Disillusioned computer users have -just the opposite approach; they are constantly afraid that their answers -are almost meaningless.}@* -Donald Knuth@footnote{Donald E.@: Knuth. -@cite{The Art of Computer Programming}. Volume 2, -@cite{Seminumerical Algorithms}, third edition, -1998, ISBN 0-201-89683-4, p.@: 229.} -@end quotation - -This @value{CHAPTER} discusses issues that you may encounter -when performing arithmetic. It begins by discussing some of -the general atributes of computer arithmetic, along with how -this can influence what you see when running @command{awk} programs. -This discussion applies to all versions of @command{awk}. - -Then the discussion moves on to @dfn{arbitrary precsion -arithmetic}, a feature which is specific to @command{gawk}. - -@menu -* General Arithmetic:: An introduction to computer arithmetic. -* Floating-point Programming:: Effective Floating-point Programming. -* Gawk and MPFR:: How @command{gawk} provides - aribitrary-precision arithmetic. -* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic - with @command{gawk}. -* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with - @command{gawk}. -@end menu - -@node General Arithmetic -@section A General Description of Computer Arithmetic - -@cindex integers -@cindex floating-point, numbers -@cindex numbers, floating-point -Within computers, there are two kinds of numeric values: @dfn{integers} -and @dfn{floating-point}. -In school, integer values were referred to as ``whole'' numbers---that is, -numbers without any fractional part, such as 1, 42, or @minus{}17. -The advantage to integer numbers is that they represent values exactly. -The disadvantage is that their range is limited. On most systems, -this range is @minus{}2,147,483,648 to 2,147,483,647. -However, many systems now support a range from -@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. - -@cindex unsigned integers -@cindex integers, unsigned -Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}. -Signed values may be negative or positive, with the range of values just -described. -Unsigned values are always positive. On most systems, -the range is from 0 to 4,294,967,295. -However, many systems now support a range from -0 to 18,446,744,073,709,551,615. - -@cindex double precision floating-point -@cindex single precision floating-point -Floating-point numbers represent what are called ``real'' numbers; i.e., -those that do have a fractional part, such as 3.1415927. -The advantage to floating-point numbers is that they -can represent a much larger range of values. -The disadvantage is that there are numbers that they cannot represent -exactly. -@command{awk} uses @dfn{double precision} floating-point numbers, which -can hold more digits than @dfn{single precision} -floating-point numbers. -@c Floating-point issues are discussed more fully in -@c @ref{Floating Point Issues}. - -There a several important issues to be aware of, described next. - -@menu -* Floating Point Issues:: Stuff to know about floating-point numbers. -* Integer Programming:: Effective integer programming. -@end menu - -@node Floating Point Issues -@subsection Floating-Point Number Caveats - -As mentioned earlier, floating-point numbers represent what are called -``real'' numbers, i.e., those that have a fractional part. @command{awk} -uses double precision floating-point numbers to represent all -numeric values. This @value{SECTION} describes some of the issues -involved in using floating-point numbers. - -There is a very nice -@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic} -by David Goldberg, -``What Every Computer Scientist Should Know About Floating-point Arithmetic,'' -@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48. -This is worth reading if you are interested in the details, -but it does require a background in computer science. - -@menu -* String Conversion Precision:: The String Value Can Lie. -* Unexpected Results:: Floating Point Numbers Are Not Abstract - Numbers. -* POSIX Floating Point Problems:: Standards Versus Existing Practice. -@end menu - -@node String Conversion Precision -@subsubsection The String Value Can Lie - -Internally, @command{awk} keeps both the numeric value -(double precision floating-point) and the string value for a variable. -Separately, @command{awk} keeps -track of what type the variable has -(@pxref{Typing and Comparison}), -which plays a role in how variables are used in comparisons. - -It is important to note that the string value for a number may not -reflect the full value (all the digits) that the numeric value -actually contains. -The following program (@file{values.awk}) illustrates this: - -@example -@{ - sum = $1 + $2 - # see it for what it is - printf("sum = %.12g\n", sum) - # use CONVFMT - a = "<" sum ">" - print "a =", a - # use OFMT - print "sum =", sum -@} -@end example - -@noindent -This program shows the full value of the sum of @code{$1} and @code{$2} -using @code{printf}, and then prints the string values obtained -from both automatic conversion (via @code{CONVFMT}) and -from printing (via @code{OFMT}). - -Here is what happens when the program is run: - -@example -$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk} -@print{} sum = 4.8888888 -@print{} a = <4.88889> -@print{} sum = 4.88889 -@end example - -This makes it clear that the full numeric value is different from -what the default string representations show. - -@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with -at least six significant digits. For some applications, you might want to -change it to specify more precision. -On most modern machines, most of the time, -17 digits is enough to capture a floating-point number's -value exactly.@footnote{Pathological cases can require up to -752 digits (!), but we doubt that you need to worry about this.} - -@node Unexpected Results -@subsubsection Floating Point Numbers Are Not Abstract Numbers - -@cindex floating-point, numbers -Unlike numbers in the abstract sense (such as what you studied in high school -or college arithmetic), numbers stored in computers are limited in certain ways. -They cannot represent an infinite number of digits, nor can they always -represent things exactly. -In particular, -floating-point numbers cannot -always represent values exactly. Here is an example: - -@example -$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'} -515.79 -@print{} 0000051579 -515.80 -@print{} 0000051579 -515.81 -@print{} 0000051580 -515.82 -@print{} 0000051582 -@kbd{@value{CTL}-d} -@end example - -@noindent -This shows that some values can be represented exactly, -whereas others are only approximated. This is not a ``bug'' -in @command{awk}, but simply an artifact of how computers -represent numbers. - -@quotation NOTE -It cannot be emphasized enough that the behavior just -described is fundamental to modern computers. You will -see this kind of thing happen in @emph{any} programming -language using hardware floating-point numbers. It is @emph{not} -a bug in @command{gawk}, nor is it something that can be ``just -fixed.'' -@end quotation - -@cindex negative zero -@cindex positive zero -@cindex zero@comma{} negative vs.@: positive -Another peculiarity of floating-point numbers on modern systems -is that they often have more than one representation for the number zero! -In particular, it is possible to represent ``minus zero'' as well as -regular, or ``positive'' zero. - -This example shows that negative and positive zero are distinct values -when stored internally, but that they are in fact equal to each other, -as well as to ``regular'' zero: - -@example -$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0} -> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz} -> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0} -> @kbd{@}'} -@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1 -@print{} mz == 0 -> 1, pz == 0 -> 1 -@end example - -It helps to keep this in mind should you process numeric data -that contains negative zero values; the fact that the zero is negative -is noted and can affect comparisons. - -@node POSIX Floating Point Problems -@subsubsection Standards Versus Existing Practice - -Historically, @command{awk} has converted any non-numeric looking string -to the numeric value zero, when required. Furthermore, the original -definition of the language and the original POSIX standards specified that -@command{awk} only understands decimal numbers (base 10), and not octal -(base 8) or hexadecimal numbers (base 16). - -Changes in the language of the -2001 and 2004 POSIX standards can be interpreted to imply that @command{awk} -should support additional features. These features are: - -@itemize @bullet -@item -Interpretation of floating point data values specified in hexadecimal -notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not} -source code constants.) - -@item -Support for the special IEEE 754 floating point values ``Not A Number'' -(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf''). -In particular, the format for these values is as specified by the ISO 1999 -C standard, which ignores case and can allow machine-dependent additional -characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}. -@end itemize - -The first problem is that both of these are clear changes to historical -practice: - -@itemize @bullet -@item -The @command{gawk} maintainer feels that supporting hexadecimal floating -point values, in particular, is ugly, and was never intended by the -original designers to be part of the language. - -@item -Allowing completely alphabetic strings to have valid numeric -values is also a very severe departure from historical practice. -@end itemize - -The second problem is that the @code{gawk} maintainer feels that this -interpretation of the standard, which requires a certain amount of -``language lawyering'' to arrive at in the first place, was not even -intended by the standard developers. In other words, ``we see how you -got where you are, but we don't think that that's where you want to be.'' - -Recognizing the above issues, but attempting to provide compatibility -with the earlier versions of the standard, -the 2008 POSIX standard added explicit wording to allow, but not require, -that @command{awk} support hexadecimal floating point values and -special values for ``Not A Number'' and infinity. - -Although the @command{gawk} maintainer continues to feel that -providing those features is inadvisable, -nevertheless, on systems that support IEEE floating point, it seems -reasonable to provide @emph{some} way to support NaN and Infinity values. -The solution implemented in @command{gawk} is as follows: - -@itemize @bullet -@item -With the @option{--posix} command-line option, @command{gawk} becomes -``hands off.'' String values are passed directly to the system library's -@code{strtod()} function, and if it successfully returns a numeric value, -that is what's used.@footnote{You asked for it, you got it.} -By definition, the results are not portable across -different systems. They are also a little surprising: - -@example -$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'} -@print{} nan -$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'} -@print{} 3735928559 -@end example - -@item -Without @option{--posix}, @command{gawk} interprets the four strings -@samp{+inf}, -@samp{-inf}, -@samp{+nan}, -and -@samp{-nan} -specially, producing the corresponding special numeric values. -The leading sign acts a signal to @command{gawk} (and the user) -that the value is really numeric. Hexadecimal floating point is -not supported (unless you also use @option{--non-decimal-data}, -which is @emph{not} recommended). For example: - -@example -$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'} -@print{} 0 -$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'} -@print{} nan -$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'} -@print{} 0 -@end example - -@command{gawk} does ignore case in the four special values. -Thus @samp{+nan} and @samp{+NaN} are the same. -@end itemize - -@node Integer Programming -@subsection Mixing Integers And Floating-point - -As has been mentioned already, @command{gawk} ordinarily uses hardware double -precision with 64-bit IEEE binary floating-point representation -for numbers on most systems. A large integer like 9007199254740997 -has a binary representation that, although finite, is more than 53 bits long; -it must also be rounded to 53 bits. -The biggest integer that can be stored in a C @code{double} is usually the same -as the largest possible value of a @code{double}. If your system @code{double} -is an IEEE 64-bit @code{double}, this largest possible value is an integer and -can be represented precisely. What more should one know about integers? - -If you want to know what is the largest integer, such that it and -all smaller integers can be stored in 64-bit doubles without losing precision, -then the answer is -@iftex -@math{2^{53}}. -@end iftex -@ifnottex -2^53. -@end ifnottex -The next representable number is the even number -@iftex -@math{2^{53} + 2}, -@end iftex -@ifnottex -2^53 + 2, -@end ifnottex -meaning it is unlikely that you will be able to make -@command{gawk} print -@iftex -@math{2^{53} + 1} -@end iftex -@ifnottex -2^53 + 1 -@end ifnottex -in integer format. -The range of integers exactly representable by a 64-bit double -is -@iftex -@math{[-2^{53}, 2^{53}]}. -@end iftex -@ifnottex -[@minus{}2^53, 2^53]. -@end ifnottex -If you ever see an integer outside this range in @command{gawk} -using 64-bit doubles, you have reason to be very suspicious about -the accuracy of the output. Here is a simple program with erroneous output: - -@example -$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'} -@print{} 9007199254740991 -@print{} 9007199254740992 -@print{} 9007199254740992 -@print{} 9007199254740994 -@end example - -The lesson is to not assume that any large integer printed by @command{gawk} -represents an exact result from your computation, especially if it wraps -around on your screen. - -@node Floating-point Programming -@section Understanding Floating-point Programming - -Numerical programming is an extensive area; if you need to develop -sophisticated numerical algorithms then @command{gawk} may not be -the ideal tool, and this documentation may not be sufficient. -@c FIXME: JOHN: Do you want to cite some actual books? -It might require digesting a book or two to really internalize how to compute -with ideal accuracy and precision -and the result often depends on the particular application. - -@quotation NOTE -A floating-point calculation's @dfn{accuracy} is how close it comes -to the real value. This is as opposed to the @dfn{precision}, which -usually refers to the number of bits used to represent the number -(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision, -the Wikipedia article} for more information). -@end quotation - -There are two options for doing floating-point calculations: -hardware floating-point (as used by standard @command{awk} and -the default for @command{gawk}), and @dfn{arbitrary-precision} -floating-point, which is software based. This @value{CHAPTER} -aims to provide enough information to understand both, and then -will focus on @command{gawk}'s facilities for the latter.@footnote{If you -are interested in other tools that perform arbitrary precision arithmetic, -you may want to investigate the POSIX @command{bc} tool. See -@uref{http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html, -the POSIX specification for it}, for more information.} - -Binary floating-point representations and arithmetic are inexact. -Simple values like 0.1 cannot be precisely represented using -binary floating-point numbers, and the limited precision of -floating-point numbers means that slight changes in -the order of operations or the precision of intermediate storage -can change the result. To make matters worse, with arbitrary precision -floating-point, you can set the precision before starting a computation, -but then you cannot be sure of the number of significant decimal places -in the final result. - -Sometimes, before you start to write any code, you should think more -about what you really want and what's really happening. Consider the -two numbers in the following example: - -@example -x = 0.875 # 1/2 + 1/4 + 1/8 -y = 0.425 -@end example - -Unlike the number in @code{y}, the number stored in @code{x} -is exactly representable -in binary since it can be written as a finite sum of one or -more fractions whose denominators are all powers of two. -When @command{gawk} reads a floating-point number from -program source, it automatically rounds that number to whatever -precision your machine supports. If you try to print the numeric -content of a variable using an output format string of @code{"%.17g"}, -it may not produce the same number as you assigned to it: - -@example -$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425} -> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'} -@print{} 0.875, 0.42499999999999999 -@end example - -Often the error is so small you do not even notice it, and if you do, -you can always specify how much precision you would like in your output. -Usually this is a format string like @code{"%.15g"}, which when -used in the previous example, produces an output identical to the input. - -Because the underlying representation can be little bit off from the exact value, -comparing floating-point values to see if they are equal is generally not a good idea. -Here is an example where it does not work like you expect: - -@example -$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} -@print{} 0 -@end example - -The loss of accuracy during a single computation with floating-point numbers -usually isn't enough to worry about. However, if you compute a value -which is the result of a sequence of floating point operations, -the error can accumulate and greatly affect the computation itself. -Here is an attempt to compute the value of the constant -@value{PI} using one of its many series representations: - -@example -BEGIN @{ - x = 1.0 / sqrt(3.0) - n = 6 - for (i = 1; i < 30; i++) @{ - n = n * 2.0 - x = (sqrt(x * x + 1) - 1) / x - printf("%.15f\n", n * x) - @} -@} -@end example - -When run, the early errors propagating through later computations -cause the loop to terminate prematurely after an attempt to divide by zero. - -@example -$ @kbd{gawk -f pi.awk} -@print{} 3.215390309173475 -@print{} 3.159659942097510 -@print{} 3.146086215131467 -@print{} 3.142714599645573 -@dots{} -@print{} 3.224515243534819 -@print{} 2.791117213058638 -@print{} 0.000000000000000 -@error{} gawk: pi.awk:6: fatal: division by zero attempted -@end example - -Here is one more example where the inaccuracies in internal representations -yield an unexpected result: - -@example -$ @kbd{gawk 'BEGIN @{} -> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} -> @kbd{i++} -> @kbd{print i} -> @kbd{@}'} -@print{} 4 -@end example - -Can computation using aribitrary precision help with the previous examples? -If you are impatient to know, see -@ref{Exact Arithmetic}. - -Instead of aribitrary precision floating-point arithmetic, -often all you need is an adjustment of your logic -or a different order for the operations in your calculation. -The stability and the accuracy of the computation of the constant @value{PI} -in the previous example can be enhanced by using the following -simple algebraic transformation: - -@example -(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1) -@end example - -@noindent -After making this, change the program does converge to -@value{PI} in under 30 iterations: - -@example -$ @kbd{gawk -f /tmp/pi2.awk} -@print{} 3.215390309173473 -@print{} 3.159659942097501 -@print{} 3.146086215131436 -@print{} 3.142714599645370 -@print{} 3.141873049979825 -@dots{} -@print{} 3.141592653589797 -@print{} 3.141592653589797 -@end example - -There is no need to be unduly suspicious about the results from -floating-point arithmetic. The lesson to remember is that -floating-point arithmetic is always more complex than the arithmetic using -pencil and paper. In order to take advantage of the power -of computer floating-point, you need to know its limitations -and work within them. For most casual use of floating-point arithmetic, -you will often get the expected result in the end if you simply round -the display of your final results to the correct number of significant -decimal digits. And, avoid presenting numerical data in a manner that -implies better precision than is actually the case. - -@menu -* Floating-point Representation:: Binary floating-point representation. -* Floating-point Context:: Floating-point context. -* Rounding Mode:: Floating-point rounding mode. -@end menu - -@node Floating-point Representation -@subsection Binary Floating-point Representation -@cindex IEEE-754 format - -Although floating-point representations vary from machine to machine, -the most commonly encountered representation is that defined by the -IEEE 754 Standard. An IEEE-754 format value has three components: - -@itemize @bullet -@item -A sign bit telling whether the number is positive or negative. - -@item -An @dfn{exponent} giving its order of magnitude, @var{e}. - -@item -A @dfn{significand}, @var{s}, -specifying the actual digits of the number. -@end itemize - -The value of the -number is then -@iftex -@math{s @cdot 2^e}. -@end iftex -@ifnottex -@var{s * 2^e}. -@end ifnottex -The first bit of a non-zero binary significand -is always one, so the significand in an IEEE-754 format only includes the -fractional part, leaving the leading one implicit. - -Three of the standard IEEE-754 types are 32-bit single precision, -64-bit double precision and 128-bit quadruple precision. -The standard also specifies extended precision formats -to allow greater precisions and larger exponent ranges. - -The significand is stored in @dfn{normalized} format, -which means that the first bit is always a one. - -@node Floating-point Context -@subsection Floating-point Context -@cindex context, floating-point - -A floating-point @dfn{context} defines the environment for arithmetic operations. -It governs precision, sets rules for rounding, and limits the range for exponents. -The context has the following primary components: - -@table @dfn -@item Precision -Precision of the floating-point format in bits. -@item emax -Maximum exponent allowed for this format. -@item emin -Minimum exponent allowed for this format. -@item Underflow behavior -The format may or may not support gradual underflow. -@item Rounding -The rounding mode of this context. -@end table - -@ref{table-ieee-formats} lists the precision and exponent -field values for the basic IEEE-754 binary formats: - -@float Table,table-ieee-formats -@caption{Basic IEEE Format Context Values} -@multitable @columnfractions .20 .20 .20 .20 .20 -@headitem Name @tab Total bits @tab Precision @tab emin @tab emax -@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127 -@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023 -@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383 -@end multitable -@end float - -@quotation NOTE -The precision numbers include the implied leading one that gives them -one extra bit of significand. -@end quotation - -A floating-point context can also determine which signals are treated -as exceptions, and can set rules for arithmetic with special values. -Please consult the IEEE-754 standard or other resources for details. - -@command{gawk} ordinarily uses the hardware double precision -representation for numbers. On most systems, this is IEEE-754 -floating-point format, corresponding to 64-bit binary with 53 bits -of precision. - -@quotation NOTE -In case an underflow occurs, the standard allows, but does not require, -the result from an arithmetic operation to be a number smaller than -the smallest nonzero normalized number. Such numbers do -not have as many significant digits as normal numbers, and are called -@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero, -is called @dfn{flush to zero}. The basic IEEE-754 binary formats -support subnormal numbers. -@end quotation - -@node Rounding Mode -@subsection Floating-point Rounding Mode -@cindex rounding mode, floating-point - -The @dfn{rounding mode} specifies the behavior for the results of numerical -operations when discarding extra precision. Each rounding mode indicates -how the least significant returned digit of a rounded result is to -be calculated. -@ref{table-rounding-modes} lists the IEEE-754 defined -rounding modes: - -@float Table,table-rounding-modes -@caption{IEEE 754 Rounding Modes} -@multitable @columnfractions .45 .55 -@headitem Rounding Mode @tab IEEE Name -@item Round to nearest, ties to even @tab @code{roundTiesToEven} -@item Round toward plus Infinity @tab @code{roundTowardPositive} -@item Round toward negative Infinity @tab @code{roundTowardNegative} -@item Round toward zero @tab @code{roundTowardZero} -@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} -@end multitable -@end float - -The default mode @code{roundTiesToEven} is the most preferred, -but the least intuitive. This method does the obvious thing for most values, -by rounding them up or down to the nearest digit. -For example, rounding 1.132 to two digits yields 1.13, -and rounding 1.157 yields 1.16. - -However, when it comes to rounding a value that is exactly halfway between, -things do not work the way you probably learned in school. -In this case, the number is rounded to the nearest even digit. -So rounding 0.125 to two digits rounds down to 0.12, -but rounding 0.6875 to three digits rounds up to 0.688. -You probably have already encountered this rounding mode when -using the @code{printf} routine to format floating-point numbers. -For example: - -@example -BEGIN @{ - x = -4.5 - for (i = 1; i < 10; i++) @{ - x += 1.0 - printf("%4.1f => %2.0f\n", x, x) - @} -@} -@end example - -@noindent -produces the following output when run:@footnote{It -is possible for the output to be completely different if the -C library in your system does not use the IEEE-754 even-rounding -rule to round halfway cases for @code{printf()}.} - -@example --3.5 => -4 --2.5 => -2 --1.5 => -2 --0.5 => 0 - 0.5 => 0 - 1.5 => 2 - 2.5 => 2 - 3.5 => 4 - 4.5 => 4 -@end example - -The theory behind the rounding mode @code{roundTiesToEven} is that -it more or less evenly distributes upward and downward rounds -of exact halves, which might cause the round-off error -to cancel itself out. This is the default rounding mode used -in IEEE-754 computing functions and operators. - -The other rounding modes are rarely used. -Round toward positive infinity (@code{roundTowardPositive}) -and round toward negative infinity (@code{roundTowardNegative}) -are often used to implement interval arithmetic, -where you adjust the rounding mode to calculate upper and lower bounds -for the range of output. The @code{roundTowardZero} -mode can be used for converting floating-point numbers to integers. -The rounding mode @code{roundTiesToAway} rounds the result to the -nearest number and selects the number with the larger magnitude -if a tie occurs. - -Some numerical analysts will tell you that your choice of rounding style -has tremendous impact on the final outcome, and advise you to wait until -final output for any rounding. Instead, you can often avoid round-off error problems by -setting the precision initially to some value sufficiently larger than -the final desired precision, so that the accumulation of round-off error -does not influence the outcome. -If you suspect that results from your computation are -sensitive to accumulation of round-off error, -one way to be sure is to look for a significant difference in output -when you change the rounding mode. - -@node Gawk and MPFR -@section @command{gawk} + MPFR = Powerful Arithmetic - -The rest of this @value{CHAPTER} decsribes how to use the arbitrary precision -(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric -capabilites in @command{gawk} to produce maximally accurate results -when you need it. - -But first you should check if your version of -@command{gawk} supports arbitrary precision arithmetic. -The easiest way to find out is to look at the output of -the following command: - -@example -$ @kbd{gawk --version} -@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) -@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation. -@dots{} -@end example - -@command{gawk} uses the -@uref{http://www.mpfr.org, GNU MPFR} -and -@uref{http://gmplib.org, GNU MP} (GMP) -libraries for arbitrary precision -arithmetic on numbers. So if you do not see the names of these libraries -in the output, then your version of @command{gawk} does not support -arbitrary precision arithmetic. - -Additionally, -there are a few elements available in the @code{PROCINFO} array -to provide information about the MPFR and GMP libraries. -@xref{Auto-set}, for more information. - -@ignore -Even if you aren't interested in arbitrary precision arithmetic, you -may still benefit from knowing about how @command{gawk} handles numbers -in general, and the limitations of doing arithmetic with ordinary -@command{gawk} numbers. -@end ignore - - -@node Arbitrary Precision Floats -@section Arbitrary Precision Floating-point Arithmetic with @command{gawk} - -@command{gawk} uses the GNU MPFR library -for arbitrary precision floating-point arithmetic. The MPFR library -provides precise control over precisions and rounding modes, and gives -correctly rounded reproducible platform-independent results. With the -command-line option @option{--bignum} or @option{-M}, -all floating-point arithmetic operators and numeric functions can yield -results to any desired precision level supported by MPFR. -Two built-in -variables @code{PREC} -(@pxref{Setting Precision}) -and @code{ROUNDMODE} -(@pxref{Setting Rounding Mode}) -provide control over the working precision and the rounding mode. -The precision and the rounding mode are set globally for every operation -to follow. - -The default working precision for arbitrary precision floating-point values is 53, -and the default value for @code{ROUNDMODE} is @code{"N"}, -which selects the IEEE-754 -@code{roundTiesToEven} (@pxref{Rounding Mode}) rounding mode.@footnote{The -default precision is 53, since according to the MPFR documentation, -the library should be able to exactly reproduce all computations with -double-precision machine floating-point numbers (@code{double} type -in C), except the default exponent range is much wider and subnormal -numbers are not implemented.} -@command{gawk} uses the default exponent range in MPFR -@iftex -(@math{emax = 2^{30} - 1, emin = -emax}) -@end iftex -@ifnottex -(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax}) -@end ifnottex -for all floating-point contexts. -There is no explicit mechanism to adjust the exponent range. -MPFR does not implement subnormal numbers by default, -and this behavior cannot be changed in @command{gawk}. - -@quotation NOTE -When emulating an IEEE-754 format (@pxref{Setting Precision}), -@command{gawk} internally adjusts the exponent range -to the value defined for the format and also performs computations needed for -gradual underflow (subnormal numbers). -@end quotation - -@quotation NOTE -MPFR numbers are variable-size entities, consuming only as much space as -needed to store the significant digits. Since the performance using MPFR -numbers pales in comparison to doing arithmetic using the underlying machine -types, you should consider using only as much precision as needed by -your program. -@end quotation - -@menu -* Setting Precision:: Setting the working precision. -* Setting Rounding Mode:: Setting the rounding mode. -* Floating-point Constants:: Representing floating-point constants. -* Changing Precision:: Changing the precision of a number. -* Exact Arithmetic:: Exact arithmetic with floating-point numbers. -@end menu - -@node Setting Precision -@subsection Setting the Working Precision -@cindex @code{PREC} variable - -@command{gawk} uses a global working precision; it does not keep track of -the precision or accuracy of individual numbers. Performing an arithmetic -operation or calling a built-in function rounds the result to the current -working precision. The default working precision is 53 which can be -modified using the built-in variable @code{PREC}. You can also set the -value to one of the following pre-defined case-insensitive strings -to emulate an IEEE-754 binary format: - -@multitable {@code{"double"}} {12345678901234567890123456789012345} -@headitem @code{PREC} @tab IEEE-754 Binary Format -@item @code{"half"} @tab 16-bit half-precision. -@item @code{"single"} @tab Basic 32-bit single precision. -@item @code{"double"} @tab Basic 64-bit double precision. -@item @code{"quad"} @tab Basic 128-bit quadruple precision. -@item @code{"oct"} @tab 256-bit octuple precision. -@end multitable - -The following example illustrates the effects of changing precision -on arithmetic operations: - -@example -$ @kbd{gawk -M -vPREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \} -> @kbd{PREC = "double"; print x + 0 @}'} -@print{} 1e-400 -@print{} 0 -@end example - -Binary and decimal precisions are related approximately according to the -formula: - -@iftex -@math{prec = 3.322 @cdot dps} -@end iftex -@ifnottex -@var{prec} = 3.322 * @var{dps} -@end ifnottex - -@noindent -Here, @var{prec} denotes the binary precision -(measured in bits) and @var{dps} (short for decimal places) -is the decimal digits. We can easily calculate how many decimal -digits the 53-bit significand of an IEEE double is equivalent to: -53 / 3.332 which is equal to about 15.95. -But what does 15.95 digits actually mean? It depends whether you are -concerned about how many digits you can rely on, or how many digits -you need. - -It is important to know how many bits it takes to uniquely identify -a double-precision value (the C type @code{double}). If you want to -convert from @code{double} to decimal and back to @code{double} (e.g., -saving a @code{double} representing an intermediate result to a file, and -later reading it back to restart the computation), then a few more decimal -digits are required. 17 digits is generally enough for a @code{double}. - -It can also be important to know what decimal numbers can be uniquely -represented with a @code{double}. If you want to convert -from decimal to @code{double} and back again, 15 digits is the most that -you can get. Stated differently, you should not present -the numbers from your floating-point computations with more than 15 -significant digits in them. - -Conversely, it takes a precision of 332 bits to hold an approximation -of the constant @value{PI} that is accurate to 100 decimal places. -You should always add some extra bits in order to avoid the confusing round-off -issues that occur because numbers are stored internally in binary. - -@node Setting Rounding Mode -@subsection Setting the Rounding Mode -@cindex @code{ROUNDMODE} variable - -The @code{ROUNDMODE} variable provides -program level control over the rounding mode. -The correspondance between @code{ROUNDMODE} and the IEEE -rounding modes is shown in @ref{table-gawk-rounding-modes}. - -@float Table,table-gawk-rounding-modes -@caption{@command{gawk} Rounding Modes} -@multitable @columnfractions .45 .30 .25 -@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE} -@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"} -@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"} -@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"} -@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"} -@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"} -@end multitable -@end float - -@code{ROUNDMODE} has the default value @code{"N"}, -which selects the IEEE-754 rounding mode @code{roundTiesToEven}. -Besides the values listed in @ref{table-gawk-rounding-modes}, -@command{gawk} also accepts @code{"A"} to select the IEEE-754 mode -@code{roundTiesToAway} -if your version of the MPFR library supports it; otherwise setting -@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode}, -for the meanings of the various rounding modes. - -Here is an example of how to change the default rounding behavior of -@code{printf}'s output: - -@example -$ @kbd{gawk -M -vROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'} -@print{} 1.37 -@end example - -@node Floating-point Constants -@subsection Representing Floating-point Constants -@cindex constants, floating-point - -Be wary of floating-point constants! When reading a floating-point constant -from program source code, @command{gawk} uses the default precision, -unless overridden -by an assignment to the special variable @code{PREC} on the command -line, to store it internally as a MPFR number. -Changing the precision using @code{PREC} in the program text does -not change the precision of a constant. If you need to -represent a floating-point constant at a higher precision than the -default and cannot use a command line assignment to @code{PREC}, -you should either specify the constant as a string, or -as a rational number whenever possible. The following example -illustrates the differences among various ways to -print a floating-point constant: - -@example -$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'} -@print{} 0.1000000000000000055511151 -$ @kbd{gawk -M -vPREC = 113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'} -@print{} 0.1000000000000000000000000 -$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'} -@print{} 0.1000000000000000000000000 -$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'} -@print{} 0.1000000000000000000000000 -@end example - -In the first case, the number is stored with the default precision of 53. - -@node Changing Precision -@subsection Changing the Precision of a Number - -@cindex Laurie, Dirk -@quotation -@i{The point is that in any variable-precision package, -a decision is made on how to treat numbers given as data, -or arising in intermediate results, which are represented in -floating-point format to a precision lower than working precision. -Do we promote them to full membership of the high-precision club, -or do we treat them and all their associates as second-class citizens? -Sometimes the first course is proper, sometimes the second, and it takes -careful analysis to tell which.} - -Dirk Laurie@footnote{Dirk Laurie. -@cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}. -Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.} -@end quotation - -@command{gawk} does not implicitly modify the precision of any previously -computed results when the working precision is changed with an assignment -to @code{PREC}. The precision of a number is always the one that was -used at the time of its creation, and there is no way for the user -to explicitly change it afterwards. However, since the result of a -floating-point arithmetic operation is always an arbitrary precision -floating-point value---with a precision set by the value of @code{PREC}---one of the -following workarounds effectively accomplishes the desired behavior: - -@example -x = x + 0.0 -@end example - -@noindent -or: - -@example -x += 0.0 -@end example - -@node Exact Arithmetic -@subsection Exact Arithmetic with Floating-point Numbers - -@quotation CAUTION -Never depend on the exactness of floating-point arithmetic, -even for apparently simple expressions! -@end quotation - -Can arbitrary precision arithmetic give exact results? There are -no easy answers. The standard rules of algebra often do not apply -when using floating-point arithmetic. -Among other things, the distributive and associative laws -do not hold completely, and order of operation may be important -for your computation. Rounding error, cumulative precision loss -and underflow are often troublesome. - -When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3} -for equality -using the machine double precision arithmetic, it decides that they -are not equal! -(@xref{Floating-point Programming}.) -You can get the result you want by increasing the precision; -56 in this case will get the job done: - -@example -$ @kbd{gawk -M -vPREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} -@print{} 1 -@end example - -If adding more bits is good, perhaps adding even more bits of -precision is better? -Here is what happens if we use an even larger value of @code{PREC}: - -@example -$ @kbd{gawk -M -vPREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} -@print{} 0 -@end example - -This is not a bug in @command{gawk} or in the MPFR library. -It is easy to forget that the finite number of bits used to store the value -is often just an approximation after proper rounding. -The test for equality succeeds if and only if @emph{all} bits in the two operands -are exactly the same. Since this is not necessarily true after floating-point -computations with a particular precision and effective rounding rule, -a straight test for equality may not work. - -So, don't assume that floating-point values can be compared for equality. -You should also exercise caution when using other forms of comparisons. -The standard way to compare between floating-point numbers is to determine -how much error (or @dfn{tolerance}) you will allow in a comparison and -check to see if one value is within this error range of the other. - -In applications where 15 or fewer decimal places suffice, -hardware double precision arithmetic can be adequate, and is usually much faster. -But you do need to keep in mind that every floating-point operation -can suffer a new rounding error with catastrophic consequences as illustrated -by our attempt to compute the value of the constant @value{PI} -(@pxref{Floating-point Programming}). -Extra precision can greatly enhance the stability and the accuracy -of your computation in such cases. - -Repeated addition is not necessarily equivalent to multiplication -in floating-point arithmetic. In the example in -@ref{Floating-point Programming}: - -@example -$ @kbd{gawk 'BEGIN @{} -> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} -> @kbd{i++} -> @kbd{print i} -> @kbd{@}'} -@print{} 4 -@end example - -@noindent -you may or may not succeed in getting the correct result by choosing -an arbitrarily large value for @code{PREC}. Reformulation of -the problem at hand is often the correct approach in such situations. - -@node Arbitrary Precision Integers -@section Arbitrary Precision Integer Arithmetic with @command{gawk} -@cindex integer, arbitrary precision - -If the option @option{--bignum} or @option{-M} is specified, -@command{gawk} performs all -integer arithmetic using GMP arbitrary precision integers. -Any number that looks like an integer in a program source or data file -is stored as an arbitrary precision integer. -The size of the integer is limited only by your computer's memory. -The current floating-point context has no effect on operations involving integers. -For example, the following computes -@iftex -@math{5^{4^{3^{2}}}}, -@end iftex -@ifnottex -5^4^3^2, -@end ifnottex -the result of which is beyond the -limits of ordinary @command{gawk} numbers: - -@example -$ @kbd{gawk -M 'BEGIN @{} -> @kbd{x = 5^4^3^2} -> @kbd{print "# of digits =", length(x)} -> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)} -> @kbd{@}'} -@print{} # of digits = 183231 -@print{} 62060698786608744707 ... 92256259918212890625 -@end example - -If you were to compute the same value using arbitrary precision -floating-point values instead, the precision needed for correct output -(using the formula -@iftex -@math{prec = 3.322 @cdot dps}), -would be @math{3.322 @cdot 183231}, -@end iftex -@ifnottex -@samp{prec = 3.322 * dps}), -would be 3.322 x 183231, -@end ifnottex -or 608693. -(Thus, the floating-point representation requires over 30 times as -many decimal digits!) - -The result from an arithmetic operation with an integer and a floating-point value -is a floating-point value with a precision equal to the working precision. -The following program calculates the eighth term in -Sylvester's sequence@footnote{Weisstein, Eric W. -@cite{Sylvester's Sequence}. From MathWorld---A Wolfram Web Resource. -@url{http://mathworld.wolfram.com/SylvestersSequence.html}} -using a recurrence: - -@example -$ @kbd{gawk -M 'BEGIN @{} -> @kbd{s = 2.0} -> @kbd{for (i = 1; i <= 7; i++)} -> @kbd{s = s * (s - 1) + 1} -> @kbd{print s} -> @kbd{@}'} -@print{} 113423713055421845118910464 -@end example - -The output differs from the acutal number, 113423713055421844361000443, -because the default precision of 53 is not enough to represent the -floating-point results exactly. You can either increase the precision -(100 is enough in this case), or replace the floating-point constant -@samp{2.0} with an integer, to perform all computations using integer -arithmetic to get the correct output. - -It will sometimes be necessary for @command{gawk} to implicitly convert an -arbitrary precision integer into an arbitrary precision floating-point value. -This is primarily because the MPFR library does not always provide the -relevant interface to process arbitrary precision integers or mixed-mode -numbers as needed by an operation or function. -In such a case, the precision is set to the minimum value necessary -for exact conversion, and the working precision is not used for this purpose. -If this is not what you need or want, you can employ a subterfuge -like this: - -@example -gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}' -@end example - -You can avoid this issue altogether by specifying the number as a floating-point value -to begin with: - -@example -gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}' -@end example - -Note that for the particular example above, there is likely best -to just use the following: - -@example -gawk -M 'BEGIN @{ n = 13; print n % 2 @}' -@end example - @node Advanced Features @chapter Advanced Features of @command{gawk} @cindex advanced features, network connections, See Also networks, connections @@ -27939,6 +26723,1229 @@ The @command{gawk} debugger only accepts source supplied with the @option{-f} op Look forward to a future release when these and other missing features may be added, and of course feel free to try to add them yourself! +@node Arbitrary Precision Arithmetic +@chapter Arithmetic and Arbitrary Precision Arithmetic with @command{gawk} +@cindex arbitrary precision +@cindex multiple precision +@cindex infinite precision +@cindex floating-point numbers, arbitrary precision +@cindex MPFR +@cindex GMP + +@cindex Knuth, Donald +@quotation +@i{There's a credibility gap: We don't know how much of the computer's answers +to believe. Novice computer users solve this problem by implicitly trusting +in the computer as an infallible authority; they tend to believe that all +digits of a printed answer are significant. Disillusioned computer users have +just the opposite approach; they are constantly afraid that their answers +are almost meaningless.}@* +Donald Knuth@footnote{Donald E.@: Knuth. +@cite{The Art of Computer Programming}. Volume 2, +@cite{Seminumerical Algorithms}, third edition, +1998, ISBN 0-201-89683-4, p.@: 229.} +@end quotation + +This @value{CHAPTER} discusses issues that you may encounter +when performing arithmetic. It begins by discussing some of +the general atributes of computer arithmetic, along with how +this can influence what you see when running @command{awk} programs. +This discussion applies to all versions of @command{awk}. + +Then the discussion moves on to @dfn{arbitrary precsion +arithmetic}, a feature which is specific to @command{gawk}. + +@menu +* General Arithmetic:: An introduction to computer arithmetic. +* Floating-point Programming:: Effective Floating-point Programming. +* Gawk and MPFR:: How @command{gawk} provides + aribitrary-precision arithmetic. +* Arbitrary Precision Floats:: Arbitrary Precision Floating-point Arithmetic + with @command{gawk}. +* Arbitrary Precision Integers:: Arbitrary Precision Integer Arithmetic with + @command{gawk}. +@end menu + +@node General Arithmetic +@section A General Description of Computer Arithmetic + +@cindex integers +@cindex floating-point, numbers +@cindex numbers, floating-point +Within computers, there are two kinds of numeric values: @dfn{integers} +and @dfn{floating-point}. +In school, integer values were referred to as ``whole'' numbers---that is, +numbers without any fractional part, such as 1, 42, or @minus{}17. +The advantage to integer numbers is that they represent values exactly. +The disadvantage is that their range is limited. On most systems, +this range is @minus{}2,147,483,648 to 2,147,483,647. +However, many systems now support a range from +@minus{}9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. + +@cindex unsigned integers +@cindex integers, unsigned +Integer values come in two flavors: @dfn{signed} and @dfn{unsigned}. +Signed values may be negative or positive, with the range of values just +described. +Unsigned values are always positive. On most systems, +the range is from 0 to 4,294,967,295. +However, many systems now support a range from +0 to 18,446,744,073,709,551,615. + +@cindex double precision floating-point +@cindex single precision floating-point +Floating-point numbers represent what are called ``real'' numbers; i.e., +those that do have a fractional part, such as 3.1415927. +The advantage to floating-point numbers is that they +can represent a much larger range of values. +The disadvantage is that there are numbers that they cannot represent +exactly. +@command{awk} uses @dfn{double precision} floating-point numbers, which +can hold more digits than @dfn{single precision} +floating-point numbers. +@c Floating-point issues are discussed more fully in +@c @ref{Floating Point Issues}. + +There a several important issues to be aware of, described next. + +@menu +* Floating Point Issues:: Stuff to know about floating-point numbers. +* Integer Programming:: Effective integer programming. +@end menu + +@node Floating Point Issues +@subsection Floating-Point Number Caveats + +As mentioned earlier, floating-point numbers represent what are called +``real'' numbers, i.e., those that have a fractional part. @command{awk} +uses double precision floating-point numbers to represent all +numeric values. This @value{SECTION} describes some of the issues +involved in using floating-point numbers. + +There is a very nice +@uref{http://www.validlab.com/goldberg/paper.pdf, paper on floating-point arithmetic} +by David Goldberg, +``What Every Computer Scientist Should Know About Floating-point Arithmetic,'' +@cite{ACM Computing Surveys} @strong{23}, 1 (1991-03), 5-48. +This is worth reading if you are interested in the details, +but it does require a background in computer science. + +@menu +* String Conversion Precision:: The String Value Can Lie. +* Unexpected Results:: Floating Point Numbers Are Not Abstract + Numbers. +* POSIX Floating Point Problems:: Standards Versus Existing Practice. +@end menu + +@node String Conversion Precision +@subsubsection The String Value Can Lie + +Internally, @command{awk} keeps both the numeric value +(double precision floating-point) and the string value for a variable. +Separately, @command{awk} keeps +track of what type the variable has +(@pxref{Typing and Comparison}), +which plays a role in how variables are used in comparisons. + +It is important to note that the string value for a number may not +reflect the full value (all the digits) that the numeric value +actually contains. +The following program (@file{values.awk}) illustrates this: + +@example +@{ + sum = $1 + $2 + # see it for what it is + printf("sum = %.12g\n", sum) + # use CONVFMT + a = "<" sum ">" + print "a =", a + # use OFMT + print "sum =", sum +@} +@end example + +@noindent +This program shows the full value of the sum of @code{$1} and @code{$2} +using @code{printf}, and then prints the string values obtained +from both automatic conversion (via @code{CONVFMT}) and +from printing (via @code{OFMT}). + +Here is what happens when the program is run: + +@example +$ @kbd{echo 3.654321 1.2345678 | awk -f values.awk} +@print{} sum = 4.8888888 +@print{} a = <4.88889> +@print{} sum = 4.88889 +@end example + +This makes it clear that the full numeric value is different from +what the default string representations show. + +@code{CONVFMT}'s default value is @code{"%.6g"}, which yields a value with +at least six significant digits. For some applications, you might want to +change it to specify more precision. +On most modern machines, most of the time, +17 digits is enough to capture a floating-point number's +value exactly.@footnote{Pathological cases can require up to +752 digits (!), but we doubt that you need to worry about this.} + +@node Unexpected Results +@subsubsection Floating Point Numbers Are Not Abstract Numbers + +@cindex floating-point, numbers +Unlike numbers in the abstract sense (such as what you studied in high school +or college arithmetic), numbers stored in computers are limited in certain ways. +They cannot represent an infinite number of digits, nor can they always +represent things exactly. +In particular, +floating-point numbers cannot +always represent values exactly. Here is an example: + +@example +$ @kbd{awk '@{ printf("%010d\n", $1 * 100) @}'} +515.79 +@print{} 0000051579 +515.80 +@print{} 0000051579 +515.81 +@print{} 0000051580 +515.82 +@print{} 0000051582 +@kbd{@value{CTL}-d} +@end example + +@noindent +This shows that some values can be represented exactly, +whereas others are only approximated. This is not a ``bug'' +in @command{awk}, but simply an artifact of how computers +represent numbers. + +@quotation NOTE +It cannot be emphasized enough that the behavior just +described is fundamental to modern computers. You will +see this kind of thing happen in @emph{any} programming +language using hardware floating-point numbers. It is @emph{not} +a bug in @command{gawk}, nor is it something that can be ``just +fixed.'' +@end quotation + +@cindex negative zero +@cindex positive zero +@cindex zero@comma{} negative vs.@: positive +Another peculiarity of floating-point numbers on modern systems +is that they often have more than one representation for the number zero! +In particular, it is possible to represent ``minus zero'' as well as +regular, or ``positive'' zero. + +This example shows that negative and positive zero are distinct values +when stored internally, but that they are in fact equal to each other, +as well as to ``regular'' zero: + +@example +$ @kbd{gawk 'BEGIN @{ mz = -0 ; pz = 0} +> @kbd{printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz} +> @kbd{printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0} +> @kbd{@}'} +@print{} -0 = -0, +0 = 0, (-0 == +0) -> 1 +@print{} mz == 0 -> 1, pz == 0 -> 1 +@end example + +It helps to keep this in mind should you process numeric data +that contains negative zero values; the fact that the zero is negative +is noted and can affect comparisons. + +@node POSIX Floating Point Problems +@subsubsection Standards Versus Existing Practice + +Historically, @command{awk} has converted any non-numeric looking string +to the numeric value zero, when required. Furthermore, the original +definition of the language and the original POSIX standards specified that +@command{awk} only understands decimal numbers (base 10), and not octal +(base 8) or hexadecimal numbers (base 16). + +Changes in the language of the +2001 and 2004 POSIX standards can be interpreted to imply that @command{awk} +should support additional features. These features are: + +@itemize @bullet +@item +Interpretation of floating point data values specified in hexadecimal +notation (@samp{0xDEADBEEF}). (Note: data values, @emph{not} +source code constants.) + +@item +Support for the special IEEE 754 floating point values ``Not A Number'' +(NaN), positive Infinity (``inf'') and negative Infinity (``@minus{}inf''). +In particular, the format for these values is as specified by the ISO 1999 +C standard, which ignores case and can allow machine-dependent additional +characters after the @samp{nan} and allow either @samp{inf} or @samp{infinity}. +@end itemize + +The first problem is that both of these are clear changes to historical +practice: + +@itemize @bullet +@item +The @command{gawk} maintainer feels that supporting hexadecimal floating +point values, in particular, is ugly, and was never intended by the +original designers to be part of the language. + +@item +Allowing completely alphabetic strings to have valid numeric +values is also a very severe departure from historical practice. +@end itemize + +The second problem is that the @code{gawk} maintainer feels that this +interpretation of the standard, which requires a certain amount of +``language lawyering'' to arrive at in the first place, was not even +intended by the standard developers. In other words, ``we see how you +got where you are, but we don't think that that's where you want to be.'' + +Recognizing the above issues, but attempting to provide compatibility +with the earlier versions of the standard, +the 2008 POSIX standard added explicit wording to allow, but not require, +that @command{awk} support hexadecimal floating point values and +special values for ``Not A Number'' and infinity. + +Although the @command{gawk} maintainer continues to feel that +providing those features is inadvisable, +nevertheless, on systems that support IEEE floating point, it seems +reasonable to provide @emph{some} way to support NaN and Infinity values. +The solution implemented in @command{gawk} is as follows: + +@itemize @bullet +@item +With the @option{--posix} command-line option, @command{gawk} becomes +``hands off.'' String values are passed directly to the system library's +@code{strtod()} function, and if it successfully returns a numeric value, +that is what's used.@footnote{You asked for it, you got it.} +By definition, the results are not portable across +different systems. They are also a little surprising: + +@example +$ @kbd{echo nanny | gawk --posix '@{ print $1 + 0 @}'} +@print{} nan +$ @kbd{echo 0xDeadBeef | gawk --posix '@{ print $1 + 0 @}'} +@print{} 3735928559 +@end example + +@item +Without @option{--posix}, @command{gawk} interprets the four strings +@samp{+inf}, +@samp{-inf}, +@samp{+nan}, +and +@samp{-nan} +specially, producing the corresponding special numeric values. +The leading sign acts a signal to @command{gawk} (and the user) +that the value is really numeric. Hexadecimal floating point is +not supported (unless you also use @option{--non-decimal-data}, +which is @emph{not} recommended). For example: + +@example +$ @kbd{echo nanny | gawk '@{ print $1 + 0 @}'} +@print{} 0 +$ @kbd{echo +nan | gawk '@{ print $1 + 0 @}'} +@print{} nan +$ @kbd{echo 0xDeadBeef | gawk '@{ print $1 + 0 @}'} +@print{} 0 +@end example + +@command{gawk} does ignore case in the four special values. +Thus @samp{+nan} and @samp{+NaN} are the same. +@end itemize + +@node Integer Programming +@subsection Mixing Integers And Floating-point + +As has been mentioned already, @command{gawk} ordinarily uses hardware double +precision with 64-bit IEEE binary floating-point representation +for numbers on most systems. A large integer like 9007199254740997 +has a binary representation that, although finite, is more than 53 bits long; +it must also be rounded to 53 bits. +The biggest integer that can be stored in a C @code{double} is usually the same +as the largest possible value of a @code{double}. If your system @code{double} +is an IEEE 64-bit @code{double}, this largest possible value is an integer and +can be represented precisely. What more should one know about integers? + +If you want to know what is the largest integer, such that it and +all smaller integers can be stored in 64-bit doubles without losing precision, +then the answer is +@iftex +@math{2^{53}}. +@end iftex +@ifnottex +2^53. +@end ifnottex +The next representable number is the even number +@iftex +@math{2^{53} + 2}, +@end iftex +@ifnottex +2^53 + 2, +@end ifnottex +meaning it is unlikely that you will be able to make +@command{gawk} print +@iftex +@math{2^{53} + 1} +@end iftex +@ifnottex +2^53 + 1 +@end ifnottex +in integer format. +The range of integers exactly representable by a 64-bit double +is +@iftex +@math{[-2^{53}, 2^{53}]}. +@end iftex +@ifnottex +[@minus{}2^53, 2^53]. +@end ifnottex +If you ever see an integer outside this range in @command{gawk} +using 64-bit doubles, you have reason to be very suspicious about +the accuracy of the output. Here is a simple program with erroneous output: + +@example +$ @kbd{gawk 'BEGIN @{ i = 2^53 - 1; for (j = 0; j < 4; j++) print i + j @}'} +@print{} 9007199254740991 +@print{} 9007199254740992 +@print{} 9007199254740992 +@print{} 9007199254740994 +@end example + +The lesson is to not assume that any large integer printed by @command{gawk} +represents an exact result from your computation, especially if it wraps +around on your screen. + +@node Floating-point Programming +@section Understanding Floating-point Programming + +Numerical programming is an extensive area; if you need to develop +sophisticated numerical algorithms then @command{gawk} may not be +the ideal tool, and this documentation may not be sufficient. +@c FIXME: JOHN: Do you want to cite some actual books? +It might require digesting a book or two to really internalize how to compute +with ideal accuracy and precision +and the result often depends on the particular application. + +@quotation NOTE +A floating-point calculation's @dfn{accuracy} is how close it comes +to the real value. This is as opposed to the @dfn{precision}, which +usually refers to the number of bits used to represent the number +(see @uref{http://en.wikipedia.org/wiki/Accuracy_and_precision, +the Wikipedia article} for more information). +@end quotation + +There are two options for doing floating-point calculations: +hardware floating-point (as used by standard @command{awk} and +the default for @command{gawk}), and @dfn{arbitrary-precision} +floating-point, which is software based. This @value{CHAPTER} +aims to provide enough information to understand both, and then +will focus on @command{gawk}'s facilities for the latter.@footnote{If you +are interested in other tools that perform arbitrary precision arithmetic, +you may want to investigate the POSIX @command{bc} tool. See +@uref{http://pubs.opengroup.org/onlinepubs/009695399/utilities/bc.html, +the POSIX specification for it}, for more information.} + +Binary floating-point representations and arithmetic are inexact. +Simple values like 0.1 cannot be precisely represented using +binary floating-point numbers, and the limited precision of +floating-point numbers means that slight changes in +the order of operations or the precision of intermediate storage +can change the result. To make matters worse, with arbitrary precision +floating-point, you can set the precision before starting a computation, +but then you cannot be sure of the number of significant decimal places +in the final result. + +Sometimes, before you start to write any code, you should think more +about what you really want and what's really happening. Consider the +two numbers in the following example: + +@example +x = 0.875 # 1/2 + 1/4 + 1/8 +y = 0.425 +@end example + +Unlike the number in @code{y}, the number stored in @code{x} +is exactly representable +in binary since it can be written as a finite sum of one or +more fractions whose denominators are all powers of two. +When @command{gawk} reads a floating-point number from +program source, it automatically rounds that number to whatever +precision your machine supports. If you try to print the numeric +content of a variable using an output format string of @code{"%.17g"}, +it may not produce the same number as you assigned to it: + +@example +$ @kbd{gawk 'BEGIN @{ x = 0.875; y = 0.425} +> @kbd{ printf("%0.17g, %0.17g\n", x, y) @}'} +@print{} 0.875, 0.42499999999999999 +@end example + +Often the error is so small you do not even notice it, and if you do, +you can always specify how much precision you would like in your output. +Usually this is a format string like @code{"%.15g"}, which when +used in the previous example, produces an output identical to the input. + +Because the underlying representation can be little bit off from the exact value, +comparing floating-point values to see if they are equal is generally not a good idea. +Here is an example where it does not work like you expect: + +@example +$ @kbd{gawk 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +The loss of accuracy during a single computation with floating-point numbers +usually isn't enough to worry about. However, if you compute a value +which is the result of a sequence of floating point operations, +the error can accumulate and greatly affect the computation itself. +Here is an attempt to compute the value of the constant +@value{PI} using one of its many series representations: + +@example +BEGIN @{ + x = 1.0 / sqrt(3.0) + n = 6 + for (i = 1; i < 30; i++) @{ + n = n * 2.0 + x = (sqrt(x * x + 1) - 1) / x + printf("%.15f\n", n * x) + @} +@} +@end example + +When run, the early errors propagating through later computations +cause the loop to terminate prematurely after an attempt to divide by zero. + +@example +$ @kbd{gawk -f pi.awk} +@print{} 3.215390309173475 +@print{} 3.159659942097510 +@print{} 3.146086215131467 +@print{} 3.142714599645573 +@dots{} +@print{} 3.224515243534819 +@print{} 2.791117213058638 +@print{} 0.000000000000000 +@error{} gawk: pi.awk:6: fatal: division by zero attempted +@end example + +Here is one more example where the inaccuracies in internal representations +yield an unexpected result: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{i++} +> @kbd{print i} +> @kbd{@}'} +@print{} 4 +@end example + +Can computation using aribitrary precision help with the previous examples? +If you are impatient to know, see +@ref{Exact Arithmetic}. + +Instead of aribitrary precision floating-point arithmetic, +often all you need is an adjustment of your logic +or a different order for the operations in your calculation. +The stability and the accuracy of the computation of the constant @value{PI} +in the previous example can be enhanced by using the following +simple algebraic transformation: + +@example +(sqrt(x * x + 1) - 1) / x = x / (sqrt(x * x + 1) + 1) +@end example + +@noindent +After making this, change the program does converge to +@value{PI} in under 30 iterations: + +@example +$ @kbd{gawk -f /tmp/pi2.awk} +@print{} 3.215390309173473 +@print{} 3.159659942097501 +@print{} 3.146086215131436 +@print{} 3.142714599645370 +@print{} 3.141873049979825 +@dots{} +@print{} 3.141592653589797 +@print{} 3.141592653589797 +@end example + +There is no need to be unduly suspicious about the results from +floating-point arithmetic. The lesson to remember is that +floating-point arithmetic is always more complex than the arithmetic using +pencil and paper. In order to take advantage of the power +of computer floating-point, you need to know its limitations +and work within them. For most casual use of floating-point arithmetic, +you will often get the expected result in the end if you simply round +the display of your final results to the correct number of significant +decimal digits. And, avoid presenting numerical data in a manner that +implies better precision than is actually the case. + +@menu +* Floating-point Representation:: Binary floating-point representation. +* Floating-point Context:: Floating-point context. +* Rounding Mode:: Floating-point rounding mode. +@end menu + +@node Floating-point Representation +@subsection Binary Floating-point Representation +@cindex IEEE-754 format + +Although floating-point representations vary from machine to machine, +the most commonly encountered representation is that defined by the +IEEE 754 Standard. An IEEE-754 format value has three components: + +@itemize @bullet +@item +A sign bit telling whether the number is positive or negative. + +@item +An @dfn{exponent} giving its order of magnitude, @var{e}. + +@item +A @dfn{significand}, @var{s}, +specifying the actual digits of the number. +@end itemize + +The value of the +number is then +@iftex +@math{s @cdot 2^e}. +@end iftex +@ifnottex +@var{s * 2^e}. +@end ifnottex +The first bit of a non-zero binary significand +is always one, so the significand in an IEEE-754 format only includes the +fractional part, leaving the leading one implicit. + +Three of the standard IEEE-754 types are 32-bit single precision, +64-bit double precision and 128-bit quadruple precision. +The standard also specifies extended precision formats +to allow greater precisions and larger exponent ranges. + +The significand is stored in @dfn{normalized} format, +which means that the first bit is always a one. + +@node Floating-point Context +@subsection Floating-point Context +@cindex context, floating-point + +A floating-point @dfn{context} defines the environment for arithmetic operations. +It governs precision, sets rules for rounding, and limits the range for exponents. +The context has the following primary components: + +@table @dfn +@item Precision +Precision of the floating-point format in bits. +@item emax +Maximum exponent allowed for this format. +@item emin +Minimum exponent allowed for this format. +@item Underflow behavior +The format may or may not support gradual underflow. +@item Rounding +The rounding mode of this context. +@end table + +@ref{table-ieee-formats} lists the precision and exponent +field values for the basic IEEE-754 binary formats: + +@float Table,table-ieee-formats +@caption{Basic IEEE Format Context Values} +@multitable @columnfractions .20 .20 .20 .20 .20 +@headitem Name @tab Total bits @tab Precision @tab emin @tab emax +@item Single @tab 32 @tab 24 @tab @minus{}126 @tab +127 +@item Double @tab 64 @tab 53 @tab @minus{}1022 @tab +1023 +@item Quadruple @tab 128 @tab 113 @tab @minus{}16382 @tab +16383 +@end multitable +@end float + +@quotation NOTE +The precision numbers include the implied leading one that gives them +one extra bit of significand. +@end quotation + +A floating-point context can also determine which signals are treated +as exceptions, and can set rules for arithmetic with special values. +Please consult the IEEE-754 standard or other resources for details. + +@command{gawk} ordinarily uses the hardware double precision +representation for numbers. On most systems, this is IEEE-754 +floating-point format, corresponding to 64-bit binary with 53 bits +of precision. + +@quotation NOTE +In case an underflow occurs, the standard allows, but does not require, +the result from an arithmetic operation to be a number smaller than +the smallest nonzero normalized number. Such numbers do +not have as many significant digits as normal numbers, and are called +@dfn{denormals} or @dfn{subnormals}. The alternative, simply returning a zero, +is called @dfn{flush to zero}. The basic IEEE-754 binary formats +support subnormal numbers. +@end quotation + +@node Rounding Mode +@subsection Floating-point Rounding Mode +@cindex rounding mode, floating-point + +The @dfn{rounding mode} specifies the behavior for the results of numerical +operations when discarding extra precision. Each rounding mode indicates +how the least significant returned digit of a rounded result is to +be calculated. +@ref{table-rounding-modes} lists the IEEE-754 defined +rounding modes: + +@float Table,table-rounding-modes +@caption{IEEE 754 Rounding Modes} +@multitable @columnfractions .45 .55 +@headitem Rounding Mode @tab IEEE Name +@item Round to nearest, ties to even @tab @code{roundTiesToEven} +@item Round toward plus Infinity @tab @code{roundTowardPositive} +@item Round toward negative Infinity @tab @code{roundTowardNegative} +@item Round toward zero @tab @code{roundTowardZero} +@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} +@end multitable +@end float + +The default mode @code{roundTiesToEven} is the most preferred, +but the least intuitive. This method does the obvious thing for most values, +by rounding them up or down to the nearest digit. +For example, rounding 1.132 to two digits yields 1.13, +and rounding 1.157 yields 1.16. + +However, when it comes to rounding a value that is exactly halfway between, +things do not work the way you probably learned in school. +In this case, the number is rounded to the nearest even digit. +So rounding 0.125 to two digits rounds down to 0.12, +but rounding 0.6875 to three digits rounds up to 0.688. +You probably have already encountered this rounding mode when +using the @code{printf} routine to format floating-point numbers. +For example: + +@example +BEGIN @{ + x = -4.5 + for (i = 1; i < 10; i++) @{ + x += 1.0 + printf("%4.1f => %2.0f\n", x, x) + @} +@} +@end example + +@noindent +produces the following output when run:@footnote{It +is possible for the output to be completely different if the +C library in your system does not use the IEEE-754 even-rounding +rule to round halfway cases for @code{printf()}.} + +@example +-3.5 => -4 +-2.5 => -2 +-1.5 => -2 +-0.5 => 0 + 0.5 => 0 + 1.5 => 2 + 2.5 => 2 + 3.5 => 4 + 4.5 => 4 +@end example + +The theory behind the rounding mode @code{roundTiesToEven} is that +it more or less evenly distributes upward and downward rounds +of exact halves, which might cause the round-off error +to cancel itself out. This is the default rounding mode used +in IEEE-754 computing functions and operators. + +The other rounding modes are rarely used. +Round toward positive infinity (@code{roundTowardPositive}) +and round toward negative infinity (@code{roundTowardNegative}) +are often used to implement interval arithmetic, +where you adjust the rounding mode to calculate upper and lower bounds +for the range of output. The @code{roundTowardZero} +mode can be used for converting floating-point numbers to integers. +The rounding mode @code{roundTiesToAway} rounds the result to the +nearest number and selects the number with the larger magnitude +if a tie occurs. + +Some numerical analysts will tell you that your choice of rounding style +has tremendous impact on the final outcome, and advise you to wait until +final output for any rounding. Instead, you can often avoid round-off error problems by +setting the precision initially to some value sufficiently larger than +the final desired precision, so that the accumulation of round-off error +does not influence the outcome. +If you suspect that results from your computation are +sensitive to accumulation of round-off error, +one way to be sure is to look for a significant difference in output +when you change the rounding mode. + +@node Gawk and MPFR +@section @command{gawk} + MPFR = Powerful Arithmetic + +The rest of this @value{CHAPTER} decsribes how to use the arbitrary precision +(also known as @dfn{multiple precision} or @dfn{infinite precision}) numeric +capabilites in @command{gawk} to produce maximally accurate results +when you need it. + +But first you should check if your version of +@command{gawk} supports arbitrary precision arithmetic. +The easiest way to find out is to look at the output of +the following command: + +@example +$ @kbd{gawk --version} +@print{} GNU Awk 4.1.0 (GNU MPFR 3.1.0, GNU MP 5.0.3) +@print{} Copyright (C) 1989, 1991-2012 Free Software Foundation. +@dots{} +@end example + +@command{gawk} uses the +@uref{http://www.mpfr.org, GNU MPFR} +and +@uref{http://gmplib.org, GNU MP} (GMP) +libraries for arbitrary precision +arithmetic on numbers. So if you do not see the names of these libraries +in the output, then your version of @command{gawk} does not support +arbitrary precision arithmetic. + +Additionally, +there are a few elements available in the @code{PROCINFO} array +to provide information about the MPFR and GMP libraries. +@xref{Auto-set}, for more information. + +@ignore +Even if you aren't interested in arbitrary precision arithmetic, you +may still benefit from knowing about how @command{gawk} handles numbers +in general, and the limitations of doing arithmetic with ordinary +@command{gawk} numbers. +@end ignore + + +@node Arbitrary Precision Floats +@section Arbitrary Precision Floating-point Arithmetic with @command{gawk} + +@command{gawk} uses the GNU MPFR library +for arbitrary precision floating-point arithmetic. The MPFR library +provides precise control over precisions and rounding modes, and gives +correctly rounded reproducible platform-independent results. With the +command-line option @option{--bignum} or @option{-M}, +all floating-point arithmetic operators and numeric functions can yield +results to any desired precision level supported by MPFR. +Two built-in +variables @code{PREC} +(@pxref{Setting Precision}) +and @code{ROUNDMODE} +(@pxref{Setting Rounding Mode}) +provide control over the working precision and the rounding mode. +The precision and the rounding mode are set globally for every operation +to follow. + +The default working precision for arbitrary precision floating-point values is 53, +and the default value for @code{ROUNDMODE} is @code{"N"}, +which selects the IEEE-754 +@code{roundTiesToEven} (@pxref{Rounding Mode}) rounding mode.@footnote{The +default precision is 53, since according to the MPFR documentation, +the library should be able to exactly reproduce all computations with +double-precision machine floating-point numbers (@code{double} type +in C), except the default exponent range is much wider and subnormal +numbers are not implemented.} +@command{gawk} uses the default exponent range in MPFR +@iftex +(@math{emax = 2^{30} - 1, emin = -emax}) +@end iftex +@ifnottex +(@var{emax} = 2^30 @minus{} 1, @var{emin} = @minus{}@var{emax}) +@end ifnottex +for all floating-point contexts. +There is no explicit mechanism to adjust the exponent range. +MPFR does not implement subnormal numbers by default, +and this behavior cannot be changed in @command{gawk}. + +@quotation NOTE +When emulating an IEEE-754 format (@pxref{Setting Precision}), +@command{gawk} internally adjusts the exponent range +to the value defined for the format and also performs computations needed for +gradual underflow (subnormal numbers). +@end quotation + +@quotation NOTE +MPFR numbers are variable-size entities, consuming only as much space as +needed to store the significant digits. Since the performance using MPFR +numbers pales in comparison to doing arithmetic using the underlying machine +types, you should consider using only as much precision as needed by +your program. +@end quotation + +@menu +* Setting Precision:: Setting the working precision. +* Setting Rounding Mode:: Setting the rounding mode. +* Floating-point Constants:: Representing floating-point constants. +* Changing Precision:: Changing the precision of a number. +* Exact Arithmetic:: Exact arithmetic with floating-point numbers. +@end menu + +@node Setting Precision +@subsection Setting the Working Precision +@cindex @code{PREC} variable + +@command{gawk} uses a global working precision; it does not keep track of +the precision or accuracy of individual numbers. Performing an arithmetic +operation or calling a built-in function rounds the result to the current +working precision. The default working precision is 53 which can be +modified using the built-in variable @code{PREC}. You can also set the +value to one of the following pre-defined case-insensitive strings +to emulate an IEEE-754 binary format: + +@multitable {@code{"double"}} {12345678901234567890123456789012345} +@headitem @code{PREC} @tab IEEE-754 Binary Format +@item @code{"half"} @tab 16-bit half-precision. +@item @code{"single"} @tab Basic 32-bit single precision. +@item @code{"double"} @tab Basic 64-bit double precision. +@item @code{"quad"} @tab Basic 128-bit quadruple precision. +@item @code{"oct"} @tab 256-bit octuple precision. +@end multitable + +The following example illustrates the effects of changing precision +on arithmetic operations: + +@example +$ @kbd{gawk -M -vPREC=100 'BEGIN @{ x = 1.0e-400; print x + 0; \} +> @kbd{PREC = "double"; print x + 0 @}'} +@print{} 1e-400 +@print{} 0 +@end example + +Binary and decimal precisions are related approximately according to the +formula: + +@iftex +@math{prec = 3.322 @cdot dps} +@end iftex +@ifnottex +@var{prec} = 3.322 * @var{dps} +@end ifnottex + +@noindent +Here, @var{prec} denotes the binary precision +(measured in bits) and @var{dps} (short for decimal places) +is the decimal digits. We can easily calculate how many decimal +digits the 53-bit significand of an IEEE double is equivalent to: +53 / 3.332 which is equal to about 15.95. +But what does 15.95 digits actually mean? It depends whether you are +concerned about how many digits you can rely on, or how many digits +you need. + +It is important to know how many bits it takes to uniquely identify +a double-precision value (the C type @code{double}). If you want to +convert from @code{double} to decimal and back to @code{double} (e.g., +saving a @code{double} representing an intermediate result to a file, and +later reading it back to restart the computation), then a few more decimal +digits are required. 17 digits is generally enough for a @code{double}. + +It can also be important to know what decimal numbers can be uniquely +represented with a @code{double}. If you want to convert +from decimal to @code{double} and back again, 15 digits is the most that +you can get. Stated differently, you should not present +the numbers from your floating-point computations with more than 15 +significant digits in them. + +Conversely, it takes a precision of 332 bits to hold an approximation +of the constant @value{PI} that is accurate to 100 decimal places. +You should always add some extra bits in order to avoid the confusing round-off +issues that occur because numbers are stored internally in binary. + +@node Setting Rounding Mode +@subsection Setting the Rounding Mode +@cindex @code{ROUNDMODE} variable + +The @code{ROUNDMODE} variable provides +program level control over the rounding mode. +The correspondance between @code{ROUNDMODE} and the IEEE +rounding modes is shown in @ref{table-gawk-rounding-modes}. + +@float Table,table-gawk-rounding-modes +@caption{@command{gawk} Rounding Modes} +@multitable @columnfractions .45 .30 .25 +@headitem Rounding Mode @tab IEEE Name @tab @code{ROUNDMODE} +@item Round to nearest, ties to even @tab @code{roundTiesToEven} @tab @code{"N"} or @code{"n"} +@item Round toward plus Infinity @tab @code{roundTowardPositive} @tab @code{"U"} or @code{"u"} +@item Round toward negative Infinity @tab @code{roundTowardNegative} @tab @code{"D"} or @code{"d"} +@item Round toward zero @tab @code{roundTowardZero} @tab @code{"Z"} or @code{"z"} +@item Round to nearest, ties away from zero @tab @code{roundTiesToAway} @tab @code{"A"} or @code{"a"} +@end multitable +@end float + +@code{ROUNDMODE} has the default value @code{"N"}, +which selects the IEEE-754 rounding mode @code{roundTiesToEven}. +Besides the values listed in @ref{table-gawk-rounding-modes}, +@command{gawk} also accepts @code{"A"} to select the IEEE-754 mode +@code{roundTiesToAway} +if your version of the MPFR library supports it; otherwise setting +@code{ROUNDMODE} to this value has no effect. @xref{Rounding Mode}, +for the meanings of the various rounding modes. + +Here is an example of how to change the default rounding behavior of +@code{printf}'s output: + +@example +$ @kbd{gawk -M -vROUNDMODE="Z" 'BEGIN @{ printf("%.2f\n", 1.378) @}'} +@print{} 1.37 +@end example + +@node Floating-point Constants +@subsection Representing Floating-point Constants +@cindex constants, floating-point + +Be wary of floating-point constants! When reading a floating-point constant +from program source code, @command{gawk} uses the default precision, +unless overridden +by an assignment to the special variable @code{PREC} on the command +line, to store it internally as a MPFR number. +Changing the precision using @code{PREC} in the program text does +not change the precision of a constant. If you need to +represent a floating-point constant at a higher precision than the +default and cannot use a command line assignment to @code{PREC}, +you should either specify the constant as a string, or +as a rational number whenever possible. The following example +illustrates the differences among various ways to +print a floating-point constant: + +@example +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000055511151 +$ @kbd{gawk -M -vPREC = 113 'BEGIN @{ printf("%0.25f\n", 0.1) @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", "0.1") @}'} +@print{} 0.1000000000000000000000000 +$ @kbd{gawk -M 'BEGIN @{ PREC = 113; printf("%0.25f\n", 1/10) @}'} +@print{} 0.1000000000000000000000000 +@end example + +In the first case, the number is stored with the default precision of 53. + +@node Changing Precision +@subsection Changing the Precision of a Number + +@cindex Laurie, Dirk +@quotation +@i{The point is that in any variable-precision package, +a decision is made on how to treat numbers given as data, +or arising in intermediate results, which are represented in +floating-point format to a precision lower than working precision. +Do we promote them to full membership of the high-precision club, +or do we treat them and all their associates as second-class citizens? +Sometimes the first course is proper, sometimes the second, and it takes +careful analysis to tell which.} + +Dirk Laurie@footnote{Dirk Laurie. +@cite{Variable-precision Arithmetic Considered Perilous --- A Detective Story}. +Electronic Transactions on Numerical Analysis. Volume 28, pp. 168-173, 2008.} +@end quotation + +@command{gawk} does not implicitly modify the precision of any previously +computed results when the working precision is changed with an assignment +to @code{PREC}. The precision of a number is always the one that was +used at the time of its creation, and there is no way for the user +to explicitly change it afterwards. However, since the result of a +floating-point arithmetic operation is always an arbitrary precision +floating-point value---with a precision set by the value of @code{PREC}---one of the +following workarounds effectively accomplishes the desired behavior: + +@example +x = x + 0.0 +@end example + +@noindent +or: + +@example +x += 0.0 +@end example + +@node Exact Arithmetic +@subsection Exact Arithmetic with Floating-point Numbers + +@quotation CAUTION +Never depend on the exactness of floating-point arithmetic, +even for apparently simple expressions! +@end quotation + +Can arbitrary precision arithmetic give exact results? There are +no easy answers. The standard rules of algebra often do not apply +when using floating-point arithmetic. +Among other things, the distributive and associative laws +do not hold completely, and order of operation may be important +for your computation. Rounding error, cumulative precision loss +and underflow are often troublesome. + +When @command{gawk} tests the expressions @samp{0.1 + 12.2} and @samp{12.3} +for equality +using the machine double precision arithmetic, it decides that they +are not equal! +(@xref{Floating-point Programming}.) +You can get the result you want by increasing the precision; +56 in this case will get the job done: + +@example +$ @kbd{gawk -M -vPREC=56 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 1 +@end example + +If adding more bits is good, perhaps adding even more bits of +precision is better? +Here is what happens if we use an even larger value of @code{PREC}: + +@example +$ @kbd{gawk -M -vPREC=201 'BEGIN @{ print (0.1 + 12.2 == 12.3) @}'} +@print{} 0 +@end example + +This is not a bug in @command{gawk} or in the MPFR library. +It is easy to forget that the finite number of bits used to store the value +is often just an approximation after proper rounding. +The test for equality succeeds if and only if @emph{all} bits in the two operands +are exactly the same. Since this is not necessarily true after floating-point +computations with a particular precision and effective rounding rule, +a straight test for equality may not work. + +So, don't assume that floating-point values can be compared for equality. +You should also exercise caution when using other forms of comparisons. +The standard way to compare between floating-point numbers is to determine +how much error (or @dfn{tolerance}) you will allow in a comparison and +check to see if one value is within this error range of the other. + +In applications where 15 or fewer decimal places suffice, +hardware double precision arithmetic can be adequate, and is usually much faster. +But you do need to keep in mind that every floating-point operation +can suffer a new rounding error with catastrophic consequences as illustrated +by our attempt to compute the value of the constant @value{PI} +(@pxref{Floating-point Programming}). +Extra precision can greatly enhance the stability and the accuracy +of your computation in such cases. + +Repeated addition is not necessarily equivalent to multiplication +in floating-point arithmetic. In the example in +@ref{Floating-point Programming}: + +@example +$ @kbd{gawk 'BEGIN @{} +> @kbd{for (d = 1.1; d <= 1.5; d += 0.1)} +> @kbd{i++} +> @kbd{print i} +> @kbd{@}'} +@print{} 4 +@end example + +@noindent +you may or may not succeed in getting the correct result by choosing +an arbitrarily large value for @code{PREC}. Reformulation of +the problem at hand is often the correct approach in such situations. + +@node Arbitrary Precision Integers +@section Arbitrary Precision Integer Arithmetic with @command{gawk} +@cindex integer, arbitrary precision + +If the option @option{--bignum} or @option{-M} is specified, +@command{gawk} performs all +integer arithmetic using GMP arbitrary precision integers. +Any number that looks like an integer in a program source or data file +is stored as an arbitrary precision integer. +The size of the integer is limited only by your computer's memory. +The current floating-point context has no effect on operations involving integers. +For example, the following computes +@iftex +@math{5^{4^{3^{2}}}}, +@end iftex +@ifnottex +5^4^3^2, +@end ifnottex +the result of which is beyond the +limits of ordinary @command{gawk} numbers: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{x = 5^4^3^2} +> @kbd{print "# of digits =", length(x)} +> @kbd{print substr(x, 1, 20), "...", substr(x, length(x) - 19, 20)} +> @kbd{@}'} +@print{} # of digits = 183231 +@print{} 62060698786608744707 ... 92256259918212890625 +@end example + +If you were to compute the same value using arbitrary precision +floating-point values instead, the precision needed for correct output +(using the formula +@iftex +@math{prec = 3.322 @cdot dps}), +would be @math{3.322 @cdot 183231}, +@end iftex +@ifnottex +@samp{prec = 3.322 * dps}), +would be 3.322 x 183231, +@end ifnottex +or 608693. +(Thus, the floating-point representation requires over 30 times as +many decimal digits!) + +The result from an arithmetic operation with an integer and a floating-point value +is a floating-point value with a precision equal to the working precision. +The following program calculates the eighth term in +Sylvester's sequence@footnote{Weisstein, Eric W. +@cite{Sylvester's Sequence}. From MathWorld---A Wolfram Web Resource. +@url{http://mathworld.wolfram.com/SylvestersSequence.html}} +using a recurrence: + +@example +$ @kbd{gawk -M 'BEGIN @{} +> @kbd{s = 2.0} +> @kbd{for (i = 1; i <= 7; i++)} +> @kbd{s = s * (s - 1) + 1} +> @kbd{print s} +> @kbd{@}'} +@print{} 113423713055421845118910464 +@end example + +The output differs from the acutal number, 113423713055421844361000443, +because the default precision of 53 is not enough to represent the +floating-point results exactly. You can either increase the precision +(100 is enough in this case), or replace the floating-point constant +@samp{2.0} with an integer, to perform all computations using integer +arithmetic to get the correct output. + +It will sometimes be necessary for @command{gawk} to implicitly convert an +arbitrary precision integer into an arbitrary precision floating-point value. +This is primarily because the MPFR library does not always provide the +relevant interface to process arbitrary precision integers or mixed-mode +numbers as needed by an operation or function. +In such a case, the precision is set to the minimum value necessary +for exact conversion, and the working precision is not used for this purpose. +If this is not what you need or want, you can employ a subterfuge +like this: + +@example +gawk -M 'BEGIN @{ n = 13; print (n + 0.0) % 2.0 @}' +@end example + +You can avoid this issue altogether by specifying the number as a floating-point value +to begin with: + +@example +gawk -M 'BEGIN @{ n = 13.0; print n % 2.0 @}' +@end example + +Note that for the particular example above, there is likely best +to just use the following: + +@example +gawk -M 'BEGIN @{ n = 13; print n % 2 @}' +@end example + @node Dynamic Extensions @chapter Writing Extensions for @command{gawk} |