Copyright (C) 1992, 1995, 1998, 2001, 2006, 2007 Free Software Foundation, Inc. Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. -------------------------------------------------------------------------- Mon May 7 14:38:32 IDT 2007 ============================ Thanks to Andrew Schorr for some of the wording here. The Open Group awk specification says that a string value shall be converted to a numeric value using the ISO C standard atof() function. The awk spec is here: http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html And the ISO C standard says that atof() shall work the same way as strtod(), and strtod() must recognize the IEEE special values for NaN (not-a-number) and Infinity. As a result, more than one gawk user has come to the conclusion that gawk is not standard compliant since it does not support converting these special values. Similar reasoning has led other users to conclude that gawk must also support input data in the format of C99-style hexadecimal floating point numbers. I, Arnold Robbins, the gawk maintainer, feel that while these interpretations of the standard can be made, they are _unintended_ side effects of the change in the POSIX standard to refer to the ISO C99 standard everywhere. With resepct to hexadecimal floating point, historically, AWK has always supported the input of only decimal numbers. Adding such a facility is likely to break many programs, very badly, and is a very big departure from historical practice. With respect to "NaN" and "Inf" values, many people point out that Unix awk and mawk "support the standard" and print such values in their output. This is _accidental_ support, since these programs are relying on the system library version of strtod/atof to do the work. If compiled on a system without such library support, those programs do NOT support the special IEEE values. These special interpretations introduce a large departure from existing practice: # Thanks to Brian Kernighan for this great example $ nawk 'BEGIN { print "Nancy Reagan" + 0 }' nan This is just plain wrong. String values with no leading digits must continue to convert to numeric zero. The POSIX people have been asked to clarify these issues, and not recently, but have not, to my knowledge, done so. Further discussion is provided in the node "POSIX Floating Point Problems" in the gawk manual. ---------- Stuff below this line is of historical interest only ---------- March 2001: It looks like the revised 1003.2 standard will actually follow the rules given below. Hallelujah! October 1998: The 1003.2 work has been at a stand-still for ages. Who knows if or when a new revision will actually happen... August 1995: Although the published 1003.2 standard contained the incorrect comparison rules of 11.2 draft as described below, no actual implementation of awk (that I know of) actually used those rules. A revision of the 1003.2 standard is in progress, and in the May 1995 draft, the rules were fixed (based on my submissions for interpretation requests) to match the description given below. Thus, the next version of the standard will have a correct description of the comparison rules. June 1992: Right now, the numeric vs. string comparisons are screwed up in draft 11.2. What prompted me to check it out was the note in gnu.bug.utils which observed that gawk was doing the comparison $1 == "000" numerically. I think that we can agree that intuitively, this should be done as a string comparison. Version 2.13.2 of gawk follows the current POSIX draft. Following is how I (now) think this stuff should be done. 1. A numeric literal or the result of a numeric operation has the NUMERIC attribute. 2. A string literal or the result of a string operation has the STRING attribute. 3. Fields, getline input, FILENAME, ARGV elements, ENVIRON elements and the elements of an array created by split() that are numeric strings have the STRNUM attribute. Otherwise, they have the STRING attribute. Uninitialized variables also have the STRNUM attribute. 4. Attributes propagate across assignments, but are not changed by any use. (Although a use may cause the entity to acquire an additional value such that it has both a numeric and string value -- this leaves the attribute unchanged.) When two operands are compared, either string comparison or numeric comparison may be used, depending on the attributes of the operands, according to the following (symmetric) matrix: +---------------------------------------------- | STRING NUMERIC STRNUM --------+---------------------------------------------- | STRING | string string string | NUMERIC | string numeric numeric | STRNUM | string numeric numeric --------+---------------------------------------------- So, the following program should print all OKs. echo '0e2 0a 0 0b 0e2 0a 0 0b' | $AWK ' NR == 1 { num = 0 str = "0e2" print ++test ": " ( (str == "0e2") ? "OK" : "OOPS" ) print ++test ": " ( ("0e2" != 0) ? "OK" : "OOPS" ) print ++test ": " ( ("0" != $2) ? "OK" : "OOPS" ) print ++test ": " ( ("0e2" == $1) ? "OK" : "OOPS" ) print ++test ": " ( (0 == "0") ? "OK" : "OOPS" ) print ++test ": " ( (0 == num) ? "OK" : "OOPS" ) print ++test ": " ( (0 != $2) ? "OK" : "OOPS" ) print ++test ": " ( (0 == $1) ? "OK" : "OOPS" ) print ++test ": " ( ($1 != "0") ? "OK" : "OOPS" ) print ++test ": " ( ($1 == num) ? "OK" : "OOPS" ) print ++test ": " ( ($2 != 0) ? "OK" : "OOPS" ) print ++test ": " ( ($2 != $1) ? "OK" : "OOPS" ) print ++test ": " ( ($3 == 0) ? "OK" : "OOPS" ) print ++test ": " ( ($3 == $1) ? "OK" : "OOPS" ) print ++test ": " ( ($2 != $4) ? "OK" : "OOPS" ) # 15 } { a = "+2" b = 2 if (NR % 2) c = a + b print ++test ": " ( (a != b) ? "OK" : "OOPS" ) # 16 and 22 d = "2a" b = 2 if (NR % 2) c = d + b print ++test ": " ( (d != b) ? "OK" : "OOPS" ) print ++test ": " ( (d + 0 == b) ? "OK" : "OOPS" ) e = "2" print ++test ": " ( (e == b "") ? "OK" : "OOPS" ) a = "2.13" print ++test ": " ( (a == 2.13) ? "OK" : "OOPS" ) a = "2.130000" print ++test ": " ( (a != 2.13) ? "OK" : "OOPS" ) if (NR == 2) { CONVFMT = "%.6f" print ++test ": " ( (a == 2.13) ? "OK" : "OOPS" ) } }'